Megan Taylor

front-end dev, volunteacher, news & data junkie, bibliophile, Flyers fan, sci-fi geek and kitteh servant

Intro to Databases: Online Analytical Processing (OLAP)

Two broad types of database processing
OLTP (Online Transaction Processing)
– Short transactions
– simple queries
– touch small portions of data
– frequent updates
OLAP (Online Analytical Processing)
– Long transactions
– complex queries
– touch large portions of data
– infrequent updates

More terminology
Data warehousing
– bring data from operational (OLTP) sources into a single warehouse for analysis
Decision support system (DDS)
– infrastructure for data analysis

“Star” Schema
Fact table
– updated frequently, often append-only, very large
Dimension tables
– updated infrequently, not as large
Fact table references dimension tables.

Example:
Sales(storeID, itemID, custID, qty, price) – Fact table
Store(storeID, city, state) – Dimension table
Item(itemID, category, brand, color, size) – Dimension table
Customer(custID, name, address) – Dimension table

Foreign keys into dimension tables are called dimension attributes. Other attributes are dependent attributes.

OLAP queries
Join -> Filter -> Group -> Aggregate

Performance
– inherently very slow: special indexes, query processing techniques
– extensive use of materialized views

Data Cube (a.k.a. multidimensional OLAP)
– dimension data forms axes of “cube”
– fact (dependent) data in cells
– aggregated data on sides, edges, corner

Fact table uniqueness for data cube
Sales(storeID, itemID, custID, qty, price)
If dimension attributes not a key, must aggregate
Date can be used to create key
– Dimension or dependent? dimension, but no table

Drill down
Examining summary data, break out by dimension attribute

Example:
Select state, brand, Sum(qty*price)
From Sales F, Store S, Item If
Where F.storeID = S.storeID and F.itemID = I.itemID
Group By state, category, brand

Roll-up
Examining data, summarize by dimension attribute

Example:
Select brand, Sum(qty*price)
From Sales F, Store S, Item If
Where F.storeID = S.storeID and F.itemID = I.itemID
Group By brand

SQL Constructs

With Cube
Select dimension-attrs, aggregates
From tables
Where conditions
Group By dimension-attrs With Cube
Add to result: faces, edges and corner of cube using NULL values

With Rollup
Select dimension-attrs, aggregates
From tables
Where conditions
Group By dimension-attrs With Rollup
For hierarchical dimensions, portion of With Cube

DEMO

Star Join

select *
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.itemID and F.custID = C.custID;

select S.city, I.color, C.cName, F.price
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.ItemID and F.custID = C.custID and S.state = ‘CA’ and I.category = ‘Tshirt’ and C.age < 22 and F.price < 25; Drilling Down select storeID, custID, sum(price) from Sales group by storeID, custID; select storeID, itemID, custID, sum(price) from Sales group by storeID, itemID, custID; select state, county, category, sum(price) from Sales F, Store S, Item I where F.storeID = S.storeID and F.itemID = I.itemID group by state, county, category; select state, county, category, gender, sum(price) from Sales F, Store S, Item I, Customer C where F.storeID = S.storeID and F.itemID = I.itemID group by state, county, category, gender; Slicing A query that analyzes a slice of the cube by contraining one of the dimensions. select F.storeID, itemID, custID, sum(price) from Sales F, Store S wher F.storeID = S.storeID and state = 'WA' group by F.storeID, itemID, custID; Dicing Slice in two dimensions to get a chunk of the Cube select F.storeID, F.itemID, custID, sum(price) from Sales F, Store S, Item I wher F.storeID = S.storeID and state = 'WA' and color = 'red' group by F.storeID, F.itemID, custID; Rolling Up select itemID, sum(price) from Sales F group by itemID; select state, category, sum(price) from Sales F, Store S, Item I where F.storeID = S.storeID and F.itemID = I.itemID group by state, category; select state, gender, sum(price) from Sales F, Store S, Customer C where F.storeID = S.storeID group by state, gender; With Cube select storeID, itemID, custID, sum(price) from Sales group by storeID, itemID, custID with cube; select storeID, itemID, custID, sum(price) from Sales group by storeID, itemID, custID with cube(storeID, custID); shows NULLs (ex: sales for store1, item1, any customer) Can create Cube table to query Cube directly gives pre-aggregated data select C.* from Cube C, Store S, Item I where c.storeID = S.storeID and C.itemID = I.itemID and state = 'CA' and color = 'blue' and custID is null; select C.* from Cube C, Store S, Item I where c.storeID = S.storeID and C.itemID = I.itemID and state = 'CA' and color = 'blue' and custID is not null; no summarized data With Rollup select storeID, itemID, custID, sum(price) from Sales F group by storeID, itemID, custID with rollup subsection of cube based on group by attributes best for hierarchical analysis select state, county, city, sum(price) from Sales F, Store S where F.storeID = S.storeID group by state, county, city with rollup

December 4, 2011 | Comments Off on Intro to Databases: Online Analytical Processing (OLAP) | Categories: Posts | Permalink

Comments are closed.

%d bloggers like this: