Two broad types of database processing
OLTP (Online Transaction Processing)
– Short transactions
– simple queries
– touch small portions of data
– frequent updates
OLAP (Online Analytical Processing)
– Long transactions
– complex queries
– touch large portions of data
– infrequent updates
More terminology
Data warehousing
– bring data from operational (OLTP) sources into a single warehouse for analysis
Decision support system (DDS)
– infrastructure for data analysis
“Star” Schema
Fact table
– updated frequently, often append-only, very large
Dimension tables
– updated infrequently, not as large
Fact table references dimension tables.
Example:
Sales(storeID, itemID, custID, qty, price) – Fact table
Store(storeID, city, state) – Dimension table
Item(itemID, category, brand, color, size) – Dimension table
Customer(custID, name, address) – Dimension table
Foreign keys into dimension tables are called dimension attributes. Other attributes are dependent attributes.
OLAP queries
Join -> Filter -> Group -> Aggregate
Performance
– inherently very slow: special indexes, query processing techniques
– extensive use of materialized views
Data Cube (a.k.a. multidimensional OLAP)
– dimension data forms axes of “cube”
– fact (dependent) data in cells
– aggregated data on sides, edges, corner
Fact table uniqueness for data cube
Sales(storeID, itemID, custID, qty, price)
If dimension attributes not a key, must aggregate
Date can be used to create key
– Dimension or dependent? dimension, but no table
Drill down
Examining summary data, break out by dimension attribute
Example:
Select state, brand, Sum(qty*price)
From Sales F, Store S, Item If
Where F.storeID = S.storeID and F.itemID = I.itemID
Group By state, category, brand
Roll-up
Examining data, summarize by dimension attribute
Example:
Select brand, Sum(qty*price)
From Sales F, Store S, Item If
Where F.storeID = S.storeID and F.itemID = I.itemID
Group By brand
SQL Constructs
With Cube
Select dimension-attrs, aggregates
From tables
Where conditions
Group By dimension-attrs With Cube
Add to result: faces, edges and corner of cube using NULL values
With Rollup
Select dimension-attrs, aggregates
From tables
Where conditions
Group By dimension-attrs With Rollup
For hierarchical dimensions, portion of With Cube
DEMO
Star Join
select *
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.itemID and F.custID = C.custID;
select S.city, I.color, C.cName, F.price
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.ItemID and F.custID = C.custID and S.state = ‘CA’ and I.category = ‘Tshirt’ and C.age < 22 and F.price < 25;
Drilling Down
select storeID, custID, sum(price)
from Sales
group by storeID, custID;
select storeID, itemID, custID, sum(price)
from Sales
group by storeID, itemID, custID;
select state, county, category, sum(price)
from Sales F, Store S, Item I
where F.storeID = S.storeID and F.itemID = I.itemID
group by state, county, category;
select state, county, category, gender, sum(price)
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.itemID
group by state, county, category, gender;
Slicing
A query that analyzes a slice of the cube by contraining one of the dimensions.
select F.storeID, itemID, custID, sum(price)
from Sales F, Store S
wher F.storeID = S.storeID and state = 'WA'
group by F.storeID, itemID, custID;
Dicing
Slice in two dimensions to get a chunk of the Cube
select F.storeID, F.itemID, custID, sum(price)
from Sales F, Store S, Item I
wher F.storeID = S.storeID and state = 'WA' and color = 'red'
group by F.storeID, F.itemID, custID;
Rolling Up
select itemID, sum(price)
from Sales F
group by itemID;
select state, category, sum(price)
from Sales F, Store S, Item I
where F.storeID = S.storeID and F.itemID = I.itemID
group by state, category;
select state, gender, sum(price)
from Sales F, Store S, Customer C
where F.storeID = S.storeID
group by state, gender;
With Cube
select storeID, itemID, custID, sum(price)
from Sales
group by storeID, itemID, custID with cube;
select storeID, itemID, custID, sum(price)
from Sales
group by storeID, itemID, custID with cube(storeID, custID);
shows NULLs (ex: sales for store1, item1, any customer)
Can create Cube table to query Cube directly
gives pre-aggregated data
select C.*
from Cube C, Store S, Item I
where c.storeID = S.storeID and C.itemID = I.itemID and state = 'CA' and color = 'blue' and custID is null;
select C.*
from Cube C, Store S, Item I
where c.storeID = S.storeID and C.itemID = I.itemID and state = 'CA' and color = 'blue' and custID is not null;
no summarized data
With Rollup
select storeID, itemID, custID, sum(price)
from Sales F
group by storeID, itemID, custID with rollup
subsection of cube based on group by attributes
best for hierarchical analysis
select state, county, city, sum(price)
from Sales F, Store S
where F.storeID = S.storeID
group by state, county, city with rollup