Introduction to Databases Statement of Accomplishment

Posted December 15th, 2011 by Megan Taylor

Just got my statement of accomplishment from Jennifer Widom for the Introduction to Databases class!

Congratulations! You have successfully completed the free online offering of Introduction to Databases, offered October through December, 2011. To successfully complete this free online class, students were required to watch lectures, complete quizzes and automated exercises, and take a midterm exam and a final exam. According to our automated system, your scores on these components were as follows:

Quizzes: 57 out of a maximum of 67
Exercises: 69 out of a maximum of 71
Exams: 26 out of a maximum of 38
Scaled total (exercises doubled, exams tripled): 273 out of a maximum of 323

We thank you for your interest in studying databases, and for participating in our ambitious experiment to deliver quality educational content to a worldwide audience.

Daily Summary

Posted December 15th, 2011 by Megan Taylor

I did:

Figured out what changes I need to make to my boundary service JavaScript files.

Checked out this crazy Facebook Timeline biznass.

Poked around online programming classes. I think I’m gonna work on mastering JavaScript next. Or at least one of the libraries.

I learned:

Spiders can make more than one kind of silk, and have multiple spigots in their spinnerets. Spiders are awesome.

If I bring homemade toffee to work, people will eat it.

How to find New York State geometry data with Google’s geocoding API

I read:

NPR’s StateImpact project explores regional topics through focused, data-driven journalism
The network is taking the resources of a national news organization and applying them to the local level.

24 Ways: Extracting the Content
Use page tables to break Content down into content.

Dalek Smash Cake & Doctor Who-Themed Birthday Party

Create Stunning HTML Email that Just Works by Matthew Patterson
We really need to start from scratch with our email templates at work.

Multi-Device Web Design: An Evolution

Intro to Databases: Online Analytical Processing (OLAP)

Posted December 4th, 2011 by Megan Taylor

Two broad types of database processing
OLTP (Online Transaction Processing)
– Short transactions
– simple queries
– touch small portions of data
– frequent updates
OLAP (Online Analytical Processing)
– Long transactions
– complex queries
– touch large portions of data
– infrequent updates

More terminology
Data warehousing
– bring data from operational (OLTP) sources into a single warehouse for analysis
Decision support system (DDS)
– infrastructure for data analysis

“Star” Schema
Fact table
– updated frequently, often append-only, very large
Dimension tables
– updated infrequently, not as large
Fact table references dimension tables.

Example:
Sales(storeID, itemID, custID, qty, price) – Fact table
Store(storeID, city, state) – Dimension table
Item(itemID, category, brand, color, size) – Dimension table
Customer(custID, name, address) – Dimension table

Foreign keys into dimension tables are called dimension attributes. Other attributes are dependent attributes.

OLAP queries
Join -> Filter -> Group -> Aggregate

Performance
– inherently very slow: special indexes, query processing techniques
– extensive use of materialized views

Data Cube (a.k.a. multidimensional OLAP)
– dimension data forms axes of “cube”
– fact (dependent) data in cells
– aggregated data on sides, edges, corner

Fact table uniqueness for data cube
Sales(storeID, itemID, custID, qty, price)
If dimension attributes not a key, must aggregate
Date can be used to create key
– Dimension or dependent? dimension, but no table

Drill down
Examining summary data, break out by dimension attribute

Example:
Select state, brand, Sum(qty*price)
From Sales F, Store S, Item If
Where F.storeID = S.storeID and F.itemID = I.itemID
Group By state, category, brand

Roll-up
Examining data, summarize by dimension attribute

Example:
Select brand, Sum(qty*price)
From Sales F, Store S, Item If
Where F.storeID = S.storeID and F.itemID = I.itemID
Group By brand

SQL Constructs

With Cube
Select dimension-attrs, aggregates
From tables
Where conditions
Group By dimension-attrs With Cube
Add to result: faces, edges and corner of cube using NULL values

With Rollup
Select dimension-attrs, aggregates
From tables
Where conditions
Group By dimension-attrs With Rollup
For hierarchical dimensions, portion of With Cube

DEMO

Star Join

select *
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.itemID and F.custID = C.custID;

select S.city, I.color, C.cName, F.price
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.ItemID and F.custID = C.custID and S.state = ‘CA’ and I.category = ‘Tshirt’ and C.age < 22 and F.price < 25;

Drilling Down

select storeID, custID, sum(price)
from Sales
group by storeID, custID;

select storeID, itemID, custID, sum(price)
from Sales
group by storeID, itemID, custID;

select state, county, category, sum(price)
from Sales F, Store S, Item I
where F.storeID = S.storeID and F.itemID = I.itemID
group by state, county, category;

select state, county, category, gender, sum(price)
from Sales F, Store S, Item I, Customer C
where F.storeID = S.storeID and F.itemID = I.itemID
group by state, county, category, gender;

Slicing
A query that analyzes a slice of the cube by contraining one of the dimensions.

select F.storeID, itemID, custID, sum(price)
from Sales F, Store S
wher F.storeID = S.storeID and state = ‘WA’
group by F.storeID, itemID, custID;

Dicing
Slice in two dimensions to get a chunk of the Cube

select F.storeID, F.itemID, custID, sum(price)
from Sales F, Store S, Item I
wher F.storeID = S.storeID and state = ‘WA’ and color = ‘red’
group by F.storeID, F.itemID, custID;

Rolling Up

select itemID, sum(price)
from Sales F
group by itemID;

select state, category, sum(price)
from Sales F, Store S, Item I
where F.storeID = S.storeID and F.itemID = I.itemID
group by state, category;

select state, gender, sum(price)
from Sales F, Store S, Customer C
where F.storeID = S.storeID
group by state, gender;

With Cube

select storeID, itemID, custID, sum(price)
from Sales
group by storeID, itemID, custID with cube;

select storeID, itemID, custID, sum(price)
from Sales
group by storeID, itemID, custID with cube(storeID, custID);

shows NULLs (ex: sales for store1, item1, any customer)

Can create Cube table to query Cube directly
gives pre-aggregated data

select C.*
from Cube C, Store S, Item I
where c.storeID = S.storeID and C.itemID = I.itemID and state = ‘CA’ and color = ‘blue’ and custID is null;

select C.*
from Cube C, Store S, Item I
where c.storeID = S.storeID and C.itemID = I.itemID and state = ‘CA’ and color = ‘blue’ and custID is not null;
no summarized data

With Rollup

select storeID, itemID, custID, sum(price)
from Sales F
group by storeID, itemID, custID with rollup

subsection of cube based on group by attributes
best for hierarchical analysis

select state, county, city, sum(price)
from Sales F, Store S
where F.storeID = S.storeID
group by state, county, city with rollup

Intro to Databases: Recursion in SQL, Nonlinear and Mutual Recursion

Posted December 4th, 2011 by Megan Taylor

Linear Recursion
R As (base query (not R) Union (removes duplicates) recursive query (one reference to R))

Example 1: Ancestors
ParentOf(parent, child)
Find all of Mary’s ancestors

Linear
with recursive
Ancestor (a, d) as (select parent as a, child as d, from ParentOf
union
select Ancestor.a, ParentOf.child as d
from Ancestor, ParentOf
where Ancestor.d = ParentOf.parent)
select a from Ancestor where d = ‘Mary’;

Non-Linear
with recursive
Ancestor (a, d) as (select parent as a, child as d from ParentOf
union
select A1.a, A2.d
from Ancestor A1, Ancestor A2
where A1.d = A2.a)
select a from Ancestor where d = ‘Mary’

Nonlinear (versus linear)
+ query looks cleaner
+ converges faster
- harder to implement
SQl standard only requires linear

Mututal Recursion

With recursive
R1(A1, A2, …, Am) As (query-1), – R2
R2 As (query-2), – R1
Rn As (query-n)

Example: Hubs and Authorities
Link(src, dest)
HubStart(node)
AuthStart(node)

Hub points to >= 3 Authority
Authority points to >= 3 Hub

with recursive
Hub(node) as (select node from HubStart
union
select src as node from Link L
where dest in (select node from Auth)
group by src having count(*) >= 3),
Auth(node) as (select node from AuthStart
union
select dest as node from Link L
where src in (select node from Hub)
group by dest having count(*) >= 3)
select * from Hub;

with recursive
Hub(node) as (select node from HubStart
union
select src as node from Link L
where src not in (select node from Auth)
and dest in (select node from Auth)
group by src having count(*) >= 3),
Auth(node) as (select node from AuthStart
union
select dest as node from Link L
where dest not in (select node from Hub)
and src in (select node from Hub)
group by dest having count(*) >= 3)
select * from Hub;

non-detirministic, not supported by SQL standard.

Recursion with Aggregation

Example:
P(x)
with recursive
R(x) as (select x from P
union
select sum(x) from R)
select * from R;

FAIL, disallowed in SQL standard

SQL Recursive With Statement
Extends expressiveness of SQL
- basic functionality: linear recursion
- extended functionality: nonlinear recursion, mutual recursion
- Disallowed: negative recursive subqueries, Aggregation

Intro to Databases: Recursion in SQL, Basic recursive WITH statement

Posted December 4th, 2011 by Megan Taylor

SQL is not a “Turing complete” language
- simple, convinient, declarative
- expressve enough for most database queries
- but basic SQL can’t express unbounded computations

Example 1: Ancestors
ParentOf(parent, child)
Find all of Mary’s ancestors

Example 2: Company hierarchy
Employee (ID, salary)
Manager(mID, eID)
Project(name, mgrID)
Find total salary cost of project ‘X’

Example 3: Airline flights
Flight(orig, dest, airline, cost)
Find cheapest way to fly from ‘A’ to ‘B’

SQL With statement
With R1(A1, A2, …, Am) As (query-1),
R2 As (query-2),
Rn As (query-n)

With Recursive R1(A1, A2, …, Am) As (query-1),
R2 As (query-2),
Rn As (query-n)

With Recursive
R As (base query (not R) Union (removes duplicates) recursive query (R))

DEMO

Example 1: Ancestors
ParentOf(parent, child)
Find all of Mary’s ancestors

with recursive
Ancestor (a, d) as (select parent as a, child as d, from ParentOf
union
select Ancestor.a, ParentOf.child as d
from Ancestor, ParentOf
where Ancestor.d = ParentOf.parent)
select a from Ancestor where d = ‘Mary’

Example 2: Company hierarchy
Employee (ID, salary)
Manager(mID, eID)
Project(name, mgrID)
Find total salary cost of project ‘X’

with recursive
Superior as (select * from Manager
union
select S.mID, M.eID
from Superior S, Manager M
where S.eID = M.mID)
select sum(salary)
from Employee
where ID in
(select mgrID from Project where name = ‘X’
union
select eID from Project, Superior
where Project.name = ‘X’ AND Project.mgrID = Superior.mID)

or

with recursive
Xemps(ID) as (select mgrID as ID from Project where name = ‘X’
union
select distinct eID as ID
from Manager M, Xemps X
where M.mID = X.ID)
select sum(salary)
from Employee
where ID in (select ID from Xemps)

Find the total salary cost of two projects ‘Y’ and ‘Z’

with recursive
Yemps(ID) as (select mgrID as ID from Project where name = ‘Y’
union
select distinct eID as ID
from Manager M, Yemps Y
where M.mID = Y.ID),
Zemps(ID) as (select mgrID as ID from Project where name = ‘Z’
union
select distinct eID as ID
from Manager M, Zemps Z
where M.mID = Z.ID)
select ‘Y-total’, sum(salary)
from Employee
where ID in (select ID from Yemps)
union
select ‘Z-total’, sum(salary)
from Employee
where ID in (select ID from Zemps)

Example 3: Airline flights
Flight(orig, dest, airline, cost)
Find cheapest way to fly from ‘A’ to ‘B’

with recursive
Route(orig, dest, total) as
(select orig, dest, cost as total from Flight
union
select R.orig, F.dest, cost+total as total
from Route R, Flight F
where R.dest = F.orig)
select min(total) from Route
where orig = ‘A’ and dest = ‘B’

or

with recursive
FromA(dest, total) as
(select dest, cost as total from Flight where orig = ‘A’
union
select F.dest, cost+total as total
from FromA FA, Flight F
where FA.dest = F.orig)
select min(total) from FromA where orig = ‘A’

or

with recursive
ToB(orig, total) as
(select orig, cost as total from Flight where dest = ‘B’
union select F.orig, cost+total as total
from Flight F, ToB TB
where F.dest = TB.orig)
select min(total) from ToB where orig = ‘A’

Beware infinite loops!

with recursive
Route(orig, dest, total) as
(select orig, dest, cost as total from Flight
union
select R.orig, F.dest, cost+total as total
from Route R, Flight F
where R.dest = F.orig)
select * from Route
where orig = ‘A’ and dest = ‘B’ limit 20

with recursive
Route(orig, dest, total, length) as
(select orig, dest, cost as total, 1 from Flight
union
select R.orig, F.dest, cost+total as total, R.length+1 as length
from Route R, Flight F
where R.length < 10 and R.dest = F.orig)
select * from Route
where orig = ‘A’ and dest = ‘B’

Intro to Databases: Views, Materialized Views

Posted December 3rd, 2011 by Megan Taylor

Why use materialized views?
Same as virtual views, plus improve query performance

Virtual views
View V = ViewQuery(R1, R…, Rn)
Schema of V is schema of query result
Query Q involving V, conceptually:
V:= ViewQuery(R1, R…, Rn)
Evaluate Q
In reality, Q rewritten to use R1, R…, Rn instead of V
Note: Ri could itself be a view

Materialized views
View V = ViewQuery(R1, R…, Rn)
Create table V with schema of query result
Execute ViewQuery and put results in V
Queries refer to V as if its a table, because it is
BUT…
V could be ginormous
Modification to R1, R2, R…, Rn -> recompute or modify V

Create Materialized View CA-CS As
Select C.cName, S.sName
From College C, Student S, Apply A
Where C.cName = A.cName and S.sID = A.sID and C.state = ‘CA’ and A.major = ‘CS’

Modifications to base data invalidate View

Modifications
Good news: just update the stored table
Bad News: base tables must stay in sync
– same issues as virtual views

Picking which materialized views to create
Efficiency benefits of a materialized view depend on:
– size of data
– complexity of View- number of queries using view
– number of modifications affecting view
– also “incremental maintenance” versus full recomputation

Intro to Databases: Views, Automatic View Modifications

Posted December 3rd, 2011 by Megan Taylor

2. Restrict views and modifications to that translation to base table modifications is meaningful and unambiguous

Restrictions in SQL standard for updatable views
1. Select(no distinct) on single table T
2. Attributes in table T not in view can be NULL or have default value
3. Subqueries must not refer to T, can refer to other tables
4. No GROUP BY or aggregation

Demo: simple college admissions database
College(cName, state, enrollment)
Student(sID, sName, GPA, sizeHS)
Apply(sID, cName, major, decision)

Example 1

create view CSaccept as
select sID, cName
from Apply
where major = ‘CS’ and decision = ‘Y’;

delete from CSaccept where sID = 123; – works, good

insert into CSaccept values (333, ‘Berkeley’); – works, bad

Example 2

create view CSEE as
select sID, cName, major
from Apply
where major = ‘CS’ or major = ‘EE’;

insert into CSEE values (111, ‘Berkeley’, ‘CS’) works, good

insert into CSEE values (111, ‘Berkeley’, ‘psychology’) works, bad

Example 4 Fix bad inserts

create view CSaccept as
select sID, cName
from Apply
where major = ‘CS’ and decision = ‘Y’
with check option;

insert into CSaccept values (444, ‘Berkeley’); – Error

create view CSEE as
select sID, cName, major
from Apply
where major = ‘CS’ or major = ‘EE’
with check option;

insert into CSEE values (444, ‘Berkeley’, ‘psychology’); – Error
insert into CSEE values (444, ‘Berkeley’, ‘CS’); – Works

Example 5

create view Bio as
select * from Student
where sID in (select sID from Apply where major like’bio%’);

delete from Bio where sName = ‘Bob’; – works, deletes from Student
insert into Bio values (555, ‘Karen’, 3.9, 1000); – works, only adds to Student, not in View

create view Bio2 as
select * from Student
where sID in (select sID from Apply where major like’bio%’)
with check option;

insert into Bio values (555, ‘Karen’, 3.9, 1000); – Error

Check option affects efficiency.

Example 6 Views that can’t be modified

create view HSgpa as
select sizeHS, avg(gpa)
From Sutdent
group by sizeHS;

create view Majors as
select distinct major from Apply;

create view NonUnique as
select * from Student S1
where exists (select * from Student S2
where S1.sID <> S2.sID
and S2.GPA = S1.GPA and S2.sizeHS = S1.sizeHS);

Example 7 View with join

create view Stan(sID, aID, sName, major) as
select Student.sID, Apply.sID, sName, major
from Student, Apply
where Student.sID = Apply.sID and cName = ‘Stanford’;

update Stan set sName = ‘CS major’ where major = ‘CS’

Cannot delete from join view, be careful with insertions.

Intro to Databases: Views, View Modifications using Triggers

Posted December 2nd, 2011 by Megan Taylor

Unlike queries, cannot be automated in general

1. Rewriting process specified explicitly by view creator
- Using special INSTEAD-OF trigger

Demo: simple college admissions database
College(cName, state, enrollment)
Student(sID, sName, GPA, sizeHS)
Apply(sID, cName, major, decision)

Example 1

create view CSaccept as
select sID, cName
from Apply
where major = ‘CS’ and decision = ‘Y’;

delete from CSaccept where sID = ’123′; – ERROR

Trigger
create trigger CSacceptDelete
instead of delete on CSaccept
for each row
begin
delete from Apply
where sID = Old.sID
and cName = Old.cName
and major = ‘CS’ and decision = ‘Y’;
end;

uupdate CSaccept set cName = ‘CMU’ where sID = 345;

Trigger
create trigger CSacceptUpdate
instead of update of cName on CSaccept
for each row
begin
update Apply
set cName = New.cName
where sID = Old.sID
and cName = Old.cName
and major = ‘CS’ and decision = ‘Y’;
end;

Example 2

create view CSEE as
select sID, cName, major
from Apply
where major = ‘CS’ or major = ‘EE’;

insert into CSEE values (111, ‘Berkeley’, ‘CS’)

Trigger
create trigger CSEEinsert
instead of insert on CSEE
for each row
begin
insert into Apply values (New.sID, New.cName, New.major, null);
end;

insert into CSEE values (222, ‘Berkeley’, ‘biology’)

Need to write better triggers to prevent users from write insert commands against a view that affect the base tables but don’t get reflected in the view because they don’t satisfy the conditions.

Better Trigger
create trigger CSEEinsert
instead of insert on CSEE
for each row
when New.major = ‘CS’ or New.major = ‘EE’
begin
insert into Apply values (New.sID, New.cName, New.major, null);
end;

Intro to Databases: Views, View Modifications – Introduction

Posted December 2nd, 2011 by Megan Taylor

Querying views
Once V defined, can reference V like any table
Queries involving V rewritten to use base tables

Modifying Views
Once V defined, can we modify V like any table?
Doesn’t make sense: V is not stored
Has to make sense: Views are some users’ entire ‘view’ of the database

Solution: Modifications to V rewritten to modify base tables

R(A, B) -> (1, 2)
V = Select A from R (1)
user says insert (3)
What would R.B be?

1. Rewriting process specified explicitly by view creator (INSTEAD-OF trigger)
+ Can handle all modifications
- No guarantee of correctness

2. Restrict views and modifications to that translation to base table modifications is meaningful and unambiguous (SQL standard)
+ No user intervention
- Restrictions are significant

Intro to Databases: Views, Defining and using Views

Posted December 2nd, 2011 by Megan Taylor

Three-level vision of database
physical
conceptual
logical

A view is a query over a relation.

Why use views?
Hide some data from some users
make some queries easier / more natural
modularity of database access

Real applications tend to use lots of views.

Defining and using views

View V = ViewQuery(R1, R…, Rn)
Schema of V is schema of query result
Query Q involving V, conceptually:
V:= ViewQuery(R1, R…, Rn)
Evaluate Q
In reality, Q rewritten to use R1, R…, Rn instead of V
Note: Ri could itself be a view

SQL Syntax
Create View Vname As

or
Create View Vname(A1, A…, An) As

Demo: simple college admissions database
College(cName, state, enrollment)
Student(sID, sName, GPA, sizeHS)
Apply(sID, cName, major, decision)

create view CSaccept as
select sID, cName
from Apply
where major = ‘CS’ and decision = ‘Y’;

Can refer to view as if its a table.

create view CSberk as
select Student.sID, sName, GPA
From Student, CSaccept
where Student.sID = CSaccept.sID and cName = ‘Berkeley’ and sizeHS > 500;

create view Mega as
select College.cName, state, enrollment, Student.sID, sName, GPA, sizeHS, major, decision
from College, Student, Apply
where College.cName = Apply.cName and Student.sID = Apply.sID;