Database Theory notes Database Theory Notes

E F Codd definitions

Database
a collection of tables (a.k.a. relations).
Relation
a table of values.
Row
(a.k.a. a tuple) - a collection of one-to-one related values.
Column
(a.k.a. field, a.k.a attribute) provides for an attribute value, related one-to-one with the primary key.  A column name specifies an attribute.
Schema
a column and its data type.

Keys

Primary key
a unique (non-Null) value for each row.
Foreign key
(a.k.a. secondary key) - repeated primary key field values in a many-side table - to retrieve many-side rows of a one-to-many relationship. Therefore, unlike primary key values, foreign key values will not be unique in the many-side table.

Normal Form(s) (see Chris J Date)

"Normalization" reduces redundancy in "degrees of normal form" i.e. the degree to which the following constraints are met. Each "degree of normal form" requires the previous degree, as a prerequisite.

1st Normal Form
duplicate table columns are eliminated.
2nd Normal Form
break out many-to-one rows (e.g. order items) into a separate table with a foreign key to the one-side (e.g. orders) table repeated in the foreign key field.
3rd Normal Form
remove columns whose row value doesn't have a one-to-one relationship with the primary key.
4th Normal Form
No column can have multiple values for the same row e.g. no PIC stack fields.
5th Normal Form
A table cannot be reconstructed from multiple tables (of fewer columns) without the component tables having the same key.

Data warehouse dimensional modeling

Fact
table contains subject/context measurements with foreign keys from surrounding dimension tables (a.k.a. star schema, since the dimension metrics revolve around the Facts). A Fact table compound primary key is usually comprised of the foreign keys from surrounding dimension tables. "Facts" are the organization metrics measurement labels.
Dimension
tables provide the who, what, when, where and how, representing the business, and contains foreign keys from the dimension tables, where time series (e.g. dates) and other dimensions (e.g. store location, salesman, product) are stored.
A Fact table is de-normalized, mostly in third normal form (3NF); dimensional tables are usually de-normalized second normal form (2NF).

When dimension tables are normalized (into separate tables), they appear as a snowflake (ergo: the Snowflake schema). Snowflake schema query performance is inferior to star schema query performance, since snowflake queries must join (more) tables than a de-normalized Star Fact table requires.

OLTP (On-Line Transaction Processing)

Normalization to reduce duplicates is important for performance ... in contrast to OLAP (On-Line Analytical Processing) where performance improves with the duplication of de-normalization.

W.H. (Bill) Inmon

http://www.inmoncif.com/home/

Ralph Kimball articles

Ralph Kimball Associates articles

Back