What is Normalization
Designing to eliminate data redundancy and prevent logical inconsistencies in data.
Prerequisites
Keys
- Primary Key
- An identifier that uniquely identifies a row
- Composite Key
- A primary key composed of multiple attributes
- Foreign Key
- A key used to reference another table
- Candidate Key
- A set of attributes that uniquely identifies a row, is irreducible, and minimal (minimum number of attributes)
- Super Key
- A combination of attributes that uniquely identifies a row, including extra attributes
- Irreducible
- A state where there are no extra attributes (≒ cannot reduce attributes further)
| Employee Number | ID | Name | Gender | Address | Phone Number |
|---|---|---|---|---|---|
| 1 | 1001 | Taro Yamada | Male | Chiyoda, Tokyo | 03-1234-5678 |
| 2 | 1002 | Hanako Tanaka | Female | Shibuya, Tokyo | 03-2345-6789 |
| 3 | 1003 | Jiro Suzuki | Male | Shinjuku, Tokyo | 03-3456-7890 |
| 4 | 1004 | Saburo Sato | Male | Minato, Tokyo | 03-4567-8901 |
| 5 | 1005 | Shiro Takahashi | Male | Meguro, Tokyo | 03-5678-9012 |
Primary Key: {Employee Number, ID}
Candidate Key: {Employee Number, ID}, {Phone Number}
Super Key: {Employee Number, ID}, {Employee Number, ID, Name}, {Employee Number, ID, Name, Gender} etc...
Functional Dependency
- Functional Dependency
- The property that if A is determined, B is also determined
- Partial Functional Dependency
- A property where an attribute is dependent on one of the candidate keys
- Full Functional Dependency
- A property where all non-key attributes are functionally dependent on the primary key
- Transitive Functional Dependency
- A functional dependency between non-key attributes where if A is determined, B is determined, and if B is determined, C is determined
- Join Dependency
- The property that decomposed relations can be joined back to the original relation (≒ lossless decomposition)
- Multivalued Dependency
- A relationship where B and C are independent regarding attributes A and C, and B depends on A
- A special case of join dependency
- Lossless Decomposition
- Decomposing a relation in such a way that the original relation can be reconstructed by joining the decomposed parts
Normal Forms
First Normal Form (1NF)
- Rows are not ordered vertically, and columns are not ordered horizontally
- Although SQL has an order, avoid writing queries that depend on order
- e.g., Avoid SELECT *, specify column positions in ORDER BY (ORDER BY 1), etc.
- Although SQL has an order, avoid writing queries that depend on order
- No duplicate rows
- No NULLs
- Each column contains only one value that satisfies the domain (data type)
Second Normal Form (2NF)
- Satisfies 1NF
- Eliminates partial functional dependencies, achieving full functional dependency
- All non-key attributes are fully functionally dependent on the candidate key
Third Normal Form (3NF)
- Satisfies 2NF
- Eliminates transitive functional dependencies
- No non-key attributes are transitively functionally dependent on the candidate key
Boyce-Codd Normal Form (BCNF)
- Satisfies 3NF
- Eliminates partial and transitive functional dependencies of candidate keys
- All non-trivial functional dependencies are removed, and no further lossless decomposition is possible due to functional dependencies
- Trivial {Order Number, Product Number} → {Product Number}
- Non-trivial {Product Name} → {Material Name}
Fourth Normal Form (4NF)
- Normalization due to multivalued dependencies
Fifth Normal Form (5NF)
- All non-trivial or implicit join dependencies are removed
Normalization
1NF
Non-normalized form
| Invoice Number | Product Number | Product Name |
|---|---|---|
| 1 | A001A002A003 | AppleOrangeBanana |
| 2 | A004A005A006 | GrapePearStrawberry |
1NF
| Invoice Number | Product Number | Product Name |
|---|---|---|
| 1 | A001 | Apple |
| 1 | A002 | Orange |
| 1 | A003 | Banana |
| 2 | A004 | Grape |
| 2 | A005 | Pear |
| 2 | A006 | Strawberry |
2NF
Before 2NF
| Invoice Number | Product Number | Product Name |
|---|---|---|
| 1 | A001 | Apple |
| 1 | A002 | Orange |
| 1 | A003 | Banana |
| 2 | A004 | Grape |
| 2 | A005 | Pear |
| 2 | A006 | Strawberry |
The candidate key {Invoice Number, Product Number} and the non-key attribute Product Name are partially functionally dependent.
2NF Sales Detail Table
| Invoice Number | Product Number |
|---|---|
| 1 | A001 |
| 1 | A002 |
| 1 | A003 |
| 2 | A004 |
| 2 | A005 |
| 2 | A006 |
Product Table
| Product Number | Product Name |
|---|---|
| A001 | Apple |
| A002 | Orange |
| A003 | Banana |
| A004 | Grape |
| A005 | Pear |
| A006 | Strawberry |
Partial functional dependency only occurs when the candidate key is a composite key, so it does not occur when the candidate key is a single attribute.
3NF
Before 3NF
| Invoice Number | Product Number | Customer Number | Customer Name |
|---|---|---|---|
| 1 | A001 | B1 | Apple Corp |
| 1 | A002 | B1 | Orange Corp |
| 1 | A003 | B1 | Banana Corp |
| 2 | A004 | C1 | Grape Corp |
| 2 | A005 | C1 | Pear Corp |
| 2 | A006 | C1 | Strawberry Corp |
When the primary key is {Invoice Number, Product Number}, {Invoice Number, Customer Number}→{Customer Number}→{Customer Name} is transitively dependent.
3NF Sales Detail Table
| Invoice Number | Product Number |
|---|---|
| 1 | A001 |
| 1 | A002 |
| 1 | A003 |
| 2 | A004 |
| 2 | A005 |
| 2 | A006 |
Customer Table
| Customer Number | Customer Name |
|---|---|
| B1 | Apple Corp |
| B1 | Orange Corp |
| B1 | Banana Corp |
| C1 | Grape Corp |
| C1 | Pear Corp |
| C1 | Strawberry Corp |
BCNF
Before BCNF
| Name | Subject | Teacher |
|---|---|---|
| Bob | Math | Yamada |
| Tom | Math | Sato |
| John | Math | Suzuki |
| John | English | Ando |
When the primary key is {Name, Subject}, there is a functional dependency {Teacher}→{Subject}, and the determinant (A in A→B) is not a super key.
BCNF Enrollment Table
| Name | Subject |
|---|---|
| Bob | Math |
| Tom | Math |
| John | Math |
| John | English |
Teacher Table
| Teacher | Subject |
|---|---|
| Yamada | Math |
| Sato | Math |
| Suzuki | Math |
| Ando | English |
The information {Name, Subject}→{Teacher} is lost, so it is unclear who John's math teacher is.
4NF
Before 4NF
| Name | Hobby | Favorite Food |
|---|---|---|
| Tanaka | Baseball | Ramen |
| Suzuki | Soccer | Sushi |
| Sato | Basketball | Curry |
When the primary key is {Name, Hobby, Favorite Food}, multiple attributes are determined by {Name}→{Hobby}→{Favorite Food}.
4NF Hobby Table
| Name | Hobby |
|---|---|
| Tanaka | Baseball |
| Suzuki | Soccer |
| Sato | Basketball |
Favorite Food Table
| Name | Favorite Food |
|---|---|
| Tanaka | Ramen |
| Suzuki | Sushi |
| Sato | Curry |
5NF
Before 5NF
| Store | Stock Item | Manufacturer |
|---|---|---|
| Tokyo | TV | Company A |
| Tokyo | TV | Company B |
| Tokyo | PC | Company A |
| Kanagawa | TV | Company A |
{Store}→{Stock Item}, {Store}→{Manufacturer}, {Stock Item}→{Manufacturer} can be decomposed into multiple parts.
5NF Stock Table
| Store | Stock Item |
|---|---|
| Tokyo | TV |
| Tokyo | TV |
| Tokyo | PC |
| Kanagawa | TV |
Supplier Table
| Store | Supplier |
|---|---|
| Tokyo | Company A |
| Tokyo | Company B |
| Tokyo | Company A |
| Kanagawa | Company A |
Manufacturer Table
| Store | Manufacturer |
|---|---|
| Tokyo | Company A |
| Tokyo | Company B |
| Tokyo | Company A |
| Kanagawa | Company A |
Thoughts
I am not very confident in my understanding beyond BCNF...
References
- amzn.to - Learn Database Practice from Theory ~ Efficient SQL with Relational Model (WEB+DB PRESS plus)
- e-words.jp - Partial Functional Dependency
- datascience-lab.sakura.ne.jp - Understanding Various Types of "Keys"
- poppingcarp.com - Various Keys like Primary Key, Candidate Key, Foreign Key, Super Key | Database Basics
- koseki2580.github.io - Normalization
- zenn.dev - Understanding Database Normalization with Illustrations
- youtube.com - Database Normalization (1NF, 2NF, 3NF)
tabibou.com - Database Normalization from 1NF to 5NF, Boyce-Codd Normal Form Examples- poppingcarp.com - Various Keys like Primary Key, Candidate Key, Foreign Key, Super Key | Database Basics