✅ What is an Anomaly in a database?
✅ What are the types of anomalies?
- Insertion Anomaly
- Update Anomaly
- Deletion Anomaly
✅ What is an insertion anomaly? Can you explain each with an example?
- ✔️ Insertion Anomaly: when we cannot add data to a table without adding unrelated, empty data
1
2
3
4
5
6
| There is a student table with student score. If a student did not take any exams yet, how do we store data? This is insertion anomaly.
| StudentID | StudentName | TestScore |
|------------|-------------|-----------|
| 1001 | Alice | NULL |
|
- we are forced to enter
NULL
into TestScore
✅ What is an update anomaly? Can you explain each with an example?
- ✔️ Update Anomaly: when duplicate data is stored in multiple rows, and only some get updated
- result in inconsistend information
1
2
3
4
5
6
| If one teacher teaches more than one subject
| Course | Instructor | Phone |
|--------|------------|-------------|
| Math | John Smith | 123-4567 |
| Physics| John Smith | 987-6543 ❌ |
|
✅ What is a deletion anomaly? Can you explain each with an example?
- ✔️ Deletion Anomaly: deleting a piece of data also deletes important information unintentionally
1
2
3
4
5
6
| If a student drops out of a course and we delete that row
and end up losing all student info
| StudentID | StudentName | Course |
|-----------|-------------|---------|
| 1001 | Alice | Math | ← Deleting this removes student 1001 entirely
|
✅ To eliminate anomalies, how far should normalization go?
✅ What is Functional Dependency?
If we know X, then we know Y
- if you know one piece of data, you know another piece
1
2
3
| | StudentID | StudentName | Course |
|-----------|-------------|---------|
| 1001 | Alice | Math |
|
studentID
and studentName
has functional dependency
✅ What is a Full (Complete) Functional Dependency?
attribute depends on the entire primary key
happens when primary key is made up of two or more columns
1
2
3
4
| | StudentID | courseID | Grade |
|-----------|-------------|---------|
| 1001 | ab123 | A+ |
| 1001 | cd345 | B- |
|
grade
depends on both studentID
and courseID
✅ What is a Partial Functional Dependency?
attirbute depends on only part of a composite key
🔴 problem, creates repetition and data redundancy
1
2
3
4
| | StudentID | courseID | StudentName | courseName |
|-----------|-------------|---------------|-------------|
| 1001 | ab123 | Alice | math |
| 1001 | cd345 | Alice | science |
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| ✔️ student table
| StudentID | StudentName |
|-----------|-------------|
| 1001 | Alice |
✔️ course table
| courseID | courseName |
|-----------|-------------|
| ab123 | math |
| cd345 | science |
✔️ enrollment table
| StudentID | courseID |
|-----------|-------------|
| 1001 | ab123 |
| 1001 | cd345 |
|
✅ What is a Transitive Dependency?
A determine B, B determine C
thus, A also determines C indirectly
- 🔴 problem, update anoamilies
- data inconsistency problem
- redundancy
1
2
3
4
5
6
| | StudentID | AdvisorID | AdvisorName |
| --------- | --------- | ----------- |
| 1001 | A1 | Prof. Kim |
| 1002 | A2 | Prof. Lee |
| 1003 | A1 | Prof. Kim |
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| - student table
| StudentID | AdvisorID |
| --------- | --------- |
| 1001 | A1 |
| 1002 | A2 |
| 1003 | A1 |
- advisor table
| AdvisorID | AdvisorName |
| --------- | ----------- |
| A1 | Prof. Kim |
| A2 | Prof. Lee |
|
✅ What is normalization in databases?
1
2
3
| | StudentID | StudentName | Hobbies |
|-----------|-------------|-------------------|
| 1001 | Alice | reading, swimming |
|
- 💊 after 1NF
- each hobby is stored in separate row or separate table
1
2
3
4
| | StudentID | StudentName | Hobby |
|-----------|-------------|----------|
| 1001 | Alice | reading |
| 1001 | Alice | swimming |
|
1
2
3
4
5
6
7
8
9
10
11
12
| - student table
| StudentID | Name |
| --------- | ---- |
| 1 | John |
- hobby table
| StudentID | Hobby |
| --------- | -------- |
| 1 | Reading |
| 1 | Swimming |
|
1
2
3
4
| | StudentID | courseID | StudentName | courseName |
|-----------|-------------|---------------|-------------|
| 1001 | ab123 | Alice | math |
| 1001 | cd345 | Alice | science |
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| ✔️ student table
| StudentID | StudentName |
|-----------|-------------|
| 1001 | Alice |
✔️ course table
| courseID | courseName |
|-----------|-------------|
| ab123 | math |
| cd345 | science |
✔️ enrollment table
| StudentID | courseID |
|-----------|-------------|
| 1001 | ab123 |
| 1001 | cd345 |
|
1
2
3
4
| | StudentID | courseID | courseName |
|-----------|-------------|--------------|
| 1001 | ab123 | math |
| 1001 | cd345 | science |
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| - enrollment table
| StudentID | CourseID |
| --------- | -------- |
| 1001 | ab123 |
| 1001 | cd345 |
- course table
| CourseID | CourseName |
| -------- | ---------- |
| ab123 | math |
| cd345 | science |
|
1
2
3
4
5
6
| | StudentID | StudentName | RoomNumber |
|-----------|-------------|------------|
| 1001 | Alice | 101 |
| 1002 | Danny | 102 |
| 1003 | Smith | 203 |
| 1004 | Lilly | 203 |
|
- 👎🏻 example of violating BCNF
- also violates 2NF
1
2
3
4
5
| | StudentID | Course | Instructor |
| --------- | ------ | ---------- |
| 1001 | Math | Prof. Kim |
| 1002 | Math | Prof. Kim |
| 1003 | CS | Prof. Lee |
|
- primary key:
StudentID
, Course
- however,
Instructor
relys only on Course
➡️ 💥 violate 2NF Course ➡️ Instructor
, however Course
is not a candidate key ➡️ 💥 violate BCNF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| | Student | Language | Hobby |
| ------- | -------- | ------ |
| John | English | Guitar |
| John | English | Soccer |
| John | Spanish | Guitar |
| John | Spanish | Soccer |
- fix after 4NF
- Language table
| Student | Language |
| ------- | -------- |
| John | English |
| John | Spanish |
- Hobby table
| Student | Hobby |
| ------- | ------ |
| John | Guitar |
| John | Soccer |
|
- ✔️ 5th Normal Form: has no join dependency
- table can be split into smaller tables and re-joined without having redundancy problems
1
2
3
4
5
6
7
8
9
10
11
| | Employee | Skill | Trainer |
| -------- | ------ | ------- |
| Alice | Java | Bob |
| Alice | Python | Bob |
| Alice | Java | Carol |
| Alice | Python | Carol |
- can be split into
- Employee-Skill
- Skill-Trainer
- Employee-Trainer
|
✅ What is denormalization?
- ✔️ denormalization: intentionally add redundancy by combining tables
- 👍🏻 to improve read performace
- 👎🏻 increase storage
- 👎🏻 increase update anomality
✅
✅
✅
✅
✅
✅
✅
✅