# Normalization

Normalization is a schema/database design process that minimizes data duplication and enforces data integrity.

A simple definition for practical purposes is:

1. Separate each *entity* into its own table.
2. Separate each discrete *attribute* into its own column.
3. Uniquely identify each entity instance (row) using a *primary key*.
4. Use *foreign key* columns to link related entities.

{% embed url="<https://www.itprotoday.com/sql-server/sql-design-why-you-need-database-normalization>" %}

## First Normal Form (1NF)

* In the *first normal form*, only single values are permitted at the intersection of each row and column; hence, there are no repeating groups.
* To normalize a relation that contains a repeating group, remove the repeating group and form two new relations.
* The PK of the new relation is a combination of the PK of the original relation plus an attribute from the newly created relation for unique identification.

#### Process for 1NF

**`Student_Grade_Report`**` ``(StudentNo, StudentName, Major, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)`

* In the `Student_Grade_Report` table, the repeating group is the course information. A student can take many courses.
* Remove the repeating group. In this case, it’s the course information for each student.
* Identify the PK for your new table.
* The PK must uniquely identify the attribute value (StudentNo and CourseNo).
* After removing all the attributes related to the course and student, you are left with the student course table (**StudentCourse**).
* The Student table (**Student**) is now in first normal form with the repeating group removed.

**`Student`**` ``(StudentNo, StudentName, Major)`

**`StudentCourse`**` ``(StudentNo, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)`

#### How to update 1NF anomalies

* To add a new course, we need a student.
* When course information needs to be updated, we may have inconsistencies.
* To delete a *student,* we might also delete critical information about a course.

## Second Normal Form (2NF)

* For the *second normal form*, the relation must first be in 1NF. The relation is automatically in 2NF if, and only if, the PK comprises a single attribute.
* If the relation has a composite PK, then each non-key attribute must be fully dependent on the entire PK and not on a subset of the PK (i.e., there must be no partial dependency or augmentation).

#### Process for 2NF

To move to 2NF, a table must first be in 1NF.

* The Student table is already in 2NF because it has a single-column PK.

**`StudentCourse`**` ``(StudentNo, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)`

* When examining the `Student_Course` table, we see that not all the attributes are fully dependent on the PK; specifically, all course information. The only attribute that is fully dependent is grade.
* Identify the new table that contains the course information.
* Identify the PK for the new table.

**`CourseGrade`**` ``(StudentNo, CourseNo, Grade)`

**`CourseInstructor`**` ``(CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation)`

#### How to update 2NF anomalies

* When adding a new instructor, we need a course.
* Updating course information could lead to inconsistencies for instructor information.
* Deleting a course may also delete instructor information.

## Third Normal Form (3NF)

* To be in *third normal form*, the relation must be in second normal form. Also all transitive dependencies must be removed; a non-key attribute may not be functionally dependent on another non-key attribute.

#### Process for 3NF

* Eliminate all dependent attributes in transitive relationship(s) from each of the tables that have a transitive relationship.
* Create new table(s) with removed dependency.
* Check new table(s) as well as table(s) modified to make sure that each table has a determinant and that no table contains inappropriate dependencies.

**`Course`**` ``(CourseNo, CourseName, InstructorNo)`

**`Instructor`**` ``(InstructorNo, InstructorName, InstructorLocation)`

At this stage, there should be no anomalies in third normal form.

## Boyce-Codd Normal Form (BCNF)

Resources

{% embed url="<https://www.youtube.com/watch?v=ABwD8IYByfk>" %}

{% embed url="<https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/>" %}
