EnggPedia - The Engineering Encyclopedia

Wed03292017

Last update06:04:15 AM

Font Size

Profile

Menu Style

Cpanel
Back Computer Algorithms Normalization & Different Normal Forms (NF)

Normalization & Different Normal Forms (NF)

Normalization:

Normalization is the process of splitting a relation (table) into sub relations or sub tables. Normalization is used to avoid or eliminate the three types of anomalies (insertion, deletion and update anomalies) which a database may suffer from.

Normal Form:

The normal forms of relational database theory provide criteria for determining a table's degree of immunity against logical inconsistencies and anomalies. Normal form is abbreviated as NF. Following are the normal forms which we will be studied.

  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3NF)
  4. Fourth Normal Form (4NF)
  5. Fifth Normal Form (5NF)

First Normal Form:

A relation is in first normal form if all its attributes are simple. In other words, none of the attributes of the relation is a relation. We can say that the first property of a relation is followed in First Normal Form (1NF), that is, each cell in a relation will contain only single data item.

Example 1: Assume the following relation.

STUDENT

Student_ID Name Contact#

01 Ali 123456123

678910123

02 Khan 786534677

03 Saima 568787967

In the given Relation (Table), student_ID is the primary key attribute (Uniquely identify each row). Here we can assume that the first row student of Student_ID 1 having more than one contact number in one cell. So it will simplify it like

STUDENT

Student_ID Name Contact#

01 Ali 123456123

01 Ali 678910123

02 Khan 786534677

03 Saima 568787967

Now each cell having a single value.

Example 2: Assume the following relation.

Student-courses (Sid, Sname, Phone, Courses-taken)

Where attribute Sid is the primary key, Sname is student name, Phone is student's phone number and Courses-taken is a table contains course-id, course-description, credit hours and grade for each course taken by the student.

More precise definition of table Course-taken is:

Course-taken (Course, Course-description, Credit-hours, Grade)

According to the definition of first normal form relation Student-courses is not in first normal form (1NF) because one of its attribute Courses-taken is itself a table and is not a simple attribute. To clarify it more, assume the above tables contain the data as shown below:

Student-courses

Sid Sname Phone Courses-taken

100 John 4872454 St-100-courses-taken

200 Smith 6718120 St-200-courses-taken

300 Russell 8712356 St-300-courses-taken

St-100-Course-taken

Course-id Course-description Credit-hours Grade

IS380 Database Concepts 3 A

IS416 Unix Operating System 3 B

St-200-Course-taken

Course-id Course-description Credit-hours Grade

IS380 Database Concepts 3 B

IS416 Unix Operating System 3 B

IS420 Data Net Work 3 C

St-300-Course-taken

Course-id Course-description Credit-hours Grade

IS417 System Analysis 3 A

So First Normal Form (1NF) will be.

Student-courses

Sid Sname Phone Course-id Course-description Credit-hours Grade

100 John 4872454 IS380 Database Concepts 3 A

100 John 4872454 IS416 Unix Operating System 3 B

200 Smith 6718120 IS380 Database Concepts 3 B

200 Smith 6718120 IS416 Unix Operating System 3 B

200 Smith 6718120 IS420 Data Net Work 3 C

300 Russell 8712356 IS417 System Analysis 3 A

Examination of the above Student-courses relation reveals that Sid does not uniquely identify a row (tuple) in the relation hence cannot be the primary key. For the same reason Course-id cannot be the primary key. However the combination of Sid and Course-id uniquely identifies a row in Student-courses, Therefore (Sid, Course-id) is the primary key of the above relation.

Second normal relation:

A first normal form relation is in second normal form if all its non-primary attributes are fully functionally dependent on the primary key. Note that primary attributes are those attributes, which are parts of the primary key, and non-primary attributes do not participate in the primary key. In Student-courses relation both Sid and Course-id are primary attributes because they are components of the primary key. However attributes Sname, Phone, Course-description, Credit-hours and Grade all are non primary attributes because none of them is a component of the primary key.

To convert Student-courses to second normal relations we have to make all non-primary attributes to be fully functionally dependent on the primary key. To do that we need to project (that is we break it down to two or more relations) Student-courses table into two or more tables. However projections may cause problems. To avoid such problems it is important to keep attributes, which are dependent on each other in the same table, when a relation is projected to smaller relations.

Following this principle and examination, it indicate that we should divide Student-courses relation into following three relations:

  1. PROJECT Student-courses ON (Sid, Sname, Phone) creates a table call it Student. The relation Student will be Student (Sid, Sname, Phone).
  2. PROJECT Student-courses ON (Sid, Course-id, Grade) creates a table call it Student-grade. The relation Student-grade will be Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade) {pk indicates primary key, while fk indicates foreign key}
  3. PROJECT Student-courses ON (Course-id, Course-Description, Credit-hours) create a table call it Courses.

Following are these three relations and their contents:

Student (Sid:pk, Sname, Phone)

Sid Sname Phone

100 John 4872454

200 Smith 6718120

300 Russell 8712356

Courses (Course-id::pk, Course-Description)

Course-id Course-description Credit-hours

IS380 Database Concepts 3

IS416 Unix Operating System 3

IS420 Data Net Work 3

IS417 System Analysis 3

Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade)

Sid Course-id Grade

100 IS380 A

100 IS416 B

200 IS380 B

200 IS416 B

200 IS420 C

300 IS417 A

All these three relations are in second normal form. Examination of these relations shows that we have eliminated the redundancy in the database. Now relation Student contains information only related to the entity student, relation Courses contains information related to entity Courses only, and the relation Student-grade contains information related to the relationship between these two entities.

Third Normal Form:

A second normal form relation is in third normal form if all non-primary attributes (that is attributes that are not parts of the primary key or of any candidate key) have non-transitivity dependency on the primary key.

Example:

Assume the relation:

STUDENT (Sid: pk, Activity, fee)

Sid Activity Fee

100 Swimming 100

200 Tennis 100

300 Golf 300

400 Swimming 100

Table STUDENT is in first normal form because all its attributes are simple. Also STUDENT is in second normal form because all its non-primary attributes are fully functionally dependent on the primary key (Sid).

Notice that a first normal relation with non-composite (that is simple) primary key automatically will be in second normal form because all its non-primary attributes will be fully functionally dependent on the primary key.

Table STUDENT suffers from all 3 anomalies; a new student can not be added to the database unless he/she takes an activity and no activity can be inserted into the database unless we get a student to take that activity. There is redundancy in the table (see Swimming), therefore to change the fee for Swimming we must make changes in more than one place and that will cause update anomaly problem. If student 300 is deleted from the table we also loose the fact that we had Golf activity with its fee to be 300. To overcome these anomalies STUDENT table should be converted to smaller tables.

Consider the following three projection of the STUDENT relation:

PROJECT STUDENT on [Sid, Activity] and we get a relation name it STUD-AVT (Sid:pk, Activity)

with the following data :

STUD-AVT (Sid:pk, Activity)

Sid Activity

100 Swimming

200 Tennis

300 Golf

400 Swimming

PROJECT STUDENT on [Activity, Fee] and we get a relation name AVT-Fee (Activity:pk, Fee)

with the following data :

AVT-Fee (Activity:pk, Fee)

Activity Fee

Swimming 100

Tennis 100

Golf 300

Swimming 100

PROJECT STUDENT on [Sid, Fee] and we get a relation name Sid-Fee (Sid:pk, Fee)

with the following data :

Sid-Fee (Sid:pk, Fee)

Sid Fee

100 100

200 100

300 300

400 100

The question is which pairs of these projections should we choose?

The answer to that is to choose the pair STUD-AVT and AVT-Fee because the join of these two projections produces the original STUDENT table. Such projections are called non-loss projections. Therefore the join of STUD-AVT and AVT-Fee on the common attribute Activity recreate the original STUDENT table. On the other hand as shown below the join of projections Sid-Fee and AVT-Fee on their common attribute Sid generates erroneous data that were not in the original STUDENT table and such projections are called loss projections.