What is Normalization
Normalization is a design
technique that is widely used as a guide in designing
relational databases. Normalization is essentially a two
step process that puts data into tabular form by removing
repeating groups and then removes duplicated data from the
relational tables.
Normalization theory is based
on the concepts of
normal forms. A relational table is said to be a
particular normal form if it satisfied a certain set of
constraints. There are currently five normal forms that have
been defined.
Why Normalization
The goal of normalization is
to create a set of relational tables that are free of
redundant data and that can be consistently and correctly
modified. This means that all tables in a relational
database should be in the third normal form (3NF).
A relational table is in 3NF if and only if all non-key
columns are
(a) mutually independent and
(b) fully dependent upon the primary key.
Mutual independence means that no non-key column is
dependent upon any combination of the other columns. The
first two normal forms are intermediate steps to achieve the
goal of having all tables in 3NF. In order to better
understand the 2NF and higher forms, it is necessary to
understand the concepts of functional dependencies and
lossless decomposition.
Functional Dependencies
The concept of functional
dependencies is the basis for the first three normal forms.
A column, Y, of the relational table R is said to be
functionally dependent upon column X of R if and only if
each value of X in R is associated with precisely one value
of Y at any given time. X and Y may be composite. Saying
that column Y is functionally dependent upon X is the same
as saying the values of column X identify the values of
column Y. If column X is a primary key, then all columns in
the relational table R must be functionally dependent upon
X.
A short-hand notation for
describing a functional dependency is:
R.x —> R.y
which can be read as in the
relational table named R, column x functionally determines
(identifies) column y.
Full functional dependence
applies to tables with composite keys. Column Y in
relational table R is fully functional on X of R if it is
functionally dependent on X and not functionally dependent
upon any subset of X. Full functional dependence means that
when a primary key is composite, made of two or more
columns, then the other columns must be identified by the
entire key and not just some of the columns that make up the
key.
Normalization is the process
of removing redundant data from relational tables by
decomposing (splitting) a relational table into smaller
tables by projection. The goal is to have only primary keys
on the left hand side of a functional dependency. In order
to be correct, decomposition must be lossless. That is, the
new tables can be recombined by a natural join to recreate
the original table without creating any spurious or
redundant data.
Sample data is used to
illustrate the process of normalization. A company obtains
parts from a number of suppliers. Each supplier is located
in one city. A city can have more than one supplier located
there and each city has a status code associated with it.
Each supplier may provide many parts. The company creates a
simple relational table to store this information that can
be expressed in relational notation as:
FIRST (s#, status, city, p#,
qty)
where
|
s# |
Supplier identification
number (Primary key) |
|
status |
Status code assigned to
city |
|
city |
Name of city where
supplier is located |
|
p# |
Part number of part
supplied |
|
qty |
Quantity of parts
supplied to date |
In order to uniquely
associate quantity supplied (qty) with part (p#) and
supplier (s#), a composite primary key composed of s# and
p# is used.