|
Normalizing the database ensures that the structure of
the database allows changes to be made without incurring unexpected
consequences. The role of normalization is to maintain stable, reliable data
through good database design.
The goal of good database design is to ensure that all
restrictions are logical consequences of domain and key restrictions.
Tables, like paragraphs, should have a single theme.
The table in the anomalies examples has two themes:
Information about products
Information about the suppliers of products
The way to manage this information most efficiently is
to split the table into two tables: a table of products and a table of
suppliers.
Products
| Prod ID
|
Description |
Supplier
|
| 34 |
Keyboard |
Super Computer |
| 68 |
Processor |
Super
Computer |
| 42 |
Printer |
TechPoint |
| 20 |
RAM |
Interlink |
| 21 |
Scanner |
Computer
Mart |
| 61 |
Hard Disk |
Giga Computer |
| 46 |
Flash Disk |
Super Computer |
| 35 |
Floppy |
Micro Computer |
Suppliers
| Supplier
|
Address
|
City
|
Region> |
Country
|
| Super Compute> |
Hafeez Center |
Lahore |
OR |
Pakistan |
| Next
Generation |
Mureee Center |
Mureee |
|
Pakistan |
| Super
Computer |
Hafeez Center |
Lahore |
|
Pakistan |
| TechPoint |
Main market |
Lahore |
|
Pakistan |
| Interlink |
Raja Center |
Rawalpindi |
|
Pakistan |
| Computer
Mart |
Hall Road |
Lahore |
|
Pakistan |
| Giga Computer |
Hall Road |
Lahore |
Punjab |
Pakistan |
| Super Computer |
Hafeez Center |
Lahore |
|
Pakistan |
| Micro Computer |
Hall Road |
Lahore |
OR |
Pakistan |
Now you can add products without duplications, change
supplier locations without changing several rows, and not lose information if
you delete a part.
If you wish, you can always bring the original table
back using a query with a join on Supplier.
A Method of Database Design
As you have seen, database design plays a major role
in the stability and the reliability of your data. In this section, we show you
the process of designing a database. To help illustrate the design process, a
database named Rags is created for a fictitious wholesale clothing manufacturer
called Unlimited Rags.
Although there are a number of rules that can be
followed in designing a database structure, the design process is as much an
art as it is a science. Follow these rules when at all possible, but not to the
point where the database loses the functionality that is so important to the
user.
Doing a paper design first has several advantages:
Saves time, money, and problems
Makes system more reliable; avoids potential
data-modification problems
Serves as a blueprint for discussion
Helps in estimating costs and size
A good design should have the following objectives:
Meet the users' needs
Solve the problem
Be free of modification anomalies
Have a reliable and stable database, where the tables
are as independent as possible
Be easy to use
Design of the Database Model
The design of the database structure requires the
following steps:
1. List the objects.
2. List the facts about the objects.
3. Turn the objects and facts into tables and columns.
4. Determine the relationship among objects.
5. Determine the key columns.
6. Determine the linking columns.
7. Determine the constraints.
8. Evaluate the design model.
9. Implement the database.
Step 1: List the Objects
Make a list of all objects. An object
is a single theme, similar to a paragraph. At Unlimited Rags the objects are:
| Customer |
Ship Rate
|
| Product |
Invoice
|
| Employee |
Dependent
|
Step 2: List the Facts About the
Objects
There is a great deal of information associated with
every object. In this step, you should list the facts about an object and then
eliminate the facts that are not important to the solution of the problem. The
customer object, for example, can have many facts associated with it: company
name, address, city, founders, number of employees, stock price. In this case,
it is not important to keep information about the number of employees, stock
price, or founders. Unlimited Rags needs only the information it will use now
and possibly in the future.
| Object |
Important
Facts About the Object
|
| Employee |
employee, name, birth date, gender, SSN,
marital status
|
| Customer |
company name, address, city, state, zip,
contact, title
|
| Invoice |
date, salesperson, customer, quantity,
shipping charge, tax, freight
|
| Product |
product name, description, cost, markup
|
| Dependent |
name, date of birth
|
| Ship Rates |
state, rates
|
Step 3: Turn the Objects and Facts
into Tables and Columns
Objects automatically become tables, and facts become
columns once the column domains are determined. Recall that a domain is a set
of values that a column can have. Every column has a domain, which has both
physical and logical properties. For example, the column for employee last name
is defined as TEXT 15. TEXT 15 is the physical property of the column. Because
of this definition, its domain is the set of all employee last names with 15
characters or less.
If a column is used to link two or more tables, the
domains must be the same and the columns should be given the same name. If the
logical description differs (for example, employee last name and customer last
name), the columns are not the same and should not share the same name.
The following is a list of the preliminary tables,
columns, and domains for Unlimited Rags:
Table: CUSTOMER Table: PRODUCT
| Name |
Type
|
Length |
|
Name |
Type |
Length |
| COMPANY |
TEXT
|
45 |
|
PRODNAME
|
TEXT |
30 |
| CADD1 |
TEXT
|
30 |
|
PRODDESC
|
TEXT |
50 |
| CADD2 |
TEXT
|
30 |
|
PRODCOST
|
CURR |
|
| CCITY |
TEXT
|
25 |
|
PMARKUP
|
NUMB |
|
| CSTATE |
TEXT
|
2 |
|
|
|
|
| CZIP |
TEXT
|
10 |
|
Table: DEPENDENT
|
| CAC |
TEXT
|
3 |
|
Name
|
Type |
Length
|
| CTELPH |
TEXT
|
7 |
|
DLAST
|
TEXT |
15 |
| CONTACT |
TEXT
|
30 |
|
DFIRST
|
TEXT |
10 |
| TITLE |
TEXT
|
30 |
|
DDOB
|
D/T |
|
Table: INVOICE Table: EMPLOYEE
| Name |
Type |
Length
|
|
Name |
Type
|
Length |
| INVDATE |
D/T
|
|
|
ESSN
|
TEXT |
11
|
| REQDATE |
D/T
|
|
|
ELASTN
|
TEXT |
15
|
| SHIPNAME |
TEXT
|
45 |
|
EFIRSTN
|
TEXT |
10
|
| SHIPADDR |
TEXT
|
30 |
|
EDOB
|
D/T |
|
| SHIPCITY |
TEXT
|
25 |
|
EGENDER
|
TEXT |
1
|
| SHIPZIP |
TEXT
|
10 |
|
EMARITAL
|
TEXT |
1
|
| INVTOTAL |
CURR
|
|
|
EADDR1
|
TEXT |
30
|
| |
|
|
|
EADDR2 |
TEXT
|
20 |
| |
|
ECITY
|
TEXT |
25
|
| Table: SHIP RATE
|
|
ESTATE
|
TEXT |
2
|
| Name |
Type
|
Length |
|
EZIP |
TEXT
|
10 |
| SHIPST |
TEXT
|
2 |
|
EAC
|
TEXT |
3
|
| SHIPRATE |
NUMB
|
|
|
EHOMEPH
|
TEXT |
7
|
Often it helps in the design stages to draw boxes to
represent the tables. In later steps you can then fill in key columns and draw
the relationships among the tables.
Step 4: Determine the
Relationship Among Objects
To determine the relationship among the objects, take
each object and look at how that object may be related to another. Keep in mind
that not every relationship existing between objects is important. The
relationships that are important are those that allow you to model the database
after the real-world situation that the database represents.
One-to-one relationships. For any given row
in Table A, there is only one row in Table B. For any given row in Table B,
there is only one row in Table A. There are no one-to-one relationships in the
Rags database. An example of a one-to-one relationship is that of employee data
and private employee data. General information, such as employee name, address,
and start date, is kept in one table, and to ensure privacy, personal
information, such as salary, is kept in another table.
One-to-many relationships. For any given row
in Table A, there are many rows in Table B. For any given row in Table B, there
is only one row in Table A. The relationship between an employee and an
employee's dependents is one-to-many, because one employee may have many
dependents, but a dependent is related to only one employee. The relationship
between customers and invoices is also one-to-many. One invoice is related to
one customer, but a customer can have many invoices.
Many-to-many relationships. For any given row
in Table A, there are many rows in Table B. For any given row in Table B, there
are many rows in Table A. There is a many-to-many relationship between the
product table and the invoice table. A product can be associated with many
different invoices and an invoice can contain many different products.
In the case of the Rags database, we are attempting to
model an environment that is based on sales transactions. Take the example of
products and customers: Although in some circumstances we may be interested in
the relationship between customers and products, in a sales transaction, the
customer is related to a product only when a sale occurs. Therefore, a customer
is related to an invoice, and the invoice carries the relationship to a
product.
The first step in determining the type of relationship
between tables is to list every table and to see how it relates to any others:
Customer is related to invoice.
Customer is not related to any other table in
the list.
Employee is related to dependent.
Employee (sales) is related to invoice.
Product is related to invoice.
An effective method to find the type of relationship
is to ask whether a specific record in Table A can point to (is linked to) one
or to many rows in Table B, and then reverse the tables and ask the question
again.
Does a customer record point to one or many invoices?
Many
Does an invoice row link with one or many customers?
One
The relationship between the tables is one-to-many.
A sales employee writes one or many invoices? Many
An invoice is written by one or many employees? One
The relationship between employee and invoice is also
one-to-many.
A product can be a line item on one or many invoices?
Many
Can an invoice be linked to one or many products? Many
The relationship between product and invoice is
many-to-many.
The Ship Rate table illustrates that a table can be
included in a database and not need to be relationally linked to any other
table.
Step 5: Determine the Key Columns
A key can be an account number, social security
number, part number, license number, or any other numeric value or combination
of characters that are unique. A complex key is one that is derived from more
than one column. Microsoft Access supports complex keys directly.
No other row in the table can have the value of the
key column(s). Other tables may share the same set of key information. If a
company name is universally unique, it is used as a unique row identifier.
However, if there is any possibility another company could have the same name,
then it is not unique and must not be employed as a key column. Do not use any
column as a key where the possibility exists for a duplicate. A key column
cannot contain null values.
By definition, all key columns should be indexed.
Because text names are usually not unique and cannot
be used in math operations, it is useful to make key columns a sequential
numeric value. In many cases, it is easier to develop your own unique row
identifier. If you want automatic numbering for invoice numbers or employee ID
numbers, COUNTER data type in Microsoft Access is a good choice for a physical
description for the domain of a key column.
Most of the tables in the final Rags database contain
columns with a COUNTER or NUMBR data type for the unique row identifier. Each
key is also indexed, and duplicates are not allowed. Database performance is
enhanced with a single numeric column as the key.
Step 6: Determine the Linking Columns
If you have been careful about designating key
columns, you also have determined the linking columns. Links provide a way to
tie information (rows) in one table to another table. If a table has a key
column, that column can generally serve as the link. Tables are linked together
through their key columns. However, the placement of the key is important, and
where the link is placed depends on the type of relationship between the
tables.
To determine the placement of the links, you must
first know the type of relationship among the objects or tables. Once you know
the type of relationship among tables, it is much easier to determine where to
place the linking column to tie two tables together.
Note that not all tables need to be linked
relationally. Employees must be linked with dependents, but you would not link
employees with ship rates or products.
Linking in a one-to-one relationship. In
one-to-one relationships the link should be the most stable column or should be
from the table where the key column is created. The most stable is the column
least likely to change. If an automatic numbering system is being used, then
use that column as the linking column.
Linking in a one-to-many relationship. In
one-to-many relationships the linking column should come from the one table.
The key column from the employee table (one side) should be placed in the
dependent table (many side). When the key empid is placed in the dependent
table, it is referred to as a foreign key in the dependent table.
Linking in a many-to-many relationship. The
many-to-many relationship causes problems when attempting to retrieve data and
when relating a value in one table to its corresponding value in the other
table. It is important to understand this relationship to be able to recognize
and control this situation when it arises.
A classic many-to-many relationship is product and
invoice. A product can be an item on many different invoices and an invoice can
have many products associated with it.
But which key will we use for a link? If invid is
placed in the product table, then all of the product data would have to be
repeated for each invoice that contains that product. If prodid is placed in
the invoice table, then the invoice information has to be repeated for each
product contained in the invoice. This leads to redundant data, and the
potential for invalid data is increased. Performance may suffer.
The solution to many-to-many relationships is to
create an intersection table. This table should contain the key columns from
both tables. This is illustrated in the following diagram.
Step 7: Determine the Relationship
Constraints
Often the information we get from a database comes
from more than one table. For example, if we want to know who the parent of a
particular dependent is, the name is determined by using the value in empid to
look up the correct row in the employee table. The question of who the parent
is can be answered only if there is a row in the employee table with an empid
value corresponding to that in the dependent table.
To ensure the integrity of the data in our database,
our model should require, for example, that no row can be added to the
dependent table, unless there is already a corresponding row in the employee
table. This requirement is known as a relationship constraint. In this case, a
constraint must exist on the dependent table that ensures that the employee
(parent) exists.
If you are creating an invoice, you must have a
customer to bill. An entry in the customer table must exist before the invoice
can be written. In this case, a constraint must exist on the invoice table to
ensure that the customer exists.
There are at least four methods to implement
relationship constraints:
Built-in controls in the DBMS
Data entry and access procedures
Programming
Implementation of rules
Microsoft Access has certain referential integrity
constraint mechanisms built into the engine. With Microsoft FoxPro, the
relationship constraints must be handled programmatically.
In Microsoft Access, rules at the database or form
level can be employed to enforce column domains (for example, accept values
less than 200, or text value must be F or M) or in any other operation where
you want a data entry test to be performed.
Step 8: Evaluate the Design
The next step in the design process is the evaluation
of the design. In this step, you should look for any design flaws that could
cause the data to be unreliable, unstable, or redundant.
Every table should be evaluated by asking the
following questions:
1. Does each table have a single theme? It should.
Each column should be a fact about the key.
2. Does each table have a key column(s)? It should.
3. Are there any dependencies? Only logical
consequences of the key should exist.
4. Are the domains unique among tables? Do not mix
domains unless the column is common between tables.
5. Are the restrictions domain or key?
6. Is the table easy to use?
Evaluation of the Customer Table
| CUSTID |
COMPANY
|
| |
CADD1
|
| |
CADD2
|
| |
CCITY
|
| |
CSTATE
|
| |
CZIP |
| |
CAC |
| |
CTELPH
|
| |
CONTACT
|
| |
TITLE
|
The table has a single theme: customers.
The table has a key: custid.
The table does not have any dependencies that are not
logical consequences of the key. Given custid, a company and company address
can be uniquely determined. Given a company, we cannot determine any particular
custid. Given a state, we cannot determine any particular custid. Therefore,
the customer table does not have any dependencies.
The column names are not used in any other tables
except for custid, which is a foreign key in the invoice table.
The restrictions are domain or key.
Step 9: Implement the Design
Once the database had been designed on paper, the next
step is to implement the design in Microsoft Access. When defining tables in
Microsoft Access, it is extremely important to keep your paper design in mind.
Designing a database on the fly can cause problems that may be quite difficult
to recover from. (Remember the anomalies earlier in this chapter.)
In Microsoft Access 2.0, there are two tools that will
help you complete the implementation of your design. The Table Wizard can be
used to generate a variety of common tables. The graphical system relationships
window can be used to set up relationships and key dependencies. Note, that
while using the Table Wizard ensures proper relational design, the eight steps
prior to implementation remain important and should be completed prior to
constructing the tables.
The following is a list of the final tables, columns,
and domains for Unlimited Rags, including linking columns:
Table: CUSTOMER Table: PRODUCT
| Name |
Type
|
Length |
|
Name |
Type |
Length |
| CUSTID |
COUNTER
|
|
PRODID
|
COUNTER |
| COMPANY |
TEXT
|
45 |
|
PNAME
|
TEXT |
30 |
| CADD1 |
TEXT
|
30 |
|
PDESCRIP
|
TEXT |
50 |
| CADD2 |
TEXT
|
30 |
|
PCOST
|
CURR |
|
| CCITY |
TEXT
|
25 |
|
PMARKUP
|
NUMB |
|
| CSTATE |
TEXT
|
2 |
|
|
|
|
| CZIP |
TEXT
|
10 |
|
|
| CAC |
TEXT
|
3 |
|
Table: SHIP RATE
|
| CTELPH |
TEXT
|
7 |
|
Name
|
Type |
Length
|
| CONTACT |
TEXT
|
30 |
|
SHIPST
|
TEXT |
2 |
| TITLE |
TEXT
|
30 |
|
SHIPRATE
|
NUMB |
|
Table: INVOICE Table: TRANSACTION
| Name |
Type
|
Length |
|
Name |
Type
|
Length |
| INVID |
COUNTER
|
|
INVID
|
NUMB |
|
| CUSTID |
NUMB
|
|
| |