ABSTRACT:
Relational databases are the most popular databases used by enterprise applications to store persistent data to this day. It gives a lot of flexibility and efficiency. A process called database normalization helps make sure that the database is free from redundancies and update anomalies. In a Database-First approach to software development, the database is designed first, and then an Object-Relational Mapping (ORM) tool is used to generate the programming classes (data layer) to interact with the database.
Finally, the business logic code is written to interact with the data layer to persist the business data to the database. However, in modern application development, a process called Code-First approach evolved where the domain classes and the business logic that interacts with the domain classes are written first. Then an Object Relational Mapping (ORM) tool is used to generate the database from the domain classes.
In this approach, since database design is not a concern, software programmers may ignore the process of database normalization altogether. To help software programmers in this process, this thesis takes the theory behind the five database normal forms (1NF – 5NF) and proposes Five Class Normal Forms (1CNF – 5CNF) that software programmers may use to normalize their domain classes.
This thesis demonstrates that when the Five Class Normal Forms are applied manually to a class by a programmer, the resulting database that is generated from the Code-First approach is also normalized according to the rules of relational theory.
SNIPPETS:
Importance of Relational Databases:
Object Relational Mapping (ORM):
ORM is a technique in which the metadata such as table names, column names,relationships (sometimes called foreign keys), indexes and more are extracted out of a database and stored in an XML or JSON file. This created metadata is then fed to a tool which understands both the type of database used as well as the type of programming language used. This tool (usually built in-house or third party) generates object-oriented programming classes 2 for the database tables, columns, relationships and more so that data can be persisted from creating, reading, updating, and deleting application code to the database.
This generation of programming classes from the database schema is possible due to the relationship between a relational database schema and an object oriented programming class. In object-oriented programming, a class is a blueprint to create objects. Objects are instances of a class, and they have attributes that define the state of that object. If an analogy was drawn between databases and programming classes, database tables could be compared to classes, columns to class attributes, and rows (which contain data in a database) to instances of objects.
Code First Approach:
In modern application development, instead of using a database-first approach that was just discussed, a popular technique is to build business applications using the Code First approach. In a Code First approach, the programming classes constituting the business/application domain are created first.
Then a tool such as Entity Framework (EF) (a third party Microsoft framework) is used to generate the database from the classes. This is the reverse process of database-first approach where an ORM tool generates the programming classes using metadata extracted from the database schema.
The steps by which the ORM tools generate a database from programming classes is given below:
1. Extract metadata information such as class names, attribute names, attribute type names, relationships between classes, etc. from the entity classes.
2. Store this metadata information either in memory or a file such as an XML or JSON file.
3. Using the metadata file and the built-in logic to issue the right database statements (example: desired SQL statements), a database is generated by the ORM tool that maps directly with the entity classes.
METHODOLOGY:
The goal of this thesis is to generate a relationally normalized database that is free from
data redundancies and update anomalies using the Code-First approach of software design. This is done by taking the theory behind relational normal forms, creating class normal forms from them, and applying the class normal forms manually to programming classes. After applying the class normal forms to the programming classes, a tool such as Entity Framework is used to generate a database from the normalized classes. After the database is generated, it can be tested to see whether it is normalized. If it is, then the goal of this thesis is met.
To achieve this goal, the following steps will be followed for each of the five database normal forms in relational database theory:
State the rules of the database normal form:
1.a. This step will state the definition of one of the five normal forms in relational database theory. This step will also explain the rules/theory behind the normal form.
2. Provide class normal form rules based on the database normal form 19
a. This step will take the rules behind the database normal form in step 1 and provide similar rules that can be applied to programming classes. This can be accomplished using the following simple mapping chart between relational database paradigm and object- oriented paradigm
3. Apply the class normal form to an example
a. This step will provide an example of how a programming class will look like before and after applying the class normal form provided in step 3
4. Generate the database and test
a. This step will use the Entity Framework tool to generate a database from the example normalized programming classes given in step 3.
b.A screenshot of the generated database output will be provided. The goal is to show that the generated database is normalized.
SOFTWARE USED:
- Microsoft® SQL Server 2014 and Management Studio
SQL Server is the database store and Management Studio is used to manage the server and create database diagrams. - Microsoft® Visual Studio Professional 2015
Visual Studio is the development IDE used to create the programming classes. - Microsoft® Windows 10 Home Edition
Windows 10 is the operating system upon which other software programs run. - Microsoft® Entity Framework 6.1.3
Entity Framework library is installed as a NuGet package into Visual Studio for the purpose of generating the database from domain programming classes. A simple representation of the working of Entity Framework’s Code-First approach is given below:
CONCLUSION:
The Class Normal Forms provided in this thesis are carefully taken from well-known theories behind database normal forms. Class Normalization in a Code-First approach is important because database normalization is important in a relational database design. The whole point of this thesis is to aid developers from an object-oriented development background to use Class Normalization as a technique to make sure their relational databases are normalized.
Database normalization is an extremely critical process in the development of a business application that relies on a relational database. However, since the introduction of the Code-First approach, database normalization could be missed completely by programmers because the focus has shifted from developing the database first to developing the code first. Since database normalization is an afterthought in the code-First approach, it can lead to redundancies and update anomalies in the database system if normalization is not accounted for.
If a database is not designed correctly, it could cost businesses hundreds to thousands of man hours to fix problems when issues arise. Also, if database normalization is not taken into account, it could lead to redundant data being stored in the database. This can increase the size of the overall database which in turn 45 increases the storage costs for a business.
Therefore, Class Normalization using the Class Normal Forms proposed in this thesis needs to be incorporated into the software development process of enterprise organizations. Just by including a process such as Class Normalization into their software development, enterprises can save a lot of money in the long run.
FUTURE WORK:
This thesis has proposed Five Class Normal Forms in the form of rules to be applied to a programming class during software development. These rules can be understood by a software programmer and the Programmer could manually follow these rules to make sure a programming class is normalized.
However, these rules could potentially be translated into programmatic rules and could be fed into a software system. This software system could then be used to validate a Software solution to see if any entity programming classes within the software solution violate any of the Class Normalization rules discussed in this thesis.
Source: University of North Florida
Author: Daniel Sushil Sudhindaran