Featured post

DBMS Tutorial – Database Management System notes

 DBMS Tutorial – Database Management System notes

DBMS stands for Database Management System. We can break it like this DBMS = Database + Management System. Database is a collection of data and Management System is a set of programs to store and retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of inter-related data and set of programs to store & access those data in an easy and effective manner. Here are the DBMS notes to help you learn database systems in a Systematic manner. Happy Learning!!

Introduction to DBMS

DBMS stands for Database Management System. We can break it like this DBMS = Database + Management System. Database is a collection of data and Management System is a set of programs to store and retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of inter-related data and set of programs to store & access those data in an easy and effective manner.

What is the need of DBMS?
Database systems are basically developed for large amount of data. When dealing with huge amount of data, there are two things that require optimization: Storage of data and retrieval of data.

Storage: According to the principles of database systems, the data is stored in such a way that it acquires lot less space as the redundant data (duplicate data) has been removed before storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account and another is salary account. Let’s say bank stores saving account data at one place (these places are called tables we will learn them later) and salary account data at another place, in that case if the customer information such as customer name, address etc. are stored at both places then this is just a wastage of storage (redundancy/ duplication of data), to organize the data in a better way the information should be stored at one place and both the accounts should be linked to that information somehow. The same thing we achieve in DBMS.

Fast Retrieval of data: Along with storing the data in an optimized and systematic manner, it is also important that we retrieve the data quickly when needed. Database systems ensure that the data is retrieved as quickly as possible.

 

 

 



Database Applications – DBMS

Applications where we use Database Management Systems are:

  • Telecom: There is a database to keeps track of the information regarding calls made, network usage, customer details etc. Without the database systems it is hard to maintain that huge amount of data that keeps updating every millisecond.

  • Industry: Where it is a manufacturing unit, warehouse or distribution centre, each one needs a database to keep the records of ins and outs. For example distribution centre should keep a track of the product units that supplied into the centre as well as the products that got delivered out from the distribution centre on each day; this is where DBMS comes into picture.

  • Banking System: For storing customer info, tracking day to day credit and debit transactions, generating bank statements etc. All this work has been done with the help of Database management systems.

  • Education sector: Database systems are frequently used in schools and colleges to store and retrieve the data regarding student details, staff details, course details, exam details, payroll data, attendance details, fees details etc. There is a hell lot amount of inter-related data that needs to be stored and retrieved in an efficient manner.

  • Online shopping: You must be aware of the online shopping websites such as Amazon, Flipkart etc. These sites store the product information, your addresses and preferences, credit details and provide you the relevant list of products based on your query. All this involves a Database management system.



Advantages of DBMS over file system

Drawbacks of File system:

  • Data Isolation: Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult.

  • Duplication of data – Redundant data

  • Dependency on application programs – Changing files would lead to change in application programs.

Advantage of DBMS over file system

There are several advantages of Database management system over file system. Few of them are as follows:

  • No redundant data – Redundancy removed by data normalization

  • Data Consistency and Integrity – data normalization takes care of it too

  • Secure – Each user has a different set of access

  • Privacy – Limited access

  • Easy access to data

  • Easy recovery

  • Flexible

Disadvantages of DBMS:

  • DBMS implementation cost is high compared to the file system

  • Complexity: Database systems are complex to understand

  • Performance: Database systems are generic, making them suitable for various applications. However this feature affect their performance for some applications





View of Data in DBMS

Abstraction is one of the main features of database systems. Hiding irrelevant details from user and providing abstract view of data to users, helps in easy and efficient user-database interaction.

To understand the view of data, you must have a basic knowledge of data abstraction and instance & schema. Refer these two tutorials to learn them in detail.

  1. Data abstraction

  2. Instance and schema



Data Abstraction in DBMS

Database systems are made-up of complex data structures. To ease the user interaction with database, the developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is called data abstraction.

 

We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is actually stored in database. You can get the complex data structure details at this level.

Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is stored in database.

View level: Highest level of data abstraction. This level describes the user interaction with database system.

Example: Let’s say we are storing customer information in a customer table. At physical level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These details are often hidden from the programmers.

At the logical level these records can be described as fields and attributes along with their data types, their relationship among each other can be logically implemented. The programmers generally work at this level because they are aware of such things about database systems.

At view level, user just interact with system with the help of GUI and enter the details at the screen, they are not aware of how the data is stored and what data is stored; such details are hidden from them.



Instance and schema in DBMS

Definition of schema: Design of a database is called the schema. Schema is of three types: Physical schema, logical schema and view schema.

The design of a database at physical level is called physical schema, how the data stored in blocks of storage is described at this level.

Design of database at logical level is called logical schema, programmers and database administrators work at this level, at this level data can be described as certain types of data records gets stored in data structures, however the internal details such as implementation of data structure is hidden at this level (available at physical level).

Design of database at view level is called view schema. This generally describes end user interaction with database systems.

To learn more about these schemas, refer 3 level data abstraction architecture.

Definition of instance: The data stored in database at a particular moment of time is called instance of database. Database schema defines the variable declarations in tables that belong to a particular database; the value of these variables at a moment of time is called the instance of that database.



keys in DBMS

Key plays an important role in relational database; it is used for identifying unique rows from table. It also establishes relationship among tables.

Types of keys in DBMS

Primary Key – A primary is a column or set of columns in a table that uniquely identifies tuples (rows) in that table.

Super Key – A super key is a set of one of more columns (attributes) to uniquely identify rows in a table.

Candidate Key – A super key with no redundant attribute is known as candidate key

Alternate Key – Out of all candidate keys, only one gets selected as primary key, remaining keys are known as alternate or secondary keys.

Composite Key – A key that consists of more than one attribute to uniquely identify rows (also known as records & tuples) in a table is called composite key.

Foreign Key – Foreign keys are the columns of a table that points to the primary key of another table. They act as a cross-reference between tables.




Primary key in DBMS

Definition: A primary is a column or set of columns in a table that uniquely identifies tuples (rows) in that table.

Example:
Student Table

Stu_Id

Stu_Name

Stu_Age

101

Steve

23

102

John

24

103

Robert

28

104

Carl

22

In the above Student table, the Stu_Id column uniquely identifies each row of the table.

Note:

  • We denote the primary key by underlining the column name.

  • The value of primary key should be unique for each row of the table. Primary key column cannot contain duplicate values.

  • Primary key column should not contain nulls.

  • Primary keys are not necessarily to be a single column; more than one column can also be a primary key for a table. For e.g.{Stu_Id, Stu_Name} collectively can play a role of primary key in the above table, but that does not make sense because Stu_Idalone is enough to uniquely identifies rows in a table then why to  make things complex. Having that said, we should choose more than one columns as primary key only when there is no single column that can play the role of primary key.

How to choose a primary key?
There are two ways: Either to create a column and let database automatically have numbers in increasing order for each row orchoose a column yourself making sure that it does not contain duplicates and nulls. For e.g. in the above Student table, TheStu_Name column cannot be a primary key as more than one people can have same name, similarly the Stu_Age column cannot play a primary key role as more than one persons can have same age.




Super key in DBMS

Definition: A super key is a set or one of more columns (attributes) to uniquely identify rows in a table. Often people get confused between super key and candidate key, so we will also discuss a little about candidate key here.

How candidate key is different from super key?
Answer is simple – Candidate keys are selected from the set of super keys, the only thing we take care while selecting candidate key is: It should not have any redundant attribute. That’s the reason they are also termed as minimal super key.

Let’s take an example to understand this: Employee table

Emp_SSN

Emp_Number

Emp_Name

123456789

226

Steve

999999321

227

Ajeet

888997212

228

Chaitanya

777778888

229

Robert

Super keys:

  • {Emp_SSN}

  • {Emp_Number}

  • {Emp_SSN, Emp_Number}

  • {Emp_SSN, Emp_Name}

  • {Emp_SSN, Emp_Number, Emp_Name}

  • {Emp_Number, Emp_Name}

All of the above sets are able to uniquely identify rows of the employee table.

Candidate Keys:
As I stated above, they are the minimal super keys with no redundant attributes.

  • {Emp_SSN}

  • {Emp_Number}

Only these two sets are candidate keys as all other sets are having redundant attributes that are not necessary for unique identification.

Primary key:
Primary key is being selected from the sets of candidate keys by database designer. So Either {Emp_SSN} or {Emp_Number} can be the primary key.




Foreign key in DBMS

Definition: Foreign keys are the columns of a table that points to the primary key of another table. They act as a cross-reference between tables.

For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it points to the primary key of the Student table.

Course_enrollment table:

 Course_Id

 Stu_Id

C01

101

C02

102

C03

101

C05

102

C06

103

C07

102

Student table:

 Stu_Id

 Stu_Name

 Stu_Age

101

Chaitanya

22

102

Arya

26

103

Bran

25

104

Jon

21

Note: Practically, the foreign key has nothing to do with the primary key tag of another table, if it points to a unique column (not necessarily a primary key) of another table then too, it would be a foreign key. So, a correct definition of foreign key would be: Foreign keys are the columns of a table that points to the candidate key of another table.




Candidate Key in DBMS

super key with no redundant attribute is known as candidate key. Candidate keys are selected from the set of super keys, the only thing we take care while selecting candidate key is: It should not have any redundant attributes. That’s the reason they are also termed as minimal super key.

For example:

Emp_Id

Emp_Number

Emp_Name

E01

2264

Steve

E22

2278

Ajeet

E23

2288

Chaitanya

E45

2290

Robert

There are two candidate keys in above table:
{Emp_Id}
{Emp_Number}

Note: A primary key is being selected from the group of candidate keys. That means we can either have Emp_Id or Emp_Number as primary key.




Composite key in DBMS

A key that consists of more than one attribute to uniquely identify rows (also known as records & tuples) in a table is called composite key. It is also known as compound key.

Example: Table – Sales

cust_Id

order_Id

product_code

product_count

C01

O001

P007

23

C02

O123

P007

19

C02

O123

P230

82

C01

O001

P890

42

Key in above table: {cust_id, order_id}
This is a composite key as it consists of more than one attribute.

Alternate key in DBMS

Out of all candidate keys, only one gets selected as primary key, remaining keys are known as alternative or secondary keys.

For example: Consider the below table

Emp_Id

Emp_Number

Emp_Name

E01

2264

Steve

E22

2278

Ajeet

E23

2288

Chaitanya

E45

2290

Robert

There are two candidate keys in above table:
{Emp_Id}
{Emp_Number}

Since we have selected Emp_Id as primary key, the remaining key Emp_Number would be called alternative or secondary key.



















Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss normal forms with examples.

Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update and deletion anomaly. Let’s take an example to understand this.

Example: Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing employee’s address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this:

emp_id

emp_name

emp_address

emp_dept

101

Rick

Delhi

D001

101

Rick

Delhi

D002

123

Maggie

Agra

D890

166

Glenn

Chennai

D900

166

Glenn

Chennai

D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.

Normalization

Here are the most commonly used normal forms:

  • First normal form(1NF)

  • Second normal form(2NF)

  • Third normal form(3NF)

  • Boyce & Codd normal form (BCNF)

First normal form (1NF)

As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold only atomic values.

Example: Suppose a company wants to store the names and contact details of its employees. It creates a table that looks like this:

emp_id

emp_name

emp_address

emp_mobile

101

Herschel

New Delhi

8912312390

102

Jon

Kanpur

8812121212

9900012222

103

Ron

Chennai

7778881212

104

Lester

Bangalore

9990000123

8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same field as you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id

emp_name

emp_address

emp_mobile

101

Herschel

New Delhi

8912312390

102

Jon

Kanpur

8812121212

102

Jon

Kanpur

9900012222

103

Ron

Chennai

7778881212

104

Lester

Bangalore

9990000123

104

Lester

Bangalore

8123450987

Second normal form (2NF)

A table is said to be in 2NF if both the following conditions hold:

  • Table is in 1NF (First normal form)

  • No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a table that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same teacher.

teacher_id

subject

teacher_age

111

Maths

38

111

Physics

38

222

Biology

38

333

Physics

40

333

Chemistry

40

Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id

teacher_age

111

38

222

38

333

40

teacher_subject table:

teacher_id

subject

111

Maths

111

Physics

222

Biology

333

Physics

333

Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at least one of the following conditions hold:

  • X is a super key of table

  • Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee, they create a table named employee_details that looks like this:

emp_id

emp_name

emp_zip

emp_state

emp_city

emp_district

1001

John

282005

UP

Agra

Dayal Bagh

1002

Ajeet

222008

TN

Chennai

M-City

1006

Lora

282007

TN

Chennai

Urrapakkam

1101

Lilly

292008

UK

Pauri

Bhagwan

1201

Steve

222999

MP

Gwalior

Ratan

 

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id). This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the transitive dependency:

employee table:

emp_id

emp_name

emp_zip

1001

John

282005

1002

Ajeet

222008

1006

Lora

282007

1101

Lilly

292008

1201

Steve

222999

employee_zip table:

emp_zip

emp_state

emp_city

emp_district

282005

UP

Agra

Dayal Bagh

222008

TN

Chennai

M-City

282007

TN

Chennai

Urrapakkam

292008

UK

Pauri

Bhagwan

222999

MP

Gwalior

Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work inmore than one department. They store the data like this:

emp_id

emp_nationality

emp_dept

dept_type

dept_no_of_emp

1001

Austrian

Production and planning

D001

200

1001

Austrian

stores

D001

250

1002

American

design and technical support

D134

100

1002

American

Purchasing department

D134

600

Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

emp_id

emp_nationality

1001

Austrian

1002

American

emp_dept table:

emp_dept

dept_type

dept_no_of_emp

Production and planning

D001

200

stores

D001

250

design and technical support

D134

100

Purchasing department

D134

600

emp_dept_mapping table:

emp_id

emp_dept

1001

Production and planning

1001

stores

1002

design and technical support

1002

Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.








Data models in DBMS

CHAITANYA SINGH | FILED UNDER: DBMS | UPDATED: APRIL 20, 2015

Data Model is a logical structure of Database. It describes the design of database to reflect entities, attributes, relationship among data, constrains etc.

Types of Data Models:
Object based logical Models – Describe data at the conceptual and view levels.

  1. E-R Model

  2. Object oriented Model

Record based logical Models – Like Object based model, they also describe data at the conceptual and view levels. These models specify logical structure of database with records, fields and attributes.

  1. Relational Model

  2. Hierarchical Model

  3. Network Model – Network Model is same as hierarchical model except that it has graph-like structure rather than a tree-based structure. Unlike hierarchical model, this model allows each record to have more than one parent record.

Physical Data Models – These models describe data at the lowest level of abstraction.

 





E-R model in DBMS

An entity–relationship model (ER model) is a systematic way of describing and defining a business process. An ER model is typically implemented as a database. The main components of E-R model are: entity set and relationship set.

Here are the geometric shapes and their meaning in an E-R Diagram –

Rectangle: Represents Entity sets.

Ellipses: Attributes

Diamonds: Relationship Set

Lines: They link attributes to Entity Sets and Entity sets to Relationship Set

Double Ellipses: Multivalued Attributes

Dashed Ellipses: Derived Attributes

Double Rectangles: Weak Entity Sets

Double Lines: Total participation of an entity in a relationship set

A sample E-R Diagram:

F:\E-R-Diagram.png

Multivalued Attributes: An attribute that can hold multiple values is known as multivalued attribute. We represent it with double ellipses in an E-R Diagram. E.g. A person can have more than one phone numbers so the phone number attribute is multivalued.

Derived Attribute: A derived attribute is one whose value is dynamic and derived from another attribute. It is represented by dashed ellipses in an E-R Diagram. E.g. Person age is a derived attribute as it changes over time and can be derived from another attribute (Date of birth).

E-R diagram with multivalued and derived attributes:

F:\multivalued_derived_attribute.png

Total Participation of an Entity set:
A Total participation of an entity set represents that each entity in entity set must have at least one relationship in a relationship set. For example: In the below diagram each college must have at-least one associated Student.
F:\total_participation_diagram.png




Relational model in DBMS

In relational model, the data and relationships are represented by collection of inter-related tables. Each table is a group of column and rows, where column represents attribute of an entity and rows represents records.

Sample relationship Model: Student table with 3 columns and four records.

Stu_Id

Stu_Name

Stu_Age

111

Ashish

23

123

Saurav

22

169

Lester

24

234

Lou

26

Course table: Course table

Stu_Id

Course_Id

Course_Name

111

C01

Science

111

C02

DBMS

169

C22

Java

169

C39

Computer Networks

Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id & Course_Name are attributes of table Course. The rows with values are the records (commonly known as tuples).





Hierarchical model in DBMS

In hierarchical model, data is organized into a tree like structure with each record is having one parent record and many children. The main drawback of this model is that, it can have only one to many relationships between nodes.

Sample Hierarchical Model Diagram:
F:\Hierarchical_Model_Diagram.png

Example of hierarchical data represented as relational tables: The above hierarchical model can be represented as relational tables like this:

Stu_Id

Stu_Name

Stu_Age

123

Steve

29

367

Chaitanya

27

234

Ajeet

28

Course Table:

Course_Id

Course_Name

Stu_Id

C01

Cobol

123

C21

Java

367

C22

Perl

367

C33

JQuery

234












DBMS languages

Database languages are used for read, update and store data in a database. There are several such languages that can be used for this purpose; one of them is SQL (Structured Query Language).

Types of DBMS languages:
Data Definition Language (DDL): DDL is used for specifying the database schema. Let’s take SQL for instance to categorize the statements that comes under DDL.

  • To create the database instance – CREATE

  • To alter the structure of database – ALTER

  • To drop database instances – DROP

  • To delete tables in a database instance – TRUNCATE

  • To rename database instances – RENAME

All these commands specify or update the database schema that’s why they come under Data Definition language.

Data Manipulation Language (DML): DML is used for accessing and manipulating data in a database.

  • To read records from table(s) – SELECT

  • To insert record(s) into the table(s) – INSERT

  • Update the data in table(s) – UPDATE

  • Delete all the records from the table – DELETE

 

Data Control language (DCL): DCL is used for granting and revoking user access on a database –

  • To grant access to user – GRANT

  • To revoke access from user – REVOKE

In practical data definition language, data manipulation language and data control languages are not separate language; rather they are the parts of a single database language such as SQL.



Functional dependency in DBMS

The attributes of a table is said to be dependent on each other when an attribute of a table uniquely identifies another attribute of the same table.

For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we can tell the student name associated with it. This is known as functional dependency and can be written as Stu_Id->Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B (Attribute B is functionally dependent on attribute A)

Types of Functional Dependencies









RDBMS Concepts

RDBMS stands for relational database management system. A relational database has following major components: Table, Record / Tuple, Field & Column /Attribute.

Table:
A table is a collection of data represented in rows and columns. For e.g. following table stores the information of students.

Student_Id

Student_Name

Student_Addr

Student_Age

101

Chaitanya

Dayal Bagh, Agra

27

102

Ajeet

Delhi

26

103

Rahul

Gurgaon

24

104

Shubham

Chennai

25

Records / Tuple:
Each row of a table is known as record or it is also known as tuple. For e.g. The below row is a record.

102

Ajeet

Delhi

26

Field:
The above table has four fields: Student_Id, Student_Name, Student_Addr & Student_Age.

Column / Attribute:
Each attribute and its values are known as attributes in a database. For e.g. Set of values of Student_Id field is one of the four columns of the Student table.

Student_Id

101

102

103

104




Cardinality in DBMS

In DBMS you may hear cardinality term at two different places and it has two different meanings as well.

In Context of Data Models:
In terms of data modeling, cardinality refers to the relationship between two tables. They can be of four types:

One to One – A single row of table 1 associates with single row of table 2

One to Many – A single row of table 1 associates with more than one rows of table 2

Many to One – Many rows of table 1 associate with a single row of table 2

Many to Many – Many rows of table 1 associate with many rows of table 2

In Context of Query Optimization:
In terms of query, the cardinality refers to the uniqueness of a column in a table. The column with all unique values would be having the high cardinality and the column with all duplicate values would be having the low cardinality. These cardinality scores helps in query optimization.


ACID properties in DBMS

To ensure the integrity of data during a transaction (A transaction is a unit of program that updates various data items, read more about it here), the database system maintains the following properties. These properties are widely known as ACID properties:

  • Atomicity: This property ensures that either all the operations of a transaction reflect in database or none. Let’s take an example of banking system to understand this: Suppose Account has a balance of 400$ & has 700$. Account is transferring 100$ to Account B. This is a transaction that has two operations a) Debiting 100$ from A’s balance b) Creating 100$ to B’s balance. Let’s say first operation passed successfully while second failed, in this case A’s balance would be 300$ while B would be having 700$ instead of 800$. This is unacceptable in a banking system. Either the transaction should fail without executing any of the operation or it should process both the operations. The Atomicity property ensures that.

  • Consistency: To preserve the consistency of database, the execution of transaction should take place in isolation (that means no other transaction should run concurrently when there is a transaction already running). For example account A is having a balance of 400$ and it is transferring 100$ to account B & C both. So we have two transactions here. Let’s say these transactions run concurrently and both the transactions read 400$ balance, in that case the final balance of A would be 300$ instead of 200$. This is wrong. If the transaction were to run in isolation then the second transaction would have read the correct balance 300$ (before debiting 100$) once the first transaction went successful.

  • Isolation: For every pair of transactions, one transaction should start execution only when the other finished execution. I have already discussed the example of Isolation in the Consistency property above.

  • Durability: Once a transaction completes successfully, the changes it has made into the database should be permanent even if there is a system failure. The recovery-management component of database systems ensures the durability of transaction.




Constraints in DBMS

Constraints enforce limits to the data or type of data that can be inserted/updated/deleted from a table. The whole purpose of constraints is to maintain the data integrity during an update/delete/insert into a table. In this tutorial we will learn several types of constraints that can be created in RDBMS.

Types of constraints

  • NOT NULL

  • UNIQUE

  • DEFAULT

  • CHECK

  • Key Constraints – PRIMARY KEY, FOREIGN KEY

  • Domain constraints

  • Mapping constraints

NOT NULL:

NOT NULL constraint makes sure that a column does not hold NULL value. When we don’t provide value for a particular column while inserting a record into a table, it takes NULL value by default. By specifying NULL constraint, we can be sure that a particular column(s) cannot have NULL values.

Example:

CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,

STU_NAME VARCHAR (35) NOT NULL,

STU_AGE INT NOT NULL,

STU_ADDRESS VARCHAR (235),

PRIMARY KEY (ROLL_NO)

);

Read more about this constraint here.

UNIQUE:

UNIQUE Constraint enforces a column or set of columns to have unique values. If a column has a unique constraint, it means that particular column cannot have duplicate values in a table.

CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,

STU_NAME VARCHAR (35) NOT NULL UNIQUE,

STU_AGE INT NOT NULL,

STU_ADDRESS VARCHAR (35) UNIQUE,

PRIMARY KEY (ROLL_NO)

);

Read more about it here.

DEFAULT:

The DEFAULT constraint provides a default value to a column when there is no value provided while inserting a record into a table.

CREATE TABLE STUDENT(

ROLL_NO   INT  NOT NULL,

STU_NAME VARCHAR (35) NOT NULL,

STU_AGE INT NOT NULL,

EXAM_FEE INT  DEFAULT 10000,

STU_ADDRESS VARCHAR (35) ,

PRIMARY KEY (ROLL_NO)

);

Read more: Default constraint

CHECK:

This constraint is used for specifying range of values for a particular column of a table. When this constraint is being set on a column, it ensures that the specified column must have the value falling in the specified range.

CREATE TABLE STUDENT(

ROLL_NO   INT  NOT NULL CHECK(ROLL_NO >1000) ,

STU_NAME VARCHAR (35)  NOT NULL,

STU_AGE INT  NOT NULL,

EXAM_FEE INT DEFAULT 10000,

STU_ADDRESS VARCHAR (35) ,

PRIMARY KEY (ROLL_NO)

);

In the above example we have set the check constraint on ROLL_NO column of STUDENT table. Now, the ROLL_NO field must have the value greater than 1000.

Key constraints:

PRIMARY KEY:

Primary key uniquely identifies each record in a table. It must have unique values and cannot contain nulls. In the below example the ROLL_NO field is marked as primary key, that means the ROLL_NO field cannot have duplicate and null values.

CREATE TABLE STUDENT(

ROLL_NO   INT  NOT NULL,

STU_NAME VARCHAR (35)  NOT NULL UNIQUE,

STU_AGE INT NOT NULL,

STU_ADDRESS VARCHAR (35) UNIQUE,

PRIMARY KEY (ROLL_NO)

);

FOREIGN KEY:

Foreign keys are the columns of a table that points to the primary key of another table. They act as a cross-reference between tables.
Read more about it here.

Domain constraints:

Each table has certain set of columns and each column allows a same type of data, based on its data type. The column does not accept values of any other data type.
Domain constraints are user defined data type and we can define them like this:

Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY / CHECK / DEFAULT)

Mapping constraints:

Read about Mapping constraint here.


Deadlock in DBMS

deadlock is a condition wherein two or more tasks are waiting for each other in order to be finished but none of the task is willing to give up the resources that other task needs. In this situation no task ever gets finished and is in waiting state forever.
F:\Deadlock_diagram.png

Coffman conditions

Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the following conditions holds true.

  • Mutual exclusion condition: There must be at least one resource that cannot be used by more than one process at a time.

  • Hold and wait condition: A process that is holding a resource can request for additional resources that are being held by other processes in the system.

  • No preemption condition: A resource cannot be forcibly taken from a process. Only the process can release a resource that is being held by it.

  • Circular wait condition: A condition where one process is waiting for a resource that is being held by second process and second process is waiting for third process ….so on and the last process is waiting for the first process. Thus making a circular chain of waiting.

Deadlock Handling

Ignore the deadlock (Ostrich algorithm)

Did that made you laugh? You may be wondering how ignoring a deadlock can come under deadlock handling. But to let you know that the windows you are using on your PC, uses this approach of deadlock handling and that is reason sometimes it hangs up and you have to reboot it to get it working. Not only Windows but UNIX also uses this approach.

The question is why? Why instead of dealing with a deadlock they ignore it and why this is being called as Ostrich algorithm?

Well! Let me answer the second question first, This is known as Ostrich algorithm because in this approach we ignore the deadlock and pretends that it would never occur, just like Ostrich behavior “to stick one’s head in the sand and pretend there is no problem.”

Let’s discuss why we ignore it: When it is believed that deadlocks are very rare and cost of deadlock handling is higher, in that case ignoring is better solution than handling it. For example: Let’s take the operating system example – If the time requires handling the deadlock is higher than the time requires rebooting the windows then rebooting would be a preferred choice considering that deadlocks are very rare in windows.

Deadlock detection

Resource scheduler is one that keeps the track of resources allocated to and requested by processes. Thus, if there is a deadlock it is known to the resource scheduler. This is how a deadlock is detected.

Once a deadlock is detected it is being corrected by following methods:

  • Terminating processes involved in deadlock: Terminating all the processes involved in deadlock or terminating process one by one until deadlock is resolved can be the solutions but both of these approaches are not good. Terminating all processes cost high and partial work done by processes gets lost. Terminating one by one takes lot of time because each time a process is terminated, it needs to check whether the deadlock is resolved or not. Thus, the best approach is considering process age and priority while terminating them during a deadlock condition.

  • Resource Preemption: Another approach can be the preemption of resources and allocation of them to the other processes until the deadlock is resolved.

Deadlock prevention

We have learnt that if all the four Coffman conditions hold true then a deadlock occurs so preventing one or more of them could prevent the deadlock.

  • Removing mutual exclusion: All resources must be sharable that means at a time more than one processes can get a hold of the resources. That approach is practically impossible.

  • Removing hold and wait condition: This can be removed if the process acquires all the resources that are needed before starting out. Another way to remove this to enforce a rule of requesting resource when there are none in held by the process.

  • Preemption of resources: Preemption of resources from a process can result in rollback and thus this needs to be avoided in order to maintain the consistency and stability of the system.

  • Avoid circular wait condition: This can be avoided if the resources are maintained in a hierarchy and process can hold the resources in increasing order of precedence. This avoid circular wait. Another way of doing this to force one resource per process rule – A process can request for a resource once it releases the resource currently being held by it. This avoids the circular wait.

Deadlock Avoidance

Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock occurrence. There are two algorithms for deadlock avoidance.

  • Wait/Die

  • Wound/Wait

Here is the table representation of resource allocation for each algorithm. Both of these algorithms take process age into consideration while determining the best possible way of resource allocation for deadlock avoidance.


Wait/Die

Wound/Wait

Older process needs a resource held by younger process

Older processwaits

Younger process dies

Younger process needs a resource held by older process

Younger process dies

Younger process waits

Once of the famous deadlock avoidance algorithm is Banker’s algorithm


Comments