Notes on DBMS

Saturday, July 16, 2016

Structure of DBMS

At very high level, a database is considered as shown in below diagram. Let us see them in detail below.

Applications: - It can be considered as a user friendly web page where the user enters the requests. Here he simply enters the details that he needs and presses buttons to get the data.
End User: - They are the real users of the database. They can be developers, designers, administrator or the actual users of the database.
DDL: - Data Definition Language (DDL) is a query fired to create database, schema, tables, mappings etc in the database. These are the commands used to create the objects like tables, indexes in the database for the first time. In other words, they create structure of the database.
DDL Compiler: - This part of database is responsible for processing the DDL commands. That means these compiler actually breaks down the command into machine understandable codes. It is also responsible for storing the metadata information like table name, space used by it, number of columns in it, mapping information etc.
DML Compiler: - When the user inserts, deletes, updates or retrieves the record from the database, he will be sending request which he understands by pressing some buttons. But for the database to work/understand the request, it should be broken down to object code. This is done by this compiler. One can imagine this as when a person is asked some question, how this is broken down into waves to reach the brain!
Query Optimizer: - When user fires some request, he is least bothered how it will be fired on the database. He is not all aware of database or its way of performance. But whatever be the request, it should be efficient enough to fetch, insert, update or delete the data from the database. The query optimizer decides the best way to execute the user request which is received from the DML compiler. It is similar to selecting the best nerve to carry the waves to brain!
Stored Data Manager: - This is also known as Database Control System. It is one the main central system of the database. It is responsible for various tasks

It converts the requests received from query optimizer to machine understandable form. It makes actual request inside the database. It is like fetching the exact part of the brain to answer.
It helps to maintain consistency and integrity by applying the constraints. That means, it does not allow inserting / updating / deleting any data if it has child entry. Similarly it does not allow entering any duplicate value into database tables.
It controls concurrent access. If there is multiple users accessing the database at the same time, it makes sure, all of them see correct data. It guarantees that there is no data loss or data mismatch happens between the transactions of multiple users.
It helps to backup the database and recover data whenever required. Since it is a huge database and when there is any unexpected exploit of transaction, and reverting the changes are not easy. It maintains the backup of all data, so that it can be recovered.

Data Files: - It has the real data stored in it. It can be stored as magnetic tapes, magnetic disks or optical disks.
Compiled DML: - Some of the processed DML statements (insert, update, delete) are stored in it so that if there is similar requests, it will be re-used.
Data Dictionary: - It contains all the information about the database. As the name suggests, it is the dictionary of all the data items. It contains description of all the tables, view, materialized views, constraints, indexes, triggers etc.

Database Users and Administrators

A primary goal of a database system is to retrieve information from and store new information in the database. People who work with a database can be categorized as database users or database administrators.

1) Database Users and User Interfaces

There are four different types of database-system users, differentiated by the way they expect to interact with the system. Different types of user interfaces have been designed for the different types of users:

• Naive users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. For example, a bank teller who needs to transfer $50 from account Ato account B invokes a program calledtransfer. This program asks the teller for the amount of money to be transferred, the account from which the money is to be transferred, and the account to which the money is to be transferred.

As another example, consider a user who wishes to find her account balance over the World Wide Web. Such a user may access a form, where she enters her account number. An application program at the Web server then retrieves the account balance, using the given account number, and passes

this information back to the user.

The typical user interface for naive users is a forms interface, where the user can fill in appropriate fields of the form. Naive users may also simply read reports generated from the database.

•Application programmers are computer professionals who write application programs. Application programmers can choose from many tools to develop user interfaces.Rapid application development (RAD)tools are tools that enable an application programmer to construct forms and reports without writing a program. There are also special types of programming languages that combine imperative control structures (for example, for loops, while loops and if-then-else statements) with statements of the data manipulation language. These languages, sometimes calledfourth-generation languages,often

include special features to facilitate the generation of forms and the display of data on the screen. Most major commercial database systems include a fourth generation language.

•Sophisticated users interact with the system without writing programs. Instead, they form their requests in a database query language. They submit each such query to a query processor, whose function is to break down DML statements into instructions that the storage manager understands. Analysts who submit queries to explore data in the database fall in this category.

Online analytical processing (OLAP)tools simplify analysts’ tasks by letting them view summaries of data in different ways. For instance, an analyst can see total sales by region (for example, North, South, East, and West), or by product, or by a combination of region and product (that is, total sales of each product in each region). The tools also permit the analyst to select specific regions, look at data in more detail (for example, sales by city within a region) or look at the data in less detail (for example, aggregate products together by category).

Another class of tools for analysts is data mining tools, which help them find certain kinds of patterns in data.

•Specialized users are sophisticated users who write specialized database applications that do not fit into the traditional data-processing framework. Among these applications are computer-aided design systems, knowledge base and expert systems, systems that store data with complex data types (for

example, graphics data and audio data), and environment-modeling systems.

2 Database Administrator

One of the main reasons for using DBMSs is to have central control of both the data and the programs that access those data. A person who has such central control over the system is called a database administrator(DBA). The functions of a DBA include:

•Schema definition. The DBA creates the original database schema by executing a set of data definition statements in theDDL.

•Storage structure and access-method definition.

•Schema and physical-organization modification.The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization, or to alter the physical organization to improve performance.

•Granting of authorization for data access. By granting different types of authorization, the database administrator can regulate which parts of the database various users can access. The authorization information is kept in a special system structure that the database system consults whenever someone attempts to access the data in the system.

•Routine maintenance. Examples of the database administrator’s routine maintenance activities are:

a) Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of data in case of disasters such as flooding.

b) Ensuring that enough free disk space is available for normal operations, and upgrading disk space as required.

c) Monitoring jobs running on the database and ensuring that performance is not degraded by very expensive tasks submitted by some users.

Summary:

DBA Responsibilities

Installation, configuration and upgrading of Microsoft SQL Server/MySQL/Oracle server software and related products.
Evaluate MSSQL/MySQL/Oracle features and MSSQL/MySQL/Oracle related products.
Establish and maintain sound backup and recovery policies and procedures.
Take care of the Database design and implementation.
Implement and maintain database security (create and maintain users and roles, assign privileges).
Database tuning and performance monitoring.
Application tuning and performance monitoring.
Setup and maintain documentation and standards.
Plan growth and changes (capacity planning).
Work as part of a team and provide 7×24 supports when required.
Do general technical trouble shooting and give consultation to development teams.
Interface with MSSQL/MySQL/Oracle for technical support.
ITIL Skill set requirement (Problem Management/Incident Management/Chain Management etc)

Types of DBA

Administrative DBA – Work on maintaining the server and keeping it running. Concerned with backups, security, patches, replication, etc. Things that concern the actual server software.
Development DBA – works on building queries, stored procedures, etc. that meet business needs. This is the equivalent of the programmer. You primarily write T-SQL.
Architect – Design schemas. Build tables, FKs, PKs, etc. Work to build a structure that meets the business needs in general. The design is then used by developers and development DBAs to implement the actual application.
Data Warehouse DBA – Newer role, but responsible for merging data from multiple sources into a data warehouse. May have to design warehouse, but cleans, standardizes, and scrubs data before loading. In SQL Server, this DBA would use DTS heavily.
OLAP DBA – Builds multi-dimensional cubes for decision support or OLAP systems. The primary language in SQL Server is MDX, not SQL here

Application DBA- Application DBAs straddle the fence between the DBMS and the application software and are responsible for ensuring that the application is fully optimized for the database and vice versa. They usually manage all the application components that interact with the database and carry out activities such as application installation and patching, application upgrades, database cloning, building and running data cleanup routines, data load process management, etc.

Wednesday, July 13, 2016

Architecture of Database

Database architecture can be 2-tier or 3 tier architecture based on how users are connected to the database to get their request done. They can either directly connect to the database or their request is received by intermediary layer, which synthesizes the request and then it sends to database.

2-tier Architecture

In 2-tier architecture, application program directly interacts with the database. There will not be any user interface or the user involved with database interaction. Imagine a front end application of School, where we need to display the reports of all the students who are opted for different subjects. In this case, the application will directly interact with the database and retreive all required data. Here no inputs from the user are required. This involves 2-tier architecture of the database.

Let us consider another example of two tier architecture. Consider a railway ticket reservation system. How does this work? Imagine a person is reserving the ticket from Delhi to Goa on particular day. At the same time another person in some other place of Delhi is also reserving the ticket to Goa on the same day for the same train. Now there is a requirement for two tickets, but for different persons. What will reservation system do? It takes the request from both of them, and queues the requests entered by each of them. Here the request entered to application layer and request is sent to database layer. Once the request is processed in database, the result is sent back to application layer for the user.

Easy to understand as it directly communicates with the database.
Requested data can be retrieved very quickly, when there is less number of users.
Easy to modify – any changes required, directly requests can be sent to database
Easy to maintain – When there are multiple requests, it will be handled in a queue and there will not be any chaos.

Disadvantages of 2-tier architecture:

It would be time consuming, when there is huge number of users. All the requests will be queued and handed one after another. Hence it will not respond to multiple users at the same time.
This architecture would little cost effective.

3-tier Architecture

3-tier architecture is the most widely used database architecture. It can be viewed as below.

Presentation layer / User layer is the layer where user uses the database. He does not have any knowledge about underlying database. He simply interacts with the database as though he has all data in front of him. You can imagine this layer as a registration form where you will be inputting your details. Did you ever guessed, after pressing ‘submit’ button where the data goes? No right? You just know that your details are saved. This is the presentation layer where all the details from the user are taken, sent to the next layer for processing.
Application layer is the underlying program which is responsible for saving the details that you have entered, and retrieving your details to show up in the page. This layer has all the business logics like validation, calculations and manipulations of data, and then sends the requests to database to get the actual data. If this layer sees that the request is invalid, it sends back the message to presentation layer. It will not hit the database layer at all.
Data layer or Database layer is the layer where actual database resides. In this layer, all the tables, their mappings and the actual data present. When you save you details from the front end, it will be inserted into the respective tables in the database layer, by using the programs in the application layer. When you want to view your details in the web browser, a request is sent to database layer by application layer. The database layer fires queries and gets the data. These data are then transferred to the browser (presentation layer) by the programs in the application layer.

Advantages of 3-tier architecture:

Easy to maintain and modify. Any changes requested will not affect any other data in the database. Application layer will do all the validations.
Improved security. Since there is no direct access to the database, data security is increased. There is no fear of mishandling the data. Application layer filters out all the malicious actions.
Good performance. Since this architecture cache the data once retrieved, there is no need to hit the database for each request. This reduces the time consumed for multiple requests and hence enables the system to respond at the same time.

Disadvantages of 3-tier architecture are that it is little more complex and little more effort is required in terms of hitting the database.

Characteristics of a Good DBMS

Should be able to store all kinds of data that exists in this real world. Since we need to work with all kinds of data and requirements, database should be strong enough to store all kinds of data that is present around us.
Should be able to relate the entities / tables in the database by means of a relation. i.e.; any two tables should be related. Let us say, an employee works for a department. This implies that Employee is related to a particular department. We should be able to define such a relationship between any two entities in the database. There should not be any table lying without any mapping.
Data and application should be isolated. Because database is a system which gives the platform to store the data, and the data is the one which allows the database to work. Hence there should be clear differentiation between them.
There should not be any duplication of data in the database. Data should be stored in such a way that it should not be repeated in multiple tables. If repeated, it would be unnecessary waste of DB space and maintaining such data becomes chaos.
DBMS has a strong query language. Once the database is designed, this helps the user to retrieve and manipulate the data. If a particular user wants to see any specific data, he can apply as many filtering conditions that he wants and pull the data that he needs.
Multiple users should be able to access the same database, without affecting the other user. i.e.; if teachers want to update a student’s marks in Results table at the same time, then they should be allowed to update the marks for their subjects, without modifying other subject marks. A good database should support this feature.
It supports multiple views to the user, depending on his role. In a school database, Students will able to see only their reports and their access would be read only. At the same time teachers will have access to all the students with the modification rights. But the database is the same. Hence a single database provides different views to different users.
Database should also provide security, i.e.; when there are multiple users are accessing the database, each user will have their own levels of rights to see the database. Some of them will be allowed to see whole database, and some will have only partial rights. For example, instructor who is teaching Physics will have access to see and update marks of his subject. He will not have access for other subjects. But the HOD will have full access on all the subjects.
Database should also support ACID property. i.e.; while performing any transactions like insert, update and delete, database makes sure that the real purpose of the data is not lost. For example, if a student’s address is updated, then it should make sure that there is no duplicate data is created nor there is any data mismatch for that student.

As we now know what is a database, who would be the users of database? Of course the developers will be using this database to design and develop. Who else? There would be an administrator, who keeps watching the database for its usages, who is accessing it, giving access to other users, limiting the security for the users, and any other maintenance work of the database. And there is one more end users. These end users are the real group of people who really uses the database and takes the advantages of database. In School database, teachers, students are the end users, who really uses the database in their daily needs.

Where all are these database used? Everywhere!! Now a day, database is used in each and every place. We can see the use of database in supermarkets, stock exchange, college, library, ATMs, offices, banks, hospitals etc.

Summary:

Characteristics

Traditionally, data was organized in file formats. DBMS was a new concept then, and all the research was done to make it overcome the deficiencies in traditional style of data management. A modern DBMS has the following characteristics −

Real-world entity − A modern DBMS is more realistic and uses real-world entities to design its architecture. It uses the behavior and attributes too. For example, a school database may use students as an entity and their age as an attribute.
Relation-based tables − DBMS allows entities and relations among them to form tables. A user can understand the architecture of a database just by looking at the table names.
Isolation of data and application − A database system is entirely different than its data. A database is an active entity, whereas data is said to be passive, on which the database works and organizes. DBMS also stores metadata, which is data about data, to ease its own process.
Less redundancy − DBMS follows the rules of normalization, which splits a relation when any of its attributes is having redundancy in values. Normalization is a mathematically rich and scientific process that reduces data redundancy.
Consistency − Consistency is a state where every relation in a database remains consistent. There exist methods and techniques, which can detect attempt of leaving database in inconsistent state. A DBMS can provide greater consistency as compared to earlier forms of data storing applications like file-processing systems.
Query Language − DBMS is equipped with query language, which makes it more efficient to retrieve and manipulate data. A user can apply as many and as different filtering options as required to retrieve a set of data. Traditionally it was not possible where file-processing system was used.
ACID Properties − DBMS follows the concepts of Atomicity,Consistency, Isolation, and Durability (normally shortened as ACID). These concepts are applied on transactions, which manipulate data in a database. ACID properties help the database stay healthy in multi-transactional environments and in case of failure.
Multiuser and Concurrent Access − DBMS supports multi-user environment and allows them to access and manipulate data in parallel. Though there are restrictions on transactions when users attempt to handle the same data item, but users are always unaware of them.
Multiple views − DBMS offers multiple views for different users. A user who is in the Sales department will have a different view of database than a person working in the Production department. This feature enables the users to have a concentrate view of the database according to their requirements.
Security − Features like multiple views offer security to some extent where users are unable to access data of other users and departments. DBMS offers methods to impose constraints while entering data into the database and retrieving the same at a later stage. DBMS offers many different levels of security features, which enables multiple users to have different views with different features. For example, a user in the Sales department cannot see the data that belongs to the Purchase department. Additionally, it can also be managed how much data of the Sales department should be displayed to the user. Since a DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to break the code.

Various Objectives of Database Management System

Mass Storage

DBMS can store a lot of data in it. So for all the big firms, DBMS is really ideal technology to use. It can store thousands of records in it and one can fetch all that data whenever it is needed.

Removes Duplicity

If you have lots of data then data duplicity will occur for sure at any instance. DBMS guarantee it that there will be no data duplicity among all the records. While storing new records, DBMS makes sure that same data was not inserted before.

Multiple Users Access

No one handles the whole database alone. There are lots of users who are able to access database. So this situation may happen that two or more users are accessing database. They can change whatever they want, at that time DBMS makes it sure that they can work concurrently.

Data Protection

Information such as bank details, employee’s salary details and sale purchase details should always be kept secured. Also all the companies need their data secured from unauthorized use. DBMS gives a master level security to their data. No one can alter or modify the information without the privilege of using that data.

Data Back up and recovery

Sometimes database failure occurs so there is no option like one can say that all the data has been lost. There should be a backup of database so that on database failure it can be recovered. DBMS has the ability to backup and recover all the data in database.

Everyone can work on DBMS

There is no need to be a master of programming language if you want to work on DBMS. Any accountant who is having less technical knowledge can work on DBMS. All the definitions and descriptions are given in it so that even a non-technical background w=person can work on it.

Integrity

Integrity means your data is authentic and consistent. DBMS has various validity checks that make your data completely accurate and consistence.

Platform Independent

One can run dbms at any platform. No particular platform is required to work on database management system.

So it was all about Various Objectives of Database Management System. If you have any question regarding this topic then please comment below.

Database Languages

A database system provides a data definition language to specify the database schema and adata manipulation language to express database queries and updates. In practice, the data definition and data manipulation languages are not two separate languages; instead they simply form parts of a single database language, such as the widely used SQL language.

a) Data-Definition Language

We specify a database schema by a set of definitions expressed by a special language

called a data-definition language(DDL). For instance, the following statement in the SQL language defines the account table:

create table account

(account-number char(10),

balance integer)

Execution of the above DDL statement creates the account table. In addition, it updates a special set of tables called the data dictionary or data directory. A data dictionary contains meta data—that is, data about data. The schema of a table is an example of metadata. A database system consults the data dictionary before reading or modifying actual data.

We specify the storage structure and access methods used by the database system by a set of statements in a special type of DDL called a data storage and definition language. These statements define the implementation details of the database schemas, which are usually hidden from the users.

The data values stored in the database must satisfy certain consistency constraints.

For example, suppose the balance on an account should not fall below $100. The DDL provides facilities to specify such constraints. The database systems check these constraints every time the database is updated.

b) Data-Manipulation Language

Data manipulation is

•The retrieval of information stored in the database

•The insertion of new information into the database

•The deletion of information from the database

•The modification of information stored in the database

A data-manipulation language (DML)is a language that enables users to access or manipulate data as organized by the appropriate data model. There are basically two types:

•Procedural DMLs require a user to specify what data are needed and how to get those data.

•Declarative DMLs (also referred to as non procedural DMLs) require a user to specify what data are needed without specifying how to get those data.

Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a user does not have to specify how to get the data, the database system has to figure out an efficient means of accessing data. TheDMLcomponent of theSQL language is non procedural.

A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language. Although technically incorrect, it is common practice to use the terms query language and data manipulation languages aynonymously.

This query in the SQL language finds the name of the customer whose customer-id is 192-83-7465:

select customer.customer-name

from customer

where customer.customer-id= 192-83-7465

The query specifies that those rows from the table customer where the customer-id is 192-83-7465 must be retrieved, and the customer-name attribute of these rows must be displayed. If the query were run on the table in Figure 1.3, the name Johnson would be displayed.

Queries may involve information from more than one table. For instance, the following query finds the balance of all accounts owned by the customer with customer id 192-83-7465.

select account.balance

from depositor, account

where depositor.customer-id= 192-83-7465 and

depositor.account-number=account.account-number

If the above query were run on the tables in Figure 1.3, the system would find that the two accounts numbered A-101 and A-201 are owned by customer 192-83-7465 and would print out the balances of the two accounts, namely 500 and 900. There are a number of database query languages in use, either commercially or experimentally.

The levels of abstraction apply not only to defining or structuring data, but also to manipulating data. At the physical level, we must define algorithms that allow efficient access to data. At higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact efficiently with the

system.

c) Database Access from Application Programs

Application programs are programs that are used to interact with the database. Application programs are usually written in a host language, such as Cobol, C, C++, or Java. Examples in a banking system are programs that generate payroll checks, debit accounts, credit accounts, or transfer funds between accounts.

To access the database,DML statements need to be executed from the host language. There are two ways to do this:

•By providing an application program interface (set of procedures) that can be used to send DML and DDL statements to the database, and retrieve the results. The Open Database Connectivity (ODBC) standard defined by Microsoft for use with the C language is a commonly used application program interface standard. The Java Database Connectivity (JDBC)standard provides corresponding features to the Java language.

•By extending the host language syntax to embed DML calls within the host language program. Usually, a special character prefaces DML calls, and a preprocessor, called the DML pre-compiler, converts the DML statements to normal procedure calls in the host language.

Tuesday, July 12, 2016

Data Abstraction and Data Independence

Data Abstraction:

For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system:

•Physical level. The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail.

•Logical level. The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in the database, use the logical level of abstraction.

•View level. The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.

Figure below shows the relationship among the three levels of abstraction.

An analogy to the concept of data types in programming languages may clarify the distinction among levels of abstraction. Most high-level programming languages support the notion of a record type. For example, in a Pascal-like language, we may declare a record as follows:

type customer = record

customer-id: string;

customer-name: string;

customer-street: string;

customer-city: string;

end;

This code defines a new record type called customer with four fields. Each field has a name and a type associated with it. A banking enterprise may have several such record types, including

•account,with fields account- number and balance

•employee, with fields employee - name and salary

At the physical level, a customer, account,or employee record can be described as a block of consecutive storage locations (for example, words or bytes). The language compiler hides this level of detail from programmers. Similarly, the database system hides many of the lowest-level storage details from database programmers. Database administrators, on the other hand, may be aware of certain details of the physical organization of the data.

At the logical level, each such record is described by a type definition, as in the previous code segment, and the interrelationship of these record types is defined as well. Programmers using a programming language work at this level of abstraction. Similarly, database administrators usually work at this level of abstraction. Finally, at the view level, computer users see a set of application programs that hide details of the data types. Similarly, at the view level, several views of the database

are defined, and database users see these views. In addition to hiding details of the logical level of the database, the views also provide a security mechanism to prevent users from accessing certain parts of the database. For example, tellers in a bank see only that part of the database that has information on customer accounts; they cannot access information about salaries of employees.

Summary:

Database systems are made-up of complex data structures. To ease the user interaction with database, the developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is called data abstraction.

We have three levels of abstraction:

Physical level: This is the lowest level of data abstraction. It describes how data is actually stored in database. You can get the complex data structure details at this level.

Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is stored in database.

View level: Highest level of data abstraction. This level describes the user interaction with database system.

Example: Let’s say we are storing customer information in a customer table. At physical level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These details are often hidden from the programmers.

At the logical level these records can be described as fields and attributes along with their data types, their relationship among each other can be logically implemented. The programmers generally work at this level because they are aware of such things about database systems.

At view level, user just interact with system with the help of GUI and enter the details at the screen, they are not aware of how the data is stored and what data is stored; such details are hidden from them.

Data Independence

A database system normally contains a lot of data in addition to users’ data. For example, it stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult to modify or update a set of metadata once it is stored in the database. But as a DBMS expands, it needs to change over time to satisfy the requirements of the users. If the entire data is dependent, it would become a tedious and highly complex job.

Metadata itself follows a layered architecture, so that when we change data at one layer, it does not affect the data at another level. This data is independent but mapped to each other.

Logical Data Independence

Logical data is data about database, that is, it stores information about how data is managed inside. For example, a table (relation) stored in the database and all its constraints, applied on that relation.

Logical data independence is a kind of mechanism, which liberalizes itself from actual data stored on the disk. If we do some changes on table format, it should not change the data residing on the disk.

Physical Data Independence

All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data independence is the power to change the physical data without impacting the schema or logical data.

For example, in case we want to change or upgrade the storage system itself − suppose we want to replace hard-disks with SSD − it should not have any impact on the logical data or schemas.

Summary:

The ability to modify a scheme definition in one level without affecting a scheme definition in a higher level is called data independence.
There are two kinds:
- Physical data independence
  - The ability to modify the physical scheme without causing application programs to be rewritten
  - Modifications at this level are usually to improve performance
- Logical data independence
  - The ability to modify the conceptual scheme without causing application programs to be rewritten
  - Usually done when logical structure of database is altered
Logical data independence is harder to achieve as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages.

Database Systems versus File Systems

Consider part of a savings-bank enterprise that keeps information about all customers and savings accounts. One way to keep the information on a computer is to store it in operating system files. To allow users to manipulate the information, the system has a number of application programs that manipulate the files, including:

•A program to debit or credit an account

•A program to add a new account

•A program to find the balance of an account

•A program to generate monthly statements

System programmers wrote these application programs to meet the needs of the bank. New application programs are added to the system as the need arises. For example, suppose that the savings bank decides to offer checking accounts. As a result, the bank creates new permanent files that contain information about all the checking accounts maintained in the bank, and it may have to write new application programs to deal with situations that do not arise in savings accounts, such as overdrafts. Thus, as time goes by, the system acquires more files and more application programs.This typical file-processing system is supported by a conventional operating system. The system stores permanent records in various files, and it needs different application programs to extract records from, and add records to, the appropriate files. Before database management systems(DBMSs) came long, organizations usually stored information in such systems. Keeping organizational information in a file-processing system has a number of major disadvantages:

•Data redundancy and inconsistency. Since different programmers create the files and application programs over a long period, the various files are likely to have different formats and the programs may be written in several programming languages. Moreover, the same information may be duplicated in several places (files). For example, the address and telephone number of a particular customer may appear in a file that consists of savings-account records and in a file that consists of checking-account records. This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the same data may no longer agree. For

example, a changed customer address may be reflected in savings-account records but not elsewhere in the system.

•Difficulty in accessing data. Suppose that one of the bank officers needs to find out the names of all customers who live within a particular postal-code area. The officer asks the data-processing department to generate such a list. Because the designers of the original system did not anticipate this request, there is no application program on hand to meet it. There is, however, an application program to generate the list of all customers. The bank officer has now two choices: either obtain the list of all customers and extract the needed information manually or ask a system programmer to write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several days later, the same officer needs to trim that list to include only those customers who have an account balance of $10,000 or more. As expected, a program to generate such a list does not exist. Again, the officer has the preceding two options, neither of which is satisfactory. The point here is that conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner. More responsive data-retrieval systems are required for general use.

•Data isolation. Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult.

•Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. For example, the balance of a bank account may never fall below a prescribed amount (say, $25). Developers enforce these constraints in the system by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files.

•Atomicity problems. A computer system, like any other mechanical or electrical device, is subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure. Consider a program to transfer $50 from account A to account B. If a system failure occurs during the execution of the program, it is possible that the $50 was removed from account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.

•Concurrent-access anomalies.For the sake of overall performance of thes ystem and faster response, many systems allow multiple users to update the data simultaneously. In such an environment, interaction of concurrent updates may result in inconsistent data. Consider bank accountA, containing $500. If two customers withdraw funds (say $50 and $100 respectively) from accountAat about the same time, the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both read the value $500, and write back $450 and $400, respectively. Depending on which one writes the value last, the account may contain either $450 or $400, rather than the correct value of $350. To guard against this possibility, the system must maintain some form of supervision. But supervision is difficult to provide because data may be

accessed by many different application programs that have not been coordinated previously.

•Security problems. Not every user of the database system should be able to access all the data. For example, in a banking system, payroll personnel need to see only that part of the database that has information about the various bank employees. They do not need access to information about customer accounts. But, since application programs are added to the system in an ad hoc manner, enforcing such security constraints is difficult.

SUMMARY:

Introduction

In a daily life, we come across various needs to store data. It can be maintaining daily household bills, bank account details, salary details, payment details, student information, student reports, books in the library etc. How it will be recorded at one place, so that we can get it back when required? It should be recorded in such a way that

Should be able to get the data any point in time latter
Should be able to add details to it whenever required
Should be able to modify stored information, as needed
Should also be able to delete them

In traditional approach, before to computer, all informations were stored in papers. When we need information, we used to search through the papers. If we know particular date or category of information we are searching, we go to that particular session in the papers. When we want update or delete some data, we search for it and modify them or strike off them. If the data is limited, then all these tasks are easy. Imagine library information or information about a student in School, or baking system! How do we search for single required data in papers? It is a never ending task! Yes, Computers solved our problems.

File Processing System

When computers came, all these jobs become easy. But initial days, these records were stored in the form of files. The way we stored in files is similar to papers, in the form of flat files – to be simpler, in notepad. Yes, the informations where all in the notepads with each fields of information separated by space, tab comma, semicolon or any other symbol.

All the files were grouped based on their categories; file used to have only related informations and each file is named properly. As we can see in the above sample file has Student information. Student files for each class were bundled inside different folders to identify it quickly.

Now, if we want to see a specific Student detail from a file, what do we do? We know which file will have the data, we open that file and search for his details. Fine, here we see the files; we can open it and search for it. But imagine we want to display student details in a UI. Now how will we open a file, read or update it? There different programs like C, C++, COBOL etc which helps to do this task. Using these programming languages, we can search for files, open them, search for the data inside them, and go to specific line in the file, add/update/delete specific information.

Disadvantages of file processing

File processing system is good when there is only limited number of files and data in are very less. As the data and files in the system grow, handling them becomes difficult.

Data Mapping and Access: - Although all the related informations are grouped and stored in different files, there is no mapping between any two files. i.e.; any two dependent files are not linked. Even though Student files and Student_Report files are related, they are two different files and they are not linked by any means. Hence if we need to display student details along with his report, we cannot directly pick from those two files. We have to write a lengthy program to search Student file first, get all details, then go Student_Report file and search for his report.
When there is very huge amount of data, it is always a time consuming task to search for particular information from the file system. It is always an inefficient method to search for the data.
Data Redundancy: - There are no methods to validate the insertion of duplicate data in file system. Any user can enter any data. File system does not validate for the kind of data being entered nor does it validate for previous existence of the same data in the same file. Duplicate data in the system is not appreciated as it is a waste of space, and always lead to confusion and mishandling of data. When there are duplicate data in the file, and if we need to update or delete the record, we might end up in updating/deleting one of the record, leaving the other record in the file. Again the file system does not validate this process. Hence the purpose of storing the data is lost.
Though the file name says Student file, there is a chance of entering staff information or his report information in the file. File system allows any information to be entered into any file. It does not isolate the data being entered from the group it belongs to.
Data Dependence: - In the files, data are stored in specific format, say tab, comma or semicolon. If the format of any of the file is changed, then the program for processing this file needs to be changed. But there would be many programs dependent on this file. We need to know in advance all the programs which are using this file and change in the entire place. Missing to change in any one place will fail whole application. Similarly, changes in storage structure, or accessing the data, affect all the places where this file is being used. We have to change it entire programs. That is smallest change in the file affect all the programs and need changes in all them.
Data inconsistency: - Imagine Student and Student_Report files have student’s address in it, and there was a change request for one particular student’s address. The program searched only Student file for the address and it updated it correctly. There is another program which prints the student’s report and mails it to the address mentioned in the Student_Report file. What happens to the report of a student whose address is being changed? There is a mismatch in the actual address and his report is sent to his old address. This mismatch in different copies of same data is called data inconsistency. This has occurred here, because there is no proper listing of files which has same copies of data.
Data Isolation: - Imagine we have to generate a single report of student, who is studying in particular class, his study report, his library book details, and hostel information. All these informations are stored in different files. How do we get all these details in one report? We have to write a program. But before writing the program, the programmer should find out which all files have the information needed, what is the format of each file, how to search data in each file etc. Once all these analysis is done, he writes a program. If there is 2-3 files involved, programming would be bit simple. Imagine if there is lot many files involved in it? It would be require lot of effort from the programmer. Since all the datas are isolated from each other in different files, programming becomes difficult.
Security: - Each file can be password protected. But what if have to give access to only few records in the file? For example, user has to be given access to view only their bank account information in the file. This is very difficult in the file system.
Integrity: - If we need to check for certain insertion criteria while entering the data into file it is not possible directly. We can do it writing programs. Say, if we have to restrict the students above age 18, then it is by means of program alone. There is no direct checking facility in the file system. Hence these kinds of integrity checks are not easy in file system.
Atomicity: - If there is any failure to insert, update or delete in the file system, there is no mechanism to switch back to the previous state. Imagine marks for one particular subject needs to be entered into the Report file and then total needs to be calculated. But after entering the new marks, file is closed without saving. That means, whole of the required transaction is not performed. Only the totaling of marks has been done, but addition of marks not being done. The total mark calculated is wrong in this case. Atomicity refers to completion of whole transaction or not completing it at all. Partial completion of any transaction leads to incorrect data in the system. File system does not guarantee the atomicity. It may be possible with complex programs, but introduce for each of transaction costs money.
Concurrent Access: - Accessing the same data from the same file is called concurrent access. In the file system, concurrent access leads to incorrect data. For example, a student wants to borrow a book from the library. He searches for the book in the library file and sees that only one copy is available. At the same time another student also, wants to borrow same book and checks that one copy available. First student opt for borrow and gets the book. But it is still not updated to zero copy in the file and the second student also opt for borrow! But there are no books available. This is the problem of concurrent access in the file system.