Tuesday, July 12, 2016

Database Systems versus File Systems

Consider part of a savings-bank enterprise that keeps information about all customers and savings accounts. One way to keep the information on a computer is to store it in operating system files. To allow users to manipulate the information, the system has a number of application programs that manipulate the files, including:

•A program to debit or credit an account
•A program to add a new account
•A program to find the balance of an account
•A program to generate monthly statements

System programmers wrote these application programs to meet the needs of the bank. New application programs are added to the system as the need arises. For example, suppose that the savings bank decides to offer checking accounts. As a result, the bank creates new permanent files that contain information about all the checking accounts maintained in the bank, and it may have to write new application programs to deal with situations that do not arise in savings accounts, such as overdrafts. Thus, as time goes by, the system acquires more files and more application programs.This typical file-processing system is supported by a conventional operating system. The system stores permanent records in various files, and it needs different application programs to extract records from, and add records to, the appropriate files. Before database management systems(DBMSs) came  long, organizations usually stored information in such systems. Keeping organizational information in a file-processing system has a number of major disadvantages:

•Data redundancy and inconsistency. Since different programmers create the files and application programs over a long period, the various files are likely to have different formats and the programs may be written in several programming languages. Moreover, the same information may be duplicated in several places (files). For example, the address and telephone number of a particular customer may appear in a file that consists of savings-account records and in a file that consists of checking-account records. This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the same data may no longer agree. For
example, a changed customer address may be reflected in savings-account records but not elsewhere in the system.

•Difficulty in accessing data. Suppose that one of the bank officers needs to find out the names of all customers who live within a particular postal-code area. The officer asks the data-processing department to generate such a list. Because the designers of the original system did not anticipate this request, there is no application program on hand to meet it. There is, however, an application program to generate the list of all customers. The bank officer has now two choices: either obtain the list of all customers and extract the needed information manually or ask a system programmer to write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several days later, the same officer needs to trim that list to include only those customers who have an account balance of $10,000 or more. As expected, a program to generate such a list does not exist. Again, the officer has the preceding two options, neither of which is satisfactory. The point here is that conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner. More responsive data-retrieval systems are required for general use.

•Data isolation. Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult.

•Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. For example, the balance of a bank account may never fall below a prescribed amount (say, $25). Developers enforce these constraints in the system by adding appropriate code in the  various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files.

•Atomicity problems. A computer system, like any other mechanical or electrical device, is subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure. Consider a program to transfer $50 from account A to account B. If a system failure occurs during the execution of the program, it is possible that the $50 was removed from account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.

•Concurrent-access anomalies.For the sake of overall performance of thes ystem and faster response, many systems allow multiple users to update the data simultaneously. In such an environment, interaction of concurrent updates may result in inconsistent data. Consider bank accountA, containing $500. If two customers withdraw funds (say $50 and $100 respectively) from accountAat about the same time, the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both read the value $500, and write back $450 and $400, respectively. Depending on which one writes the value last, the account may contain either $450 or $400, rather than the correct value of $350. To guard against this possibility, the system must maintain some form of supervision. But supervision is difficult to provide because data may be
accessed by many different application programs that have not been coordinated previously.

•Security problems. Not every user of the database system should be able to access all the data. For example, in a banking system, payroll personnel need to see only that part of the database that has information about the various bank employees. They do not need access to information about customer accounts. But, since application programs are added to the system in an ad hoc manner, enforcing such security constraints is difficult.


SUMMARY:

Introduction

In a daily life, we come across various needs to store data. It can be maintaining daily household bills, bank account details, salary details, payment details, student information, student reports, books in the library etc. How it will be recorded at one place, so that we can get it back when required? It should be recorded in such a way that
  1. Should be able to get the data any point in time latter
  2. Should be able to add details to it whenever required
  3. Should be able to modify stored information, as needed
  4. Should also be able to delete them
In traditional approach, before to computer, all informations were stored in papers. When we need information, we used to search through the papers. If we know particular date or category of information we are searching, we go to that particular session in the papers. When we want update or delete some data, we search for it and modify them or strike off them. If the data is limited, then all these tasks are easy. Imagine library information or information about a student in School, or baking system! How do we search for single required data in papers? It is a never ending task! Yes, Computers solved our problems.

File Processing System

When computers came, all these jobs become easy. But initial days, these records were stored in the form of files. The way we stored in files is similar to papers, in the form of flat files – to be simpler, in notepad. Yes, the informations where all in the notepads with each fields of information separated by space, tab comma, semicolon or any other symbol.
structure-of-dbms
All the files were grouped based on their categories; file used to have only related informations and each file is named properly. As we can see in the above sample file has Student information. Student files for each class were bundled inside different folders to identify it quickly.
structure-of-dbms
Now, if we want to see a specific Student detail from a file, what do we do? We know which file will have the data, we open that file and search for his details. Fine, here we see the files; we can open it and search for it. But imagine we want to display student details in a UI. Now how will we open a file, read or update it? There different programs like C, C++, COBOL etc which helps to do this task. Using these programming languages, we can search for files, open them, search for the data inside them, and go to specific line in the file, add/update/delete specific information.

Disadvantages of file processing

File processing system is good when there is only limited number of files and data in are very less. As the data and files in the system grow, handling them becomes difficult.
  1. Data Mapping and Access: - Although all the related informations are grouped and stored in different files, there is no mapping between any two files. i.e.; any two dependent files are not linked. Even though Student files and Student_Report files are related, they are two different files and they are not linked by any means. Hence if we need to display student details along with his report, we cannot directly pick from those two files. We have to write a lengthy program to search Student file first, get all details, then go Student_Report file and search for his report.
    When there is very huge amount of data, it is always a time consuming task to search for particular information from the file system. It is always an inefficient method to search for the data.
  2. Data Redundancy: - There are no methods to validate the insertion of duplicate data in file system. Any user can enter any data. File system does not validate for the kind of data being entered nor does it validate for previous existence of the same data in the same file. Duplicate data in the system is not appreciated as it is a waste of space, and always lead to confusion and mishandling of data. When there are duplicate data in the file, and if we need to update or delete the record, we might end up in updating/deleting one of the record, leaving the other record in the file. Again the file system does not validate this process. Hence the purpose of storing the data is lost.
    Though the file name says Student file, there is a chance of entering staff information or his report information in the file. File system allows any information to be entered into any file. It does not isolate the data being entered from the group it belongs to.
  3. Data Dependence: - In the files, data are stored in specific format, say tab, comma or semicolon. If the format of any of the file is changed, then the program for processing this file needs to be changed. But there would be many programs dependent on this file. We need to know in advance all the programs which are using this file and change in the entire place. Missing to change in any one place will fail whole application.  Similarly, changes in storage structure, or accessing the data, affect all the places where this file is being used. We have to change it entire programs. That is smallest change in the file affect all the programs and need changes in all them.
  4. Data inconsistency: - Imagine Student and Student_Report files have student’s address in it, and there was a change request for one particular student’s address. The program searched only Student file for the address and it updated it correctly. There is another program which prints the student’s report and mails it to the address mentioned in the Student_Report file. What happens to the report of a student whose address is being changed? There is a mismatch in the actual address and his report is sent to his old address. This mismatch in different copies of same data is called data inconsistency. This has occurred here, because there is no proper listing of files which has same copies of data.
  5. Data Isolation: - Imagine we have to generate a single report of student, who is studying in particular class, his study report, his library book details, and hostel information. All these informations are stored in different files. How do we get all these details in one report? We have to write a program. But before writing the program, the programmer should find out which all files have the information needed, what is the format of each file, how to search data in each file etc. Once all these analysis is done, he writes a program. If there is 2-3 files involved, programming would be bit simple. Imagine if there is lot many files involved in it? It would be require lot of effort from the programmer. Since all the datas are isolated from each other in different files, programming becomes difficult.
  6. Security: - Each file can be password protected. But what if have to give access to only few records in the file? For example, user has to be given access to view only their bank account information in the file. This is very difficult in the file system.
  7. Integrity: - If we need to check for certain insertion criteria while entering the data into file it is not possible directly. We can do it writing programs. Say, if we have to restrict the students above age 18, then it is by means of program alone. There is no direct checking facility in the file system. Hence these kinds of integrity checks are not easy in file system.
  8. Atomicity: - If there is any failure to insert, update or delete in the file system, there is no mechanism to switch back to the previous state. Imagine marks for one particular subject needs to be entered into the Report file and then total needs to be calculated. But after entering the new marks, file is closed without saving. That means, whole of the required transaction is not performed. Only the totaling of marks has been done, but addition of marks not being done. The total mark calculated is wrong in this case. Atomicity refers to completion of whole transaction or not completing it at all. Partial completion of any transaction leads to incorrect data in the system. File system does not guarantee the atomicity. It may be possible with complex programs, but introduce for each of transaction costs money.
  9. Concurrent Access: - Accessing the same data from the same file is called concurrent access. In the file system, concurrent access leads to incorrect data. For example, a student wants to borrow a book from the library. He searches for the book in the library file and sees that only one copy is available. At the same time another student also, wants to borrow same book and checks that one copy available. First student opt for borrow and gets the book. But it is still not updated to zero copy in the file and the second student also opt for borrow! But there are no books available. This is the problem of concurrent access in the file system.

No comments:

Post a Comment