It’s all about data
Manipulating data is what computer applications do: your web browser fetches HTML-encoded text (data) and renders it onscreen; your banking application moves cash balances (data) around; the simple calculators I use here as examples add numbers (data) and display results (more data). An application’s data may be transitory – like the numbers and results in that calculator – or may be made to persist by storing it in some type of database, from a simple comma-separated file to a fully relational monster like SQL Server or Oracle. ADO.NET is the key component of the .NET Framework that enables applications to retrieve data from databases, and in this column I’ll be examining its architecture and features.
ADO.NET is the latest in a long line of data access methods provided by Microsoft. The first of these was Open Database Connectivity (ODBC), which separated the business of accessing data from the details of the provider, enabling programs to interface to multiple database products without needing to know all their idiosyncrasies – database manufacturers merely had to supply an ODBC driver for their product. ODBC then gave way in turn to Data Access Objects (DAO), Object Linking and Embedding Database (OLE-DB) and eventually to Active-X Data Objects (ADO), which was in effect little more than a COM wrapper around the standard mechanism for data access. As these technologies matured, programmers were offered ODBC and DAO classes in the MFC (Microsoft Foundation Classes) library and RDO (Remote Data Objects), which enabled C++ programmers to hook into databases relatively easily.
DAO was based on the Jet database engine, to enable access to locally stored databases where high performance wasn’t essential. RDO and ADO were both designed from the start for networked client/server architectures, where database and data consumer ran on different computers. ADO works via a class of objects called Recordsets that contain all retrieved data. A connection to a database is established when the application starts up, and it remains open until the application is closed – a scheme that didn’t scale particularly well and, by keeping the database open so long, gave rise to security concerns.
ADO.NET is the latest step in this data access evolution, designed to resolve some of the weaknesses of those earlier technologies. Based on ADO, ADO.NET is a set of managed classes within the .NET Framework that support data access and manipulation. ADO.NET expands on ADO and its predecessors by introducing XML into its scheme, and since it forms an integral part of .NET, developers can take full advantage of its features from any of the .NET programming languages (C#, VB.NET, Cobol.NET and more). One of its greatest benefits is that no extra database code needs be installed on client systems, since everything an application needs is already in the .NET Framework.
Unlike ADO, ADO.NET operates on a disconnected data access model. Whenever an application requests data, a fresh database connection is opened and it gets closed as soon as the request completes, and the same scheme is used when updating a database. In this way, ADO.NET both conserves system resources and maximises security by keeping data connections alive only so long as they’re in use.
The screenshot below depicts the ADO.NET architecture, which consists of two principal components: the DataSet and the Data Provider. A DataSet is an in-memory representation of some data retrieved from the stored database, which can be manipulated and updated independently of that database. The Data Provider does all the work of opening and maintaining a connection to the physical database. ADO.NET supplies two main Data Providers, called the SQL DataProvider and the OLEDB Data Provider. The SQL Data Provider is a highly optimised provider for use with Microsoft’s SQL Server, while the OLEDB Data Provider offers a generic way to connect to other databases (including SQL).