ADO.NET is a powerful toolbox but it's not a software magic wand.

To get the most out of ADO.NET classes, developers must fully understand the model and study a few best practices. Based on years of real-world experience with ADO, ADO.NET provides a richer set of more powerful tools. But, ADO.NET is not designed to be an out-of-the-box tool that reduces any programming work to just point-and-click. In this article, you'll learn about common best practices for using three key element of any data access strategy: connections, security, and transactions.

Database connections, data access security, pooling, and transactions are all key elements of any real-world enterprise distributed application. No matter which technology you use for each tier of your .NET application?Windows Forms, ASP.NET, or Web services?data access is concentrated in the back end of the system and is implemented using a common programming interface?the ADO.NET object model.

You must close the connection object to make it return to the pool. You should never rely on the garbage collector to silently free connection objects.

ADO.NET comes with a bunch of objects that represent the various logical entities of a data access model?connection and transaction in particular. To create a connection, that is to establish a physical link with the backend database, you instantiate the connection object on a particular .NET data provider. To create a transaction, you can either instantiate a transaction object over an existing connection or explicitly execute a BEGIN TRAN SQL statement. Around a connection and a transaction, though, a number of choices lie; they are all important and critical to the health and effectiveness of the application.

How do you store the connection string and how do you protect the sensitive data (such as the password) it may contain? How do you design your overall data access strategy to take into account security (that is, authentication and authorization) without hitting too much performance and scalability? If you need a transactional behavior, what would be an effective way of implementing and controlling transactions? Should you go for automatic or manual transactions? These questions are only a few examples of the choices you face when devising and building the data access layer of a .NET application. ADO.NET helps provide the tools. Smart developers take advantage of those tools in light of accepted best practices.

Database Connections

Database connections represent a critical, expensive, and limited resource. Optimizing how you use them is therefore fundamental for any real application; not just for .NET Framework applications, but in particular for multi-tier Web applications. The bottom line for database connections can be summarized in the following two points:

  • Store connection strings securely
  • Open connections late and close them early

A connection string is the key to connect to the database and access its content. The connection string contains the credentials of the user as well as any information needed to identify the data to retrieve. When it comes to performing a database operation, the user credentials are the most important element and the key that enable data access and determines the rights on the data. Developers should think in terms of user roles, not user accounts, choose Windows authentication, and exploit connection pooling. Let's examine these points in more detail.

Connection String Storage

When you need to operate on a database, a connection string is needed. How do you retrieve this critical information for use in your application(s)? By far, hard-coded strings offer the best performance because they compile directly into your application code However, hard-coded strings also diminish flexibility and require you to recompile the application if the string changes.

External storage offers much greater flexibility at the price of extra overhead. All in all, the impact of this extra burden on performance is negligible; it may pose some security issues, however. Attackers, in fact, might tamper with external files and corrupt them. If you want to consider external storage these are a few options: .config files, UDL files, Windows registry.

The .NET Framework configuration files are easy to access programmatically and are deployed as plain text files. Their text-based nature is the cause of their most significant drawback if they contain passwords, because they are stored as clear text by default. A tailor-made engine for encrypting and decrypting is possible to arrange, but the task is completely up to you.

UDL files are text files recognized by OLE DB providers. This means that the SQL Server managed provider doesn't support them. UDL files suffer from the same security issues as .config files and don't offer greater advantages.

Finally, you can use the Windows registry as a naturally secure storage medium. The registry is a system repository for critical information. It reaches a high level of security especially if combined with encryption. The main drawback of using the registry is that it poses some deployment issues. You need to create registry keys (and optionally store encrypted data in it) and read their contents back. The .NET Framework provides a set of wrapper classes that call into the underlying Win32 API. None of these classes, though, supply encryption capabilities. The aspnet_setreg.exe utility (see Knowledge Base article Q329290) can be used to create a registry key under the HKEY_LOCAL_MACHINE hive and store their user name and password. The following code snippet shows the typical command line of the utility.

aspnet_setreg.exe
   -k "Software\MyData"
   -u:userID
   -p:password

It encrypts the specified userID and password and stores that into the given key.

Connection Pooling

Database connections are not only critical to create, but it is critical that you reuse them. Connection pooling allows you to reuse an existing object from a pool instead of repeatedly establishing a new connection with the database. In this way, a limited number of connections can serve a much larger number of clients.

Avoid mixing manual and automatic transaction in the same context. If you're not particularly concerned about performance, consider using automatic transactions even against a single database because it results in simpler (though slower) code.

Each connection pool is associated with a distinct connection string and its transaction context. When you open a new connection, the data provider attempts to find a match between the specified connection string and a string already used by the connection pool. If no match is found, the data provider will create a new connection and add it to the pool. Once created, connection pools are not destroyed until the process ends. You may think that this results in a performance hit, but it really doesn't. Maintenance of inactive or empty pools requires only minimal overhead.

When a connections pool is created, multiple connection objects are created and added so that the minimum size is reached. Next, connections are added to the pool on demand, up to the maximum pool size. If no usable connection is available, the pool handler attempts to create a new connection object. When the maximum pool size is reached, the request is queued and served as soon as an existing connection object is released to the pool. The process of creating new connections is serialized and requests hit the server one at a time.

You should avoid trying to create connection strings programmatically. If you build a string by concatenating input data together, you expose yourself to the risk of code injection attacks. To avoid that, you should first validate the data you're putting in the connection string. This little precaution could save you from sneaky and harmful attacks.

Closing the Connection

Two golden rules characterize any code working with connections. First: open the connection as late as possible. Second: close the connection as early as possible. You want your application to work with connections for the shortest time possible.

When you close a connection, the connection is returned to the connection pool and made available to use again. The actual connection to the database is not severed. If you disable connection pooling, the actual connection to the database is closed. Note that you must close the connection to make it return to the pool?do not rely on the garbage collector to free connections. In fact, the connection is not necessarily closed when the reference goes out of scope. What the garbage collector does is destroy the .NET wrapper object representing the physical connection. In no case, does this automatically mean that the underlying connection is closed.

Connections are released to the pool when you call methods such as Close or Dispose. A connection object is removed from the pool only if the lifetime has expired or if a serious error has occurred.

Pooling and Security

The real advantage of connection pooling comes when programmers use the same connection string for all data access tasks in their applications. However, this approach may clash with other characteristics of the application. For example, using a single connection string makes it hard to manage security at the database level. On the other hand, if you give each user their own connection string (i.e., you use impersonation of user accounts) to access the database, you end up having many small pools and you don't end up reusing many connections. As often happens in life, the solution lies somewhere in the middle of these opposite positions. You could use a common set of accounts and modify your stored procedures to include an extra parameter that indicates the user's identity. Based on the passed user identity, each stored procedure could perform different tasks. This is an instance of a more general problem?determining the identity (or the identities) to use to perform data access.

You can access data through a variety of accounts with the final choice being determined by the data access model in use. Overall, I suggest three possible approaches.

  • The process identity of the calling process
  • One or more service identities
  • The original caller's identity (impersonation)

Let's see what happens if the process identity is used. This scenario is common particularly among ASP.NET applications. In this scenario, the application must take advantage of the Windows authentication model to connect to SQL Server.

The idea is that the account of the currently logged user is silently passed to SQL Server and no explicit user ID and password (nor any other sensitive information) is embedded in the connection string. The main advantage of this approach is that you deal with only one security model (the Windows security model) and inherit features such as account lockout and automatic password expiration that can only increase the overall level of security of the system. Sensitive data is still passed but that happens out of the application's control and, more importantly, the data travels as an encrypted stream.

To inform SQL Server that it has to retrieve a user's credentials from Windows, you set either the Integrated_Security or Trusted_Connection attributes to True in the connection string. You'll notice that SQL Server needs more time to handle a trusted connection. However, the performance degradation is negligible compared to the overall execution time of any SQL command. Furthermore, consider that SQL Server must be configured to run either in Windows Authentication Mode or Mixed Mode if you want to successfully use trusted connections.

With special regard to ASP.NET applications, you create your data tier components to use trusted connections to perform data-driven operations. The process account (i.e., ASPNET or NetworkService account) must be mirrored at the database level and secured granting the needed privileges. This approach doesn't have any negative impact on performance and scalability but is ideal only in the case in which one fixed identity can serve any database-related requests. If the requirements of the application dictate that more identities should be used, a different approach is needed.

The second approach is based on one or more service identities. The user is authenticated at the gate (i.e., IIS/ASP.NET) and assigned a role. A role is a string that indicates the role that the account plays within the context of the application. Each role is bound to a collection of privileges and permissions. The key point, though, is that neither the database nor the operating system checks these permissions. Instead, the middle-tier of the application itself guarantees that calls to the database are valid and authorized according to the role membership. In other words, the middle-tier component would perform database operations only if the role of the user enables the task. Once the call reaches the data tier, any involved components correctly assume that the user is legitimate. The data tier components then look at the role and impersonate an implicit built-in account configured to access the tables and the stored procedures defined for that role. Of course, the account is given database permissions to operate according to its privileges. Each role is associated with a unique account. As a result, few accounts are used by all users thus ensuring both good connection pooling and security.

The third approach is for the circumstance in which limiting the number of accounts is impossible because your data layer logic requires user-level authorization. In this case you need to impersonate the original caller when accessing the database. Database objects (i.e., stored procedures, database tables) determine what a user can and cannot do. In ASP.NET applications, you use impersonation together with trusted connections to maintain a high level of security. In contrast, Windows Forms applications, let you use trusted connections and take the current identity. User-level authorization affects the overall scalability of the application because it defeats connection pooling. You should not take this route unless strictly necessary. In general, a "trusted model" based on roles and using a fixed number of user-independent accounts alleviates this issue while maintaining a certain association between users and functions.

Transaction Models

Enterprise distributed applications can hardly work without transactions: the logical entities that group multiple operations into a single monolithic step. The transacted operations can be local to a particular database or span over a variety of resource managers deployed on various nodes of the network. There are two ways to incorporate transaction management into your data access code?manual and automatic transactions.

A large part of the systems that require user-level authorization could be more efficiently rewritten to use roles and service identities. A "trusted model" based on roles and using a fixed number of accounts alleviates pooling issues and keeps scalability and security high.

Manual transactions mean that the programmer is responsible for writing any code that configures and uses the transaction support provided by either ADO.NET or the SQL dialect of the database. Instead, automatic (or COM+) transactions add declarative attributes to your .NET classes that specify the transactional requirements of your objects at run time. This model allows you to easily configure multiple components to perform work within the same transaction. Both techniques can be used to perform local or distributed transactions but the automatic transaction model greatly simplifies distributed transaction processing.

Another key thing to bear in mind is that transactions are computationally expensive. As a result, you should always ask yourself if you really need transactions. If you do, you should try to minimize the duration of transactions to minimize locks on any involved database. Transactions are not needed for individual SQL statements because SQL Server automatically runs each statement as an individual transaction. Manual local transactions are always significantly faster because they do not require interaction with the Distributed Transaction Coordinator (DTC). This statement holds true even if you're using automatic transactions against a local single resource manager with no occurrence of interprocess communication with the DTC.

In ADO.NET, you manually create a transaction by calling the BeginTransaction method on the connection object of the specific managed data provider. The method returns a data source-specific transaction object with Commit and Rollback methods, an isolation level (ReadCommitted by default), plus any extra features that the underlying data provider supports. For example, the SqlTransaction object?the transaction object for SQL Server 2000?supports named savepoints for partial rollbacks. The OleDbConnection object, instead, supports nested transactions. The isolation level remains in effect until explicitly changed. It can be changed at any time though. If changed during a transaction, the server is expected to apply the new locking level to all statements remaining. In the SQL Server .NET data provider, the BeginTransaction method is implemented as a wrapper around the T-SQL BEGIN TRAN statement. The method and the statement are therefore equivalent. However, using the ADO.NET method is recommended because it adds more flexibility to your code and abstracts you from the details of the underlying SQL dialect.

Automatic transactions simplify the programming model because they do not require that you explicitly begin a new transaction or explicitly commit or abort the transaction. What the programmer is required to do is declare what type of transactional support is expected for a given component (i.e., ASP.NET page, Web service method, .NET Framework class). The most significant advantage of automatic transactions is that they can work in conjunction with the DTC. In this way, you declaratively configure multiple components to perform tasks that comprise of a single transaction.

Automatic transactions rely on the distributed transaction support features provided by the COM+ environment. Only serviced components can use automatic transactions. Typically, a serviced component is a class created from the base ServicedComponent class. However, ASP.NET pages that set the Transaction attribute on the @Page directive, and Web service methods with the TransactionOption attribute set, are equally considered as serviced components.

Manual and automatic transactions should be considered as two distinct, mutually exclusive technologies. When you are performing a transaction against a single database, use manual transactions. Use automatic transactions when a single transaction spans multiple remote databases or when a single transaction touches multiple resource managers (i.e., a database and a MSMQ resource manager). In any case, you should avoid mixing the two transaction models. If you're not particularly concerned about performance, consider using automatic transactions even against a single database because it results in simpler (but slower) code. Automatic transactions make it easy for multiple components to perform operations that are part of the same transaction.

Summary

Effective database programming is the outcome of many effective practices applied to a common layer of services. In .NET, the ADO.NET object model provides the common API that all types of applications use?be it Windows Forms, ASP.NET pages, or Web services. ADO.NET, though, is not a black box that takes input data and returns result sets. ADO.NET is rather a toolbox made of several objects designed to let programmers build more powerful enterprise applications more easily.

To build an effective data access layer you must combine together several objects in the ADO.NET model?connection, transaction, and commands. These object must be configured and parameterized and must work against a data source such as SQL Server or Oracle. In doing so, you hit the security context of the database and must interact with it. But security is too far large a topic to limit it to the database. Enterprise distributed applications have security built into the foundation and not just (or not necessarily) insulated in the data tier. In this context, even an apparently insignificant element such as the connection string can have key importance.