Replication figures as one of the more prominent features of SQL Server 2000.

Replication is a complex application that uses a combination of stored procedures and executables to distribute and copy data between SQL Server databases. If you take care not to get lost in the details and confused by occasionally misleading terms, replication can be an important component in the database architecture of a SQL Server application. In this article, you'll learn about the terminology surrounding replication, the types of replication available in SQL Server, and how to configure (that is, install) replication.

Replication can provide you with some interesting solutions for your data tier architecture because it allows you to use multiple database servers on the back end. You can publish to multiple servers, gather data from multiple servers, provide loosely connected clients with the ability to share data with a centralized database, and scale out database work in a horizontal fashion.

Getting clear on terminology is one of the most important hurdles to overcome when diving into SQL Server replication.

Let's take a look at replication terminology, then the types of replication, and finally with the steps required to configure replication.

Replication Terminology

Getting clear on terminology is one of the most important hurdles to overcome when diving into SQL Server replication, so in this section you'll learn about the basic terms and concepts.

The purpose of SQL Server replication is to copy or distribute a database's data to one or more data sources, usually other SQL Server databases. The destination database(s) could be on the same SQL Server, but most commonly you'll want to get the data from one SQL Server to another SQL Server.

The Publish/Subscribe Metaphor

SQL Server replication adapts a publish/subscribe metaphor to name various replication roles and objects. You define certain data sources as publications that become available for replication. When you have database servers subscribe to the publication, replication begins.

The publish/subscribe metaphor is pretty good, but don't take it too literally. Remember, you're publishing data from a database, not publishing a magazine. The publish/subscribe metaphor works well when you use it to help understand replication's components and roles, but do not expect all of the elements of a magazine or newspaper publishing business to apply.

Publisher, Publications, and Articles

The SQL Server that you define as having the origin of the data you want to replicate is called the publisher, or publishing server. A publisher is always one SQL Server, and you can only define a given SQL Server as a replication publisher once. It is logical to think of a publisher server first, but as you'll learn below in the section on configuring publishing, you actually have to define the distributor before you define the publisher.

You can replicate SQL Server 2000 database tables, views, and stored procedures. When you replicate a table or view, you are sending data; when you replicate stored procedures, there are some restrictions on what you can send: snapshot replication will send the text of a stored procedure, but transactional replication will only send the execution of stored procedures.

For each database on the publisher SQL Server, you can define selected sets of tables and views (and sometimes stored procedures) containing the data you want to replicate. Each defined set of tables and views is called a publication. A publication must exist in one database only; it cannot span databases.

Within a publication, each table or view is called an article. An article can be an entire table or view, or just a subset. You can publish a subset of a table or view's columns, and you can also restrict the rows of the table or view by using a WHERE condition when you define the article. There are other specialized methods of filtering as well, which you'll learn about in the sections on Transactional and Merge replication.

When you define a publication, you choose which type of replication to use; every publication must be a snapshot, transactional, or merge publication. Once you've defined the type of the publication, your options for creating articles are adapted to the type of replication you've chosen. You'll learn more about these types in the next section.

Distribution

Before you can set up a replication publisher, you must define a distributor. The distribution server contains at least one distribution database in which SQL Server places system tables and procedures unique to replication.

The distribution server plays a major role in snapshot and transactional replication as a store-and-forward location for replicated data. However, it only plays a minor role in merge replication. In transactional replication, where the distribution server plays a store-and-forward role for data, it's more common and often more efficient to locate the distributor on its own SQL Server. On the other hand, with merge replication it's very common to make the publisher server do double-duty as the distributor server.

Subscriber and Subscriptions

Once you have defined publications on a publisher server, other data sources known as subscribers can then receive or interact with the replicated data. You can create either a push or a pull subscription. You define a push subscription at the publisher, and you define a pull subscription on the subscribing server.

The limitations and behaviors of subscribers and subscriptions can change depending on the type of replication you're using. You'll learn more about the specifics of each subscriber behavior in the next section called Types of Replication.

Types of Replication

Before you get into the details of setting up replication, you should have a basic understanding of the types of replication. There are only three: snapshot, transactional, and merge replication. Three is a nice simple number, but the variations and options on transactional and merge can make the number of possible options much larger.

Replication can provide you with some interesting solutions for your data tier architecture because it allows you to use multiple database servers on the back end. You can publish to multiple servers, gather data from multiple servers, provide loosely connected clients with the ability to share data with a centralized database, and scale out database work in a horizontal fashion.

Because snapshot replication delivers a one-time-only copy of selected data, and both transactional and merge replication use snapshot replication to initialize, you'll hear most often about just transactional replication and merge replication.

Snapshot Replication

Snapshot is the simplest form of replication. It makes a simple copy of a data set and delivers it to a subscriber. The direction of the data is one-way only, with no updates allowed from the snapshot's subscribers back to the publisher. You can use a snapshot to send data once, or you can repeat a snapshot on a regular basis.

All three types of replication use snapshot replication. You can use snapshot replication by itself, but SQL Server also uses it as the method to initialize both transactional and merge replication. Behind the scenes, snapshot replication uses a Snapshot Agent (snapshot.exe) in combination with the Distribution Agent (distrib.exe) to export all relevant information about a publication's data to a text file. The Distribution Agent then delivers that data to the subscriber.

Transactional Replication

Transactional replication is usually also a one-way-only incremental form to update subscribers. It gets its name because it uses an agent on the distributor that reads the transaction log of the publisher database (logread.exe.) Transactional replication is normally one-way only, and you treat the subscribers as read-only.

Transactional replication is very sensitive to connectivity issues, and normally assumes continuous connections between publisher and subscribers. If a subscriber is disconnected, data will build up in the distribution database, to apply to the subscriber once it connects again. If the subscriber gets too far out of date, a snapshot will have to be run again.

When you use transactional replication to distribute data to read-only subscribers, it can be very fast, delivering data changes from publisher to subscriber in a matter of seconds. This can make it an ideal mechanism for reporting servers. If you have updating subscribers, however, the transaction delivery time can increase significantly.

Merge Replication

Merge replication was introduced as a more efficient method of allowing updates across publishers and subscribers. It uses a Merge Agent (merge.exe) on each server to communicate between publisher and each subscriber. Merge replication requires a ROWGUID column on each replicated table that it will use for tracking changes.

In merge replication, SQL Server uses triggers on each replicated table to populate the MSmerge_contents system table in each database. It does not read the transaction log of the publisher and does not use the distribution server for store-and-forward.

Merge replication has the remarkable feature that subscribers need only be occasionally connected to the publisher, and when the subscribers do connect, they can automatically catch up.

The extra metadata stored on each node in a merge replication application can cause a merge to sometimes be somewhat slow under heavy transaction loads.

Choosing the Type of Replication

Before you dive into configuring or installing replication, you should decide which type of replication you need. You need to ask yourself several questions before you choose a type of replication.

First of all, what type of update scenario do you need between the publisher and the subscriber data? Do you need updates on the subscriber to replicate back to the publisher? Will the subscribers be read only?

If you want to have updates made on a subscriber replicated back to the publisher, you must use merge or transactional replication with queued or immediate updating subscribers. You cannot use snapshot replication or one-way only transactional replication.

You also need to consider how transactionally consistent the data must be between the publisher and the subscriber. If you allow updates among subscribers, will having data converge to the same values across publisher and subscribers be sufficient, or do you need the subscribers to be more transactional?

Snapshot replication is, by definition, out of date right after the application of the snapshot, until the next snapshot occurs. Transactional replication provides the maximum amount of consistency between publisher and subscribers. Merge replication uses a strategy of convergence, and the resulting convergent data may differ depending on the order that changes are applied.

Once you've decided on the type of replication you desire, you can move on to configuring replication.

Configuring Replication

Setting up SQL Server replication is easy because Enterprise Manager gives you a number of wizards to use. However, just the act of initially configuring replication requires that you make some decisions.

Preliminary Steps

Your servers must meet several requirements before you can configure replication. You should ensure that all the SQL Servers involved are using service accounts for both the SQL Server and SQL Agent logins, and that the servers can connect to each other sufficiently so that the type of replication you install will work.

You should use either SQL Server Enterprise or Standard Edition for the publishing and distribution servers. (For development, you can use the Developer Edition. There does not appear to be any difference between all three of these editions for replication purposes.)

Table 1 shows a consolidated version of the SQL Server Books Online matrix that shows you which edition of SQL Server you can use for the various types of replication.

You can also replicate to heterogonous subscribers (non-SQL Server databases) for snapshot and transactional replication without updating subscribers. You can use SQL Server 2000 to subscribe to data being published from other databases such as Oracle, DB2, Microsoft Access, and other databases that satisfy certain ODBC or OLE DB subscriber requirements, but you must set it up programmatically. (For more information, see "Programming Replication from Heterogeneous Data Sources" in SQL Server Books Online.) When you choose to use non-SQL Server publishers, you can only use transactional and snapshot replication.

The Configure Publishing and Distribution Wizard

To get started configuring replication, you run the Configure Publishing, Subscribers, and Distribution wizard from Enterprise Manager. Despite its name, this wizard asks you to initially choose a location for the distribution server, and then asks you to enable one server as a publisher and one or more servers as subscribers. So before you start, you need to decide whether you are going to separate the distribution server from the publishing server.

For merge replication, you would normally place the distribution server on the publishing server because the activities of the distribution server are fairly light. But for snapshot and transactional replication, if the publisher production server is under significant load, it's a better practice to identify a separate server as a distribution server.

You can find the Configure Publishing, Subscribers, and Distribution option under Replication in the Enterprise Manager menu. Make sure you've registered each server you want involved in replication, and then select the server that you want to have as the distribution server.

The wizard initially asks whether you want to make the selected server its own distributor, as you see illustrated in Figure 1. If this is the first time you're running the dialog, choose this option, which is the default.

Figure 1: The first decision you must make when you configure replication is which server will play the role of distribution server. The default choice is make the server its own distributor.

You can only choose the "Use the following server..." option if one of the servers in the list has already been designated as a distributor. You would choose this second option if you had more than one publisher using the same server as a distributor. (If you do have more than one publisher for a given distribution server, a good practice is to use distinctly named distribution databases for each publisher.)

Next, you'll choose the default location of the snapshot folder, as shown in Figure 2. You can override this location later when you define each publication. The wizard will share this folder to users with administrative privileges on the distributor machine.

Figure 2: Next, you must choose the default location for the snaphot folder for this distributor, The default is an administrative share on the distribution server.

Customizing the Configuration

Now you must choose whether you want to customize the configuration. The default choice is no, but normally you will want to customize the configuration because this choice will let you designate other registered servers as publishers and subscribers. If you choose yes, the wizard's next dialog box will let you designate the name of the distribution database and its default file locations. Here you can override the defaults and assign your own. If you only have one distribution database, calling it "distribution" is fine. If your distribution server will handle several publishers, then you should assign a meaningful name other than just "distribution."

After you click Next, you'll be able to designate one or more publisher servers in the Enable Publishers dialog, as shown in Figure 3. If you define more than one publisher, you'll have to specify the distribution database for it. You can also register a new server as a publisher using this dialog.

Figure 3: The Enable Publishers dialog box lets you define publisher or publisher servers. You can also register another server not on the list.

If your distribution server is the same as the publishing server, you can just click Next and go on. You have more to do when you need to define a publisher on a different server. (You can clear the checkbox next to the distribution server if you don't need it as a publisher.)

Distributor-Publisher Login Security

When you choose another server other than the distribution server as a publisher, SQL Server pops up a one-time-only dialog box (see Figure 4) that tells you what you need to proceed. For each server other than the distribution server that you enabled as a publisher, you must:

Figure 4: SQL Server's Configure Publishing and Distribution wizard will remind you about specifying distribution databases, logins, and the snapshot folder share when you choose a server other than the distribution server as a publisher.
  • Identify the distribution database that the publisher will use.
  • Specify the login that the Snapshot Agent will use to connect to the publisher SQL Server.
  • Identify a default location of the snapshot folder.

Probably the most significant choice you need to make concerns the login. As soon as you click OK, you'll see the Publisher Properties dialog box for the publisher server, as shown in Figure 5. You've already chosen the name of the distribution database. You can't change it, but you can change its default location.

Figure 5: When you choose a publisher server that is on a different server from the distribution server, you need to specify the snapshot location and type of security between publisher and distributor SQL Servers.

The Replication Agent selection of the Publisher Properties dialog box has to do with the Snapshot Agent authentication. Here you choose how you would like the Snapshot Agent to connect to the SQL Server on the publisher server. Normally the Snapshot Agent will be run by SQL Agent, and if you choose the default selection here, the Snapshot Agent will use the same account as SQL Agent to connect to the publisher. You can see now how important it is to have your SQL Server and SQL Agent service accounts set up properly, because the SQL Agent on one machine will often need to connect to the SQL Server on another. If you don't want to have the Snapshot Agent use a trusted connection, you must enter a SQL Server login id and password that are valid on the publisher SQL Server.

When you use transactional replication to distribute data to read-only subscribers, it can be very fast, delivering data changes from publisher to subscriber in a matter of seconds. This can make it an ideal mechanism for reporting servers.

The "Administrative link to the Distributor" section of the Publisher Properties dialog box is where you establish the publisher SQL Server as a remote server on the distributor SQL Server. If you leave the default selection chosen, you will have to enter an administrator level (sa) password. Otherwise, the remote server will be established using a trusted connection. This allows the distributor to execute RPC calls against the publisher database.

After you are finished with the Publisher Properties dialog box for the one publisher, you can also specify other publisher servers and modify their properties using the browse buttons next to the publisher's name. You can also return to this dialog box later from Enterprise Manager.

Enabling Publication Databases

When you click Next, you may enter the "Enable Publication Databases" dialog box, where you can enable individual databases on the publisher server for snapshot, transactional, and merge replication, as shown in Figure 6. You will not see this dialog box unless you've also enabled the distribution server as a publisher.

The dialog box in Figure 6 does not do as much as you might at first think. All it does is enable the databases for the stated type of replication for database owners. If you log into the publisher SQL Server with a system administrator account, you can initiate merge or transactional replication on those databases later without needing to check anything here.

Figure 6: You can enable publisher server databases for replication by database owners. If you are using a system administrator level SQL account, you do not need these databases enabled here.

In my opinion, the Enable Publications dialog box has some confusing terminology. Notice that it only gives you choices for transactional and merge replication?what about snapshot replication? It turns out that enabling a database for transactional replication here also enables it for snapshot replication.

Enabling Subscribers

The last step you make in the Configure Publishing and Distribution wizard is to enable subscriber databases, as shown in Figure 7. This dialog box is meant for named subscribers, subscriber servers that the publisher server communicates with by name. Information about the subscriber server will be stored in the publisher, in system tables that belong to the master database.

Figure 7: You can also enable named subscribers. SQL Server will store information about each named subscriber on the publisher.

Note that once you choose a subscriber server, a Browse button appears on the same line. If you click it, you'll see the login properties that SQL Server will store in the publisher, as shown in Figure 8. You can override these values later if you desire.

Figure 8: The login properties of each named subscriber will be stored on the publisher server's master database.

Finishing Up

To finish the configuration, just click Next on the dialog box, review your chosen options, and click Finish in the "Completing the Configure Publishing and Distribution Wizard" dialog box.

Merge replication has the remarkable feature that subscribers need only be occasionally connected to the publisher, and when the subscribers do connect, they can automatically catch up.

Once you've finished, SQL Server gives you a dialog box indicating success. When you click OK, you'll then see an informational message telling you that the Replication Monitor has been added to the Enterprise Manager console tree for the distribution server. The Replication Monitor is a set of dialogs located on the distribution server from which you can monitor replication. At this stage there's not much to monitor, so all it shows is the publisher server. The Replication Monitor becomes much more important once publications and subscriptions are defined.

Finishing at this point means that the SQL Server on the publisher server has been marked as a publisher, and that the particular distribution server you designated is that publisher's distributor. In addition, any named subscriber servers that you enabled, along with their login properties, will be stored in a system table in the publisher server's master database. The distribution server will now have a distribution database on it.

You can modify configuration options in Enterprise Manager by selecting the distribution server and then click Replication and then choose "Configure Publishing, Subscribers, and Distribution." The wizard you saw earlier when choosing this option is replaced by the Publisher and Distributor Properties dialog box, as shown in Figure 9. All these dialogs were accessed in some fashion during the Configure Publishing and Distribution wizard. However, note one difference: the Agent Profiles button and resulting dialog was not available in the wizard. These are profiles for the various replication agents that will be used. I'll discuss the profiles in more detail when I write about replication agents in a future article.

Figure 9: You can edit replication server properties after you've configured replication on the distribution server.

Removing Replication

To undo or remove the current replication configuration, choose the Disable Publishing and Distribution option from within Enterprise Manager, again having selected the distribution server in the console tree, and then the Replication menu option. When you remove replication, SQL Server will remove the entry marking the publisher SQL Server as a replication publisher, and remove the distribution database on the distribution SQL Server.

Concluding Thoughts

Once you've configured or installed replication on a set of servers, you're free to move on to defining the publications and articles (the data) that you want to replicate. When configuring replication, the steps are mostly the same no matter what type of replication you choose. But when defining publications and subscriptions, the three types of replication tend to diverge. How they differ, and what makes them unique, I'll leave for another time.