Whether you’re building a traditional distributed system or an offline Web app, synchronizing data and reconciling conflicts are accompanied by some hard realities. Sometimes data gets stale, sometimes users update the same data simultaneously, and sometimes synchronization attempts fail. This article demonstrates how to gracefully resolve conflicts and synchronize disconnected databases.

The examples explored in this article demonstrate how to work with the PouchDB API (Listing 1) as well as how to create a to-do list application that synchronizes with server (Listing 2 and Listing 3). Figure 1 shows a screenshot of the running application. The application is available on GitHub at https://github.com/craigshoemaker/synchronize-dbs-demo.

Figure 1: Screenshot of running application

Different Databases in Different Contexts

CouchDB (http://couchdb.apache.org) is a server-side multi-master-document database that seamlessly synchronizes data among disconnected database instances. As data changes, a complete revision history for each document is stored, giving CouchDB the context to handle synchronization and resolve conflicts. As databases are synchronized, the revision history is used to decide which revisions prevail among the different versions. When dealing with conflicts, the revision information is used to allow users to select winning revisions.

PouchDB is a browser-based database interface that’s tailor-made to synchronize with CouchDB. This means that data manipulated in the browser can seamlessly flow up to the server.

A core aspect of CouchDB known as "eventual consistency" means that changes are incrementally replicated across the network. This same principle is at work when dealing with databases found inside a Web browser.

PouchDB (https://pouchdb.com) is a browser-based database interface that’s tailor-made to synchronize with CouchDB. In the same fashion as with multiple server instances of CouchDB, data from PouchDB synchronizes with server-side databases. This means that data manipulated in a disconnected state from the server can seamlessly flow up to the server.

PouchDB is a JavaScript implementation of CouchDB that uses IndexedDB, and, on rare occasion, Web SQL. The following similarities exist between PouchDB and CouchDB.

  • The APIs are consistent. Although not identical, much of the code you write for PouchDB works directly against CouchDB.
  • PouchDB implements CouchDB’s replication algorithm. The same rules are enforced on the client as exist on the server that decide how data is synchronized across multiple database instances.
  • HTTP as a core transport. CouchDB exposes RESTful HTTP/JSON APIs that allow direct access to data. Exposing data through HTTP side-steps the data access layers often required to work with other databases. PouchDB capitalizes on this feature and sends JSON payloads via HTTP to interface directly with CouchDB.

Document Revisions

Synchronization is made possible by carefully tracking document revisions. Each document revision generates a unique identifier, known as the revision ID. There are two parts to a revision ID. The first part is a human-readable incrementing integer. The second part of the revision ID is a GUID-like value that’s generated by the database API.

When you create a new document in the database, the revision ID is prefixed with the number 1 followed by a GUID-like value. In the following examples, a three-letter string is used instead of an actual GUID value to make the examples readable. When you create a document in the database, the first revision ID is generated, as shown in the following example.

1-abc

As the document changes, the prefix is incremented by 1 and a new GUID is generated. Therefore, when you update the document, the revision ID prefix advances from 1 to 2.

2-def

Revision IDs are updated in this way in concert with any data changes. Even if you delete the document from the database at this point, the revision ID advances to 3 and the document metadata is marked as deleted. Tracking with revision IDs allows the database to maintain a full revision history of each document. By sustaining a running revision history for every document, the database has the context necessary to replicate changes among different database instances.

Working with PouchDB

To begin working with a database in the browser, you first need to reference the pouchdb.js script in your HTML page.

<script src="scripts/pouchdb.js"></script>

Next, inside a script tag or in a separate JavaScript file, create a new instance of PouchDB. The constructor accepts the database name.

const localDB = new PouchDB('people');

As you create a new instance of PouchDB, the resulting object either points to an existing database or it creates a new database for you. In this case, a new IndexedDB database is created in the browser. PouchDB uses one of a series of adapters to interface with different databases. If you inspect the localDB instance in the browser console, notice that the adapter, as shown in Figure 2, is set as idb. This alludes to the fact that in the browser, PouchDB is using the IndexedDB adapter.

Figure 2: Create a local instance of PouchDB.

PouchDB is architected with a Promise-based API that provides an opportunity to use JavaScript’s async/await syntax when calling methods. The following snippet demonstrates how to add a new object to the database by calling the put method.

const add = async () => {
    const person = {
        _id: 'craigshoemaker',
        name: 'Craig Shoemaker',
        twitter: 'craigshoemaker'
    };
    const response = await localDB.put(person);
    console.log(response);
};

The result returned from the database resembles an HTTP response code. When successful, the response from PouchDB returns a response with ok: true, the document’s unique identifier, and the revision ID value.

{
  ok: true,
  id: "craigshoemaker",
  rev: "1-747b2b81bf8ef992e8ec1f44aa737c48"
}

Once you have the identifier and revision ID, you can access and manipulate the data as you wish. To retrieve a record from the database, you pass the document ID to the get method.

const get = async () => {
    const person =
                 await localDB.get('craigshoemaker');
    console.log(person);
};

The response from the database includes the full document data including the unique identifier and revision ID.

{
  _id: "craigshoemaker",
  _rev: "1-747b2b81bf8ef992e8ec1f44aa737c48"
  name: "Craig Shoemaker",
  twitter: "craigshoemaker",
}

Updating data in the database requires that you have the latest revision ID associated with a specific document. Often, the most reliable way to reference the latest revision ID is to get the latest version of the document from the database just prior to updating values. To update the document, you can call the get method, add or update the object’s values, and then call the put method to persist changes to the database.

const update = async () => {
    const person =
               await localDB.get('craigshoemaker');
    person.github = 'craigshoemaker';
    const response = await localDB.put(person);
    console.log(response);
};

Once updated, the response from the database includes the new revision ID, as shown in the following code snippet.

{
  ok: true,
  id: "craigshoemaker",
  rev: "2-101931707fec4f12ff20776d94690c9f"
}

To retrieve a list of documents from the database, you use the allDocs method. The response from allDocs varies depending on the options you provide. In the following snippet, the include_docs: true option is set, which tells the method to return full document data along with the query. The default value for include_docs is false and when not enabled, the only information returned from allDocs is the _id and _rev values.

const getAll = async () => {
    const options = {
include_docs: true
    };
    const response =
                    await localDB.allDocs(options);
    console.log(response);
    return response.rows;
};

The response, as shown in Figure 3, includes a rows array that holds data from the database. Inside each element the id and key values are copied from the data document to make working with the data easier, and the entire document’s data is available via the doc property.

Figure 3: Return value from the allDocs method

Removing a document from the data also requires reference to the unique identifier and latest revision ID values. The best way to get the latest values is to call get immediately before attempting to remove the document from the database.

const remove = async () => {
    const person =
                 await localDB.get('craigshoemaker');
    const response =
                 await localDB.remove(person);
    console.log(response);
};

The response from the database is reminiscent of the response returned from the get method. Here, you get back the document’s ID and a new revision number.

{
  ok: true,
  id: "craigshoemaker",
  rev: "3-70fb7e034b076663cd6861a46516c7f9"
}by

Internally, the database hasn’t deleted your record, but has marked it as deleted by adding the _deleted property to the document. Figure 4 shows how a deleted record appears in the database.

Figure 4: State of the document after removal

In fact, if you tried to create a new document in the database with the same primary key value, instead of getting an entirely new revision ID, the database returns a document with a revision ID incremented from the deleted state. The following snippet shows the database’s response after creating a new document with the same ID as the previously deleted document.

{
  ok: true,
  id: "craigshoemaker",
  rev: "4-ffc5ec971505cfb9b37318877441e646"
}

The revision ID starts with a 4 instead of a 1, even though a new document is inserted into the database. Building on these API basics, you can begin synchronizing data between two databases.

Synchronizing with the Server

To synchronize with the server, you first need to create an instance of PouchDB in the client script that points to the server-side database. By providing PouchDB with a URL and authorization credentials, the browser creates a secure connection to the remote database.

const remoteDB = new PouchDB(
    'http://localhost:5984/people',
    {
        skipSetup: true,
        auth: {
            username: 'account_user_name',
            password: 'secret_password',
        }
    });

When you create an instance of PouchDB against the server, the adapter used is http, as shown in Figure 5. This means that each call to the PouchDB API is ultimately expressed as an HTTP call to CouchDB over the network. The benefit to you is that your application code remains unchanged regardless of whether your commands are against the local database or the server.

Figure 5: Create a remote instance of PouchDB

This example uses the PouchDB Authentication (https://github.com/pouchdb-community/pouchdb-authentication) plugin to handle authentication with the remote server. The plugin allows you to add options to the constructor that authenticates your connection to the server.

Once you have instances of PouchDB that point to both the in-browser database and the server, you can then begin to synchronize data between the two.

The code required to handle synchronization accepts a few options. During synchronization, you can create a persistent live connection and choose to retry failed attempts. The following example creates a function that sets up synchronization between the local and remote databases.

let syncer = {};
const sync = (live = true, retry = true) => {
    const options = {
        live: live,
        retry: retry
    };
    syncer = localDB.sync(remoteDB, options);
    syncer.on('complete', e => {
       // handle complete
    });
    syncer.on('error', e => {
       // handle error
    });
};

The syncer object is declared outside the sync function so that you have access to the synchronization instance throughout the application. The arguments defined in the function allow you to select if you want to establish a live connection and whether you want to retry failed synchronization attempts.

As the databases are synchronized, data flows in a bi-directional direction. Data added to the remote database is replicated to the local database, and vice versa. Ultimately, the sync method is a wrapper for CouchDB’s underlying replication feature. As data is replicated among individual databases, conflicts are not just a possibility, but an inevitability.

Managing Conflicts

Dealing with conflicting data sits at the heart of any attempt to synchronize databases. Embracing the inevitability of conflicts, the CouchDB and PouchDB APIs make conflict management a first-class concern. The dual nature of the revision ID allows the database to resolve different types of conflicts. Conflicts are managed by continuously evaluating the revision ID during any operation that manipulates data. There are at least two different types of conflicts that arise when using PouchDB.

Immediate and Eventual Conflicts

An immediate conflict arises when you attempt to save changes to a document, but the revision ID provided is older than what’s in the database. For instance, let’s say a new record added to the database results in a revision ID of 1-aa1. As the document is updated, the revision ID becomes 2-aa2. If the first revision of the document (1-aa1) is cached somewhere and the user tries to persist the version of the document while the database holds a newer version, an immediate conflict is encountered.

To handle conflicts, any operation that manipulates data should be nested inside a try/catch block giving you the chance to handle conflicts.

try {
    const response = await db.put(person);
} catch(error) {
    if(error.name === 'conflict') {
        // handle conflict
    } else {
        // handle other error
    }
}

The following code snippet shows the error object returned from the database during an immediate conflict. As conflicts are encountered, PouchDB returns a 409 (conflict) error.

{
  "status": 409,
  "name": "conflict",
  "message": "Document update conflict",
  "error": true,
  "id": "2019-06-08T12:33:00.169Z",
  "docId": "2019-06-08T12:33:00.169Z"
}

The easiest way to resolve this conflict is to fetch the document’s latest version, update the required values and then attempt to save the document again.

By contrast, an eventual conflict happens when a revision ID is mismatched during a synchronization attempt. Consider the situation when an existing document is updated in the browser and the resulting revision ID becomes 2-bb1. Then the same document is updated directly on the server, and that copy’s revision ID becomes 2-bb2. The document is updated for the second time in both locations, but the disconnected databases are unaware of each other’s change. Eventually, the databases will synchronize together and the conflict for this document must be resolved.

The Couch DB replication logic handles conflicts seamlessly. As databases are synchronized, the replication algorithm automatically selects a revision as the winner for you. As the winning version is selected, the metadata of the document is flagged as being in a conflicted state and is associated with an array of revision IDs that represent the conflicted versions.

When you retrieve data from the database, you have the option to request conflicts associated with a specified document. The following example demonstrates how you can request conflicts when calling the allDocs method.

const options = {
    include_docs: true,
    conflicts: true
};
const response = await localDB.allDocs(options);
console.log(response);

The result from this code returns a collection of documents from the database that includes an array of revision IDs that conflict with the current version of the document. Figure 6 shows a document in a conflicted state.

Figure 6: Conflicts array in a document

Storing conflicting revision IDs as document metadata allows your applications to always be aware of conflicts. By writing conflict-aware code, you can early and often allow users to resolve conflicts by giving them a chance to decide which version is ultimately the winner.

Resolving eventual conflicts involves fetching documents from the database with the associated conflict data. Conflict data is not returned by default, so when you call the get method, you need to enable the conflicts option. Once data is returned from the database, you can allow the user to designate which revision is the desired version.

The following example extracts a document from the database with conflict information. The revision IDs are evaluated to find the winning revision and the database is updated to mark all revisions as deleted except the winning revision.

// get item with conflicts
const item = await localDB.get(
                     id,
                           {
                       conflicts: true
                     });
// filter out item you want to keep
let revIds = item._conflicts;
revIds.push(item._rev);
revIds = revIds.filter(
          conflictId => conflictId !== winningRevId);
// delete the rest of the items
const conflicts = revIds.map(rev => {
    return {
        _id: item._id,
        _rev: rev,
        _deleted: true
    };
});
const response = await localDB.bulkDocs(conflicts);

The call to get includes the document ID and an options object where conflicts is set to true. This tells the database to fetch the document with the matching ID and return an array of revisions IDs in an array named _conflicts. Next, the revision IDs are isolated into a variable named revIds. The current winning revision ID is added to the array with revIds.push and then the user-selected winning version is filtered out of the revision IDs array. Now the revIds array only contains values of revision IDs that aren’t selected by the user as the winning document version. These revisions are meant to be deleted from the database.

The map method is used to transform the revIds array into an array named conflicts. This becomes an array of objects that includes the unique identifier, the losing revision ID and the _deleted property set to true. This object array is then passed to bulkDocs to update the database revisions simultaneously.

Conclusion

Built as a multi-master database from the ground-up, Couch DB makes conflict resolution a first-class concern. The replication logic, which powers synchronization, is robust enough recognize conflict, temporarily select winning versions, and provide the context necessary to allow users to decide how to resolve conflicted data. In the browser, Pouch DB is a JavaScript implementation of Couch DB and makes it easy to carry out not only simple data operations but to synchronize data from the browser to the server.