Best Practices Using Node.js and npm

When CODE Magazine reached out to me to write an article on the topic of Node.js best practices. I was delighted to inform the managing editor that I had recently taken a job at npm, Inc., and was itching to write an article discussing both Node.js, and npm (Node's package management system).

Node.js and npm are difficult to separate from one another, and they are, each in their own way, contributing to the renaissance currently taking place in the JavaScript development world. As I write this, we just finished a record day at npm: 13,000,000 open-source JavaScript packages were downloaded in 24 hours. The exploding popularity of Node.js is simply astounding.

Before you dive in to the meat of this article, I'd like to address a frequently asked question: Why should I use Node.js? The answer that I often hear is, “because it's fast!” Although I agree, I'd like to expand on this. The Node.js community (through peer pressure) encourages that libraries are written using non-blocking-IO (Input/Output). Many scaling problems that Web applications run into are IO-bound: interacting with external APIs, waiting on uploads, and streaming downloads. By delegating to the underlying operating system's scheduling, non-blocking-IO can contribute to miraculous improvements in performance. Asynchronous-IO can be a double-edged sword.

“JavaScript code-bases become absolute nightmares.” This is another statement that I often hear. Unfortunately, there's a seed of truth to it. An asynchronous paradigm is difficult to work within, comparatively to a synchronous one. Steps can be taken to mitigate the challenge: good testing practices, using powerful open-source helper libraries, and making an asserted effort to write cleaner code.

All things considered, I'd argue that Node.js is a great technology choice for Web development: it encourages design decisions that scale well; it's easy to pick up, compared to Erlang, Scala, or other languages that companies frequently move toward when scaling is a concern; it has a vibrant open-source community, and a plethora of great libraries are available.

In this article, I'll look at the steps involved in getting a simple Node.js Web application off the ground. I'll start with a “Hello World!” application and move toward a more complicated typeahead searching application. The goal of these apps is to provide a jumping off point for a conversation about best-practices, covering:

How to structure a Node.js application
How to find interesting dependencies and install them using npm
How to avoid spaghetti code through unit-testing and the better decomposition of functions
How to deploy your finished project
How to use npm to share your finished project with the world

Crack out your favorite text editor, get the newest version of Node.js installed (https://nodejs.org/en/), and get ready to follow along. Node.js is an incredible technology, and I'm delighted to have the opportunity to introduce you to it.

The Structure of a Node.js Application

Before diving into a complex application, I think it's useful to discuss getting a minimal “Hello World” app up and running.

I began by creating a repository on Github called npm-typeahead, which you can find here https://www.npmjs.com/package/npm-typeahead. After cloning the empty repository to my computer, I typed npm init inside the project folder to generate a package.json file. The package.json file describes the project's dependencies, provides meta-information about the project for other developers, and lets you specify command-line shortcuts for your application. (See https://www.npmjs.com/package/npm-typeahead for more documentation.) The
package.json I generated for npm-typeahead looked like this:

{
    "name": "npm-typeahead",
    "version": "0.0.0",
    "description": "Typeahead search for npm packages.",
    "main": "lib/index.js",
    "scripts": {
        "test": "./node_modules/.bin/mocha -u bdd"
    },
    "repository": {
        "type": "git",
        "url": "git@github.com:package/npm-typeahead.git"
    },
    "keywords": [
        "npm",
        "search",
        "typeahead"
    ],
    "author": "Benjamin Coe <ben@npmjs.com>",
    "license": "ISC",
    "bugs": {
        "url": "https://github.com/package/npm-typeahead/issues"
    },
    "homepage": "https://www.npmjs.com/package/npm-typeahead"
}

After generating a package.json, I stubbed out the rest of the project's directory structure. JavaScript proponents boast of not being dogmatic: You could organize your project any number of ways. As a heuristic guideline, here's how I organize my projects:

./lib/index.js: The entrance point to the application. If I create a project called npm-typeahead and require('npm-typeahead'), the contents of index.js get loaded. Ideally, your application's logic should be divided into descriptively named files, and index.js is used to pull everything together.
./lib/search.js: The model file into which I'll build the typeahead search functionality. Index.js will include this model.
./test/search.js: By convention, for every model I create, I create a corresponding test file.
./README.md: A README file is important, especially for open-source projects. I like to use my README file as a scratch-pad for testing out API ideas.
./server.js: A script used to launch the Web application. When you run npm start, it looks for this file.
./assets: A place to store static assets (css, html, etc.) for the Web application.

The First Dependency

Having set up a rough directory structure for the Node.js app, let's look at the process involved adding the first open-source dependency.

To get the bare bones “Hello World” application up and running, you rely on a single external dependency node-restify. The node-restify library simplifies the process of getting an HTTP server up and running.

To install dependencies, you specify them inside your package.json file. Rather than editing the file directly, npm provides a handy shortcut for installing a dependency. You can run:

npm install restify --save

With node-restify installed, I wrote the following code in server.js.

// require the restify library.
var restify = require('restify'),
// create an HTTP server.
server = restify.createServer();
// add a route that listens on http://localhost:5000/hello/world
server.get('/hello', function (req, res, cb) {
    res.send("Hello World!");
    return cb();
});

server.listen(process.env.PORT || 5000, function () { // bind server to port 5000.
console.log('%s listening at %s', server.name, server.url);
});

Let's walk through the code:

require('restify'): Loads the restify library
restify.createServer(): Creates an HTTP server with default settings
server.listen: Binds the server on a port so that you can start receiving HTTP requests

That's all there is to it! You've got a minimum-viable Web application ready to go in 12-lines of code. Simply type npm start, and visit http://localhost:5000/ in your Web-browser. In the next section of this article, you'll look at how the structure of the application evolves as you take on a more complex challenge: implementing typeahead search for npm.

The Anatomy of a Complex Application

As you move from the simple “Hello World” application toward the more complex typeahead searching application, you can re-apply much of what you've already seen: finding and installing open-source dependencies; further building out the roughed-out directory structure by adding models, tests, assets, and etc.; and, finally, writing code to stitch everything together.

There are over 70,000 open-source Node.js packages on npm and choosing good dependencies is complicated. Obviously, re-inventing the wheel should be the exception. For most common tasks that you take on, like connecting to a popular database, there will be mature libraries with good community support, reasonable licenses, and battle-tested production deployments. Even if a library lacks functionality that you're craving, it's often much better to give back to the project by forking it and adding the functionality yourself. Here's the basic discovery and evaluation process I use when choosing libraries for my projects:

I reach out to coworkers, friends, and to communities like Twitter. “What do you recommend for a lightweight Webserver #web-development #kittens?”
I read various blogs and news sites, keeping up-to-date with technologies that get a lot of buzz.
I look for projects on GitHub that fit the following criteria: they have a large number of followers, they've been updated recently, and they have a reasonably small number of open pull-requests and issues.
I lean towards projects that are supported by well-known development shops, such as Twitter.
Finally, but perhaps most importantly, I read the codebase looking for good documentation, thorough testing, and an API that's intuitive and fun to use.

The dependencies that I settled on for the typeahead searching app can be organized into three categories: server-side dependencies, browser dependencies, and development dependencies.

Server-Side Dependencies

Server-side dependencies refer to libraries that run on a remote server that you deploy, rather than running in the Web browser. In the previous section, you installed the first server-side dependency node-restify. Rather than a single endpoint that echoes “Hello World,” my master plan is to create an endpoint that interacts with ElasticSearch() and returns a list of npm packages matching a query. Let's look at the open-source dependencies that will make this dream a reality:

restify: A lightweight HTTP server used to return search results
lodash: A useful set of helpers for interacting with arrays and objects. Lodash provides functionality like merging, mapping, updating, etc.
elasticsearch: Provides a simple, clean API for performing searches against ElasticSearch.

All of these packages were installed using npm install package-name –save, which I introduced in the previous section.

Browser Dependencies

One thing that makes Node.js applications unique, is that both server-side and client-side dependencies are implemented in the same language. It turns out that npm is a great tool for installing both server-side and client-side packages. Let's look at the browser-side libraries on which npm-typeahead relies:

jQuery: If you've ever done any front-end development, chances are that you've heard of jQuery, which provides an abstraction on top of the browser's DOM (Document Object Model), making it easier to interact with HTML elements using JavaScript.
typeahead.js: Twitter's open-source typeahead searching library. Typeahead suggestions are a hassle to implement; you need to worry about caching, throttling, and various other technical challenges. It's great that we can outsource this part of our system to Twitter.

Development Dependencies

The final class of dependency that the application relies on is development-dependencies. Development dependencies are libraries used during the process of writing code but that won't run in the production environment. There are three development dependencies:

mocha: The testing framework that npm-typeahead uses
browserify: A tool that makes it possible to use packages installed via npm in the Web browser. Browserify does this by making it possible to include your browser-side dependencies using a require statement.
uglify-js: Takes the JavaScript code and minifies it. Minification refers to the process of removing long variable names, whitespace, etc., from a JavaScript library so that it can be transferred to the browser with less bandwidth.

That's everything that our typeahead searching library relied on. Next, let's look at the code that was written to pull everything together.

Pulling Everything Together

You can try out npm-typeahead here: http://npm-typeahead.herokuapp.com/. Since I managed to find open-source projects to facilitate most of the hard technical challenges, the code to pull everything together was quite simple. Let's dig into it a bit.

server.js

Endpoints were added to server.js for handling searches and for returning static assets:

var restify = require('restify'),
    server = restify.createServer(),
    search = new (require('./lib').Search);

// add the query-string parsing extension
// to restify.
server.use(restify.queryParser());

// lookup packages by their name.
server.get('/search', function(req, res, cb) {
    search.search(req.params.q,
    function(err, results) {
        res.send(results);
        cb();
    });
});

// serve static JavaScript and CSS.
server.get(/\/js|css|images\/?.*/, restify.serveStatic({
    directory: './assets'
}));

Let's look at the important bits:

server.use(restify.queryParser()): Out of the gate, restify has bare-bones functionality. This line enables the query parsing extension so that URLs like http://www.example.com/?foo=bar are parsed automatically.
search.search(req.params.q): I try to keep controllers as thin as possible. Searching for packages is handled by the search model (search.js), and the controller simply grabs the query from the URL and passes it along to the model.
restify.serveStatic(): A helper for serving static assets (images, css, html, JavaScript) through the node-restify server.

That's all the updates that were required in server.js. Next let's look at the search model itself.

search.js

The search model exposes a single method, search, which uses the elasticsearch library to perform queries against the npm package index. ElasticSearch is an easy-to-use, JSON-based, full-text search engine. For the purposes of following along with this article, I recommend trying a hosted ElasticSearch solution, such as https://www.bonsai.io/. To populate the npm index used by our search model, I used the open-source tool npm2es https://github.com/npm/npm2es2, and ran the following command to populate it:

npm2es --couch="https://skimdb.npmjs.com/registry" --es=[bonsai-url]

Let's look at the interesting lines of code in search.js. See Listing 1.

Listing 1: Server-Side JavaScript

var _ = require('lodash'), elasticsearch = require('elasticsearch');

function Search(opts) {
    _.extend(this, {
        client: new elasticsearch.Client({
            host: process.env.ELASTIC_SEARCH_URL || 'localhost:9200'
        })
    }, opts);
}

// The search method called by server.js.
Search.prototype.search = function(q, cb) {
    this.client.search({
        index: 'npm',
        size: 50,
        body: {
            query: {
                query_string: {
                    fields: ['_id'],
                    query: q + '*'
                }
            }
        }
    }, function(err, resp) {
        return cb(
            err,
            _.map(resp.hits.hits, function(hit) {
                return {value: hit._id};
            })
        );
    });
};

exports.Search = Search;

_.extend(this, defaults, options): This is a favorite pattern of mine, using the lodash library. I populate the model with a default set of options, overriding them with any options that are passed in.
this.client.search(): Using the elasticsearch library, this line hits the external ElasticSearch server, performing the actual search.
_.map(resp.hits.hits): This line takes the raw search results returned by ElasticSearch, and translates them into a form that is more easily consumed by the browser-side typeahead.js library.

That's it for the server-side component of npm-typeahead. Next let's look at the browser-side JavaScript.

main.js

We require all of the client-side JavaScript to be in main.js. Browserify is used to compile this file, creating code that can be run in the Web browser. See Listing 2.

Listing 2: Client-Side JavaScript

var $ = window.jQuery = require('jquery');
var typeahead = require('typeahead.js'),
    npmUrl = 'https://www.npmjs.org';

$(document).ready(function() {

    // Create the engine, used to interact
    // with our search backend.
    var engine = new Bloodhound({
        name: 'packages',
        local: [],
        remote: '/search?q=%QUERY',
        datumTokenizer: function(d) {
            return Bloodhound.tokenizers.whitespace(
                d.val
            );
        },
        queryTokenizer:
        Bloodhound.tokenizers.whitespace
    });

    engine.initialize();

    // attach the typeahead extension to
    // our search box using jQuery.
    var typeahead = $('typeahead').typeahead({},
        {
            name: 'states',
            displayKey: 'value',
            source: engine.ttAdapter()
        }
    );
});

Most of the heavy lifting in the browser-side code is facilitated by Twitter's typeahead.js library. Let's look at the parts used to glue everything together:

Bloodhound({remote: '/search?q=%QUERY}): Bloodhound is the library used by typeahead.js to connect the front-end search widget with the backend server. The remote option tells typeahead.js that it should hit the search endpoint.
$('.typeahead').typeahead({}, {source: engine.ttAdapter()}): this line uses jQuery to attach a typeahead search to the search box and indicates that the Bloodhound engine created above should be used.

That's all there is to it. In 100 lines of code (give or take), you got the typeahead searching app up and running. This would not have been possible without the awesome Node.js open-source community and npm. In the next section of this article, I'll talk about managing complexity as the application grows even larger.

Avoiding Spaghetti Code

JavaScript code-bases are notorious for devolving into hectic messes. I'd like to postulate two potential reasons for this. First, there isn't the same obsession for unit-testing in the JavaScript community as there is for other programming languages. Making changes to large code-bases with poor unit tests is terrifying, and as a result, these code-bases don't get refactored and cleaned up. Second, asynchronous logic is hard to write. It can quickly devolve into a mess of nested callbacks, like this:

a(b, function(err, c) {
  // wait for a to finish.
  d(c, function(err, e) {
    // wait for d to finish.
    f(e, function(err, g) {
      // do something with the result of f.
    })
  })
})

Let's talk about how both of these problems can be avoided.

Unit-Testing

Writing unit-tests for asynchronous code is hard. I think this is why a testing culture has taken a longer time to catch on in JavaScript. I recommend taking a disciplined approach to testing in JavaScript early on, because retrofitting old code with tests is painful. I also strongly advocate a test-first approach to writing unit tests. If you hammer out your tests first, it makes it much easier to figure out parts of your application that are a hassle to test. I wrote an article for CODE on this very topic, and recommend giving it a read for a detailed discussion about asynchronous unit-testing in JavaScript: http://www.codemag.com/Article/1308061.

To get your feet wet with unit testing in Node.js, I recommend checking out the following libraries:

mocha: Mocha is the de-facto testing framework used for Node.js currently. It's easy to learn, and has a straightforward approach to managing asynchronous logic (you simply call the done() method when the test has reached a terminal point).
sinon: Provides spies, stubs, and mocks for JavaScript. If your application interacts with external services, e.g., ElasticSearch, using stubs and mocks can be a great way to test these dependencies.
nock: A library for mocking HTTP connections. In general, it's good to avoid integrating with external services in your tests. Both nock and sinon are great way to avoid this.

Avoiding Callback Hell

The problem of overly nested asynchronous callbacks is jokingly referred to as Callback Hell. In general, it's good practice to keep nesting callbacks to a minimum. One of the best ways to do this is to simply unravel your callbacks:

function a(b, cb) {
    d(null, b * 2, cb);
}
function d(err, c, cb) {
    if (err) cb(err);
    else f(null, c * 2, cb);
}
function f(err, e, cb) {
    if (err) cb(err);
    else cb(null, e * 2);
}
a(3, function(err, result) {
    console.log(result);
});

There are certain asynchronous operations that tend to lead to Callback Hell:

A Chain of Asynchronous Operations: Grab something from an API, process it with another API, and write the result to a database.
Asynchronous Set Operations: For a collection of objects, perform an asynchronous operation and perform an action when all of the operations are complete.

There are some great libraries available for managing these common patterns:

async.js: A set of helpers for common asynchronous operations, like the scenarios described above.
bluebird: Promises are language constructs that attempt to make asynchronous logic more manageable. As with async.js, bluebird provides many helpers for dealing with common asynchronous patterns, such as mapping and enqueueing.

As with unit testing, avoiding Callback Hell is largely a matter of discipline. Take steps early on to avoid heavily nesting callbacks in your application.

Having given my two-cents about avoiding messy code-bases in JavaScript, I can take my grumpy-old-man-hat off. Let's talk about how to go about shipping what you've built.

Releasing What You've Built

On the topic of shipping code, there are two things that I'd like to discuss:

Deploying the Node.js code to production
Publishing the open-source project to https://www.npmjs.com/

Deploying the Code

For the sake of simplicity, I opted to release npm-typeahead on Heroku. For a detailed discussion on the pros and cons of using a service such as Heroku, check out my article on Hosted Solutions: http://www.codemag.com/Article/1403051.

If you use the directory structure outlined in the first section of this article, it turns out that it's really straight-forward to release to Heroku:

I created a project on Heroku and added it to npm-typeahead as a remote repository called heroku.
Using heroku config:set, I set the environment variable ELASTIC_SEARCH_URL and pointed it at my index of npm on Bonsai.
I pushed my master branch at Heroku, with git push heroku master.
Heroku uses the package.json file to automatically install the application's dependencies.
Heroku runs npm start by default, executing the server.js file.

That's all it took to get npm-typeahead up and running on Heroku.

Publishing an Open-Source Project

The npm-typeahead file relied heavily on open-source JavaScript dependencies. I love giving back to the community by making as many of my projects open source as possible. Publishing a project as open source on npm is easy:

If you haven't already, create a user on npm by typing npm add-user.
Make sure that there is no sensitive information, such as passwords, credit-card-numbers, etc., inside your project directory. The files .gitignore and .npmignore can be used to indicate that files should not be published.
Publish the package by typing npm publish in the project directory. As with Heroku, npm looks at the package.json file and uses this to infer information about the project.

Congratulations. You're now an active member of the vibrant Node.js open-source community!

The Future of Node.js

The JavaScript community prides itself on avoiding dogmatism. In this article, I've attempted to keep to this spirit:

There are benefits to conventions in directory structure. As an example, naming conventions made it easy for Heroku to auto-discover the server.js file. The directory structure I outline in this article is just a suggestion; the important part is that you divide up your source files and assets logically.
There are many HTTP servers available other than node-restify, and there are certainly other typeahead search widgets. Experiment when you're adding dependencies for your projects and pick those that you love.
Unit-testing makes it far easier to maintain old code and for others to contribute to your projects. I find that a test-first approach works great. You can find a flow that works well for you.
If you let nested callbacks get out of control, they will ultimately make your codebase unmaintainable. Give async.js and promises a shot, and find an approach to decomposing asynchronous logic that you find intuitive.

Node.js hits an interesting sweet spot: It allows you to write highly performant asynchronous code, and it does so in a language that is easy to pick up and that many developers already know. With libraries like browserify, Node.js also begins to blur the line between server-side development and developing for the browser. These reasons, along with many others (such as the vibrant open-source community, driven by npm), make Node.js a great technology choice for the Web. I hope that this article has encouraged you to get off your laurels and try it out.

Node.js Best Practices

Published in:

Filed under: