In 1963, the USS Thresher, which was the first nuclear submarine put into service by the United States, sank off the coast of Boston during a training exercise. The entire crew was lost. The investigation that followed concluded that the sinking was caused by two consecutive failures.

First, the ship lost power due to a leak in the engine room. Losing power in a submarine that’s several hundred feet below the surface of the ocean is not good. However, the crew was trained to deal with a loss of power. The protocol for such a situation called for the ballast tanks to be blown so that the submarine would surface. This wasn’t the normal way to surface. Normally, a submarine maneuvers into an upward angle and powers to the surface. This is a quiet and controlled way to surface. Blasting ballast tanks causes a submarine to shoot to the surface with little control.

Unfortunately, a design flaw caused the pipe that blasts air into the ballast tank to freeze shut. When air is stored under high pressure it becomes cold enough to freeze water when pressure is suddenly released. (If you’ve ever used a pressurized air canister to clean the dust from your computer, you’re familiar with this phenomenon.) Consequently, the ballast tanks couldn’t be blasted free of water. With no way to power to the surface and no way to make the submarine buoyant, the Thresher sank until the surrounding water pressure caused it to implode.

After the Thresher disaster, the US Navy put together a program called SUBSAFE. SUBSAFE is a series of tests that every submarine must pass before being put to sea. Before SUBSAFE, the Navy lost 17 submarines to noncombat causes. After SUBSAFE, the Navy lost none.

Imagine if one day you woke up and none of the software on any of your devices worked. What would you be able to do? This simple thought experiment provides proof that software plays an important role in your life. Software is almost as important in your life as a submarine is to a crew that needs to travel hundreds of feet below the surface of the ocean and return safely to land.

This article is about doing for your code what SUBSAFE did for submarine safety. I’ll show how to use unit tests to improve code quality, thus reducing the chances that an undiscovered defect will cause your application to experience an outage.

Approach

In the Jan/Feb 2017 issue of CODE Magazine (http://www.codemag.com/article/1701081), I introduced the concept of Code Coverage. Code Coverage is a metric used to measure the effectiveness of unit tests and end-to-end tests. I did this using Python and a popular open-source tool for Python called Coverage.py.

In this article, I’ll pick up where my last article left off. I’ll show how to choose and then use a Python unit testing tool with Coverage.py. I’ll start out by sharing a handful of requirements that I often use when looking for a unit testing tool for any language. I’ll then compare two popular unit testing tools in the Python community and share my reasons for picking one of them. From there, I’ll show how to install the tool of choice, create unit tests, integrate with Coverage.py, and configure test runs to your needs. Along the way, I’ll share best practices for organizing unit tests within your repository and I’ll also clarify the results of test runs.

Choosing a Tool

Before researching and tinkering with any unit testing tools, list your requirements. Once your requirements are understood, you can start reading up on tools.

I wanted the best tool for all of my Python experiments. Specifically, I wanted something that would work well with simple language experiments and something that would work well for the more ambitious topics I intend to research, such as Data Science and Machine Learning. Consequently, I came up with the five requirements listed below.

  • Plain old Python. This is sometimes referred to as no-boilerplate code. A unit testing tool will have a few functions and classes that need to be used within the unit test you write. This is unavoidable, but the less tooling that comes with the unit testing tool, the easier it will be to write unit tests.
  • Integration with Coverage.py. I need to be able to turn on coverage tracking before a single line of code is tested. This could be done at the command line if there’s a way to combine the command-line interface for both Coverage.py and a unit testing tool. If CLI integration is too messy, I’m willing to write code but I don’t want to have to write a lot of code to get this to happen.
  • Exception checks. Unit tests can simulate error conditions and check to make sure your code handles these conditions correctly. For example, if a function threw an exception under certain conditions, the unit testing tool should allow you to specify the exception that should be thrown before calling the function. If the correct exception is thrown, the test passes. Otherwise it fails.
  • Debugging information. For tests that produce an error, I’d like to see all the information I’d get if I ran the test interactively. Specifically, the full error description, the line of code that blew up, and the stack trace.
  • Test Discovery As my projects become more robust, the number of unit tests I write grows. I’d like to be able to run a simple command and have all my unit tests automatically discovered and run. I don’t want to have to maintain a list of my unit tests and specify each one during a test run.

I next set out to create a list of tools available within the Python community that could potentially meet my requirements. After reading several blogs and a handful of tutorials, I narrowed my decision down to two tools: Unittest and Pytest. Unittest is a part of Python itself, so it has the advantage that no install is necessary. Pytest is another popular tool that many have adopted and it’s gaining momentum in the Python community. A couple of tools that I eliminated before even comparing to my requirements were Nose and doctest. Both are good tools. If your requirements are different than mine, give them a read.

Armed with my well-thought-out requirements, I set out to compare the capabilities of Unittest and Pytest. Table 1 shows this comparison.

As you can see, the comparison is close. The Plain Old Python requirement isn’t incredibly important to me. The extra code you need to write when using Unittest is not much of a hindrance. I was also able to perform all important checks with Unittest’s built-in checker. In the end, I chose Pytest because of the better error reporting for tests that produce a runtime error. Additionally, Pytest can be used for creating functional tests where an entire feature of an application is tested. In this article, I’ll focus only on unit tests.

Another tidbit I came across during my research is that Pytest can be used as a runner for tests written using Unittest. In other words, if tests written using Unittest conform to the naming conventions expected by Pytest, Pytest can discover and run them. So if you come across a situation where Unittest is better suited, go ahead and use it.

If tests written using Unittest conform to the naming conventions expected by Pytest, Pytest can discover and run them.

Installing Pytest

To install Pytest, you’ll use Python's package manager pip. The command below works on both Windows and Mac. It installs Pytest if it doesn’t exist and it upgrades Pytest if you aren’t on the latest version.

pip install -U pytest

If you have two versions of Python on your computer, make sure that you use the correct pip command. If you have a Mac and you installed Python 3.6, which is the latest version, chances are that you have two versions of Python because Python 2.7 comes preinstalled on every Mac and when you install Python 3.6, it gets installed alongside Python 2.7. If this is the case, then the command above installs Pytest into your Python 2.7 library. To install Pytest into your Python 3.6 library, use the pip3 command:

pip3 install -U pytest

To check your version of Pytest, use this command:

pytest - -version

Writing Unit Tests

Unit tests, as the name implies, are tests that you can run against the individual units of your application. This typically ends up being tested against the functions and methods within your modules and classes. If you find that you have a unit test that calls several functions, see if there’s a way to break it up into a collection of simple tests. When a simple test fails, it’s easy to find the problem and correct it.

As the functionality of your application increases, so will the number of your unit tests. Don’t worry if you find yourself with a lot of unit tests. It’s worth it.

In general, there are three things you can check using unit tests.

  • Did the code work? In other words, did it throw a runtime error when it should have completed successfully?
  • Did the function return the correct value? Or is the application state correct after the function call?
  • Did the function throw the correct error? When error conditions are simulated, does the function handle the simulated error condition correctly?

The snippet below is a simple unit test that loads a csv file containing transactions downloaded from Mint. (This code can be found in the code sample for this article. See the sidebar "About the Code Sample" for more details.) This test calls the get_categories function within the data access module which is imported into the data variable. The get_categories function returns a dictionary object that organizes the transactions by category. (e.g., Shopping, Dining, Car Payment, etc.). When the function completes, Python’s assert statement is used to check the dictionary of categories to make sure that the "Shopping" category is present. If it isn’t, this unit test is marked as having failed.

def test_get_categories():
   ''' Test the get_categories function.'''
   transactions = data.load_transaction_file(
      settings.DATA_FILE)
   cats = data.get_categories(transactions)
   assert 'Shopping' in cats

This code is interesting because it doesn’t use anything from the Pytest package. It’s plain old Python. Pytest makes use of the assert statement that’s a part of Python itself. Engineers typically use the assert statement to check assumptions in their code. If the condition after the assert statement evaluates to false, the assert stops program execution and provides debugging information so that the problem can be corrected. However, when used within a Pytest unit test, your unit test run won’t stop. Pytest runs all your unit tests and makes note of all failed tests so that they can be reported at the end of the test run.

The name of all unit tests must start with test_. Also, the name of all modules (or files) that contain unit tests must also start with test_. This allows py.test to automatically discover and run all tests in the current directory and in any child directories. You don’t have to manually specify each test. (In a later section, I’ll share best practices for organizing tests within a repository.)

The next snippet demonstrates how to check whether a function is throwing the correct error. The load_transaction_file function throws the custom exception, MissingTransactionFile, if the transaction file cannot be found. This test simulates this error condition by passing a file name that doesn’t exist. This test makes use of the Pytest tool pytest.raises to check the type of the thrown exception. Consequently, you need to import pytest.

def test_load_transaction_file_error_2():
   ''' Make sure the correct message is
created within the exception. '''
   with pytest.raises(
   data.MissingTransactionFile) as ex:
       file_name = 'does_not_exist.csv'
       transactions = data.load_transaction_file(
              file_name)
   assert file_name in str(ex.value)

Notice in this snippet that it’s possible to check the state of the thrown exception. This test not only makes sure that the exception type is MissingTransactionFile but it also makes sure that the exception contains the name of the file that couldn’t be found in the value property. Providing as much information as possible about an error condition helps an engineer consuming your code to fix the problem. If your application throws errors, you should write unit tests that simulate every condition that could cause the code to throw an error. This allows you to check the type of the exception and the state of the exception.

Fixtures

If you look closely at the unit tests written in the previous section, you’ll notice that each unit test has been written to run independently. In other words, set-up code and tear-down code are a part of each test. I load the transaction.csv file within every test and when the test completes, the dictionary object that I use to hold all the transactions falls out of scope and is removed from memory. This is less than optimal when you have a lot of tests and you need to gather a lot of data before running your test. For example, you may need to create a small database that emulates your production database or you may need to call a third-party service that returns data to your application for processing. If you need to do this for every unit test, your test run will take a long time to complete.

Fortunately, Pytest provides a mechanism known as Fixtures. Fixtures allow common set-up code to be placed in functions that can be re-used across unit tests. Consider the code in Listing 1. This code may look funny if you’re new to Python or if you’re following along with no working experience of Python. The function transactions loads the transaction file into a Python list. The @pytest.fixture(scope=‘session') on top of the function definition tells Pytest that this function is a fixture and not a test. In Python, this statement is known as a decorator. (It serves the same purpose as an attribute in C#.) This is basically an aspect. An aspect, or aspect-oriented coding, is a way to tell a runtime to inject behavior. In this case, the behavior that’s injected is that each unit test containing a parameter with the same name as the fixture function gets the return value of the fixture function placed into the value of the parameter.

This code appears much cleaner than its previous version, as duplicate code isn’t littered across each test, making the test file large and unwieldy. If you wish to share fixtures across modules, place your fixtures in a module named conftest.py. The conftest.py module shouldn’t be imported by modules that contain unit tests. Pytest reads this file for you without an import statement.

When fixtures are shared, they should be assigned a scope. This tells Pytest how long a fixture should be kept in memory. The fixture in Listing 1 is set up with a scope that keeps it around for the entire test run or session. In other words, the fixture only gets called once and the data it returns is around for the duration of the test run. This is done by setting the scope parameter to session, as shown in Listing 1. If you change the decorator to @pytest.fixture(scope=‘module'), any data returned from the fixture function is scoped to the module of the unit test that used the fixture. If another module uses the same fixture, the fixture gets called again.

Finally, notice the yield statement in the fixture function. When the function falls out of scope, all code after the yield is called. This provides a place to put teardown code. This is standard Python syntax and is not a feature of Pytest.

Running Unit Tests with Coverage.py

There are a few ways to run unit tests with Coverage.py. You can combine them at the command line; you can use a Pytest add-in known as py-cov, or you can create a Python module that calls Coverage.py and Pytest APIs. Regardless of the technique you choose, use configuration files to turn on the features of both Coverage.py and Pytest.

Coverage.py uses the coveragerc file for configuration. All command line options can be placed in this file. The file in the code sample uses all the features of Coverage.py that I described in my previous article.

Pytest uses pytest.ini for configuration. Setting up pytest.ini requires only two lines. The next snippet turns on my favorite features.

[pytest]
addopts = --maxfail=5 –durations=5 –color=yes

These options tell Pytest to abort the run if more than five tests produce an application error. They also show the five slowest unit tests and colorize all output. There are many more ways to change the behavior of Pytest. You can get a complete list of options by running pytest --help at the command line.

Once your configuration files are set up, you can run Pytest and Coverage.py without having to re-specify your options every time you start a test run. To run Coverage.py and Pytest at the command line, use the following command:

coverage run –m pytest

Don’t forget to use the -m option because you’re running Pytest as a library module and not a single py file.

Using APIs allows you to print a coverage report after Pytest prints the results of your tests. This is a nice touch as it allows engineers running unit tests to quickly get an understanding of the code that’s being tested and code for which no tests have been written. To implement this technique, create a Python module that’s your main entry point for calling all of the unit tests in your repository. A Python module named test_main_pytest.py that integrated Pytest and Coverage.py is shown in Listing 2. In the code sample, you’ll find this module in the test directory.

Once this module’s in place, you can run your entire suite of unit tests by calling it, as shown in the next snippet. It’s best to start your unit tests from the test directory. By default, both Coverage.py and Pytest look for configuration files in the present working directory.

$ python test_main_pytest.py

Understanding the Results of a Test Run

Unit tests look for three types of problems: application errors, incorrectly created application state, and incorrectly thrown exceptions when error conditions are simulated. The Pytest runner provides all of the information needed to find and correct these three types of problems when you run your suite of unit tests.

The output produced when an application error is encountered is shown in Listing 3. Notice that the unit test that found the error is shown. The exact error that occurred is also shown, as well as the location of the error. Tracking down application errors is usually a simple matter, especially when this information is viewed by an engineer who just wrote a piece of functionality. For this reason, it’s a good idea to run unit tests locally on an engineer’s workstation prior to submitting any changes to a source code management tool.

Py.test doesn’t stop the execution of your suite of tests when one test produces an application error. The default behavior for py.test is to run all tests regardless of results. It’s a common mistake to mess up an application configuration setting in such a way that it causes all your unit tests to bomb out. For example, in the code sample, if the name of the transaction file is specified incorrectly, the majority of the unit tests produce an application error. If you have hundreds of unit tests, this is a waste of time and computer cycles, as it could take some time before Pytest gets through calling all your unit tests. To get around this potential problem use the maxfail option when running Pytest. This option allows you to specify the maximum number of errors that can occur before Pytest aborts the current run.

Listing 4 shows the output produced when a unit test didn’t produce the correct application state. The unit test that failed is listed along with the assertion that found the problem.

Listing 5 shows the output that occurs when a test fails because it isn’t receiving the correct exception when error conditions are simulated. The output shows the unit test that failed, the exception that is being thrown, and the code that threw the exception.

Listing 6 shows a unit test that’s receiving the correct exception, however the error message within the exception doesn’t contain enough detail for an engineer to fix the problem.

Finally, Listing 7 shows a successful unit test run along with coverage information.

Organizing Your Repository

Consider putting all unit tests into a single folder. I usually create a folder off of my project’s root folder named test for this purpose. The files in the test folder should only contain unit tests, functional tests, fixtures, and configuration files. They shouldn’t contain application logic. It may be tempting to put unit tests in the same files that contain application logic, as this would provide an easy way for a code reviewer to ensure that every function has a unit test. However, this has the undesirable effect of blotting your files. For this reason, I keep unit tests separate from application logic.

To help engineers locate unit tests, maintain parity between the test folder and the root of the application by having the test directory contain all the same subfolders as the root with unit tests located in a like-named folder under test. Unit tests should be named after the functions they test but prefixed with test_. Files should be named in a similar fashion. If a file containing data access code is named dataaccess.py, all of the unit tests for the functions in this module will be in a file named test_dataaccess.py located in the test folder.

These organizational tips can be a real headache-saver for large repositories that have many engineers working within them.

Summary

In this article, I showed you how to use Pytest, a Python unit testing tool, with Coverage.py. I didn’t go into detail on how to set up and use Coverage.py. If you’ve never used this tool, read my previous article in the Jan/Feb 2017 issue of CODE Magazine, "Improving Code Quality Using Coverage.py" (http://www.codemag.com/article/1701081).

The techniques shown in this article were about running unit tests locally on an engineer’s development computer. Another place that unit tests should be run is within a build job that’s run by a continuous integration server. Introducing continuous integration would have made this article too large, so I’ve left it for another article.