Test matrix explosions are a growing challenge in modern software testing, especially when using parameterized tests in machine learning pipelines. This article presents a hybrid approach, combining Python's Abstract Syntax Tree (AST) analysis and the parameterized testing framework Pytest to identify a mapping of test parameterizations, thereby revealing hidden redundancies. By strategically trimming the test matrix, this approach reduced test cases for the development team I'm on by 35% and reduced CI/CD pipeline execution times by more than one-fourth to half, demonstrating the power of static analysis for smarter test management.

Python-based testing frameworks like Pytest are widely used for unit, integration, and system-level testing. As testers, we strive to maximize coverage across diverse workflows, from simple user interactions to complex developer pipelines. However, test matrices tend to grow exponentially as features evolve, especially when leveraging parameterized testing.

The Challenge: When Parameterization Becomes Unmanageable

Parameterization enhances coverage, but it can silently grow to an unmanageable scale. This is particularly common in machine learning (ML) environments, where tests involve varying model parameters.

My workplace supports multiple open-source ML model formats and validates their conversion within our ecosystem. This led to scenarios such as:

  • Longer test execution times delaying feedback loops
  • Redundant testing, where similar paths were repeatedly verified
  • Increased maintenance overhead, making test suite management harder
  • Scaling challenges when running tests across multiple devices and platforms

To maintain efficient testing without sacrificing coverage, I wanted to analyze and optimize the test matrix strategically.

The testing pipeline followed a consistent workflow:

  1. Parameterize the test function.
  2. Fetch an external/open-source machine learning model and/or data.
  3. Convert the fetched model to the in-house format.
  4. Execute the updated model under various test parameters.

Because every test worked with some open-source model, I hypothesized that certain models were over-represented across multiple test cases. Therefore, I needed a way to bridge information between these models and the test cases. (Step 2 and Step 4 above)

To achieve this, I used a hybrid approach combining:

  • Python's Abstract Syntax Tree (AST) for static analysis of test files
  • The Pytest –collect-only option to list all parameterized test cases

Let's first explore the concept of Abstract Syntax Trees in general and how you can effectively use them in Python towards the goal

Understanding Python's Abstract Syntax Tree

An Abstract Syntax Tree (AST) is a tree representation of the structure of source code. It doesn't require the execution of the script, but instead, Python can parse it into an AST, which allows developers to analyze code statically. AST is a technique used widely in code analysis in compilers, linters, parsers, etc. for code inspection.

Python's AST module comes built-in, so there's no need to install it separately. All that's needed is the simple import of the module

import ast

Parsing Python Code with AST

Consider a simple example:

def greet(name): message = f"Hello, {name}!" print(message) greet("Sam")

Now, to parse this file with AST:

import ast # Read and parse the file with open("sample.py", "r") as f: tree = ast.parse(f.read()) # Print the AST structure print(ast.dump(tree, indent=4))

You can see the output from this file in Listing 1. Although Listing 1 generates a large dump, as explained below, you might only want or need to examine specific information in it.

Listing 1: Output displays a structured code representation revealing function definitions, variable assignments, and function calls

Module(
  body=[
    FunctionDef(
      name='greet',
      args=arguments(
        posonlyargs=[],
        args=[
          arg(arg='name')
        ],
        vararg=None,
        kwonlyargs=[],
        kw_defaults=[],
        kwarg=None,
        defaults=[]
      ),
      body=[
        Assign(
          targets=[
            Name(id='message', ctx=Store())
          ],
          value=JoinedStr(
            values=[
              Str(s='Hello, '),
              FormattedValue(
                value=Name(id='name', ctx=Load()),
                conversion=-1
              ),
              Str(s='!')
            ]
          )
        ),
        Expr(
          value=Call(
            func=Name(id='print', ctx=Load()),
            args=[
              Name(id='message', ctx=Load())
            ],
            keywords=[]
          )
        )
      ],
      decorator_list=[]
    ),
    Expr(
      value=Call(
        func=Name(id='greet', ctx=Load()),
        args=[
          Constant(value='Sam')
        ],
        keywords=[]
      )
    )
  ],
  type_ignores=[]
)


Example: Extracting Specific Information from the AST

Let's say that you want to extract all function names from the script. You can walk through the AST like this:

class FunctionVisitor(ast.NodeVisitor): def visit_FunctionDef(self, node): print(f"Function name found: {node.name}") self.generic_visit(node) # Parse and analyze the script tree = ast.parse(open("sample.py").read()) visitor = FunctionVisitor() visitor.visit(tree)

This outputs:

Function name found: greet

The base class ast.NodeVisitor used above allows you to traverse an AST and perform operations on specific node types. By subclassing NodeVisitor, you can define custom logic to extract information from the syntax tree.

When you create a class that inherits from ast.NodeVisitor, you can override specific visit methods, such as:

  • visit_FunctionDef(self, node): Called when a function definition (def func_name(...)) is encountered
  • visit_Call(self, node): Called when a function call (some_function(...)) is encountered
  • visit_Assign(self, node): Called when an assignment (x = 5) is encountered

Each visit_ method is automatically invoked when the visitor walks over a corresponding node type in the AST.

Now that you have a basic understanding of AST, we can apply these concepts to our original problem.

Analyzing Test Files to Extract Parameterized Test Information

By extending the AST analysis, you can look for specific function calls, such as test case parameters, and identify redundancy patterns:

  • AST Analysis: AST allows static code analysis without execution, making it possible to extract information about input models used in tests. It parses the Python source code into an abstract tree structure, enabling detailed inspection of function calls, arguments, and variables.
  • Pytest –collect-only: This option lists all parameterizations of tests without executing them, providing an expanded view of the test matrix

One of the useful Pytest options when dealing with parameterized test names across the entire module is –collect-only. It enables you to observe the full set of parameterizations for each test without executing them. Let's explore more.

Consider the following parameterized test:

class TestModels(): @parameterized.expand(itertools.product(PARAM, PRECISION)) def test_modelexported_platform(param, precision): # Test implementation here

Assuming:

PARAM = param1, param2 PRECISION = precision1

To observe how this expands, execute from the terminal, looking only at the test file, after installing the Pytest Python package,

> (python environment) pytest --collect-only test_*

The output looks like this:

Test session starts collecting ... <Package ABC> <Module M> <UnitTestCase TestModels> <TestCaseFunction test_modelexported_platform_param1_precision1> <TestCaseFunction test_modelexported_platform_param2_precision1>

This provides visibility into the expanded test matrix. However, it doesn't reveal which input models are used (Step 2 of Identifying Common Patterns in Tests above), which was critical for identifying redundancies in this case.

Extracting Input Models Using AST Analysis

To bridge this gap, I used the AST module to statically analyze the source code and extract input models.

For example, most of the machine learning tests in this space loaded models using a function like:

original_model = my_model_loading_function(<model_name>)

Example:

@parametrize(precision=[16, 32]) def test_model_conversion(): model = my_model_loading_function('Resnet50.pt') converted_model = model.export() converted_model.save()

A simple AST-based analysis walks over the code and searches for such function usages:

import ast def get_model_loading_calls(file_path): with open(file_path, "r") as source: tree = ast.parse(source.read()) models = [] # Walk over the code and get usages of the loading function for node in ast.walk(tree): # if this is a function call node and uses a string input if (isinstance(node, ast.Call) and isinstance(node.func, ast.Name) and node.func.id == 'my_model_loading_function'): models.append(node.args[0].s) return models

This extracts Resnet50.pt, allowing you to proceed to map model strings to the actual expanded test cases, thereby enabling redundancy analysis.

Mapping Input Models to Test Cases

After extracting test parameterizations (pytest –collect-only) and the input models from the AST analysis, the two datasets were combined to identify model-test relationships:

ast_results = [] for file_path in test_files: ast_results.extend(get_model_loading_calls(file_path))

Assume the Pytest results were collected in the format of:

package::module::classname::expandedtestname

The final mapping is then obtained by the merge:

mapping = combine_results(ast_results, pytest_results)

The following code snippet shows the final result:

{ "ResNet50.pt": [ "packageA::moduleA::classnameA::test_model_conversion_16", "packageA::moduleA::classnameA::test_model_conversion_32", "packageM::moduleB::classnameF::testnameX", "packageM::moduleB::classnameF::testnameY" ] }

From this mapping, you can observe that it's not just test_model_conversion() that references ResNet50.pt, but also other test cases like testnameX and testnameY located in different packages/modules. This comprehensive overview of model-to-test relationships reveals the extent of model reuse across the test suite.

By identifying these overlaps, you can focus on over-tested models, guiding targeted optimizations to reduce redundancy and improve CI/CD efficiency.

By analyzing the test-to-model mapping, the team inspected the suite and used a smaller test matrix where a quick turnaround was needed. The following optimizations were thus applied:

  • Reduced redundant tests where multiple tests used the same model behavior to validate similar behaviors
  • Scheduled less critical test cases for periodic execution
  • Reduced CI/CD execution time without losing meaningful coverage

Challenges/Limitations

There are some limitations with the approach as it depends on how the tests are tailored. My team mostly sees the following:

  • AST Limitations: Custom handling is needed if the tree parsing isn't straightforward and has nested function calls or loops. The above example demonstrated a simple string specifying the name of the model used. My team's set up, in actuality, handles definitions such as strings and variables, but not situations where the model loaded requires a detailed code traversal
  • Dynamic Model Selection: If the model wasn't specified up front but loaded dynamically at runtime, this makes it harder to detect statically. Thus it does require collaboration with developers to standardize simple model-loading patterns

Conclusion and Adaptability

By leveraging AST for static analysis and Pytest for parameter mapping, this approach helps to decide how to reduce test combinatorial explosions while maintaining coverage. It empowers testers to visualize test redundancy and strategically optimize test matrices, leading to faster and more maintainable CI/CD pipelines.

This methodology can be adapted for broader testing scenarios, making it a powerful tool for modern test engineers. You can adapt it as needed where there are repeatable static patterns in code that help visualize the source of various tests being used.