Python Static Analysis Tools

Python Static Analysis Tools

Development teams are under pressure. Releases needed to be delivered on time. Coding and quality standards need to be met. And mistakes are not an option. That’s why development teams are using static analysis.
The main work of static code analysis tools is to analyze an application’s compiled code or source code analysis so that one can easily identify the vulnerabilities without executing the program.

Why use static analysis?

  • Provides insight into code without executing it
  • Executes quickly relative to dynamic analysis
  • Can automate maintaining code quality
  • Can automate finding bugs early (not all of them though)
  • Can automate finding security issues early
  • You're already using it (if you're using any IDE it already has static analysers inside, Pycharm uses pep8 for example)

What types of static analysis exist?

  • Code styling analysis
  • Security linting
  • Error detection
  • UML diagram creation
  • Duplicate code detection
  • Complexity analysis
  • Comment styling analysis
  • Unused code detection

Let's go to tools that exist in Python ecosystem for static analysis:

1. Pylint (pylint.org)

Well known in Python community Pylint is on the first place on my list. It supports a number of features, from coding standards to error detection, and it also helps with refactoring (by detecting duplicated or unused code).

Pylint is overly pedantic out of the box and benefits from a minimal effort of configuration, but it is fully customizable through a .pylintrc file where you select which errors or conventions are relevant to you.

You can easily install pylint for Ubuntu:

$ sudo apt-get install pylint


$ pylint <file/dir> --rcfile=<.pylintrc>

Running Pylint on a piece of code will result in something like this (which will be followed by some statistics):

$ pylint app.py
C:122, 4: Missing method docstring (missing-docstring)
R:136, 0: Too many instance attributes (9/7) (too-many-instance-attributes)
R:217, 4: Too many local variables (23/15) (too-many-locals)
C:345,16: Variable name "mo" doesn't conform to snake_case naming style (invalid-name)
R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks)
C:377,24: Variable name "mo" doesn't conform to snake_case naming style (invalid-name)
W:403,34: Access to a protected member _payload of a client class (protected-access)
R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks)
C:405,28: Variable name "mo" doesn't conform to snake_case naming style (invalid-name)
W:408,32: Access to a protected member _payload of a client class (protected-access)
R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks)
W:268,16: Unused variable 'msg' (unused-variable)
R:217, 4: Too many return statements (7/6) (too-many-return-statements)
R:217, 4: Too many branches (58/12) (too-many-branches)
R:217, 4: Too many statements (160/50) (too-many-statements)

Note that Pylint prefixes each of the problem areas with a R, C, W, E, or F, meaning:

    [R]efactor for a “good practice” metric violation
    [C]onvention for coding standard violation
    [W]arning for stylistic problems, or minor programming issues
    [E]rror for important programming issues (i.e. most probably bug)
    [F]atal for errors which prevented further processing

Regarding coding style, Pylint follows the PEP8 style guide.

Pylint ships with Pyreverse, with which it creates UML diagrams for your code. You can automate Pylint with Apycot, Hudson or Jenkins; it also integrates with several editors.

You can also write small plugins to add personal features.

2. Pyflakes (https://github.com)

Another similar tool, pyflakes approach is to try very hard not to emit false positives. Pyflakes “makes a simple promise: it will never complain about style, and it will try very, very hard to never emit false positives”. This means that Pyflakes won’t tell you about missing docstrings or argument names not conforming to a naming style. It focuses on logical code issues and potential errors.
Pyflakes only examines the syntax tree of each file individually. That, combined with a limited set of errors, makes it faster than Pylint. On the other hand, pyflakes is more limited in terms of the things it can check.

It can be installed with:

$ pip install --upgrade pyflakes

While pyflakes doesn’t do any stylistic checks, there is another tool that combines pyflakes with style checks against PEP8: Flake8. Flake8, aside from combining pyflakes and pep8, also adds per-project configuration ability.

3. Mypy (mypy-lang.org)

Mypy is a static type checker for Python.

The requirement here is that your code is annotated, using Python 3 function annotation syntax (PEP484). Then, mypy can type check your code and find common bugs. It's goal is to combine the benefits of dynamic typing and static typing(using typing module).

Installation and usage are simular to others.

While mypy is still in development, it already supports a significant subset of Python features.

Type declarations act as machine-checked documentation and static typing makes your code easier to understand and easier to modify without introducing bugs.

4. Prospector (prospector.landscape.io)

One of the powerful tools for static analysis to analyze Python code and output information about errors, potential problems, convention violations and complexity. It incorporates:

  • PyLint - Code quality/Error detection/Duplicate code detection
  • pep8.py - PEP8 code quality
  • pep257.py - PEP27 Comment quality
  • pyflakes - Error detection
  • mccabe - Cyclomatic Complexity Analyser
  • dodgy - secrets leak detection
  • pyroma - setup.py validator
  • vulture - unused code detection

Chances are, you will consider most of the warnings that come from tools like pylint or pep8 or pyflakes to be a bit picky. There are warnings about line length, there are warnings about whitespace on empty lines, there are warnings about how much space there is between methods on your class etc. What you probably want, however, is a list of actual problems in your code.

For that reason, Prospector has a series of settings and default behaviors to supress the more picky warnings and only provide things that are important.

It's my favorite tool for static analysis because of its power, customizations for the team and easy usage.


$ pip install prospector

You can customize severity, ignore some errors and enable/disable tools by providing the .prospector.yml file. For example:

strictness: medium
test-warnings: true
doc-warnings: true
autodetect: false
max-line-length: 120

    full: true
      - N803 # argument name should be lowercase
      - N806 # variable in function should be lowercase
      - N812 # lowercase imported as non lowercase

    run: true
      - too-many-locals
      - arguments-differ
      - no-else-return
      - inconsistent-return-statements
    run: true
      - D203 # 1 blank line required before class docstring
      - D212 # Multi-line docstring summary should start at the first line
      - D213 # Multi-line docstring summary should start at the second line

5. Bandit (github.com)

Bandit is a tool designed to find common security issues in Python code. It can do:

  • Static analysis tool to detect security defects
  • Hardcoded passwords
  • Invalid pickle serialization/deserialization
  • Shell injections
  • SQL injections
Test results:
>> Issue: [B307:blacklist] Use of possibly insecure function - consider using safer ast.literal_eval.
   Severity: Medium   Confidence: High
   Location: test.py:3
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b307-eval


Code scanned:
    Total lines of code: 2
    Total lines skipped (#nosec): 0

Run metrics:
    Total issues (by severity):
        Undefined: 0.0
        Low: 0.0
        Medium: 1.0
        High: 0.0
    Total issues (by confidence):
        Undefined: 0.0
        Low: 0.0
        Medium: 0.0
        High: 1.0
Files skipped (0):

Let's look specifically at the Test results section. We see here that there's an issue labeled B307 and named blacklist. The message then usually tells us what the specific issue is and a potential way to fix it, blacklist means eval operator is blacklisted(and suppose to).

After that message, we are given information about:

  1. How severe the issue is - Medium in this case
  2. How confident Bandit is that there's a problem - High
  3. Where the issue is - in test.py on line number 2
  4. And the code in question, complete with line numbers.

It's pretty straigtforward and easy to use.

Other tools out there are PyChecker, PEP8,  Frosted (a fork of PyFlakes) and Flake8 (a wrapper around PyFlakes and PEP8).

In conclusion, spending time on static analysis will really(really) benefit you and your team in terms of time spending on finding bugs, in terms of explaining code to project newcomers, in terms of project costs etc . If you spend the time doing it upfront it may seem like you're not working on features but it will come back to you in the future you will you will benefit from this at some point.

Support author