Development teams are under pressure. Releases needed to be delivered on time. Coding and quality standards need to be met. And mistakes are not an option. That's why development teams are using static analysis.
The main work of static code analysis tools is to analyze an application's compiled code or source code analysis so that one can easily identify the vulnerabilities without executing the program.
Why use static analysis?
- Provides insight into code without executing it
- Executes quickly relative to dynamic analysis
- Can automate maintaining code quality
- Can automate finding bugs early (not all of them though)
- Can automate finding security issues early
- You're already using it (if you're using any IDE it already has static analyzers inside, Pycharm uses pep8 for example)
What types of static analysis exist?
- Code styling analysis
- Security linting
- Error detection
- UML diagram creation
- Duplicate code detection
- Complexity analysis
- Comment styling analysis
- Unused code detection
Let's go to tools that exist in Python ecosystem for static analysis:
Well known in Python community Pylint is in the first place on my list. It supports a number of features, from coding standards to error detection, and it also helps with refactoring (by detecting duplicated or unused code).
Pylint is overly pedantic out of the box and benefits from a minimal effort of configuration, but it is fully customizable through a .pylintrc file where you select which errors or conventions are relevant to you.
You can easily install pylint for Ubuntu:
$ sudo apt-get install pylint
$ pylint <file/dir> --rcfile=<.pylintrc>
Running Pylint on a piece of code will result in something like this (which will be followed by some statistics):
$ pylint app.py C:122, 4: Missing method docstring (missing-docstring) R:136, 0: Too many instance attributes (9/7) (too-many-instance-attributes) R:217, 4: Too many local variables (23/15) (too-many-locals) C:345,16: Variable name "mo" doesn't conform to snake_case naming style (invalid-name) R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks) C:377,24: Variable name "mo" doesn't conform to snake_case naming style (invalid-name) W:403,34: Access to a protected member _payload of a client class (protected-access) R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks) C:405,28: Variable name "mo" doesn't conform to snake_case naming style (invalid-name) W:408,32: Access to a protected member _payload of a client class (protected-access) R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks) W:268,16: Unused variable 'msg' (unused-variable) R:217, 4: Too many return statements (7/6) (too-many-return-statements) R:217, 4: Too many branches (58/12) (too-many-branches) R:217, 4: Too many statements (160/50) (too-many-statements)
Note that Pylint prefixes each of the problem areas with a R, C, W, E, or F, meaning:
[R]efactor for “good practice” metric violation [C]onvention for coding standard violation [W]arning for stylistic problems, or minor programming issues [E]rror for important programming issues (i.e. most probably a bug) [F]atal for errors which prevented further processing
Regarding the coding style, Pylint follows the PEP8 style guide.
Pylint ships with Pyreverse, with which it creates UML diagrams for your code. You can automate Pylint with Apycot, Hudson or Jenkins; it also integrates with several editors.
You can also write small plugins to add personal features.
Another similar tool, pyflakes approach is to try very hard not to emit false positives. Pyflakes “makes a simple promise: it will never complain about style, and it will try very, very hard to never emit false positives”. This means that Pyflakes won’t tell you about missing docstrings or argument names not conforming to a naming style. It focuses on logical code issues and potential errors.
Pyflakes only examines the syntax tree of each file individually. That, combined with a limited set of errors, makes it faster than Pylint. On the other hand, pyflakes is more limited in terms of the things it can check.
It can be installed with:
$ pip install --upgrade pyflakes
While pyflakes doesn’t do any stylistic checks, there is another tool that combines pyflakes with style checks against PEP8: Flake8. Flake8, aside from combining pyflakes and pep8, also adds per-project configuration ability.
Mypy is a static type checker for Python.
The requirement here is that your code is annotated, using Python 3 function annotation syntax (PEP484). Then, mypy can type check your code and find common bugs. It's goal is to combine the benefits of dynamic typing and static typing(using typing module).
Installation and usage are similar to others.
While mypy is still in development, it already supports a significant subset of Python features.
Type declarations act as machine-checked documentation and static typing makes your code easier to understand and easier to modify without introducing bugs.
4. Prospector (prospector.landscape.io)
One of the powerful tools for static analysis to analyze Python code and output information about errors, potential problems, convention violations and complexity. It incorporates:
- PyLint - Code quality/Error detection/Duplicate code detection
- pep8.py - PEP8 code quality
- pep257.py - PEP27 Comment quality
- pyflakes - Error detection
- mccabe - Cyclomatic Complexity Analyser
- dodgy - secrets leak detection
- pyroma - setup.py validator
- vulture - unused code detection
Chances are, you will consider most of the warnings that come from tools like pylint or pep8 or pyflakes to be a bit picky. There are warnings about line length, there are warnings about whitespace on empty lines, there are warnings about how much space there is between methods on your class, etc. What you probably want, however, is a list of actual problems in your code.
For that reason, Prospector has a series of settings and default behaviors to suppress the more picky warnings and only provide things that are important.
It's my favorite tool for static analysis because of its power, customizations for the team and easy usage.
$ pip install prospector
You can customize severity, ignore some errors and enable/disable tools by providing the .prospector.yml file. For example:
strictness: medium test-warnings: true doc-warnings: true autodetect: false max-line-length: 120 pep8: full: true disable: - N803 # argument name should be lowercase - N806 # variable in function should be lowercase - N812 # lowercase imported as non lowercase pylint: run: true disable: - too-many-locals - arguments-differ - no-else-return - inconsistent-return-statements pep257: run: true disable: - D203 # 1 blank line required before class docstring - D212 # Multi-line docstring summary should start at the first line - D213 # Multi-line docstring summary should start at the second line
5. Bandit (github.com)
bandit is a tool designed to find common security issues in Python code. It can do:
- Static analysis tool to detect security defects
- Hardcoded passwords
- Invalid pickle serialization/deserialization
- Shell injections
- SQL injections
Test results: >> Issue: [B307:blacklist] Use of possibly insecure function - consider using safer ast.literal_eval. Severity: Medium Confidence: High Location: test.py:3 More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b307-eval print(eval("1+1")) -------------------------------------------------- Code scanned: Total lines of code: 2 Total lines skipped (#nosec): 0 Run metrics: Total issues (by severity): Undefined: 0.0 Low: 0.0 Medium: 1.0 High: 0.0 Total issues (by confidence): Undefined: 0.0 Low: 0.0 Medium: 0.0 High: 1.0 Files skipped (0):
Let's look specifically at the
Test results section. We see here that there's an issue labeled B307 and named blacklist. The message then usually tells us what the specific issue is and a potential way to fix it, blacklist means eval operator is blacklisted(and suppose to).
After that message, we are given information about:
- How severe the issue is - Medium in this case
- How confident Bandit is that there's a problem - High
- Where the issue is - in
test.pyon line number 2
- And the code in question, complete with line numbers.
It's pretty straightforward and easy to use.
In conclusion, spending time on static analysis will really(really) benefit you and your team in terms of time spending on finding bugs, in terms of explaining code to project newcomers, in terms of project costs, etc. If you spend the time doing it upfront it may seem like you're not working on features but it will come back to you in the future you will benefit from this at some point.