Development teams are under pressure. Releases must be delivered on time. Coding and quality standards must be met. And errors are not an option. That's why developer teams use static analysis.
The main work of static code analysis tools is to analyze compiled application code or source code analysis so that you could easily detect vulnerabilities without executing a program.
Why use static analysis?
- Provides insight into code without executing it
- Executes quickly in comparison with dynamic analysis
- Can automate code quality maintenance
- Can automate the search for bugs at the early stages (although not all).
- Can automate the finding of security problems at an early stage
- You already using it (if you use any IDE that already has static analyzers, Pycharm uses pep8 for example).
What types of static analysis exist?
- Code styling analysis
- Security linting
- Error detection
- UML diagram creation
- Duplicate code detection
- Complexity analysis
- Comment styling analysis
- Unused code detection
Let's move on to the tools that exist in the Python ecosystem for static analysis:
Well known in the community, Python pylint is number one on my list. It has a number of features, from coding standards to error detection, and helps with refactoring (by detecting duplicate or unused code).
Pylint is overly pedantic out of the box and benefits from a minimal configuration effort, but it is fully customizable through a
.pylintrc file where you can choose which errors or agreements are relevant to you.
You can easily install pylint for Ubuntu:
$ sudo apt-get install pylint
$ pylint <file/dir> --rcfile=<.pylintrc>
Running Pylint on a piece of code will result in something like this (which will be followed by some statistics):
$ pylint app.py C:122, 4: Missing method docstring (missing-docstring) R:136, 0: Too many instance attributes (9/7) (too-many-instance-attributes) R:217, 4: Too many local variables (23/15) (too-many-locals) C:345,16: Variable name "mo" doesn't conform to snake_case naming style (invalid-name) R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks) C:377,24: Variable name "mo" doesn't conform to snake_case naming style (invalid-name) W:403,34: Access to a protected member _payload of a client class (protected-access) R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks) C:405,28: Variable name "mo" doesn't conform to snake_case naming style (invalid-name) W:408,32: Access to a protected member _payload of a client class (protected-access) R:304, 8: Too many nested blocks (6/5) (too-many-nested-blocks) W:268,16: Unused variable 'msg' (unused-variable) R:217, 4: Too many return statements (7/6) (too-many-return-statements) R:217, 4: Too many branches (58/12) (too-many-branches) R:217, 4: Too many statements (160/50) (too-many-statements)
Note that Pylint prefixes each of the problem areas with an R, C, W, E, or F, meaning:
[R]efactor for “good practice” metric violation [C]onvention for coding standard violation [W]arning for stylistic problems, or minor programming issues [E]rror for important programming issues (i.e. most probably a bug) [F]atal for errors which prevented further processing
Regarding the coding style, Pylint follows the PEP8 style guide.
Pylint comes with Pyreverse, with which it creates UML diagrams for your code. You can automate pylint with Apycot, Hudson or Jenkins; it also integrates with multiple editors.
You can also write small plugins to add your own features.
Another similar tool, pyflakes' approach is to try very hard not to produce false positives. Pyflakes "makes a simple promise: it will never complain about style, and it will try very, very hard to never emit false positives". This means that Pyflakes will not tell you about the missing docstrings or argument names that do not match the naming style. It focuses on logical code issues and potential errors.
pyflakes only examines the syntax tree of each file individually. This, combined with a limited set of errors, makes it faster than pylint. On the other hand, pyflakes are more limited in what it can check.
It can be installed with:
$ pip install --upgrade pyflakes
Although pyflakes do not do any stylistic checks, there is another tool that combines pyflakes with PEP8 style checks: flake8. flake8, in addition to combining pyflakes and pep8, also adds configuration options for each project.
Mypy is a static type checker for Python.
The requirement here is that your code is annotated, using Python 3 function annotation syntax (PEP484). Then, mypy can type check your code and find common bugs. Its purpose is to combine the advantages of dynamic and static typing (using a typing module).
Installation and usage are similar to others.
Although mypy is still under development, it already supports a significant subset of Python functions.
Type declarations act as machine-tested documentation, and static typing makes your code clearer and easier to modify without making errors.
4. Prospector (prospector.landscape.io)
One of the powerful static analysis tools for analyzing Python code and displaying information about errors, potential issues, convention violations and complexity. It includes:
- PyLint - Code quality/Error detection/Duplicate code detection
- pep8.py - PEP8 code quality
- pep257.py - PEP27 Comment quality
- pyflakes - Error detection
- mccabe - Cyclomatic Complexity Analyser
- dodgy - secrets leak detection
- pyroma - setup.py validator
- vulture - unused code detection
Most of the warnings coming from tools such as pylint, pep8 or pyflakes are likely to be a bit picky. There are warnings about line lengths, warnings about spaces on empty lines, warnings about how much space there is between methods in your class, etc. However, what you actually need is a list of actual problems in your code.
For this reason, prospector has a number of settings and default behavior to suppress more picky warnings and provide only what is important.
It's my favorite tool for static analysis because of its power, customizations for the team and easy usage.
$ pip install prospector
You can customize severity, ignore some errors and enable/disable tools by providing the .prospector.yml file. For example:
strictness: medium test-warnings: true doc-warnings: true autodetect: false max-line-length: 120 pep8: full: true disable: - N803 # argument name should be lowercase - N806 # variable in function should be lowercase - N812 # lowercase imported as non lowercase pylint: run: true disable: - too-many-locals - arguments-differ - no-else-return - inconsistent-return-statements pep257: run: true disable: - D203 # 1 blank line required before class docstring - D212 # Multi-line docstring summary should start at the first line - D213 # Multi-line docstring summary should start at the second line
5. Bandit (github.com)
bandit is a tool designed to find common security issues in Python code. It can do:
- Static analysis tool to detect security defects
- Hardcoded passwords
- Invalid pickle serialization/deserialization
- Shell injections
- SQL injections
Test results: >> Issue: [B307:blacklist] Use of possibly insecure function - consider using safer ast.literal_eval. Severity: Medium Confidence: High Location: test.py:3 More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b307-eval print(eval("1+1")) -------------------------------------------------- Code scanned: Total lines of code: 2 Total lines skipped (#nosec): 0 Run metrics: Total issues (by severity): Undefined: 0.0 Low: 0.0 Medium: 1.0 High: 0.0 Total issues (by confidence): Undefined: 0.0 Low: 0.0 Medium: 0.0 High: 1.0 Files skipped (0):
Let's look specifically at the
Test results section. We see here that there's an issue labeled B307 and named blacklist. The message then usually tells us what the specific issue is and a potential way to fix it, blacklist means eval operator is blacklisted(and suppose to).
After that message, we are given information about:
- How severe the issue is - Medium in this case
- How confident Bandit is that there's a problem - High
- Where the issue is - in
test.pyon line number 2
- And the code in question, complete with line numbers.
It's pretty straightforward and easy to use.
In conclusion, the time spent on static analysis will bring real benefit to you and your team in terms of time spent on searching for errors, explaining code to project novices, project cost, etc. If you spend time on it beforehand, it may seem that you don't work on functions but it will return to you in the future and you will benefit from it at some moment.
Daily dose of