TLDR
Here are a few tips to add to your Python projects to increase robustness and quality:
- Type hinting
- Pure functions
- Leaving init.py alone
- Pylinting
- Using data classes
- Migrating to pipenv
- Following accepted organizational standards
It might seem pointless at first, but these practices will save you time and time again.
Type Hinting
Type hinting allows our code to become more readable, increases self-documentation, and is less error-prone out of the box.
PEP 484 defines a way to optionally define expected types, similar to a strongly typed language. This might seem odd at first since Python is a dynamically typed language, but having the best of both worlds is the goal of Python. Here are two examples:
Variable assignment
user_name: str = "Sam"
Function
def say_hello(name: str) -> str:
return "Hello, " + name + ". How are you"
Always use the static type and avoid dynamic options like using Any
from typing import Any
name: Any = "Sam"
Mypy is a great linter that will check your code for any incorrect type usage. mypy.ini is the global config file, here is a simple setup:
#http://mypy.readthedocs.io/en/latest/config_file.html#config-file
[mypy]
warn_redundant_casts = True
warn_unused_ignores = True
[mypy-*]
disallow_untyped_calls = True
disallow_untyped_defs = True
check_untyped_defs = True
warn_return_any = True
no_implicit_optional = True
strict_optional = True
ignore_missing_imports = True
Pure Functions
What is a pure function? Do we need to only use pure functions?
A pure function is a function that does not use global variables and does not have side effects. This might seem strange, but writing code in this way has huge benefits.
Avoid global variables. Why? Global variables make your code less readable. It's also extremely hard to determine what code uses what global variable and makes your code brittle. By avoiding global variables we are following the principle of encapsulation. For a review of solid principles, this video is a great resource S.O.L.I.D
The side effects are input/output our code does, instead of producing a return value. Side effects should be kept to the edge of your codebase, and the vast majority should not use any side effects. By avoiding side effects, we keep our logic testable, composable, and pure.
Immutable variables are your friend
This concept brings to light the bad practice of reusing meaningless variable names. Your variable names should be meaningful, follow pep standards, and be unique.
A common example would be creating several dataframes in your function and calling all of them df. Not only is your code less readable, but it's also more prone to mistakes. If you find your function has too many variables or too much logic, then you are not following the single responsibility principle (SOLID), ie break your function up into several single-purpose functions.
Avoid adjusting the Python path
The Python path allows the Python interpreter to find all Python modules in the Python application with help from the __init__.py files. When working with a disjointed folder structure it can be tempting to manipulate your path to allow the interpreter to see all needed Python modules. It is almost always a bad idea to manipulate the Python path. It's better to reorganize and follow a correct folder structure or create a shared dependency pypi library.
__init__.py is not the ideal place for code
If you are writing a pypi package there are legitimate reasons to use __init__.py for Python code.
For all other cases, it's rare to actually have a great reason to put code in __init__.py . Doing so increases complexity, moves away from standard accepted norms, and honestly is a huge pet peeve.
Follow Pylint like it’s dogma
Pylint isn’t just a linter, it's a critical method for learning and enforcing accepted best practices by the Python community. Your code should have a 10 every time you do a PR. No exceptions.
That being said, there are times where the PEP standards do not accommodate a specific domain’s standards. A perfect example is the use of X & y as unique variable names when working with machine learning in the data science community. This breaks the PEP 8 accepted standard for variable names. In one-off unique situations, it is perfectly ok to tell Pylint to ignore that PEP. Ideally, you have separate repositories for each application so the effect of one small group of code having an exception to Pylint is perfectly fine.
I'll conclude with the reality that at first, I refused to follow Pylint’s advice, saying “really isn’t that big a deal if I do my own thing”. Many projects later, with a large number of developers I very strongly feel I was 100% wrong.
Data Classes
PEP 557 creates support for data classes. Using data classes increases the readability and robustness of your code. When you create a data class, you define a template to which your data must conform. This allows for clear expectations and granular type checking. Data class instances are immutable, meaning not ideal for all cases. Here's a basic example:
from dataclasses import dataclass
......
@dataclass
class ApiData:
__slots__ = ['first_name', 'last_name', 'SSN']
first_name: str
last_name: str
SSN: str
customer_prod: ApiData = ApiData(api_response["fname"],
api_response["lname"],
api_response["ussn"])
def process_customer_data(customer: ApiData):
.....
Pipenv
Pip was a very useful tool for installing Python packages but has many limitations including a very poor dependency resolution process, and no project isolation. Pipenv can be used as a full replacement for pip. For legacy systems that require a requirements.txt file, pipenv can be used to more effectively create that requirements.txt file. Here's your basic setup:
Install pipenv with pip, the only thing you should use pip for.
pip install pipenv
Create a Pipenv Environment
mkdir my_project
cd my_project
pipenv --python 3.9.2 install
pipenv lock
Install a package
pipenv install requests~=1.2
__main__.py
Your application can have multiple possible entry points or one single entry point. In the latter case, I prefer to name the module containing the entry point __main__.py . This is straightforward and clear for anyone looking at the project. I also follow a very well accepted standard pattern for this module:
def main():
.....
if __name__ == "__main__":
main()
Which prevents anything from happening if the main module is imported.
These are a few tried and true tricks that I have used over the years to improve the quality of my Python projects. Give all (or a few) of these a try and let me know if you find them useful—I have a feeling you will!