Small software tips, part 1

2024-08-09

Some small software tips, with a bias towards python and robotics. They are mostly style suggestions for maintainability of code, and not solutions to problems. These are collected from personal experience, but I don't take credit for them. I learned by working with engineers better than myself. When I need an example application, I will consider software for an aerial drone that performs deliveries.

Contents

Saving indents

I like to reduce indent levels where possible. Code that is deeply indented might be an opportunity to refactor and simplify. It is hard to read and keep track of scope in such code, and I have found editing such code to be more error-prone.

def function():
    while condition:
        with resource:
            if predicate:
                # already at 4 levels of indent

Many other tips involve saving indents in some way.

import statements

Ideally all imports appear at the top of a file. A way to group modules is: standard library, external, internal.

import datetime
import itertools

import numpy

import data_types
import perception
import analysis

The imports can be further ordered, say alphabetically, or based on dependency. The last three imports above are from our hypothetical robotics project, and presumably the data_types module is used by perception, whose output is used in analysis.

All imports from a module should appear in a single line. So instead of

from perception.detectors import ObjectDetector
...
from perception.detectors import detect_edges

prefer

from perception.detectors import ObjectDetector, detect_edges

It's good to preserve logical dependency across project modules. For example, we might add a function calculate_area to our analysis module, which another engineer might then want to use in perception. Importing analysis code into perception might work, but it would go against the dependency order stated earlier, and possibly introduce a circular import situation. Instead, we can move calculate_area to perception, or to another module that both depend on, like perception_helpers (sounds silly here but might not in a large codebase). A fallback is to import analysis in perception, but in local scope, which can get around a circular dependency.

# in a file in perception/

def find_landing_zone():
    from analysis import calculate_area

if blocks

In a function, an if block that returns (or otherwise exits by raising an error) doesn't need to be followed by an else block. So instead of

def function():
    if condition:
        # do work
        return
    else:
        # do other work

prefer

def function():
    if condition:
        # do work
        return

    # do other work

Rather than a bunch of nested if statements, I prefer an if-ladder, which is closer to a switch, such as

if condn1:
    return value1
if condn2:
    return value2
...

return fallback_value

which is easier to debug and test. Due to the structure, only one block (the first matching condition) can be hit, and we know that all previous conditions evaluated to False. As an example, suppose our drone has two vision sensors, a lidar generating point clouds, and a stereo camera. When finding a landing zone, we need to check if the data is recent, and prefer using the point cloud if it is of good quality. A hard to read version is

def find_landing_zone():
    """ 
    :return: True if landing zone detected
    """
    if not data_stale():
        if point_cloud_good():
            if find_landing_zone_in_point_cloud():
                return True
            else:
                if find_landing_zone_in_stereo():
                    return True
                else:
                    return False
        else:
            if find_landing_zone_in_stereo():
                return True
            else:
                return False
    else:
        return False

which is the same as a flatter version.

def find_landing_zone():
    """ 
    :return: True if landing zone detected
    """
    if data_stale():
        return False
    if point_cloud_good() and find_landing_zone_in_point_cloud():
        return True
    if find_landing_zone_in_stereo():
        return True
    return False

return statements

Returning early from a function is a simple way to save indents and improve readability. So instead of

def function():
    if condition:
        # lots of work

prefer

def function():
    if not condition:
       return

    # reduced indent level for lots of work

Avoid using tuples to return multiple values. It is a pattern that is hard to extend. As an example, suppose find_landing_zone currently returns a tuple,

def find_landing_zone():
    """
    :return: whether landing zone found, area
    :rtype: (bool, float)
    """

and we want to add a measure of planarity.

def find_landing_zone():
    """
    :return: whether landing zone found, area, planarity
    :rtype: (bool, float, float)
    """

Then every return in the function has to be updated. This includes knowing what to use as a null/ undefined value, say (False, 0, None). If any calling code unpacks the return,

landing_zone_found, area = find_landing_zone()

it needs to be updated as well, otherwise it may lead to an error like ValueError: too many values to unpack .... An alternative is to return a dataclass.

from dataclasses import dataclass
from typing import Optional

@dataclass
class LandingZone:
      area: float = 0
      planarity: Optional[float] = None

def find_landing_zone():
    """
    :return: landing zone, if found
    :rtype: Optional[LandingZone]
    """

New fields can be added to LandingZone along with a default value, and return statements that don't have a planarity will not need an update. A nit is that I changed the convention of indicating that no landing zone was found. Earlier, it was by returning False as the first tuple element, now it is by returning None.

Using None

It is nice to use a clear default value for an argument, like

def algorithm(threshold=0.5):
    pass

but there are situations where using None as the default value of an argument helps. Suppose the drone executes a Trajectory

class Trajectory:
    def __init__(self, waypoints=[]):
        self.waypoints = waypoints

and that waypoints can be appended to as the drone is moving along the trajectory. The empty-list default value is mutable, and can lead to bugs.

>>> traj1 = Trajectory()
>>> traj1.waypoints.append(0.1)
>>> traj1.waypoints.append(0.2)
>>> traj2 = Trajectory()
>>> traj2.waypoints
[0.1, 0.2]

An fix is to use a default value of None.

class Trajectory:
    def __init__(self, waypoints=None):
        self.waypoints = waypoints or []

Another use is if an argument has to be passed down through many function calls. Suppose we add an optional threshold called minimum_landing_area with a default value of 1.5. This requires making sure the default value is the same everywhere, which is hard to maintain.

perform_mission(*args, minimum_landing_area=1.5, **kwargs)
reach_address(*args, minimum_landing_area=1.5, **kwargs)
find_landing_zone(minimum_landing_area=1.5)

An alternative is to use a default value of None

perform_mission(*args, minimum_landing_area=None, **kwargs)
reach_address(*args, minimum_landing_area=None, **kwargs)
find_landing_zone(minimum_landing_area=None)

and then apply the default threshold in find_landing_zone.

def find_landing_zone(minimum_landing_area=None):
    if minimum_landing_area is None:
        minimum_landing_area = 1.5

Avoid overloading the use of None as a return value. For example, the function find_landing_zone may return None on failure to find a suitable location. There may be many reasons: perception data is stale, corrupt, or there is genuinely no location. If the calling code needs to know which of these situations occurred, instead of returning None, we are better off explicitly communicating this information (as a return code or exception).

In customizable software, many components are optional, and their absence is often indicated by None. The price paid is that we have to check for None-ness, a sort of disciplined flexibility. For example, the same drone software might have to run on multiple generations of hardware:

If we want to make use of perception.stereo_camera.surface_normals, we have to account for every field being None. Such None checks are best handled by a few well-tested helper functions.

Another situation in which to check for None is state variables that may be set only after instantiation. For example, perception might passively subscribe to some sensor_data

class Perception:
    def __init__(self):
        self.sensor_data = None

    def receive_sensor_data(self, sensor_data):
        self.sensor_data = sensor_data

and None indicates that no data has been received yet. When there are multiple such state variables, a better approach is to use an explicit flag indicating that state has been initialized.

Avoid unnecessary variables

If a variable is created only to be passed as an argument (to a function or constructor)

def find_landing_zone():
    polygon = create_polygon(edges)
    area = calculate_area(polygon)

prefer

def find_landing_zone():
    area = calculate_area(polygon=create_polygon(edges))

For readability, use keyword arguments and proper formatting (a linter helps).

When calling a function that returns a tuple, avoid naming unused return variables. For readability, a comment can be added explaining why they are unused.

def execute_docking():
    # At this point in the function, we know the drone is
    # above a platform, so finding a landing zone is just a sanity check,
    # and we ignore the area detected.
    landing_zone_found, _ = find_landing_zone()

Avoid duplicate variables that store the same value. Suppose the system works by receiving an order with the address for the drone to go to.

@dataclass
class Order:
    address: str

A Mission object is created for each order. The mission needs the address, but instead of copying it into an instance variable

class Mission:
    def __init__(self, order):
        self.address = order.address

prefer storing the order itself. For convenient access, a property can be added to Mission.

class Mission:
    def __init__(self, order):
        self.order = order

    @property
    def address(self):
        return self.order.address

A duplicate variable is an opportunity for introducing error. There might be code that has access to both the order and mission objects, and now there is no chance of a discrepancy between order.address and mission.address. When storing references, however, something to watch for is memory leaks.

Naming

Choosing good names is important because renaming is hard. Names find a place not only in code but also in people's heads. I once erred on the side of being too descriptive with a name, e.g. landing_zone_minimum_area_perimeter_ratio_threshold. When it became clear that it was cumbersome and needed a refactor, edits were required in multiple locations:

Avoid including implementation details in variable names, such as landing_zones_list, instead preferring landing_zones. An implementation may change.

Remove redundant tokens from names. If each robot has a unique identifier, it can be stored as robot.uid, instead of robot.robot_uid.

When adding a new feature, resist the temptation to choose generic names. If an engineer writes software to detect potential collisions to stay away from, prefer CollisionRegionDetection to SceneUnderstanding. The latter makes it sound like the feature solves all problems. I've seen classes that do something very specific named a generic Action or State. Namespaces help in this regard to contain names.

Avoid playing telephone with the name of a field that travels across software. Consider a field like the address that a drone must travel to to. It could have a path like:

It seems harmless to substitute address with a synonym like destination at some stage, or abbreviate to addr, but it is best to keep using address. If the name has to be changed (because it is already in use), it can be done by a helper function, and unit tested.

Unrelated to naming, but a similar tip is to avoid mangling the order of arguments when calling a constructor or function, and use the order in the definition.

Standalone functions

I like to keep in mind three scenarios of calling a function.

I prefer standalone functions over instance methods for satisfying these scenarios. Instance methods have easy access to object state, but for this same reason, they can be harder to stand up in unit tests. This is especially true for objects with heavy state, like a high-level mission class.

Python's first-class support for functions means that we can write flexible software with standalone functions. A tip therefore is to pull out instance methods and convert them to standalone functions if possible.

Similarly, avoid defining large functions inside another function. I have found that these can be sloppily implemented. Moving the nested function out to where it is clearly visible forces a cleaner implementation and better documentation.