Python Style Guide

Warning

Deviations from this format will yield penalties on any submitted Python code.

Python comes with its own style recommendations PEP 8, we prune down this guide to those relevant themes for the course. It’s also influenced by google’s style guide.

Function and Variable Names

Function and variables names should be all lowercase. Separate distinct words with underscores to improve readability. Use brief, simple language to name the variables. The names of the variables are themselves documentation. Pro tip: give corresponding variables names of identical length for extra readability:

# poor form
FirstGuysScoreInTooLongVariableName += 1
y += 1

# same functionality, but documented properly:
score_player0 += 1
score_player1 += 1

Function Docstrings

Your functions should contain a docstring (the red multi-line comment which begins and ends with triple-quotes below) which consists of:

a single line which summarizes what the function does
- (e.g. “computes the greatest …”)
(optional) a longer description of the function
- (e.g. “the greatest common divisor …”)
a list and description of all arguments (i.e. inputs) if any
a list and description of all returned variables (i.e. outputs) if any

For example:

def get_gcd(x, y):
    """ computes the greatest common divisor of two ints

    the greatest common divisor of two values is the biggest integer
    which evenly divides both values.  For example, the gcd of 12
    and 60 is 12.

    Args:
        x (int): input integer
        y (int): input integer

    Returns:
        gcd (int): gcd of x, y
    """

Class Docstrings

Class names should use CamelCase. Give a brief, high-level description of the class and list its Attributes (the properties every instance of the object has) just like one would list the inputs / outputs of a function. Each method, being a function, should follow all function documentation rules as above. Use self as the first argument of every method (referring to the instance of the object whose method has been called).

class BankAccount:
    """ tracks balance & ownership of a bank account

    Attributes:
        owner (str): name of the bank account owner
        balance (float): how much money is in the account
        is_open (bool): true if bank account is open.  (false when bank
            account closed)
    """

    def __init__(self, owner, balance=0):
        self.owner = owner
        self.balance = balance
        self.is_open = True

    def change_balance(self, diff):
        """ credits (or debits if diff is negative) account

        Args:
            diff (float): amount account changes by
        """
        assert self.balance + diff >= 0, 'overdraft'

        self.balance = self.balance + diff

Comments and whitespace (within Python code)

Separate your code into “chunks” which perform similar functions separated by a line of whitespace between them. Label each chunk with a short message which describes its function. Ideally, these serve as labels to allow a reader to quickly identify the lines of code which perform a particular function they’re interested in looking at.

Consider the following function. Taken out of context, we expect reader’s to have a tough time understanding why it does what it does but the chunking and comments (hopefully) help provide an easy on-ramp for reader’s to begin learning about it. Notice how critical the documentation becomes when you’re tossed into this function without proper context, as one often is when writing software in a team:

def snip_trial(df_mode, trial_len, feat_list, start_stamp=None, start_idx=None):
   """ extracts a single trial from a dataframe

   Args:
       df_mode (pd.DataFrame): dataframe, contains timestamp and trial data
       trial_len (int): number of samples in trial
       feat_list (list): columns of dataframe which make up trial data
       start_stamp (float): timestamp @ start of trial (inclusive)
       start_idx (int): index of start of trial (inclusive) in df_mode

   Returns:
       trial (np.array): (trial_len, len(feat_list)) trial data
   """
   # check that only start_stamp xor start_idx is passed
   assert (start_stamp is None) != (start_idx is None)

   # get start_idx from start_stamp
   if start_idx is None:
       timestamp = df_mode['timestamp'].to_numpy()
       start_idx = np.searchsorted(timestamp, v=start_stamp, side='left')
   assert start_idx.size == 1, 'non unique start'

   # extract trial (in time)
   stop_idx = int(start_idx + trial_len)
   trial = df_mode.iloc[start_idx: stop_idx, :]

   # extract trial (just relevant features) and cast to array
   trial = trial.loc[:, feat_list].to_numpy()

   # check that trial has proper shape
   if trial.shape[0] != trial_len:
       raise IOError('data stream ends before trial')

   return trial

Jupyter Notebook Style Notes

Your Jupyter Notebook should be shared empty or with results which are consistent with a fresh “Kernel -> Restart & Run All Cells”. To do otherwise is clumsy and could be considered misleading in professional contexts.
Use cells to chunk your program into pieces which perform a similar function.
Suppress all output which you do not want to draw the reader’s attention to. (A semicolon on the last line will prevent Jupyter from parroting the last line in Out[]).
Markdown provides you a chance to talk to your reader as they move through your analysis. Use it. Having clear language (and crisp visuals) goes a long way towards teaching the reader just what you’ve accomplished. Be as clear and brief as possible.

Odds and ends

Don’t do too much on one line:

import numpy as np; import sklearn as skl; import pandas as pd;

Use single or double quotes for all strings, but don’t mix them in the same file:

# preferred (if used consistently throughout)
String0 = 'this is how Prof Higger does it'

# acceptable (if used consistently throughout code)
String1 = "I feel like such a rebel"

# don't mix and match
String2 = 'sometimes you feel like a nut'
String3 = "sometimes you dont"