Python Style Guide --------------------- .. Documentation is even more valuable in Data Science as compared to CS in general as your code is expected to be a self-guided presentation of your analysis. To make a DS impact, it’s not sufficient to be correct: you must also be persuasive! .. warning:: Deviations from this format will yield penalties on any submitted Python code. Python comes with its own style recommendations `PEP 8 `_, we prune down this guide to those relevant themes for the course. It’s also influenced by `google’s style guide `_. Function and Variable Names ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Function and variables names should be all lowercase. Separate distinct words with underscores to improve readability. Use brief, simple language to name the variables. The names of the variables are themselves documentation. Pro tip: give corresponding variables names of identical length for extra readability:: # poor form FirstGuysScoreInTooLongVariableName += 1 y += 1 # same functionality, but documented properly: score_player0 += 1 score_player1 += 1 Function Docstrings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Your functions should contain a docstring (the red multi-line comment which begins and ends with triple-quotes below) which consists of: - a single line which summarizes what the function does - (e.g. "computes the greatest ...") - (optional) a longer description of the function - (e.g. "the greatest common divisor ...") - a list and description of all arguments (i.e. inputs) if any - a list and description of all returned variables (i.e. outputs) if any For example:: def get_gcd(x, y): """ computes the greatest common divisor of two ints the greatest common divisor of two values is the biggest integer which evenly divides both values. For example, the gcd of 12 and 60 is 12. Args: x (int): input integer y (int): input integer Returns: gcd (int): gcd of x, y """ Class Docstrings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Class names should use `CamelCase `_. Give a brief, high-level description of the class and list its ``Attributes`` (the properties every instance of the object has) just like one would list the inputs / outputs of a function. Each method, being a function, should follow all function documentation rules as above. Use ``self`` as the first argument of every method (referring to the instance of the object whose method has been called). :: class BankAccount: """ tracks balance & ownership of a bank account Attributes: owner (str): name of the bank account owner balance (float): how much money is in the account is_open (bool): true if bank account is open. (false when bank account closed) """ def __init__(self, owner, balance=0): self.owner = owner self.balance = balance self.is_open = True def change_balance(self, diff): """ credits (or debits if diff is negative) account Args: diff (float): amount account changes by """ assert self.balance + diff >= 0, 'overdraft' self.balance = self.balance + diff Comments and whitespace (within Python code) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Separate your code into “chunks” which perform similar functions separated by a line of whitespace between them. Label each chunk with a short message which describes its function. Ideally, these serve as labels to allow a reader to quickly identify the lines of code which perform a particular function they're interested in looking at. Consider the following function. Taken out of context, we expect reader's to have a tough time understanding why it does what it does but the chunking and comments (hopefully) help provide an easy on-ramp for reader's to begin learning about it. Notice how critical the documentation becomes when you're tossed into this function without proper context, as one often is when writing software in a team:: def snip_trial(df_mode, trial_len, feat_list, start_stamp=None, start_idx=None): """ extracts a single trial from a dataframe Args: df_mode (pd.DataFrame): dataframe, contains timestamp and trial data trial_len (int): number of samples in trial feat_list (list): columns of dataframe which make up trial data start_stamp (float): timestamp @ start of trial (inclusive) start_idx (int): index of start of trial (inclusive) in df_mode Returns: trial (np.array): (trial_len, len(feat_list)) trial data """ # check that only start_stamp xor start_idx is passed assert (start_stamp is None) != (start_idx is None) # get start_idx from start_stamp if start_idx is None: timestamp = df_mode['timestamp'].to_numpy() start_idx = np.searchsorted(timestamp, v=start_stamp, side='left') assert start_idx.size == 1, 'non unique start' # extract trial (in time) stop_idx = int(start_idx + trial_len) trial = df_mode.iloc[start_idx: stop_idx, :] # extract trial (just relevant features) and cast to array trial = trial.loc[:, feat_list].to_numpy() # check that trial has proper shape if trial.shape[0] != trial_len: raise IOError('data stream ends before trial') return trial Jupyter Notebook Style Notes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Your Jupyter Notebook should be shared empty or with results which are consistent with a fresh “Kernel -> Restart & Run All Cells”. To do otherwise is clumsy and could be considered misleading in professional contexts. * Use cells to chunk your program into pieces which perform a similar function. * Suppress all output which you do not want to draw the reader’s attention to. (A semicolon on the last line will prevent Jupyter from parroting the last line in ``Out[]``). * Markdown provides you a chance to talk to your reader as they move through your analysis. Use it. Having clear language (and crisp visuals) goes a long way towards teaching the reader just what you’ve accomplished. Be as clear and brief as possible. Odds and ends ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Don't do too much on one line:: import numpy as np; import sklearn as skl; import pandas as pd; * Use single or double quotes for all strings, but don't mix them in the same file:: # preferred (if used consistently throughout) String0 = 'this is how Prof Higger does it' # acceptable (if used consistently throughout code) String1 = "I feel like such a rebel" # don't mix and match String2 = 'sometimes you feel like a nut' String3 = "sometimes you dont"