homework 7
attention neural networks
attention and the transformer
Summarization is an important task in natural language processing and could be useful for a consumer enterprise. For example, bots can be used to scrape articles, summarize them, and then you can use sentiment analysis to identify the sentiment about certain stocks. By completing this assignment you will learn to:
- Use built-in functions to preprocess your data
- Implement DotProductAttention
- Implement Causal Attention
- Understand how attention works
- Build the transformer model
- Evaluate your model
- Summarize an article
This model is slightly different than the ones you have already implemented. This is heavily based on attention and does not rely on sequences, which allows for parallel computing.

data and starter kits
The python code you will need to modify and turn in is located here. There are several functions that you will be modifying. They will be of the form
def some_function(argument):
'''Description of arguments and return values
'''
### START CODE HERE ###
return_values = "This is where you will add or edit the code"
### END CODE HERE ###
return return_values
You will need to edit between START CODE HERE
and END CODE HERE
.
In addition, you will need the data. There are a variety of ways you can download them onto your workstations. On GCP VMs, you can download with the bash command wget
. If you wish to download inside of a notebook, you can type in a shell command by prepending each of the following commands with !
. The full package can be downloaded with
wget -nc https://course.ccs.neu.edu/cs6120s25/assets/python/assignment7.py
wget -nc https://course.ccs.neu.edu/cs6120s25/data/samsum/utils.py
wget -nc https://course.ccs.neu.edu/cs6120s25/assets/python/assignment7_unittests.py
wget -nc https://course.ccs.neu.edu/cs6120s25/data/samsum/corpus.tar
tar -xvf corpus.tar
pip install dlai_grader
This set of files contains:
-
Your Code Template: found in
assignment7.py
, which is as described above. -
The SAMsum dataset: around 16k paired conversations with their human-generated summaries, and can be found here. Both the samples and their annotations are created by linguists, reflecting real-life messenger conversations: varying style, formality, slang, emojis, and general language patterns.
-
Loading Scripts found in
utils.py
. The dataset has several functions that we will be using to process the data, including splitting training and test data from a folder name and preprocessing that data. We will be calling the majority of these utils functions from thepreprocess_data
function. -
Unit Tests found in
assignment7_unittests.py
. For every function that you’re required to implement inassignment7.py
, there is a corresponding*_test()
function that does preliminary dimensionality checks, initial sanity checks, and some unit tests. This function calls out to the unit test file, which has additional parameter checks after tensorflow initialization. If you are using a different version of Tensorflow (thanv2.18.0
), then you may not necessarily get the same exact values, so you can either disregard the calls toassignment7_unittest
or installv2.18.0
.
Beyond the homework specific files, you can use the following resources. It is important to note that Transformers are compute heavy, and it is much easier to train if you’re on GPU hardware when training. You won’t need to train until you’ve completely finished your Python coding, so you can save your Google credits until the end. There are a variety of ways that you can secure GPU’s. To set resources, here are some options:
- Google Cloud Platform VMs: remember to shutdown / delete if using GPU’s
- Google Cloud Workbench JupyterLab: remember to shutdown / delete if using GPU’s
- Google Colabs: remember to close your session if using the T4 GPU.
For any of the options above, you may find more luck in securing GPUs using the more expensive NVIDIA L4
GPU Type. (Just remember to shut it off right after usage). You can check whether or not CUDA and Tensorflow are appropriately installed with nvidia-smi
and tf.config.list_physical_devices('GPU')
. In Colabs, your only available option is the T4
GPU. If you hang on to the T4
for too long, though, Google may shut you out of your account for a while so make sure you close your session. On GPU’s, each iteration is around 20 seconds (except for the first iteration, which is 40 seconds). On CPU’s, each iteration is around 20 minutes.
Note: all the unit tests in the homework were conducted with tensorflow version 2.18.0
. There are minor differences in the tensorflow versions. You can check your version with:
import tensorflow as tf
tf.__version__
submission instructions
Submit your Python solution in assignment7.py to Gradescope.