{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## DS4420: In-class exercise on clustering (`BERT` representations of) Trump tweets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I downloaded all of Trump's tweets for the past year, and ran these through a pre-trained `BERT` model (don't worry; we will cover this later in the semester -- for now it suffices to know that `BERT` is a big neural network that has been 'pre-trained' on a very large dataset and provides useful feature vectors for texts). This means we have 768-dimensional vector representations of each tweet." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "tweets | \n", "vectors | \n", "
---|---|---|---|
0 | \n", "0 | \n", "Always heard that as President “it’s all about... | \n", "[-0.6935118436813354, -0.3402666747570038, 0.7... | \n", "
1 | \n", "1 | \n", "Be careful and try staying in your house. Larg... | \n", "[-0.7567042708396912, -0.41789475083351135, 0.... | \n", "
2 | \n", "2 | \n", "Nancy Pelosi and some of the Democrats turned ... | \n", "[-0.7120085954666138, -0.3883921205997467, -0.... | \n", "
3 | \n", "3 | \n", "No Amnesty is not a part of my offer. It is a ... | \n", "[-0.6233459115028381, -0.3593074679374695, -0.... | \n", "
4 | \n", "4 | \n", "Nancy Pelosi has behaved so irrationally &... | \n", "[-0.6446494460105896, -0.32997167110443115, -0... | \n", "