Each individual student will submit a project proposal (3% of final grade) in .ipynb format which:

(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:

“We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.

fake news?¶

Claire Pan DS2500

Many people ingest news on a daily basis, though there can be issues with detecting fake news from real news. This poses a problem because people may be taking news from untrustworthy sources, which imposes cerain ideas or political agendas. These ideas, with the help of more fake news sources and articles, are spread further, and can end up clouding the judgements of individuals. For this reason, it is important that we are able to detect fraud news sources that are writing articles about fake stories.

To read more on the dangers of fake news, see this link: https://www.cits.ucsb.edu/fake-news/danger-social

For this project, I have decided to use a dataset that I found on kaggle. There is a 'True.csv' and a 'Fake.csv', and combining these 2 csv files will give us the dataset that we will use for the project.

In [1]:
import pandas as pd 
In [2]:
true = pd.read_csv('True.csv')
true
Out[2]:
title text subject date
0 As U.S. budget fight looms, Republicans flip t... WASHINGTON (Reuters) - The head of a conservat... politicsNews December 31, 2017
1 U.S. military to accept transgender recruits o... WASHINGTON (Reuters) - Transgender people will... politicsNews December 29, 2017
2 Senior U.S. Republican senator: 'Let Mr. Muell... WASHINGTON (Reuters) - The special counsel inv... politicsNews December 31, 2017
3 FBI Russia probe helped by Australian diplomat... WASHINGTON (Reuters) - Trump campaign adviser ... politicsNews December 30, 2017
4 Trump wants Postal Service to charge 'much mor... SEATTLE/WASHINGTON (Reuters) - President Donal... politicsNews December 29, 2017
... ... ... ... ...
21412 'Fully committed' NATO backs new U.S. approach... BRUSSELS (Reuters) - NATO allies on Tuesday we... worldnews August 22, 2017
21413 LexisNexis withdrew two products from Chinese ... LONDON (Reuters) - LexisNexis, a provider of l... worldnews August 22, 2017
21414 Minsk cultural hub becomes haven from authorities MINSK (Reuters) - In the shadow of disused Sov... worldnews August 22, 2017
21415 Vatican upbeat on possibility of Pope Francis ... MOSCOW (Reuters) - Vatican Secretary of State ... worldnews August 22, 2017
21416 Indonesia to buy $1.14 billion worth of Russia... JAKARTA (Reuters) - Indonesia will buy 11 Sukh... worldnews August 22, 2017

21417 rows × 4 columns

In [3]:
fake = pd.read_csv('Fake.csv')
fake 
Out[3]:
title text subject date
0 Donald Trump Sends Out Embarrassing New Year’... Donald Trump just couldn t wish all Americans ... News December 31, 2017
1 Drunk Bragging Trump Staffer Started Russian ... House Intelligence Committee Chairman Devin Nu... News December 31, 2017
2 Sheriff David Clarke Becomes An Internet Joke... On Friday, it was revealed that former Milwauk... News December 30, 2017
3 Trump Is So Obsessed He Even Has Obama’s Name... On Christmas day, Donald Trump announced that ... News December 29, 2017
4 Pope Francis Just Called Out Donald Trump Dur... Pope Francis used his annual Christmas Day mes... News December 25, 2017
... ... ... ... ...
23476 McPain: John McCain Furious That Iran Treated ... 21st Century Wire says As 21WIRE reported earl... Middle-east January 16, 2016
23477 JUSTICE? Yahoo Settles E-mail Privacy Class-ac... 21st Century Wire says It s a familiar theme. ... Middle-east January 16, 2016
23478 Sunnistan: US and Allied ‘Safe Zone’ Plan to T... Patrick Henningsen 21st Century WireRemember ... Middle-east January 15, 2016
23479 How to Blow $700 Million: Al Jazeera America F... 21st Century Wire says Al Jazeera America will... Middle-east January 14, 2016
23480 10 U.S. Navy Sailors Held by Iranian Military ... 21st Century Wire says As 21WIRE predicted in ... Middle-east January 12, 2016

23481 rows × 4 columns

In [4]:
news = true.merge(fake, how='outer')
news
Out[4]:
title text subject date
0 As U.S. budget fight looms, Republicans flip t... WASHINGTON (Reuters) - The head of a conservat... politicsNews December 31, 2017
1 U.S. military to accept transgender recruits o... WASHINGTON (Reuters) - Transgender people will... politicsNews December 29, 2017
2 Senior U.S. Republican senator: 'Let Mr. Muell... WASHINGTON (Reuters) - The special counsel inv... politicsNews December 31, 2017
3 FBI Russia probe helped by Australian diplomat... WASHINGTON (Reuters) - Trump campaign adviser ... politicsNews December 30, 2017
4 Trump wants Postal Service to charge 'much mor... SEATTLE/WASHINGTON (Reuters) - President Donal... politicsNews December 29, 2017
... ... ... ... ...
44893 McPain: John McCain Furious That Iran Treated ... 21st Century Wire says As 21WIRE reported earl... Middle-east January 16, 2016
44894 JUSTICE? Yahoo Settles E-mail Privacy Class-ac... 21st Century Wire says It s a familiar theme. ... Middle-east January 16, 2016
44895 Sunnistan: US and Allied ‘Safe Zone’ Plan to T... Patrick Henningsen 21st Century WireRemember ... Middle-east January 15, 2016
44896 How to Blow $700 Million: Al Jazeera America F... 21st Century Wire says Al Jazeera America will... Middle-east January 14, 2016
44897 10 U.S. Navy Sailors Held by Iranian Military ... 21st Century Wire says As 21WIRE predicted in ... Middle-east January 12, 2016

44898 rows × 4 columns

“We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.

For the project, I propose that we cluster the news into 'Real' and 'Fake' based on just the text that is in the articles. Doing this allows us to analyze types of articles, titles, and subjects that pertain to fake news; we could potentially also see what types of key words are in these articles.

In [ ]: