This dataset contains the most popular 100 instagram accounts (based on followers amount) with 7 attributes. The goal of this project is to determine how much of an impact hashtags have on the popularity of an instagram account.

In [42]:
import pandas as pd

ig_df = pd.read_csv('most_followed_ig.csv', index_col='RANK', encoding='latin-1')
ig_df
Out[42]:
BRAND CATEGORIES 1 CATEGORIES 2 FOLLOWERS ER iPOSTS ON HASHTAG MEDIA POSTED
RANK
1 Selena Gomez celebrities musicians 105.4Mæ(=) 2.62%æ(1342) 14.5Mæ(48) 1.2kæ(2135)
2 Taylor Swift celebrities musicians 95.2Mæ(=) 1.96%æ(2040) 10.5Mæ(66) 958æ(2669)
3 Ariana Grande celebrities musicians 92.3Mæ(=) 1.43%æ(2759) 16.9Mæ(41) 2.8kæ(824)
4 Beyonce celebrities musicians 90.6Mæ(=) 2.53%æ(1427) 9.2Mæ(70) 1.4kæ(1897)
5 Kim Kardashian West celebrities tv 89.3Mæ(=) 1.39%æ(2812) 5.1Mæ(130) 3.6kæ(550)
... ... ... ... ... ... ... ...
96 DanialvesD2 My Twitter celebrities athletes 11.7Mæ(=) 1.62%æ(2477) 122.4kæ(1486) 1.7kæ(1508)
97 Dolce & Gabbana fashion luxury 11.7Mæ(=) 0.48%æ(4142) 6.1Mæ(105) 3.9kæ(471)
98 Tyga / T-Raww celebrities musicians 11.6Mæ(=) 1.31%æ(2922) 1.2Mæ(421) 2.5kæ(948)
99 Paul Labile Pogba celebrities athletes 11.5Mæ(=) 6.11%æ(170) 77.6kæ(1745) 396æ(4219)
100 Barack Obama celebrities political 11.5Mæ(=) 3.37%æ(826) 2.5Mæ(240) 231æ(4753)

100 rows × 7 columns

In [46]:
# create dict of each value
    # explains meaning of each feature (column)

col_dict = {'BRAND':'Name of instagram account', 
            'CATEGORIES1':'category of account',
           'CATEGORIES2':'specific field/profession of account',
           'FOLLOWERS':'amount of followers',
           'ER':'N/A',
           'iPOSTS ON HASHTAG':'amount of posts with hashtags',
           'MEDIA POSTED':'amount of posts posted'}
col_dict
Out[46]:
{'BRAND': 'Name of instagram account',
 'CATEGORIES1': 'category of account',
 'CATEGORIES2': 'specific field/profession of account',
 'FOLLOWERS': 'amount of followers',
 'ER': 'N/A',
 'iPOSTS ON HASHTAG': 'amount of posts with hashtags',
 'MEDIA POSTED': 'amount of posts posted'}

This data is sufficient to answer this question because using this dataset and the attributes of 'iPOSTS ON HASHTAG', 'MEDIA POSTED', and 'FOLLOWERS', it may be possible to determine if there is a correlation to hashtags and followers, which also correlates to hashtags and popularity.

In order to achieve this we would create a set of each ranked account, focusing on the three attributes mentioned above. Doing so would allow us to visualize the data to help determine the possible correlation between followers and hashtags.

This research paper is slightly related: The effect of #enhancement-free Instagram images and hashtags on women’s body image

It describes the engagement of women on instagram post with and without body-enhanced-hashtags, and describes the more hashtags (more engagement) in a post in which a women's body was enhaced, correlated with higher disatisfaction with the subject. This shows the impact that hashtags have on a population, and how analysis of certain accounts may gain popularity and what they promote may provide further insights into how power hashtags are at influencing a population.

Additionally, it is a well-known idea that hashtags are a very useful and powerful tool to gain a following and popularity. It would be interesting to further understand how engaged the most popular accounts on Instagram are in these tools.

CITATIONS

Marika Tiggemann, et al. “The Effect of #Enhancement-Free Instagram Images and Hashtags on Women's Body Image.” Body Image, Elsevier, 9 Oct. 2019, https://www.sciencedirect.com/science/article/pii/S1740144519300981.