Dog Popularity Prediction for Dog Actors in Movies¶

Motivation:¶

Problem¶

A lot of times there are different kinds of dog actors in movies. How to choose the dogs to appear in the movie has become a big problem for filmmakers.

Solution¶

Stefano Ghirlanda, Alberto Acerbi, and Harold Herzog collected data on dog actors in films before 2014, and analyzed in detail the popularity, influence, and revenue of dog actors over the course of n years. TheThe aim of the project was to analyse which characteristics of dog actors make them popular.

Impact¶

If the analysis is successful, in the first place, film makers could be able to use dog actors based on information about more popular dogs as much as possible, thereby increasing film ratings to some extent.

In addition, depending on the film-maker, different kinds of dogs can be chosen to achieve other popular purposes, such as film revenue.

Dataset¶

Detail¶

I will use Dog movie stars and dog breed popularity (data) to observe the following features for each dog:

-dog: name of the dog actor

-breed: the portrayed dog's breed

-year: the year of movie release

-movie: the movie title

-earnings1: movie earnings during the opening weekend (in 2012 USD)

-earnings: total movie earnings (in 2012 USD)

-disney: whether the movies has been produced by the Walt Disney Company

-before[n]: the n-year popularity trend of the considered breed before movie release

-after[n]: the n-year popularity trend of the considered breed after movie release

-popularity[n]: average number of registrations for the considered breed in the 2n+1 years around movie release (between n years before and n years after)

-effect[n]: the n-year effect of the movie on the breed's popularity trend

-excess[n]: registrations of the considered breed attributable to movie release (actual registrations over the n years after movie release minus registrations predicted based on the trend observed n years before movie release)

-viewers: estimated number of people who saw the movie

-viewers1: estimated number of people who saw the movie over its opening weekend

In [3]:
import pandas as pd
data = pd.read_csv("moviesAnalyzed.csv")
data
Out[3]:
Unnamed: 0 dog breed year movie earnings1 earnings disney main negative ... effect2 effect5 effect10 excess1 excess2 excess5 excess10 viewers viewers1 include
0 3 Toto Cairn Terriers 1939 The wizard of Oz NaN 22342633.0 False True False ... 0.139550 -0.112201 -0.037940 0.0 52.5 -510.0 -1476.0 9.714188e+07 NaN True
1 4 Lassie Collies 1943 Lassie come home NaN NaN False True False ... 0.603563 0.431417 0.203161 0.0 3492.0 35332.0 116873.5 NaN NaN True
2 5 Laddie Collies 1945 Son of Lassie NaN NaN False True False ... 0.361880 0.206243 0.097500 0.0 3377.0 19339.0 57231.5 NaN NaN False
3 6 Rusty German Shepherd Dogs 1945 Adventures of Rusty NaN NaN False True False ... 0.259486 0.199355 0.331190 0.0 790.5 5912.0 56935.0 NaN NaN True
4 7 Lassie Collies 1946 The courage of Lassie NaN NaN False True False ... -0.132187 0.041341 0.037298 0.0 -4227.0 -4806.0 -568.0 NaN NaN False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
63 92 Various Retrievers (Golden) 2001 The retrievers NaN NaN False True False ... -0.081739 NaN NaN 0.0 -3516.5 NaN NaN NaN NaN False
64 93 ? Border Collies 2002 Snow Dogs 17814259.0 81172560.0 True True False ... 0.206130 NaN NaN 0.0 531.0 NaN NaN 1.397118e+07 3.066138e+06 True
65 94 Scooby Great Danes 2002 Scooby Doo 54155312.0 153294164.0 False True False ... 0.093605 NaN NaN 0.0 1178.5 NaN NaN 2.638454e+07 9.321052e+06 False
66 95 Various Siberian Huskies 2002 Snow Dogs 17814259.0 81172560.0 True True False ... 0.129457 NaN NaN 0.0 2509.5 NaN NaN 1.397118e+07 3.066138e+06 False
67 96 Hubble Border Terriers 2003 Good boy! 13107022.0 37667746.0 False True False ... -0.007811 NaN NaN 0.0 -69.5 NaN NaN 6.246724e+06 2.173635e+06 True

68 rows × 33 columns

Potential Problems¶

For some relatively early films, a lot of data is missing. This can result in the dogs not being included in some calculations, which skew the final conclusion.

In addition, due to the large time span of the data, the impact of other variables on the data cannot be guaranteed. For example, the population itself has grown since 1939, which makes the number of people going to the movies unreliable. Although smaller in absolute terms, earlier films may have been more popular in relative terms.

Method:¶

First, dogs are ranked from top to bottom in terms of popularity. Since there are many different levels of popularity, we can average them or calculate them multiple times to see different results. Then, according to the data in the column before, we observe the types and names of the most popular dogs that should be taught in recent years, and get a set of data about the most popular dogs. Eventually, filmmakers could use this data to cast dog actors.