top of page

One of the primary goals of our project was to dissect Netflix's catalog in a humanistic manner, integrating background knowledge, outside events and additional bibliography to inform our interpretations of different trends we observe. It is important to realize that this data from Netflix not only provides insight into the streaming industry as a whole; we also can learn about individual subscribers, whose preferences and diversity are directly reflected in the data itself and impact many of Netflix's licensing decisions.

Sources

The ‘Netflix Titles’ dataset was our main source for this final project. We found this dataset on Kaggle - it was created by a data scientist from Singapore using regular API calls and generated by a third-party search engine called Flixable, which sources its information directly from Netflix. The dataset comprises information on over 6000 titles (movies and TV shows) available on Netflix as of January 2020.

The details most pertinent to our final project in the dataset are attributes such as IMDb ratings and countries where the titles were produced. This helped us look at our research question from a humanistic perspective, and derive conclusions about consumer behavior and practices.

We found the dataset to be fairly comprehensive of the shows that were offered. The inclusion of geographical data in the form of countries where each title was produced helped us gain some insight into Netflix’s international presence.

​

Web sources:

  • Our website is hosted on Wix.

  • Our visualizations were created using Tableau.

  • Our timeline feature was built on Northwestern University Knight Lab’s TimelineJS framework.

Processing

Most of our data processing was conducted on R, Excel, and Tableau. The dataset was relatively clean and well-formatted to begin with, but we made use of cleaning tools in order to better understand and suit our research questions.

 

  • For a more comprehensive analysis, we used a supplementary dataset that included IMDb ratings and combined it with the original dataset. (You can find all datasets under the "Data" section.)

  • We made use of R to clean our dataset and remove or all missing, null, or NA values. As part of our data cleaning process, we merged genres and MPA ratings together into general categories based on common features; for example, “Anime Features” and “Anime Series” were grouped into “Anime,” and "R", "TV-MA", "NC-17" were grouped together as "Adults or Mature Audiences."

  • We merged non-rated and empty cells in mpa_ratings into the “Not Rated” category.

  • Since some of the movies and/or TV shows were produced before a certain historical event like the German reunification, the countries were separated into “West Germany” and “East Germany”, which we later merged into “Germany” as is what is called today. Similarly, the “Soviet Union” was renamed to “Russia.” For the purpose of consistency, we kept the US as “United States”; that is, if listed as “United States of America”, we would keep only the first two words. 

  • sSme data was directly cleaned in the Tableau workbook using the ‘custom split’ function, especially for columns such as ‘country’ and ‘date_added’ which contain multiple values. We also made use of Tableau’s built-in calculations feature to compute values such as average IMDb ratings.

Presentation

Out of all hosting platforms available, we chose to use WIX because of its high level of customizability. We intentionally designed the website to resemble that of Netflix, our topic of investigation, to provide readers a sense of familiarity while browsing our project. Our site's colors, layout, and typography were carefully chosen to represent Netflix's branding; the home page looks like the "Who's Watching?" welcome page, and the navigation features resemble Netflix's "About Us" pages on their official site.

​

For our data narrative, we chose to embed data visualizations from Tableau and Voyant Tools as that would retain the interactive features from the software, ensuring that readers get to engage with the charts directly. We created our timeline on Northwestern University's TimelineJS for the same reason.

​

​

​

​

bottom of page