Film Archive & Analytics

From Letterboxd to Interactive Web Visualization

ABOUT PROJECT

As a movie enthusiast, I have been keeping track of every film I watched over the years on Letterboxd, along with my personal ratings. This project documents how I transformed that raw data into a structured dataset, enriched it with additional information from IMDb, and prepared it for further analysis and visualization.

1. Data Extraction

Movies watched over several years were exported from Letterboxd in .txt format, including fields such as Date, Name, Year, Rating, and Letterboxd URI.Date, Name, Year, Letterboxd URI, Rating

2. Data Cleaning & Transformation (Excel)

The dataset was processed in Excel. Personal ratings were rescaled from Letterboxd’s 5-star system to IMDb’s 10-point scale (e.g., ×2). The table was simplified to include essential fields such as Name, Year, and Rating

Name	Year	Rating
Three Colors: Blue	1993	8

3. Data Enrichment (Python + IMDb API)

To enhance the dataset, a Python script was developed to connect to the OMDb API (IMDb data) and fetch additional attributes for each movie, including:

IMDb Rating
Country
Director(s)
Duration (runtime in minutes)
Language(s)
IMDb link

You can access the full code from the GitHub link or image below.

4. JSON Conversion (Pyhton)

Once the cleaned and enriched dataset was prepared in Excel, it was converted into a JSON format for use in web applications.

This process resulted in a structured films.json file, which was subsequently utilized alongside a data.js file in the front-end movie table project.

You can access the full code from the GitHub link or image below.

5. Frontend Implementation

After preparing the dataset, a fully interactive table was created for visualization on a personal website, instead of embedding a static Excel or Power BI file. This was implemented using HTML, CSS, and JavaScript.

Key features implemented:

Sortable Columns: Clicking on a column header allows sorting the table in ascending or descending order.
Clickable Rows: Each row redirects to the corresponding IMDb page in a new tab.
Responsive Design: On mobile devices, some columns (e.g., Duration, Country, Language) are hidden for improved readability.
Custom Sorting on Load: By default, the table is pre-sorted by personal rating (highest to lowest).
Breadcrumb Adjustment: On this specific page, the global breadcrumb (“Home/…”) was disabled to avoid visual overlap with the

You can access the full code from the GitHub link or images below.

6. Advanced Analysis with Power BI

To complement the raw data table, a separate Power BI project was created for in-depth analytics. The analysis included:

Country-based average ratings: Filtered to show only countries with more than 5 films.
Genre & Language distributions.
Decade Analysis: Films were grouped into 10-year periods (1920s, 1930s, …) and average ratings were compared to identify the preferred eras.
Directors Analysis: An unpivoted dataset was used to handle films with multiple directors.

This approach allows the web table to display the raw archive, while the Power BI dashboards provide detailed visual insights.

You can click here to see more details about my analysis.

7. GitHub Integration

Since the website includes a GitHub link, the following items were uploaded to a public repository:

You can access the full repository from the GitHub logo below.

eyupatakanozkan.com

From Letterboxd to Interactive Web Visualization