OpenRights Entertainment Rights & Catalog Intelligence
Data Cleaning beginner

Standardize Title Metadata Across Your Catalog

Beginner-friendly workflow for normalizing title records to controlled vocabularies for genre, rating, and language using dropdown validation and EIDR format checks.

By OpenRights Team · · 7 min read
titlesdata cleaningmetadatavocabularies
On this page

A catalog where genre is free text has no usable genre data. “Sci-Fi”, “Sci Fi”, “Science Fiction”, “SciFi”, “sf” — same concept, five buckets, none of them groupable. The same pain applies to rating (“PG-13” vs “PG13” vs “TV-14”), language (“English” vs “EN” vs “en-US”), and type (“Film” vs “Movie” vs “Feature”).

The fix is controlled vocabularies + dropdown validation at input time. This tutorial walks you through it using Google Sheets or Excel, no scripting required. By the end, your catalog will be filterable, groupable, and platform-ingest-ready.

Step 1: Decide Your Vocabularies

Four fields need controlled values:

  • TypeFeature | Series | Episode | Special | Short | Trailer
  • Genre — a 10-item list is plenty: Action, Comedy, Drama, Documentary, Family, Horror, Romance, Sci-Fi, Thriller, Reality
  • RatingG, PG, PG-13, R, NC-17, NR (MPAA) or TV-Y, TV-G, TV-PG, TV-14, TV-MA (FCC) — pick one standard per title
  • Language — ISO 639-1 two-letter codes: en, es, fr, de, ja, ko, zh, hi, pt, it (extend as needed)

If you have a deeper internal taxonomy — say, 35 genres — use that instead. The point is the list is finite and someone owns the definition.

Step 2: Apply Dropdown Validation

In Google Sheets: select the column → Data > Data ValidationDropdown (from a range) → reference a cell range on the Vocabularies sheet. Check Reject input on invalid entries. Now nobody can type “Sci Fi” by hand.

In Excel: select the column → Data > Data ValidationAllow: List → reference the vocabulary range. Same idea.

Step 3: Validate EIDR Format

EIDR is a structured ID: starts with 10.5240/, followed by seven 4-char hex segments separated by hyphens, ending with a check digit. Total length 34 characters.

Formula for the Status column:

=IF(LEN(D2)=34, IF(LEFT(D2,8)="10.5240/", "OK", "WARN"), IF(ISBLANK(D2), "BLANK", "WARN"))

This won’t validate the check digit, but it catches 90% of data-entry errors — truncated IDs, missing prefix, extra spaces. For check-digit validation you need the EIDR SDK or an API call.

Step 4: Build the Missing-Value Report

On a separate sheet, count blanks per field:

=COUNTBLANK('Titles Input'!A2:A1000)   // adjust range to your catalog size

One row per field, one column for the count. Sort descending. Whichever field has the most blanks is where you start your cleanup — the highest-impact gap.

Common Issues

  • Rating is jurisdiction-specific. MPAA is US movies, FCC is US TV, BBFC is UK, FSK is Germany. Pick one per title (typically the origin rating) and record it consistently. Don’t try to store every jurisdiction’s rating in one field.
  • Genre drift over time. Your 2023 taxonomy had 8 genres; now it has 12. Existing titles still use the old labels. Before expanding a vocabulary, decide: backfill old rows, or accept the drift with a “legacy” flag.
  • Multi-language titles. Use the primary audio language. If a film is 60% English / 40% Spanish, pick en and note the secondary in a note column; don’t invent new codes.

Download the Template

The Title Metadata Cleanup Workbook ships with all four vocabularies pre-loaded as dropdowns, the EIDR format check, and the missing-value report. Paste your catalog into the Titles Input tab and the validation kicks in immediately.

OpenRights Weekly

Free templates, tutorials, and data quality tips. Every week.

Need help with your catalog data?

Share your challenge and we'll help — by email or callback. 100% free, no sales pitch. Part of our mission to help rights and catalog professionals.

Get Free Help