Standardize Title Metadata Across Your Catalog
Beginner-friendly workflow for normalizing title records to controlled vocabularies for genre, rating, and language using dropdown validation and EIDR format checks.
On this page
A catalog where genre is free text has no usable genre data. “Sci-Fi”, “Sci Fi”, “Science Fiction”, “SciFi”, “sf” — same concept, five buckets, none of them groupable. The same pain applies to rating (“PG-13” vs “PG13” vs “TV-14”), language (“English” vs “EN” vs “en-US”), and type (“Film” vs “Movie” vs “Feature”).
The fix is controlled vocabularies + dropdown validation at input time. This tutorial walks you through it using Google Sheets or Excel, no scripting required. By the end, your catalog will be filterable, groupable, and platform-ingest-ready.
Step 1: Decide Your Vocabularies
Four fields need controlled values:
- Type —
Feature | Series | Episode | Special | Short | Trailer - Genre — a 10-item list is plenty:
Action, Comedy, Drama, Documentary, Family, Horror, Romance, Sci-Fi, Thriller, Reality - Rating —
G, PG, PG-13, R, NC-17, NR(MPAA) orTV-Y, TV-G, TV-PG, TV-14, TV-MA(FCC) — pick one standard per title - Language — ISO 639-1 two-letter codes:
en, es, fr, de, ja, ko, zh, hi, pt, it(extend as needed)
If you have a deeper internal taxonomy — say, 35 genres — use that instead. The point is the list is finite and someone owns the definition.
Step 2: Apply Dropdown Validation
In Google Sheets: select the column → Data > Data Validation → Dropdown (from a range) → reference a cell range on the Vocabularies sheet. Check Reject input on invalid entries. Now nobody can type “Sci Fi” by hand.
In Excel: select the column → Data > Data Validation → Allow: List → reference the vocabulary range. Same idea.
Step 3: Validate EIDR Format
EIDR is a structured ID: starts with 10.5240/, followed by seven 4-char hex segments separated by hyphens, ending with a check digit. Total length 34 characters.
Formula for the Status column:
=IF(LEN(D2)=34, IF(LEFT(D2,8)="10.5240/", "OK", "WARN"), IF(ISBLANK(D2), "BLANK", "WARN"))
This won’t validate the check digit, but it catches 90% of data-entry errors — truncated IDs, missing prefix, extra spaces. For check-digit validation you need the EIDR SDK or an API call.
Step 4: Build the Missing-Value Report
On a separate sheet, count blanks per field:
=COUNTBLANK('Titles Input'!A2:A1000) // adjust range to your catalog size
One row per field, one column for the count. Sort descending. Whichever field has the most blanks is where you start your cleanup — the highest-impact gap.
Common Issues
- Rating is jurisdiction-specific. MPAA is US movies, FCC is US TV, BBFC is UK, FSK is Germany. Pick one per title (typically the origin rating) and record it consistently. Don’t try to store every jurisdiction’s rating in one field.
- Genre drift over time. Your 2023 taxonomy had 8 genres; now it has 12. Existing titles still use the old labels. Before expanding a vocabulary, decide: backfill old rows, or accept the drift with a “legacy” flag.
- Multi-language titles. Use the primary audio language. If a film is 60% English / 40% Spanish, pick
enand note the secondary in a note column; don’t invent new codes.
Download the Template
The Title Metadata Cleanup Workbook ships with all four vocabularies pre-loaded as dropdowns, the EIDR format check, and the missing-value report. Paste your catalog into the Titles Input tab and the validation kicks in immediately.
OpenRights Weekly
Free templates, tutorials, and data quality tips. Every week.
Need help with your catalog data?
Share your challenge and we'll help — by email or callback. 100% free, no sales pitch. Part of our mission to help rights and catalog professionals.
Get Free Help