Detect Non-ASCII Characters Breaking Your Platform Avails Ingest
Smart quotes, non-breaking spaces, and em-dashes silently break avails ingest. One regex formula surfaces every offending cell so you can fix before shipping.
On this page
You ship an avails file to a platform. The ingest rejects 40 rows. Spot-checking, you can’t see what’s wrong — the rows look identical to the ones that passed. Then you notice one Title has Jean's Story where the apostrophe is the typographic ' (U+2019) instead of the plain ' (U+0027). The platform’s parser chokes on it. Your document looked clean because the visual difference is almost invisible.
The Common Offenders
Four characters cause 90% of these ingest failures:
| Character | Unicode | Often typed as | Where it comes from |
|---|---|---|---|
' curly apostrophe | U+2019 | straight ' | Word/Google Docs autocorrect |
" curly quotes | U+201C, U+201D | straight " | Word/Google Docs autocorrect |
— em-dash | U+2014 | hyphen - | autocorrect or Mac Option+Shift+hyphen |
non-breaking space | U+00A0 | regular space | HTML paste, Word paste |
Zero-width characters (U+200B, U+FEFF) sneak in from rich-text paste and are completely invisible. These are the worst because you can’t even see them when you’re looking right at the cell.
The Fix — Detection Formula
Add a helper column next to your Title and Notes columns with this formula (Google Sheets):
=IF(REGEXMATCH(A2, "[^\x00-\x7F]"), "NON-ASCII", "")
The regex [^\x00-\x7F] matches any character outside the ASCII range (0–127). Any cell containing a non-ASCII character flags NON-ASCII. Conditional-format that column — amber for NON-ASCII — and the offending rows surface at a glance.
For Excel (which lacks REGEXMATCH pre-365), use:
=IF(SUMPRODUCT((CODE(MID(A2, ROW(INDIRECT("1:"&LEN(A2))), 1))>127)*1)>0, "NON-ASCII", "")
More verbose, same result.
The Fix — Bulk Replace
Once you’ve located the offending cells, fix them with find-and-replace. In Sheets:
- Ctrl+H for find-and-replace.
- Check Also search using regular expressions.
- Find
['']→ Replace with'. Repeat for[""]→",—→-, and so on. - For NBSP, find
\u00A0→ replace with a single space.
Or paste this one-shot CLEAN-like formula that strips the offenders:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,CHAR(8216),"'"),CHAR(8217),"'"),CHAR(8220),""""),CHAR(8221),"""")
Apply to a helper column, then paste-values back over the original.
Gotchas
- International titles with legitimately non-ASCII characters.
Amélie,Pokémon,東京— these are valid. The detection flags them asNON-ASCII, but they’re not the problem. Before bulk-replacing, check the platform’s spec: most modern platforms accept UTF-8 and these will ingest fine. The ingest failures are almost always autocorrect artifacts, not intentional diacritics. - Differential handling by platform. Amazon accepts UTF-8 throughout. Some older CSV-based ingests accept ASCII-only. Know which bucket each platform is in before stripping characters.
- Don’t strip zero-width characters blindly.
REGEXREPLACE(A2, "[\x{200B}-\x{200D}\x{FEFF}]", "")removes the truly-invisible troublemakers. But run it as a pass on a copy first and diff — if it removes too much, you’ve got a bigger encoding issue than autocorrect. - CSV vs. XLSX. XLSX preserves characters reliably; CSVs depend on the exporter’s encoding. If you’re exporting to CSV for ingest, explicitly save as UTF-8 CSV, not “CSV (default)” which is often Windows-1252.
If You Need the Full Workflow
The Validate an Avails Sheet Against IFTA + EMA Rules tutorial includes the character check as one of its seven validation passes, plus territory/date/required-field validation alongside.
Download
The Avails Data Quality Scorecard template flags non-ASCII characters in a dedicated rule row. Paste your avails, and the check runs automatically.
OpenRights Weekly
Free templates, tutorials, and data quality tips. Every week.
Need help with your catalog data?
Share your challenge and we'll help — by email or callback. 100% free, no sales pitch. Part of our mission to help rights and catalog professionals.
Get Free Help