Click me x5 ↓
Sanitize CSV/TSV column headers with d3.js
CSV column names are often all over the place. It’s not rare to see this:
Name, description, Publication Date, Updated date,
Notice how description
is lowercase and Updated date
has different casing to Publication Date
?
That’s why I usually want to clean up my CSV column headers before doing anything else.
In regular JS, using d3-dsv.
import { csvParse, autoType } from 'd3-dsv';
const sanitizeHeaders = (d) => {
const returnObj = {};
for (const [key, value] of Object.entries(d)) {
// I lowercase & replace spaces with underscores, but do whatever you want
const sanitizedKey = key.toLowerCase().replace(/\s/g, '_');
returnObj[sanitizedKey] = value;
}
// everything in CSV is a string by default
// autoType will try to convert your data to the correct type (so "13" becomes 13, etc)
return autoType(returnObj);
};
// Pass in your row conversion function
const data = csvParse(csv, sanitizeHeaders);
Using the Deno standard Library: