Logo with initials
Click me x5 ↓

Sanitize CSV/TSV column headers with d3.js

CSV column names are often all over the place. It’s not rare to see this:

Name, description, Publication Date, Updated date,

Notice how description is lowercase and Updated date has different casing to Publication Date?

That’s why I usually want to clean up my CSV column headers before doing anything else.

In regular JS, using d3-dsv.

import { csvParse, autoType } from 'd3-dsv';

const sanitizeHeaders = (d) => {
  const returnObj = {};

  for (const [key, value] of Object.entries(d)) {
    // I lowercase & replace spaces with underscores, but do whatever you want
    const sanitizedKey = key.toLowerCase().replace(/\s/g, '_');
    returnObj[sanitizedKey] = value;
  }
  // everything in CSV is a string by default
  // autoType will try to convert your data to the correct type (so "13" becomes 13, etc)
  return autoType(returnObj);
};

// Pass in your row conversion function
const data = csvParse(csv, sanitizeHeaders);

Using the Deno standard Library: