If you care about performance, you may want to avoid CSV files. But since our data sources are often like our family, we can’t make a choice, we’ll see in this blog post how to process a CSV file as fast as possible.

  • @[email protected]
    link
    fedilink
    277 months ago

    Holy shit, switching to PyArrow is going to make me seem a mystical wizard when I merge in the morning. I’ve easily halved the execution time of a horrible but unavoidable job (yay crappy vendor “API” that returns a huge CSV).

    • @[email protected]
      link
      fedilink
      37 months ago

      You and me both. I’ve been parsing around 10-100 million row CSVs lately and…this will hopefully help.