Choosing a good file format for Pandas

Before you can process your data with Pandas, you need to load it (from disk or remote storage).
There are plenty of data formats supported by Pandas, from CSV, to JSON, to Parquet, and many others as well.

Which should you use?

  • You don’t want loading the data to be slow, or use lots of memory: that’s pure overhead.
    Ideally you’d want a file format that’s fast, efficient, small, and broadly supported.
  • You also want to make sure the loaded data has all the right types: numeric types, datetimes, and so on.
    Some data formats do a better job at this than others.

