Choosing a good file format for Pandas
Before you can process your data with Pandas, you need to load it (from disk or remote storage).
There are plenty of data formats supported by Pandas, from CSV, to JSON, to Parquet, and many others as well.
Which should you use?
- You don’t want loading the data to be slow, or use lots of memory: that’s pure overhead.
Ideally you’d want a file format that’s fast, efficient, small, and broadly supported.
- You also want to make sure the loaded data has all the right types: numeric types, datetimes, and so on.
Some data formats do a better job at this than others.
While there is no one true answer that works for everyone, this article will try to help you narrow down the field and make an