Don’t bother trying to estimate Pandas memory usage

You have a file with data you want to process with Pandas, and you want to make sure you won’t run out of memory.
How do you estimate memory usage given the file size?

At times you may see estimates like these:

  • “Have 5 to 10 times as much RAM as the size of your dataset”, or
  • “several times the size of your dataset”, or
  • 2×-3× the size of the dataset.

All of these estimates can both under- and over-estimate memory usage, depending on the situation.
In fact, I will go so far as to say that estimating memory usage is just not worth doing.

In particular, this article will: