Measuring the memory usage of a Pandas DataFrame

How much memory are your Pandas DataFrame or Series using?
Pandas provides an API for measuring this information, but a variety of implementation details means the results can be confusing or misleading.

Consider the following example:

>>> import pandas as pd
>>> series = pd.Series(["abcdefhjiklmnopqrstuvwxyz" * 10
...                     for i in range(1_000_000)])
>>> series.memory_usage()
8000128
>>> series.memory_usage(deep=True)
307000128

Which is correct, is memory usage 8MB or 300MB?
Neither!

In this special case, it’s actually 67MB, at least with the default Python interpreter.

 

 

 

To finish reading, please visit source site