Saving memory with Pandas 1.3’s new string dtype

When you’re loading many strings into Pandas, you’re going to use a lot of memory.
If you have only a limited number of strings, you can save memory with categoricals, but that’s only helpful in a limited number of situations.

With Pandas 1.3, there’s a new option that can save memory on large number of strings as well, simply by changing to a new column type.
Let’s see how.

Pandas’ different string dtypes

Every pandas.Series, and every column in a pandas.DataFrame, have a dtype: the type of object stored inside it.
By default, Pandas will store strings using the object dtype, meaning it store strings as NumPy array of pointers to normal Python object.

In Pandas 1.0, a new "string" dtype was added,

 

 

 

To finish reading, please visit source site