Python to Read Large Excel/CSV File Faster

Img Source: https://unsplash.com/photos/Wpnoqo2plFA

Read a CSV with PyArrow

In Pandas 1.4, released in January 2022, there is a new backend for CSV reading, relying on the Arrow library’s CSV parser. It’s still marked as experimental, and it doesn’t support all the features of the default parser—but it is faster.1

Full article: Python to Read Large Excel/CSV File Faster — Hung, Chien-Hsiang | Blog (chienhsiang-hung.github.io)

Notice, it’s only feasible by pd.read_csv() not pd.read_excel().

In pd.read_excel():

engine: str, default None

If io is not a buffer or path, this must be set to identify io. Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”.2

Upgrade Pandas

pip install --upgrade pandas --user

Noted –user is needed for windows user to handle:

Read Large Excel File Faster

Parallel

Let’s imagine that you received excel files and that you have no other choice but to load them as is. You can also use joblib to parallelize this3. Compared to our pickle code from above, we only need to update the loop function.4

Just One File

Standard usecols, nrows, skiprows experiment.

Other great ideas to reduce time on reading data like chunk etc, read on Big Data from Excel to Pandas | Python Charmers.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store