Member-only story

4 Examples to Compare the Speed of Pandas and Vaex

Soner Yıldırım
4 min readJan 4, 2023

--

Performance matters more as the data size increases.

Photo by Ross Sneddon on Unsplash

I admire Pandas. It’s one of the first tools I learned in my data science journey and I have been using it frequently ever since.

Pandas was the only tool I needed to do data cleaning, manipulation, and analysis until I had to work with very large datasets.

Pandas starts to slow down when the data size becomes very large because it does in-memory analytics. Hence, if the dataset is larger than memory, it becomes very difficult, or impossible, to use Pandas.

I wrote about some tips for making Pandas more efficient when working with large datasets. In this article, we will learn about an alternative tool to Pandas when working with large datasets: Vaex.

What is Vaex?

Vaex is also a Python library and can be used for data analysis and manipulation. The key features of Vaex makes it outperform Pandas when working with large datasets.

Some of these key features:

  • Memory-mapping
  • With lazy execution, calculations are performed only when needed. Vaxes uses expression objects to keep track of the executions.
  • Virtual columns which are treated as regular columns but do not occupy…

--

--

Soner Yıldırım
Soner Yıldırım

No responses yet