cuDF

cuDF (CUDA DF) is a Python GPU data frame library that helps accelerate the loading, processing, and manipulating of massive data – thus, enabling users to perform computer-intensive operations fast. cuDF is based on an apache arrow columnar layout which we will discuss later. 

In order to shift from CPU to GPU, i.e. Pandas to cuDF, one doesn’t need to learn a new library from scratch. cuDF provides a Pandas-like API – making the shift from Pandas to cuDF quite simple for data scientists, analysts, and Machine Learning Engineers. Just like Pandas, cuDF offers two data structures: Series and Dataframe – most of the in-built functions are also available in cuDF with the same syntax.

CUDA/GPU requirements:

  • CUDA 11.0+
  • NVIDIA driver 450.80.02+
  • Pascal architecture or better (Compute Capability >=6.0)
  • Conda

cuDF can be installed with conda from the rapidsai channel:

# for CUDA 11.0
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
   cudf=21.08 python=3.7 cudatoolkit=11.0

# or, for CUDA 11.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
   cudf=21.08 python=3.7 cudatoolkit=11.2

How to speed up Pandas with cuDF?

Pandas data frames in Python are extremely useful; they provide an easy and flexible way to deal with data and a large number of in-built functions to handle, analyze, and process the data. While Pandas data frames have a decent processing time, still in the case of computationally intensive operations, Pandas data frames tend to be slow, causing delays in data science and ML workflows. This limited speed of pandas data frames is because pandas work on CPUs that only have 8 cores. However, GPU acceleration of data science and machine learning workflows provides a solution to this problem and enhances the speed of operations at an impressive level.

Similar Reads

cuDF

cuDF (CUDA DF) is a Python GPU data frame library that helps accelerate the loading, processing, and manipulating of massive data – thus, enabling users to perform computer-intensive operations fast. cuDF is based on an apache arrow columnar layout which we will discuss later....

Comparison between computational times of Pandas and cuDF

In order to analyze the time taken in both cases, let us try to load a huge dataset data.csv – first using pandas library and then using cuDF, and compare the computational time in both the cases....

Arrow Columnar Layout in cuDF

...