Binned Statistics in Vaex

Virtual Columns in Vaex

Vaex provides a faster alternative to pandas’s groupby as ‘binby’ which can calculate statistics on a regular N-dimensional grid swiftly in regular bins.

Python3

%time df_vaex.count(binby=df_vaex.column7, 
                    limits=[0, 20], shape=10) 

Output:

Fast Visualization in Vaex:

Visualization of the large dataset is a tedious task. But Vaex can compute these visualizations pretty quickly. The dataset gives a better idea of data distribution when computed in bins and Vaex excels in group aggregate properties, selections, and bins. So, Vaex is able to visualize swiftly and interactively. By Vaex, visualizations can be done even in 3-dimensions on large datasets.
Let’s plot a simple 1-dimensional graph:

Python3

%time df_vaex.viz.histogram(df_vaex.column1,  
                            limits = [0, 20]) 

Output:

Let’s plot a 2-dimensional heat-map:

Python

df_vaex.viz.heatmap(df_vaex.column7, df_vaex.column8 +
                    df_vaex.column9, limits=[-3, 20]) 

Output:

We can add statistics expression and visualize by passing the “what=<statistic>(<expression>)” argument. So let’s perform a slightly complicated visualization:

Python3

df_vaex.viz.heatmap(df_vaex.column1, df_vaex.column2, 
                    what=(vaex.stat.mean(df_vaex.column4) /
                          vaex.stat.std(df_vaex.column4)), 
                    limits='99.7%') 

Output:

Here, the ‘vaex.stat.<statistic>’ objects are very similar to Vaex expressions, which represent an underlying calculation, and also we can apply typical arithmetic and Numpy functions to these calculations.

Introduction to Vaex in Python

Working on Big Data has become very common today, So we require some libraries which can facilitate us to work on big data from our systems (i.e., desktops, laptops) with instantaneous execution of Code and low memory usage.

Vaex is a Python library which helps us achieve that and makes working with large datasets super easy. It is especially for lazy Out-of-Core DataFrames (similar to Pandas). It can visualize, explore, perform computations on big tabular datasets swiftly and with minimal memory usage.

Binned Statistics in Vaex

Python3

Fast Visualization in Vaex:

Python3

Let’s plot a 2-dimensional heat-map:

Python

Python3

Introduction to Vaex in Python

Similar Reads