Binned Statistics in Vaex

Vaex provides a faster alternative to pandas’s groupby as ‘binby’ which can calculate statistics on a regular N-dimensional grid swiftly in regular bins. 

Python3




%time df_vaex.count(binby=df_vaex.column7,
                    limits=[0, 20], shape=10)


Output:

Fast Visualization in Vaex:

Visualization of the large dataset is a tedious task. But Vaex can compute these visualizations pretty quickly. The dataset gives a better idea of data distribution when computed in bins and Vaex excels in group aggregate properties, selections, and bins. So, Vaex is able to visualize swiftly and interactively. By Vaex, visualizations can be done even in 3-dimensions on large datasets.
Let’s plot a simple 1-dimensional graph:

Python3




%time df_vaex.viz.histogram(df_vaex.column1, 
                            limits = [0, 20])


Output:

Let’s plot a 2-dimensional heat-map:

Python




df_vaex.viz.heatmap(df_vaex.column7, df_vaex.column8 +
                    df_vaex.column9, limits=[-3, 20])


Output:

We can add statistics expression and visualize by passing the “what=<statistic>(<expression>)” argument. So let’s perform a slightly complicated visualization:

Python3




df_vaex.viz.heatmap(df_vaex.column1, df_vaex.column2,
                    what=(vaex.stat.mean(df_vaex.column4) /
                          vaex.stat.std(df_vaex.column4)),
                    limits='99.7%')


Output:

Here, the ‘vaex.stat.<statistic>’ objects are very similar to Vaex expressions, which represent an underlying calculation, and also we can apply typical arithmetic and Numpy functions to these calculations.



Introduction to Vaex in Python

Working on Big Data has become very common today, So we require some libraries which can facilitate us to work on big data from our systems (i.e., desktops, laptops) with instantaneous execution of Code and low memory usage.

Vaex is a Python library which helps us achieve that and makes working with large datasets super easy. It is especially for lazy Out-of-Core DataFrames (similar to Pandas). It can visualize, explore, perform computations on big tabular datasets swiftly and with minimal memory usage.

Similar Reads

Installation:

Using Conda:...

Why Vaex?

Vaex helps us work with large datasets efficiently and swiftly by lazy computations, virtual columns, memory-mapping, zero memory copy policy, efficient data cleansing, etc. Vaex has efficient algorithms and it emphasizes aggregate data properties instead of looking at individual samples. It is able to overcome several shortcomings of other libraries (like:- pandas). So, Let’s Explore Vaex:-...

Vaex does computations lazily

...

Statistics Performance:

...

Vaexfollows zero memory copy policy

...

Virtual Columns in Vaex

Vaex uses a lazy computation technique (i.e., compute on the fly without wasting RAM). In this technique, Vaex does not do the complete calculations, instead, it creates a Vaex expression, and when printed out it shows some preview values. So Vaex performs calculations only when needed else it stores the expression. This makes the computation speed of Vaex exceptionally fast. Let’s Perform an example on a simple computation:...

Binned Statistics in Vaex:

...