How Does PowerTransformer Work?
The ‘PowerTransformer’ supports two main transformations:
- Box-Cox Transform
- Yeo-Johnson Transform
Both of these methods are used to compute optimal transformation parameter that normalizes the data.
Box-Cox Transform
The Box-Cox transformation is a statistical method used to stabilize variance and make data more closely meet the assumptions of normality. The Box-Cox transformation can be applied to positive data. The transformation is parameterized by value, which varies to find the best approximation of a normal distribution.
The formula for the Box-Cox transformation is:
This transformation helps improve the validity of many statistical techniques that assume normality.
Yeo-Johnson Transform
The Yeo-Johnson transformation, an extension of the Box-Cox method, serves to stabilize variance and normalize data distributions, rendering it more adaptable for real-world scenarios by accommodating both positive and negative data values.
The transformation is defined as follows for values of and y:
PowerTransformer in scikit-learn
When it comes to data preprocessing, machine learning algorithms perform better when variables are transformed to fit a more Gaussian distribution. PowerTransformer is a scikit-learn library that is used to transform to fit Gaussian distribution. The article aims to explore PowerTransfoer technique, its methods along with implementation in scikit-learn.
Table of Content
- What is a PowerTransformer?
- How Does PowerTransformer Work?
- Box-Cox Transform
- Yeo-Johnson Transform
- Implementation: PowerTransformer in Scikit-Learn
- Step 1: Import Libraries
- Step 2: Generating Skewed Data
- Step 3: Applying PowerTransformer
- Advantages of PowerTransformer