Creating Machine Learning Pipeline with Scikit-Learn
Step 1: Import Libraries and Load Data
First, import the necessary libraries and load your dataset. For this example, weāll use the Iris dataset.
from sklearn import datasets
from sklearn.model_selection import train_test_split
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 2: Define the Pipeline
Next, define the pipeline by specifying the sequence of steps.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
pipeline = Pipeline([
('scaler', StandardScaler()),
('pca', PCA(n_components=2)),
('classifier', LogisticRegression())
])
Step 3: Train the Pipeline
Fit the pipeline on the training data.
pipeline.fit(X_train, y_train)
Step 4: Make Predictions
Use the trained pipeline to make predictions on the test data.
y_pred = pipeline.predict(X_test)
Step 5: Evaluate the Model
Evaluate the performance of the model using appropriate metrics.
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Output:
Accuracy: 0.97
What is exactly sklearn.pipeline.Pipeline?
The process of transforming raw data into a model-ready format often involves a series of steps, including data preprocessing, feature selection, and model training. Managing these steps efficiently and ensuring reproducibility can be challenging.
This is where sklearn.pipeline.Pipeline
from the scikit-learn library comes into play. This article delves into the concept of sklearn.pipeline.Pipeline
, its benefits, and how to implement it effectively in your machine learning projects.
Table of Content
- Understanding sklearn.pipeline.Pipeline
- Components of a Pipeline
- Creating Machine Learning Pipeline with Scikit-Learn
- Step 1: Import Libraries and Load Data
- Step 2: Define the Pipeline
- Step 3: Train the Pipeline
- Step 4: Make Predictions
- Step 5: Evaluate the Model
- Advanced Techniques for Machine Learning Pipelines in Scikit-Learn
- 1. ColumnTransformer
- 2. FeatureUnion
- 3. Hyperparameter Tuning