Breast cancer wisconsin (diagnostic) dataset

The load_breast_cancer function in scikit-learn provides a dataset for binary classification between benign and malignant breast tumors based on features derived from cell nucleus images.

Classes	2
Samples per class	212(M),357(B)
Samples total	569
Dimensionality	30
Features	real, positive

Breast cancer wisconsin (diagnostic) dataset Example:

Python3

from sklearn.datasets import load_breast_cancer
import pandas as pd

# Load the breast cancer dataset
breast_cancer = load_breast_cancer()

# Creating a DataFrame from the dataset for easier manipulation
cancer_df = pd.DataFrame(data=breast_cancer.data, columns=breast_cancer.feature_names)
cancer_df['target'] = breast_cancer.target

# Add a new column with target names for better readability
cancer_df['diagnosis'] = cancer_df['target'].apply(lambda x: breast_cancer.target_names[x])

# Print the first few rows of the DataFrame
print(cancer_df.head())

Output:

   mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   mean compactness  mean concavity  mean concave points  mean symmetry  \
0           0.27760          0.3001              0.14710         0.2419   
1           0.07864          0.0869              0.07017         0.1812   
2           0.15990          0.1974              0.12790         0.2069   
3           0.28390          0.2414              0.10520         0.2597   
4           0.13280          0.1980              0.10430         0.1809   

   mean fractal dimension  ...  worst perimeter  worst area  worst smoothness  \
0                 0.07871  ...           184.60      2019.0            0.1622   
1                 0.05667  ...           158.80      1956.0            0.1238   
2                 0.05999  ...           152.50      1709.0            0.1444   
3                 0.09744  ...            98.87       567.7            0.2098   
4                 0.05883  ...           152.20      1575.0            0.1374   

   worst compactness  worst concavity  worst concave points  worst symmetry  \
0             0.6656           0.7119                0.2654          0.4601   
1             0.1866           0.2416                0.1860          0.2750   
2             0.4245           0.4504                0.2430          0.3613   
3             0.8663           0.6869                0.2575          0.6638   
4             0.2050           0.4000                0.1625          0.2364   

   worst fractal dimension  target  diagnosis  
0                  0.11890       0  malignant  
1                  0.08902       0  malignant  
2                  0.08758       0  malignant  
3                  0.17300       0  malignant  
4                  0.07678       0  malignant  

[5 rows x 32 columns]

What is Toy Dataset – Types, Purpose, Benefits and Application

Toy datasets are small, simple datasets commonly used in the field of machine learning for training, testing, and demonstrating algorithms. These datasets are typically clean, well-organized, and structured in a way that makes them easy to use for instructional purposes, reducing the complexities associated with real-world data processing.

Breast cancer wisconsin (diagnostic) dataset

What is Toy Dataset – Types, Purpose, Benefits and Application

Similar Reads