Breast cancer wisconsin (diagnostic) dataset
The load_breast_cancer function in scikit-learn provides a dataset for binary classification between benign and malignant breast tumors based on features derived from cell nucleus images.
Classes |
2 |
Samples per class | 212(M),357(B) |
Samples total |
569 |
Dimensionality |
30 |
Features | real, positive |
Breast cancer wisconsin (diagnostic) dataset Example:
from sklearn.datasets import load_breast_cancer
import pandas as pd
# Load the breast cancer dataset
breast_cancer = load_breast_cancer()
# Creating a DataFrame from the dataset for easier manipulation
cancer_df = pd.DataFrame(data=breast_cancer.data, columns=breast_cancer.feature_names)
cancer_df['target'] = breast_cancer.target
# Add a new column with target names for better readability
cancer_df['diagnosis'] = cancer_df['target'].apply(lambda x: breast_cancer.target_names[x])
# Print the first few rows of the DataFrame
print(cancer_df.head())
Output:
mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
mean compactness mean concavity mean concave points mean symmetry \
0 0.27760 0.3001 0.14710 0.2419
1 0.07864 0.0869 0.07017 0.1812
2 0.15990 0.1974 0.12790 0.2069
3 0.28390 0.2414 0.10520 0.2597
4 0.13280 0.1980 0.10430 0.1809
mean fractal dimension ... worst perimeter worst area worst smoothness \
0 0.07871 ... 184.60 2019.0 0.1622
1 0.05667 ... 158.80 1956.0 0.1238
2 0.05999 ... 152.50 1709.0 0.1444
3 0.09744 ... 98.87 567.7 0.2098
4 0.05883 ... 152.20 1575.0 0.1374
worst compactness worst concavity worst concave points worst symmetry \
0 0.6656 0.7119 0.2654 0.4601
1 0.1866 0.2416 0.1860 0.2750
2 0.4245 0.4504 0.2430 0.3613
3 0.8663 0.6869 0.2575 0.6638
4 0.2050 0.4000 0.1625 0.2364
worst fractal dimension target diagnosis
0 0.11890 0 malignant
1 0.08902 0 malignant
2 0.08758 0 malignant
3 0.17300 0 malignant
4 0.07678 0 malignant
[5 rows x 32 columns]
What is Toy Dataset – Types, Purpose, Benefits and Application
Toy datasets are small, simple datasets commonly used in the field of machine learning for training, testing, and demonstrating algorithms. These datasets are typically clean, well-organized, and structured in a way that makes them easy to use for instructional purposes, reducing the complexities associated with real-world data processing.