Investigating the Role of Feature Variation and Data Transformations of Different Types of Machine Learning Algorithms in Classifying Benign - Malignant Breast Cancer
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective : to explain how the role of data transformation and feature selection can be used to improve the performance of machine learning in terms of classifying breast tumors into benign or malignant categories based on available breast cancer datasets. Method : data taken from Kaggle breast cancer Wisconsin, there are 569 data, consisting of 357 benign, 212 malignant. 70% of the data is used for training and 30% of the data is used for testing. Data is divided into 3 types of features (10 features, 30 features and optional features), each feature is done 3 types of data transformation (original, binary and bipolar). By using 7 types of algorithms (logistic regression, decision tree, naïve bayes, random forest, SVM, ANN, KNN), the values of TP, FP, FN, TN, accuracy, sensitivity, specificity, precision are calculated. Results : ANN method with optional features and bipolar transformation data has the highest accuracy, sensitivity, specificity, precision values. Conclusion : Proper feature selection can improve the performance of machine learning, as well as the use of binary and bipolar data transformation can improve the performance of machine learning.