Feature Stream Selection Using Alpha Investing and Xgboost for Enhanced Predictive Accuracy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Real time applications such as patient monitoring, predictive maintenance of machines, recommendation systems, and video surveillance often produce large amounts of data. In order to get good insight of the data and for better decision making purposes many processing on this data are done. Processing with large volumes of data is challenging. So, Feature selection techniques are used which select the important features from the available feature set. The predictive accuracy of machine learning models such as classifiers and regression can be increased by selecting relevant and optimal numbers of features from the dataset and using only those features for training the model. Feature Selection becomes even more challenging if all features are not available at the same instance of time and features may arrive at varying instances of time and are called dynamic features. For example, in predictive maintenance of machines, different sensor readings may arrive at different times. But, to detect potential failures in real time the processing of the data needs to be done at an earlier stage itself instead of waiting for the arrival of the entire feature space. There are many existing algorithms for feature selection, but most of them either need the entire features are to be available at the same instance of time or the predictive accuracies of classifiers built using the selected features are low. This paper proposes a novel methodology that integrates Alpha Investing, an online streaming feature selection technique, with XGBoost, a tree based classifier for feature selection, resulting in improved predictive performance on high-dimensional datasets. It is an efficient feature selection technique that initially selects the relevant features using Alpha investing technique and then refines the result using XGBoost algorithm. The experimental result on various datasets shows that the proposed method has greater predictive accuracy with a lesser number of selected features when compared to other available feature selection methods