An Interpretable AutoML Pipeline: Automated Feature Engineering with FeatureTools and SurX
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Feature engineering is a foundational step in building effective machine learning models. This study presents a modular and interpretable AutoML pipeline that automates data preprocessing, feature synthesis and selection using FeatureTools and a novel explainability tool, SurX (Surrogate Explainer). The pipeline includes preprocessing stages such as imputation, outlier handling, and label encoding, and supports advanced transformations including logarithmic, Box-Cox, Arscin and Square root scaling. Deep Feature Synthesis is applied to generate high-quality features, while SurX provides localized explanations and ranks features across datasets. Performance was evaluated on nine datasets covering binary and multiclass tasks. Results show substantial improvements in model accuracy and interpretability with reduced human intervention.