Dynamic Feature Engineering Through Reinforcement and Prompt Based Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Feature engineering is an important part of machine learning processes that has a big impact on how well models work, how easy they are to understand, and how effective they are overall. Feature selection and transformation are often done with filter, wrapper, and embedding techniques, but they often require manual heuristics and subject knowledge. They are also ineffective in contexts characterized by high dimensionality and complexity. Recent studies have explored automated techniques utilizing big language models and reinforcement learning to address these limitations. This paper presents a thorough and critically analyzed review of cutting-edge research on reinforcement learning-based feature selection, reinforcement learning-driven feature production, and LLM-guided feature optimization. Three primary paradigms of technique are recognized. Initially, feature selection is conceptualized as a collaborative or directed decision-making challenge employing interactive and multi-agent reinforcement learning methodologies. These strategies assign agents to features and optimize long-term rewards based on domain-specific importance, redundancy, or model precision. Combinatorial Multi-Armed Bandits (CMAB) represent a computationally efficient alternative that facilitates scalable and effective feature selection with minimal learning overhead, being a component of the second paradigm. In the third type, LLMs are employed to either derive effective reward functions or generate novel features. They accomplish this through the utilization of reasoning-based prompts, external knowledge repositories, and prototype alignment. This work also addresses unresolved issues in bias management, computational overhead, and generalization to unfamiliar domains, as well as underexplored gaps, including the necessity for hybrid frameworks that integrate the exploration efficiency of reinforcement learning with the semantic reasoning of large language models.

Article activity feed