Instruction Strategy Design for Autonomous Machine Learning Experimentation Systems: A Taxonomy, Cross-System Analysis, and Evidence-Based Practitioner Framework

Praneeth Kodumagulla

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Autonomous machine learning experimentation systems—wherein a large language model (LLM) agent iteratively proposes, executes, and evaluates code modifications against a fixed scalar metric—represent a fundamental shift in how machine learning research is conducted. In these systems, the practitioner's primary lever is not the training code itself but the natural-language research program : the instruction document that specifies objectives, priorities, and constraints for the agent across dozens or hundreds of consecutive decisions. Despite this centrality, no principled framework for designing research programs exists in the literature. This survey addresses that gap through four contributions. First, we conduct a structured cross-system analysis of sixteen agentic AutoML and autonomous research systems—including AIDE, AIRA, R&D-Agent, AgentHPO, AlphaEvolve, MLAgentBench, AI-Researcher, and AI Scientist-v2—identifying the instruction document as a universal practitioner-facing control mechanism and cataloguing seven design dimensions. Second, we develop a five-family taxonomy of instruction strategies: Scope-Constrained, Hypothesis-Directed, Diversity-Preserving, Simplicity-Biased, and Curriculum-Staged, grounded in theory from the AutoML, evolutionary computation, prompt engineering, and curriculum learning literatures. Third, we provide multi-source empirical grounding: analysis of two publicly documented overnight sessions suggests a cross-session curriculum intervention is associated with a 37% difference in total gain, with important caveats regarding session-length confounding; independently controlled benchmarks from AIRA and AgentHPO corroborate the taxonomy's predictions. Fourth, five practitioner guidelines with explicitly labelled calibration thresholds are synthesised and validated against all sixteen surveyed systems.

Version published to 10.21203/rs.3.rs-9286871/v1 on Research Square
Apr 2, 2026

Real Science Is Harder Than Benchmarks: Evaluating Advanced AI Frameworks on Published Studies. I. Uncertainty Quantification, ML on Therapeutic Data Commons, and Agent-Based Modeling

This article has 22 authors:
1. Mohammed O. Ahmed
2. Sahil A. Amale
3. Rhythm D. Bhavsar
4. Pratham Chopra
5. Amos Jaimes
6. Arpita Kachhwah
7. Casual D. Kalotra
8. Peizhao Li
9. Xingbei Li
10. Yuantian Liao
11. Rahul Roy
12. Nivethini Senthilselvan
13. Yukun Shao
14. Alok D. Sharma
15. Arjun Shrivatsan
16. Renqianqian Xue
17. Yunjing You
18. Amitesh Badkul
19. Lei Xie
20. Mikhail Oet
21. KuoHao Lee
22. Anton V. Sinitskiy
This article has no evaluationsLatest version Jun 27, 2026
CoG-MeM: A Cognitive-Behavior-Inspired and Logic-Aligned Design for Memory Encoding, Retrieval, and Synthesis

This article has 1 author:
1. Zhiqiang Gan
This article has no evaluationsLatest version Jun 27, 2026
Practical Use of Advanced AI Frameworks on Real-Life Scientific Problems: Three Case Studies

This article has 21 authors:
1. Halime S. A. Gulluoglu
2. Jibin Baby
3. Kirti M. Bagul
4. Bhuvan R. Basangari
5. S. Akash Bathini
6. Nikhil K. R. Chalamalla
7. Jude Dcunha
8. Om Gupta
9. Lanqin Huang
10. Xutong Jiang
11. Yashas R. Naidu
12. Gokul Sathishkumar
13. Mayank Sehrawat
14. S. Lakshmi Thota
15. Dheeraj Thuvara
16. Mahesh B. Vanguri
17. Jiaxi Yin
18. Bat-Erdene Jugder
19. Isabel E. Lusky
20. Jianing Li
21. Anton V. Sinitskiy
This article has no evaluationsLatest version Jun 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Real Science Is Harder Than Benchmarks: Evaluating Advanced AI Frameworks on Published Studies. I. Uncertainty Quantification, ML on Therapeutic Data Commons, and Agent-Based Modeling

CoG-MeM: A Cognitive-Behavior-Inspired and Logic-Aligned Design for Memory Encoding, Retrieval, and Synthesis

Practical Use of Advanced AI Frameworks on Real-Life Scientific Problems: Three Case Studies