From Informal to Formal – Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

Jialun Cao
Yaojie Lu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. However, these studies intertwined multiple skills simultaneously—problem-solving, reasoning, and writing formal specifications—making it hard to precisely identify the LLMs’ strengths and weaknesses in each task. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five mainstream formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) in six tasks by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We found that LLMs are good at writing proof segments when given either the code, or the detailed description of proof steps. Also, the fine-tuning brought about a nearly threefold improvement at most. Interestingly, we observed that fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities. Fine-tuned models are released to facilitate subsequent studies at https://huggingface.co/fm-universe.

Version published to 10.32388/mlaotg
Feb 18, 2025

EXa-LM: A Controlled Natural Language Bridge between Large Language Models and First-Order Logic Solvers

This article has 1 author:
1. Francis Frydman
This article has no evaluationsLatest version Dec 22, 2025
Applying Action Research to Developing a GPT-Based Assistant for Construction Cost Code Verification in State-Funded Projects in Vietnam

This article has 4 authors:
1. Quan T. Nguyen
2. Thuy-Binh Pham
3. Hai Phong Bui
4. Po-Han Chen
This article has no evaluationsLatest version Jan 26, 2026
Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS

This article has 1 author:
1. Himanshu Sharma
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

EXa-LM: A Controlled Natural Language Bridge between Large Language Models and First-Order Logic Solvers

Applying Action Research to Developing a GPT-Based Assistant for Construction Cost Code Verification in State-Funded Projects in Vietnam

Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS