Frontier Topics Mining Method via AI-Agent

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

How to quickly identify high-quality frontier topics from massive scientific research data to assist researchers in accurately carrying out scientific research work is of great importance. Traditional analysis methods have some bottlenecks, such as weak cross-domain adaptability, high resource consumption and low efficiency. In order to solve the above problems, a frontier topics mining method via AI-agent is proposed. A generative-verification dual-agents (D-Agents) architecture is innovatively constructed. Firstly, prompt engineering is used to construct generative agent (G-Agent), and the semantic understanding ability of large-scale pre-trained language models is used to realize the automatic generation of candidate frontier topics; Then, the verification agent (V-Agent) is introduced to establish a multi-dimensional evaluation system, and the candidate results are automatically verified from the dimensions of academic novelty, topic accuracy and completeness to identify frontier topics. The effectiveness of the proposed method is verified by constructing three labeled test dataset including computer vision (CV), natural language processing (NLP), and machine learning (ML). The experimental results show that D-Agents can be competent for frontier topics mining tasks in multiple domain at the same time. On three manually labeled datasets: CV-DataSet, NLP-DataSet and ML-DataSet, the accuracy rate of D-Agents exceeds 74% while maintaining the coverage rate of more than 85%. Compared with traditional bibliometric methods, the accuracy and coverage rate of frontier topics mining in three different fields: altitude sickness, recommendation system and oyster reef ecosystem have reached more than 67%. It can effectively alleviate the hallucination problem of G-Agent through the automatic generation and self-verification mechanism in D-Agents, and greatly improve the efficiency of frontier topics mining.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/20125731.

    The study finds that a dual-agent AI system can identify frontier research topics with strong performance, reporting over 74% accuracy and over 85% coverage on labeled datasets in computer vision, natural language processing, and machine learning. It also shows that the verification agent helps reduce hallucinations from the generation step, making the overall process more reliable and efficient than traditional bibliometric methods.

    This work moves the field forward by introducing a practical generative-plus-verification framework for large-scale topic discovery. Instead of relying only on citation patterns or manual screening, it uses AI to both propose and validate emerging topics, which could make frontier-topic mining faster, more scalable, and more adaptable across domains.

    Major issues

    • The evaluation appears to be based on the authors' own metrics such as accuracy and coverage, but it is unclear how these were defined and whether they capture real-world topic quality.

    • The system depends on LLM-generated outputs and prompt engineering, so it may still inherit bias, instability, and hidden hallucination risks even with the verification agent.

    • The main takeaway is that the approach looks promising, but the evidence in the available summary is not yet strong enough to rule out overfitting, evaluation bias, or limited generalizability.

    Minor issues

    • The abstract is dense and repeats the same core claim in several ways, which makes the main contribution harder to scan quickly.

    • The evaluation summary would be clearer if it separated dataset construction, labeling, and performance results into distinct parts.

    • Terms such as "frontier topics," "coverage," and "accuracy" would be clearer if the paper defined them early and used them consistently.

    • The abstract could improve readability by reducing long sentences and splitting technical descriptions into shorter, more direct statements.

    Competing interests

    The author declares that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The author declares that they used generative AI to come up with new ideas for their review.