Automated Classification and Trend Analysis of Large Language Model Survey Papers Using Machine Learning and Natural Language Processing Techniques

Meherunnesa Tania

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study investigates the application of machine learning (ML) and natural language processing (NLP) techniques to classify academic survey papers into predefined taxonomy categories. The dataset, consisting of paper titles, summaries, release dates, taxonomy labels, and categories, was analyzed to uncover trends and patterns in the publication of research papers. Exploratory data analysis (EDA) revealed important insights through visualizations, such as publication trends over time, the distribution of taxonomy categories, and the most common terms used in paper summaries. Key NLP techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), were employed to transform the textual data into numerical features, while one-hot encoding was applied to the categorical data. A Random Forest Classifier was trained on the extracted feature matrix to predict the taxonomy category of each paper. The model achieved promising accuracy, effectively capturing patterns in the dataset. The study also identified areas for future improvement, including addressing class imbalance and exploring more sophisticated models. These findings demonstrate the potential of ML and NLP for automating the classification of academic papers, providing a scalable solution for managing large collections of research literature while offering insights into publication dynamics and trends.

Version published to 10.31224/3984
Oct 2, 2024

Random forests in corpus research: A systematic review

This article has 1 author:
1. Lukas Sönning
This article has no evaluationsLatest version Jan 17, 2026
Random forests in corpus research: A systematic review

This article has 1 author:
1. Lukas Sönning
This article has no evaluationsLatest version Jan 17, 2026
Intelligent Business Document Processing Using AI- and NLP-Based Techniques: A Systematic Literature Review

This article has 4 authors:
1. Naif Alotaibi
2. Morteza Saberi
3. Madhushi Bandara
4. Thantrira Porntaveetus
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Random forests in corpus research: A systematic review

Random forests in corpus research: A systematic review

Intelligent Business Document Processing Using AI- and NLP-Based Techniques: A Systematic Literature Review