Taxonomy Classification using Machine Learning Based Model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language model (LLM) trends and taxonomy have changed rapidly in the last few years, primarily due to the advancement of data sciences like natural language processing (NLP), deep learning, and the ever-growing size of computational resources. These models aim to enhance logical and mathematical reasoning beyond pattern recognition. This work aims to explore trends in survey papers over time and analyze their associated taxonomies through data exploration, visualization, and machine learning modeling. Initially, the dataset of survey papers is preprocessed by grouping the number of surveys by year and month, revealing publication trends across time. A detailed analysis of taxonomy distributions is performed to identify the prevalence of various survey categories. Using the TF-IDF method, the titles and summaries of papers are vectorized, transforming textual information into numerical features. A one-hot encoding approach is applied to the survey categories to enable better feature representation for machine learning models. The results show that the Random Forest Classifier and Support Vector Machine achieved the highest accuracies in classifying survey papers based on their taxonomy. This research not only highlights trends in the publication of surveys but also offers an automated approach for classifying them, potentially aiding future research in organizing and categorizing survey literature efficiently.

Article activity feed