Taxonomy Classification using Machine Learning Based Model

Anup Majumder

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language model (LLM) trends and taxonomy have changed rapidly in the last few years, primarily due to the advancement of data sciences like natural language processing (NLP), deep learning, and the ever-growing size of computational resources. These models aim to enhance logical and mathematical reasoning beyond pattern recognition. This work aims to explore trends in survey papers over time and analyze their associated taxonomies through data exploration, visualization, and machine learning modeling. Initially, the dataset of survey papers is preprocessed by grouping the number of surveys by year and month, revealing publication trends across time. A detailed analysis of taxonomy distributions is performed to identify the prevalence of various survey categories. Using the TF-IDF method, the titles and summaries of papers are vectorized, transforming textual information into numerical features. A one-hot encoding approach is applied to the survey categories to enable better feature representation for machine learning models. The results show that the Random Forest Classifier and Support Vector Machine achieved the highest accuracies in classifying survey papers based on their taxonomy. This research not only highlights trends in the publication of surveys but also offers an automated approach for classifying them, potentially aiding future research in organizing and categorizing survey literature efficiently.

Version published to 10.31224/3967
Oct 22, 2024

A Comprehensive Evaluation of Llama 3 for Text Classification Tasks

This article has 4 authors:
1. AmirAhmad Amjadi
2. Shiva TaghipourEivazi
3. Bahman Arasteh
4. Huseyin Kusetogullari
This article has no evaluationsLatest version Dec 23, 2025
Best Practices for Using Large Language Models at Scale

This article has 5 authors:
1. Bhargavee Kannikanti
2. Arjun Coimbatore Nagarasan
3. Alberto Rosas
4. Sriram Kothandaraman
5. Sravan Kumar Kannuri
This article has no evaluationsLatest version Dec 12, 2025
A Hybrid Rule-Based and Machine LearningMorphological Analyzerfor the Kangri Language Using UD Treebank

This article has 1 author:
1. Prateek Kaushal
This article has no evaluationsLatest version Dec 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Comprehensive Evaluation of Llama 3 for Text Classification Tasks

Best Practices for Using Large Language Models at Scale

A Hybrid Rule-Based and Machine LearningMorphological Analyzerfor the Kangri Language Using UD Treebank