An Optimized Content Retriever from Web Articles using Large Language Models and FAISS Indexing

Veerababu Reddy
Veeranjaneyulu N

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Natural Language Processing (NLP) has transformed the way unstructured textual data is processed and analyzed, enabling the extraction of meaningful insights from extensive information sources. Traditional keyword-based retrieval systems often fail to capture the semantic relationships between words, leading to suboptimal search accuracy. The existing retrieval methods explore various NLP techniques, including rule-based and machine learningbased approaches, to analyze skill acquisition, taxonomy creation, and future skill predictions. However, these methods face limitations in handling large-scale datasets, maintaining contextual coherence, and providing semantically relevant responses. The proposed system addresses these challenges by integrating Large Language Models (LLMs), similarity search using FAISS indexing, text segmentation, and optimized embeddings to enhance semantic understanding. The system implements vector search techniques to improve content analysis, intelligent retrieval, and response generation. By implementing an advanced dense retrieval model, the system ensures that retrieved content is not only relevant but also contextually accurate and semantically rich. Performance evaluation is conducted using Precision, Recall, F1-score, Hit@K, BERTScore and Embedding Similarity Scores, demonstrating a 91% accuracy when compared against manually curated ground-truth answers.

Version published to 10.21203/rs.3.rs-6411861/v1 on Research Square
Apr 11, 2025

Leveraging Large Language Models and Embedding Representations for Enhanced Word Similarity Computation

This article has 5 authors:
1. XiaoHong Peng
2. Hongbin Jiang
3. Jing Chen
4. MingXin Liu
5. Xiao Chen
This article has no evaluationsLatest version May 16, 2025
Research on the Application of Large Language Models in Classification and Indexing

This article has 1 author:
1. Michael Reynolds
This article has no evaluationsLatest version May 6, 2025
Retrieval-Augmented Text Generation: Methods, Challenges, and Applications

This article has 1 author:
1. Jeanie Genesis
This article has no evaluationsLatest version Apr 8, 2025

Listed in

Abstract

Article activity feed

Related articles

Leveraging Large Language Models and Embedding Representations for Enhanced Word Similarity Computation

Research on the Application of Large Language Models in Classification and Indexing

Retrieval-Augmented Text Generation: Methods, Challenges, and Applications