Email Summarizer: A Novel Hybrid Approach to Email Summarization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Email has become an essential mode of communication, but the sheer volume of messages makes it difficult for users to stay organized and quickly find key information. This paper explores the field of automatic email summarization, reviewing a range of summarization techniques and how their effectiveness is evaluated. Building on this groundwork, we present a new approach designed to create clear, informative summaries by combining three complementary strategies. First, we identify important terms using TF-IDF (Term Frequency–Inverse Document Frequency) to determine which words carry the most weight in an email. Second, we apply Latent Dirichlet Allocation (LDA) to uncover the underlying themes or topics within the message. Third, we leverage sentence embeddings from the MiniLM transformer model to capture the deeper meaning of each sentence. By integrating these methods in a unified framework, our system evaluates sentence importance in the context of the entire email and its subject. We developed this solution with efficiency in mind and tested it against benchmark methods such as LexRank Summarizer, Lead Sentence, and Random Sentence selection. Results show that our approach generates summaries that are more informative and easier to understand than these baselines. This work serves both as a review of current summarization techniques and as a practical contribution toward reducing the problem of email overload with an effective, accessible summarization system.