Leveraging Deep Learning for Automated Generative Grading of Science Subject-Based Structured Questions

Peterson Arthur Komugisa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The increasing dependence on online education highlights the need for scalable and efficient assessment tools. Manual grading of structured science questions is time-consuming and subjective, leading to inefficiencies and inconsistencies that compromise assessment fairness and reliability. This research addresses this challenge by developing and testing a new deep learning model for automated generative grading. It employs a hybrid Seq2Seq architecture with BERT and ResNet encoders alongside a GRU decoder to analyze both text and images within questions. The model was carefully evaluated using token-level metrics such as Accuracy, Precision, and F1-Score, along with advanced generative metrics like Corpus BLEU Score and Average BERT Similarity Score. The results reveal a notable contrast: the model achieved a low Corpus BLEU score of 4.34, indicating limited exact syntactic matches with reference answers, but excelled with an Average BERT Similarity of 0.9944, demonstrating strong semantic and contextual understanding. This key finding shows the model's capacity to comprehend the meaning and relevance of marking schemes despite varied wording. The study confirms the model's ability to process and interpret both textual and visual data to generate relevant, meaningful outputs. Overall, this research validates the concept, offering a robust architectural framework and evaluation method for new AI-powered educational tools. The findings reject the null hypothesis, indicating the model significantly improves grading accuracy through enhanced semantic understanding and scalability, providing a promising solution for educators.

Version published to 10.20944/preprints202511.0699.v1
Nov 11, 2025

Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.

This article has 5 authors:
1. Rohit M
2. Jeganathan L
3. Srinivasa Rao Ummity
4. Janaki Meena M
5. Jayaram Balabaskaran
This article has no evaluationsLatest version Sep 22, 2025
Towards Transparent and Context-Aware Automated Essay Scoring

This article has 2 authors:
1. R. Johnsi
2. G. Bharadwaja Kumar
This article has no evaluationsLatest version Nov 12, 2025
A Hybrid TF–IDF and SBERT Approach for Enhanced Text Classification Performance

This article has 3 authors:
1. Muntazir Mehdi
2. Saqlain Mushtaq
3. Ghulam Rabbani Butt
This article has no evaluationsLatest version Oct 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.

Towards Transparent and Context-Aware Automated Essay Scoring

A Hybrid TF–IDF and SBERT Approach for Enhanced Text Classification Performance