ResNet34-Based Galaxy Morphology Classification with Machine Unlearning

Nayna Potdukhe
Priyani Sabde
Pooja Aaglave
Meet Upadhye
Smit Barmate
Siddhesh Ratnaparkhi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Galaxy morphology classification is fundamental to observational astronomy. The structure of a galaxy, whether it is smooth and elliptical, features spiral arms and rotation, or presents an optical artifact—tells us a lot about how it formed, if it has merged with others, and where it is headed evolutionarily. As surveys have grown from thousands to hundreds of thousands of galaxies, manually classifying each one has become impossible. Automated deep-learning pipelines have therefore become the standard approach for scaling galaxy morphology classification. However, a challenge arises. Large citizen-science datasets like Galaxy Zoo 2 suffer from label noise because volunteers have different expertise levels and sometimes disagree on ambiguous images. This noise is especially damaging for rare classes; a small percentage of wrong labels can significantly compromise what the model learns. Once a model is trained on noisy data, standard fine-tuning does not provide an effective method to fix labeled errors without retraining entirely. This paper addresses both problems simultaneously. We trained a ResNet34-based CNN on 61,578 Galaxy Zoo 2 images to classify galaxies into three categories: Smooth, Featured/Disk, and Artifact, then applied three machine unlearning methods to reduce the influence of approximately 10–12% intentionally mislabeled samples. We compared Gradient Ascent Unlearning, Fisher Forgetting, and Full Retraining. Our classifier achieved 85.58% validation accuracy, 79.34% balanced accuracy, and a macro-F1 score of 75.15%. Among unlearning methods, Gradient Ascent was fastest (11.01% forget-set accuracy in 103.6 seconds), while Full Retraining gave the best retention (99.64% at 2649.4 seconds). Our experiments also reveal an important implementation constraint: Fisher Forgetting can collapse when the numerical stabilizer is not large enough. Parameters with low importance receive substantial noise perturbations instead of selective forgetting. This is a non-trivial issue with significant practical implications.

Version published to 10.21203/rs.3.rs-9269285/v1 on Research Square
Apr 7, 2026

Self-Supervised Plankton Classification via DINO and Gradient-Based Loss Re-weighting

This article has 9 authors:
1. Abdelfatah Ahmad
2. Zaid Habash
3. Abdulrahman Ahmad
4. Muaz Alradi
5. Abderrahmene Boudiaf
6. Naoufel Werghi
7. Muhammad Owais
8. Irfan Hussain
9. Taimur Hassan
This article has no evaluationsLatest version Mar 16, 2026
Machine metacognition improves classification performance and uncertainty quantification

This article has 2 authors:
1. Murray Scott Bennett
2. Peter D. Kvam
This article has no evaluationsLatest version Mar 24, 2026
Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos

This article has 8 authors:
1. Irshad Ullah
2. Sameed ur Rehman
3. Wajahat Akbar
4. Altaf Hussain
5. Raaz Waheeb Attar
6. Ruzat Ullah
7. Tariq Hussain
8. Amal Hassan Alhazmi
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Self-Supervised Plankton Classification via DINO and Gradient-Based Loss Re-weighting

Machine metacognition improves classification performance and uncertainty quantification

Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos