Exploratory Data Analysis (EDA) on Undergraduate Data Science Students Through R Programming
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study explores the use of exploratory data analysis (EDA) as a tool for experiential learning in the third-year AIML course "Introduction to R Programming" (PCC-CSE-354G). Conducted with undergraduate data science students, the research aimed to provide hands-on experience in data collection, manipulation, and visualization using R programming. The dataset, encompassing attributes such as age, gender, height, weight, and physical activity status, was self-collected by students in three randomly assigned groups (Alpha, Beta, and Gamma) under instructor supervision. Physical measurements, including height and weight, were recorded using measuring tapes and digital weighing machines to ensure precision.The study employed R libraries such as ggplot2, dplyr, and tidyr to perform EDA, focusing on descriptive and comparative analyses of team-based and gender-based patterns. Insights included the relationships between age, physical characteristics, and activity status, highlighting trends such as greater physical activity among lighter individuals and team-specific differences in gender composition. Correlation and statistical testing were further employed to deepen the analysis, revealing weak but notable relationships between age and physical activity.This hands-on approach not only enabled students to engage deeply with real-world data but also fostered teamwork, critical thinking, and technical proficiency in R programming. The findings demonstrate the effectiveness of integrating EDA into active learning frameworks, providing a valuable blueprint for similar educational initiatives in data science curricula.