Automated Deployment and Performance Benchmarking of Machine Learning Workloads on Hadoop Clusters Using Ansible

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

With the growing adoption of big data frameworks in cloud computing, ensuring efficient deployment and performance evaluation of distributed systems has become crucial. This project focuses on automating the deployment of machine learning benchmarks on Hadoop clusters using Ansible, a leading DevOps tool for IT automation. We leverage HiBench, a comprehensive benchmarking suite developed by Intel, to evaluate the runtime and throughput of Naïve Bayes Classification and K-Means Clustering workloads implemented in Apache Mahout. The deployment process involves setting up a virtual cluster, installing necessary software packages, and configuring Hadoop and Spark environments through Ansible playbooks. The study presents benchmarking results across different workload scales, analyzing execution time and throughput per node. By automating the entire setup, this work simplifies the evaluation of large-scale machine learning tasks, making it easier to assess and optimize Hadoop-based distributed computing environments.

Article activity feed