EPOBPA: Extensible Parallelizable Optimized Buddy Prima Algorithm
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Data is the growing new natural resource; in today’s world many organizations have a vast amount of data and they need a method to get benefits of it. Every day different systems generate more than 2.5 Quintilian bytes of data; and there is a fact that 90% of the data have been created in the last 10 years. Frequent Itemset Mining (FIM) finds valuable associations and establishes a correlation relationship between large sets of data items. Association rules describe attribute value conditions that occur frequently together in a given dataset. Existing FIM algorithms depend mainly on candidate sets generation like Apriori algorithm or on constructing data structure to handle datasets like FP-Growth. In the big data era these techniques come with high time overhead. E-POBPA presents a new FIM technique to handle big data with neither candidate generation step nor creating a specific data structure. The proposed algorithm is built upon the original Buddy Prima algorithm. It encompasses a distribution method that makes it customizable for any hardware architecture used. The Experimental results show that E-POBPA surpasses state of the art techniques in its time performance. The time improvement over other approaches ranges between 36% and 99% depending on the dataset and the minimum support used.