EPOBPA: Extensible Parallelizable Optimized Buddy Prima Algorithm
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Data is the growing new natural resource; in today’s world many organizations have a vast amount of data and they need a method to get benefits of it. Every day different systems generate more than 2.5 Quintilian bytes of data; and there is a fact that 90% of the data have been created in the last 10 years. Frequent Itemset Mining (FIM) finds valuable associations and establishes a correlation relationship between large sets of data items. Association rules describe attribute value conditions that occur frequently together in a given dataset. Existing FIM algorithms depend mainly on candidate sets generation like Apriori algorithm or on constructing data structure to handle datasets like FP-Growth. In the big data era these techniques come with high time overhead. E-POBPA presents a new FIM technique to handle big data with neither candidate generation step nor creating a specific data structure. The proposed algorithm is built upon the original Buddy Prima algorithm. It encompasses a distribution method that makes it customizable for any hardware architecture used. The Experimental results show that E-POBPA surpasses state of the art techniques in its time performance. The time improvement over other approaches ranges between 36% and 99% depending on the dataset and the minimum support used.