pisco_log
banner

An Improved Apriori Algorithm Based on the Spark Platform

Congshuai Xia, Gang Fang, Wenqiang Gao, Qian Zhao

Abstract


Apriori algorithm is one of the most classical algorithms used to mine frequent term sets, but because of the need to scan the
calculation method of transaction set repeatedly, the computational efficiency of the algorithm is seriously reduced and it is difficult to parallelize processing. With the advent of the era of big data, the data scale is increasing. In order to solve this problem, this paper proposes a
Apriori parallelization processing method based on vertical data format and Spark computing framework GC-Apriori algorithm. Using
the vertical data format, reduce the duplication between things, improve the efficiency of data storage, and the efficiency of frequent item
set mining. At the same time, the broadcast variable mechanism of Spark is used to improve the overall computing efficiency. Comparing
the performance with other distributed Apriori algorithms on the same scale, the computational efficiency of GC-Apriori algorithm is improved. The results show that the algorithm effectively improves the efficiency of frequent term set mining of Apriori algorithm in distributed environment.

Keywords


Apriori algorithm; Spark; Vertical data format

Full Text:

PDF

Included Database


References


[1] Ye Yanbin, Chiang C C.A parallel Apriori algorithm for frequent itemsets mining[C]//Proc of International Conference on Software Engineering Research, Management and Applications.Washington DC:IEEE Computer Society, 2006:87-94.

[2] Qiu Hongjian, GuRong, Yuan Chunfeng, et al.YAFIM:a parallel frequent itemset mining algorithm with Spark [C]//Proc of IEEE International Parallel?Distributed Processing Symposium Workshops Washington DC:IEEE Computer Society, 2014:1664-1671.

[3] Su Weisong. Evaluation of the memory overhead of a typical memory computing system [D]. Southeast University, 2016.

[4] Wu Lei. Research on the frequent pattern mining algorithm in the big data environment [D]. Guangdong University of Technology,

2019.

[5] Xing Changzheng, An Weiguo, Wang Xing. Improvement of frequent itemset algorithm for vertical data format mining[J]. Computer

Engineering and Science, 2017, 39(7):1365-1370.

[6] Zheng Jingyi, Deng Xiaoheng. Distributed frequent term set mining algorithm based on item coding [J]. Computer Application Research, 2019, 36 (4): 1059-1067.


Refbacks

  • There are currently no refbacks.