pisco_log
banner

The Research on the Improvement of FP-growth Algorithm

Zixian Wu, Gang Fang

Abstract


This paper focuses on the issue of low efficiency in the FP-growth algorithm for frequent pattern mining and proposes an improved algorithm, ICFM-growth. Experimental results demonstrate that the improved algorithm outperforms the FP-growth algorithm in terms of both runtime and space utilization. By studying the classical frequent pattern algorithms Apriori and FP-growth, the latter is an improved algo- rithm based on Apriori to address the problems of generating a large number of candidate itemsets and consuming significant memory space. While FP-growth exhibits superior mining efficiency compared to Apriori, it faces challenges when dealing with large and long transactional databases due to the construction of numerous FP-trees, which increases computational tasks and prolongs runtime, leading to lagging mining efficiency. To address this issue, the improved algorithm ICFM-growth is proposed. It constructs a co-occurrence frequency matrix to perform preliminary screening on the transaction set, focusing on high-frequency and co-occurring item pairs, thereby reducing unnecessary computa- tions. In the initial stage, as important item pairs have already been filtered, the algorithm can directly operate on these item pairs and items, rather than the entire dataset, thereby reducing the search space and computational complexity. Additionally, the structure of the FP-tree ena- blesthe algorithm to store and process data more efficiently, avoiding frequent scanning of the entire database, as seen in traditional Apriori algorithms. Finally, through simulation experiments on publicly available datasets such as Movie Data and House Data, validated using cross- validation, the ICFM-growth algorithm proves to be significantly superior to FP-growth in terms of time and space efficiency. It demonstrates faster runtime, lower memory consumption, and superior mining efficiency compared to FP-growth.

Keywords


Frequency matrix; Frequent patterns; Data mining

Full Text:

PDF

Included Database


References


[1] Hu Shichang. Research and Improvement of Apriori Algorithm [D]. Qingdao University, 2019.

[2] Qu Rui, Zhang Tianjiao. Improved Apriori Algorithm Based on Matrix Compression[J]. Computer Engineering and Design, 2017, (8): 2127-2131.

[3] Wei Kun, Wang Fang, Huang Shucheng. Improved Frequent Pattern Mining Algorithm[J]. Computer and Digital Engineering, 2021, 49(11): 2175-2179.

[4] Malarvizhi SP, Sathiyabhama B. Frequent Pagesets from WebLog by Enhanced Weighted Association Rule Mining[J]. Cluster Comput- ing, 2016, 19(1): 269-277.

[5] Wang Meng, Zou Shurong, Fang Rui. An Improved Apriori Algorithm Based on Matrix[J]. Information Technology, 2018, (3): 150-154, 158.

[6] Wei Kun. Research and Application of MGFP-growth Improved Algorithm Based on FP-growth Association Rules [D]. Jiangsu Univer- sity of Science and Technology, 2020. DOI:10.27171/d.cnki.ghdcc.2020.000636




DOI: http://dx.doi.org/10.18686/aitr.v2i1.3860

Refbacks

  • There are currently no refbacks.