NÂNG CAO HIỆU QUẢ KHAI THÁC N TẬP HỮU ÍCH CAO NHẤT TRONG CƠ SỞ DỮ LIỆU GIAO DỊCH
Abstract
The problem of mining high utility itemsets has many practical applications. However, in the mining process, the minimum utility threshold must be determined in advance, leading to difficulties for users. To solve this problem, many studies have been proposed to replace the determination of the threshold value by giving a positive integer N in top-N high utility itemset (top-N HUI) mining. In this study, the authors propose the topNEFIM algorithm to efficiently exploit the top-N HUI in the transaction database and the RTU threshold- strategy to automatically specify the initial threshold value. The algorithm uses a global priority queue (priorityQueue) structure to optimize the threshold-raising process. In addition, the authors also propose a data structure called ExtentionP_set to limit the database being browsed many times during top-N HUI mining. The experimental results show that the proposed algorithm has better execution time on both dense and sparse databases when it is compared with the two algorithms TKEH and THUI.