Abstract
Frequent pattern mining generates a lot of candidates, which requires a lot of memory usage and mining time. In real applications, a small number of frequent patterns are used. Therefore, the mining of top-rank-k frequent patterns, which limits the number of mined frequent patterns by ranking them in frequency, has received increasing interest. This paper proposes the iNTK algorithm, which is an improved version of the NTK algorithm, for mining top-rank-k frequent patterns. This algorithm employs an N-list structure to represent patterns. The subsume concept is used to speed up the process of mining top-rank-k patterns. The experiments are conducted to evaluate iNTK and NTK in terms of mining time and memory usage for eight datasets. The experimental results show that iNTK is more efficient and faster than NTK.
1. Introduction
An expert system is an intelligent system that solves the complex problems based on knowledge throughout inference procedures. Generally, there are three components in an expert system including knowledge base, inference engine and user interface (Jackson, 1999). The central of expert systems is the knowledge base, because it contains the problem solving knowledge of the particular application (Ahmed, 2008). Therefore, the reduction of this knowledge space plays a big role in the implemented performance of expert systems. Association rules are important of the knowledge (Daniel & Viorel, 2004; Guil, Bosch, Túnez, & Marín, 2003) which represent the relationships between items in a dataset. To generate association rules, traditional approaches first mine frequent patterns which are itemsets, subsequences, and substructures that appear in large transactions or relational datasets with a frequency no less than a given threshold. After that, the system uses these frequent patterns and the minimum confidence to find all rules. Two above phrases require a lot of memory usage and mining time. Therefore, the reduction of time to mine frequent patterns is very useful to enhance expert systems.
6. Conclusion and future work
This paper presents an efficient improvement algorithm called iNTK to mine top-rank-k frequent patterns. The advantage of iNTK lies in that it uses N-list and subsume index of 1-patterns. N-list store information shorter than Node-list and subsume index help iNTK directly mining in case of patterns belonged to top-rank-k table contain other 1-patterns in their subsume set. This causes that iNTK consume less memory and runtime. Extensive experiments show that iNTK outperforms NTK for various datasets.