應用於關聯性規則探勘之平行化硬體架構

Ying-Hsiang Wen; 溫英翔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32693

標題:	應用於關聯性規則探勘之平行化硬體架構 Parallel Hardware Architecture for Mining Association Rules
作者:	Ying-Hsiang Wen 溫英翔
指導教授:	陳銘憲(Ming-Syan Chen)
關鍵字:	資料探勘,關聯性規則,硬體加速,心縮陣列,管線化,可程式化邏輯閘陣列, data mining,association rules,hardware-enhanced,systolic array,pipeline,FPGA,
出版年 :	2006
學位:	碩士
摘要:	自從早期關聯性規則探勘演算法Apriori被提出後，相關演算法已被廣泛研究與討論。然而隨著資料大量的成長，利用軟體演算法處理大量的資料，在效能上會遇到瓶頸。有鑑於此，在本論文中，我們提出了一個管線化(Pipeline)排程的硬體架構，可以同時執行以下多個運算：利用心縮陣列(Systolic Array)做平行化的項目組(Itemsets)比對，並且可以同時運用雜湊表(Hash Table)方法來刪去掉不必要的候選人項目組(Candidate Itemsets)，此外透過紀錄每筆交易資料中，項目出現在候選人項目組的次數，使用修飾過濾器(Trimming filter)，我們可以過濾掉低頻(Infrequent)的項目，進而加速整個資料探勘的速度。從各種實驗數據看來，相對於傳統完全採用軟體去執行資料探勘的演算法，使用硬體加速在效能上可以得到可觀的改進。 Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this thesis, we propose a HAsh-based and PiPelIned architecture (abbreviated as HAPPI) for hardware-enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in Apriori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach in terms of execution cycles.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32693
全文授權:	有償授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 目前未授權公開取用	410.24 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。