基於資訊理論分析的序列型資料資料探勘及分析

Hung-Jui Chang; 張紘睿

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71152

標題:	基於資訊理論分析的序列型資料資料探勘及分析 Information Retrieval and Analysis in Sequential Data: An Information Theoretical Approach
作者:	Hung-Jui Chang 張紘睿
指導教授:	薛智文(Chih-Wen Hsueh)
共同指導教授:	徐讚昇(Tsang-sheng Hsu)
關鍵字:	序列資料,資訊理論,資訊探勘,專家系統,隨機演算法, Sequential data,information theory,information retrieval,expert system,randomized algorithm,
出版年 :	2018
學位:	博士
摘要:	在本篇論文中，作者使用資訊理論的分析方法，對時序性資料庫進行分析，藉以研究時序資庫中之資料點於不同時間點的轉換，尋找出隱藏的資訊及因果關係。其中為提高因果關係判斷的精準度，作者提出了兩種篩選資料點的方式：其一是對相同時間點下的不同狀態建立篩選規則，提高同一個時間點下的資料點所形成之資料集之可靠度；其二是對兩個時間點的資料集，透過隨機配對的方式，降低計算因果關係時所可能產生的誤差。除了計算兩個狀態點間的因果關係及轉移機率，作者同時利用時序資料庫中的資料，驗證隨機配對演算法在時序資料庫中的隨機性。同時以時序行資料的觀點分析蒙地卡羅樹搜尋演算法所形成之解集合隨時間變化的收斂性及正確性。在本論文中一共提及了兩種不同類型的資料庫，其一是由人類自然活動所形成的健保資料庫，其二是由電腦所計算出的對局資料庫。作者將此二種不同來源的資料庫，以一般化的時序性模型進行描述，並透過機率分析的方法，對此兩種資料庫進行分析。透過分析，作者從研究的資料庫中，探勘出隱藏在此二種資料庫中的資訊，並將探勘出的資訊應用在建立及強化現有之專家系統。實驗顯示，在健保資料庫中所萃取出的事件組合，符合過去研究中判定為高研究價值的事件組合，藉由快算探勘整個健保資料庫，產生出大量具有研究價值且尚未被領域專家確認發表的事件組合。在對局資料庫中所萃取出的資訊，得以用於修正過去領域專家所忽略的部份，並發展出新的一般性領域知識。實驗顯示藉由應用新發展出的領域知識，可以有效的改善對局系統的效能。於此同時，也藉由此二種資料庫的內容及資訊分析的方法，驗證及改良隨機性演算法在時序性資料庫中的效能，分析的結果顯示出，演算法的初始參數設置，對世代研究的隨機配對法所產生的配對結果穩定度有相當大的影響。改良後的世代研究隨機配對法，可以提高整體候選對照組被配對的隨機性，進而提高配對結果所產生的統計數值的穩定度。而在蒙地卡羅上確界樹搜尋演算法的實驗中顯示出，在隨機性遊戲中蒙地卡羅上確界樹搜尋演算法會以集合為目標，而非以單值為目標收斂，妥善的設置隨機演算法中的參數，可以收斂到特定性質的解集合中。 In this dissertation, the author uses the information theoretical related analysis method to study the sequential database. By observing the transformation of events between different timestamps, hidden information and relations between events are found. In order to increase the precision of judging the casual relationship, two methods are proposed to select data points. The first method focuses on selecting data points within the same group. By removing the unreliable data points, the remaining data points in a state group are more reliable. The second method relies on the randomized matching algorithm. By using the randomized matching algorithm to reduce the bias when judging the casual relationship. Besides finding the transformation probability and the causal relation, the author also uses the sequential database to verify the quality of randomness of the randomized matching algorithms. Meanwhile, the author treats the solution sets of Monte Carlo tree search algorithm as the sequential data, and the correctness and the convergence speed are examined. In this dissertation, two different types of databases are presented and analyzed. The first database is the health insurance database, which records the natural human behavior. The second one is the game-theoretical value database, which is calculated by the computer with a retrograde analysis related algorithm. The author used a general sequential model to describe the two presented databases and applied probability related analysis method to these two databases. The hidden information in these two databases are found, and the founded results are applied to enhance the existed knowledge-based expert systems. The experiment results show that the extracted event pairs in the insurance health database cover most of the critical event pairs, which are published previously. Through the efficient analysis method, more undiscovered event pairs with a high value of research are founded. In the game-theoretical value database, the established result is used to fine-tune the expert system and propose new domain knowledge which is ignored before. The experiment results show the game playing program is highly improved by applying the newly discovered domain knowledge. Meanwhile, the author also used the entropy analysis method to verify and enhence the performance of randomized algorithms, which are used in the sequential databases. The experiment results show the initial settings of parameters are a highly effect factor of the cohort study with randomized matching. The improved randomized matching algorithm of the cohort study can highly improve the randomness of the selected control candidates, and the stabilization of the experiment results are also improved. On the other hand, the experiment results show the Monte-Carlo tree search algorithm will converge to a solution set, instead of a single solution. By carefully setting the parameter of the Monte-Carlo tree search algorithm, the result will converge to specific solutions with specific properties.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71152
DOI:	10.6342/NTU201801967
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	2.29 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。