類別:

類別: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/117 2026-04-25T03:31:48Z 2026-04-25T03:31:48Z 點集合資料庫之時空樣式探勘 Wen-Kwang Tsao 曹文光 http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22794 2021-06-08T04:28:27Z 2010-01-01T00:00:00Z

標題: 點集合資料庫之時空樣式探勘; Mining Spatial and Temporal Patterns in Pointset Databases 作者: Wen-Kwang Tsao; 曹文光摘要: 本論文提出一個新的資料表示方式與三種探勘頻繁樣式的演算法：MaxFSP、PCP 及CVP。MaxFSP演算法從二維點集合資料庫中探勘最大頻繁空間樣式；PCP演算法從二維點集合資料庫中探勘封閉性頻繁空間樣式；CVP演算法則從視訊點集合資料庫中探勘封閉性頻繁時空樣式。在MaxFSP 及PCP 演算法中，我們除了運用Apriori 演算法所提出之反單調修剪原則，修剪不可能的候選樣式外，我們還針對空間關係的特性，提出快速修剪方法。並參考MAFIA與CHARM演算法，各別加入最大頻繁樣式與封閉性頻繁樣式所需的修剪方法。透過這些方法，我們可以在探勘過程中有效刪除不可能的候選樣式。CVP演算法更進一步考慮了時間軸上的關係，探勘視訊資料庫中封閉性頻繁時空樣式。CVP 演算法除了應用了空間中修剪方法之外，並提出「快速成長」(fast-grow)的策略，可有效地在樣式索引樹中標示不須延展的節點，進而避免重複檢查的運算，減少耗時的候選樣式產生時間。實驗結果顯示，MaxFSP 演算法的執行效率優於改良式的MAFIA演算法；PCP演算法的執行效率優於改良式的Apriori演算法、SASMiner演算法與MaxGeo演算法；CVP演算法的執行效率優於改良式的Apriori演算法。; In this dissertation, we propose a novel data representation and three algorithms, MaxFSP, PCP, and CVP algorithms. The MaxFSP algorithm is proposed to find the maximal frequent spatial patterns in image databases. The PCP algorithm is designed to find the frequent closed spatial patterns in image databases. The CVP algorithm is developed to find the frequent closed spatial-temporal patterns in video databases. In the MaxFSP and PCP algorithms, in addition to using the anti-monotone pruning strategy to prune unnecessary candidate patterns, we utilize the characteristics of the spatial relationships to design the pruning strategies. Specifically, we design pruning strategies based on the MAFIA and CHARM algorithms respectively to prune non-maximal and non-closed candidates. By using these strategies, we can prune a large number of unnecessary candidate patterns. The CVP algorithm is further considering temporal information to mine all frequent closed spatial-temporal patterns in video databases. In the CVP algorithm, we not only use the spatial relationship to prune unnecessary candidate patterns, we also propose a fast-grow pruning strategy, which can speed up the mining process in the temporal dimension. Therefore, the CVP algorithm can effectively prune the unnecessary branches in the frequent pattern tree and avoid the costly candidates’ generation. The experimental results show that the MaxFSP algorithm outperforms the modified MAFIA algorithm; the PCP algorithm outperform the modified Apriori, SASMiner, and MaxGeo algorithms; and the CVP algorithm outperform the modified Apriori algorithm

2010-01-01T00:00:00Z 點集合資料庫中封閉性樣式之資料探勘 Po-Yin Chen 陳柏吟 http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9826 2021-05-20T20:43:43Z 2008-01-01T00:00:00Z

標題: 點集合資料庫中封閉性樣式之資料探勘; Mining Closed Patterns in Pointset Databases 作者: Po-Yin Chen; 陳柏吟摘要: 隨著行動運算及定位技術的進步，位置定位服務（Location Based Services，LBS）也跟著蓬勃發展。透過定位裝置，我們可以將大量點座標資料搜集到點集合資料庫中，而點集合是由一些點座標所形成的集合。藉由資料探勘的技術，可以幫助我們在點集合資料庫中發現物體移動的經常路徑或行為。因此，這篇論文將探討如何在點集合資料庫中尋找經常出現的樣式或路徑，我們提出一個有效率的探勘演算法叫「PCP-Miner」，用來找尋點集合資料庫中的封閉性樣式。演算法主要可分為兩個階段。第一階段，我們產生出所有長度為2的頻繁樣式。第二階段，我們利用頻繁樣式樹以深先搜尋法的方式遞迴產生所有的頻繁樣式。在產生的過程中，我們會對這些樣式做檢查，檢查它們是否為封閉的。由於PCP-Miner只需掃描資料庫一次，且能避免產生不必要的候選樣式，實驗結果顯示，不管在合成資料或真實資料中，我們所提出的方法皆比改良式的Apriori演算法有效率。; With advance in mobile computing and positioning technologies, location-based services (LBS) have gained significant progress. By using these technologies, a large amount of pointsets can be collected in an LBS database where a pointset contains a set of points. Mining frequent pointset in a pointset database can help us understand the movement patterns of objects. In this thesis, we proposed a novel algorithm, PCP-Miner (Pointset Closed Pattern Miner), to mine frequent closed pointset patterns. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent patterns by a frequent pattern tree in a depth-first search manner. During the process of pattern generation, we check whether the frequent patterns are closed or not. Since the PCP-Miner only needs to scan the database once and doesn’t generate unnecessary candidates, it is more efficient than the modified Apriori algorithm. The experiment results show that the PCP-Miner outperforms the modified Apriori by one order of magnitude in both synthetic and real data.

2008-01-01T00:00:00Z 點集合影片資料庫中封閉性樣式之資料探勘 Yi-Yu Lai 賴奕伃 http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9593 2021-05-20T20:30:25Z 2008-01-01T00:00:00Z

標題: 點集合影片資料庫中封閉性樣式之資料探勘; Mining Closed Patterns in Pointset Video Databases 作者: Yi-Yu Lai; 賴奕伃摘要: 隨著多媒體影像技術的蓬勃發展，多媒體資料量快速地遽增，如何從龐大的多媒體資料中找到有意義的資訊和特性已成為熱門的研究議題。我們可以將影片中發生的一個事件視為一個連續的點集合，而找到影片資料庫中由點集合所構成的封閉性樣式，則可以表達出影片中發生該事件的特性。因此，在本論文中，我們提出一個有效率的探勘演算法「CVP」來探勘出影片資料庫中的封閉性樣式。我們所提出的演算法主要先利用兩種資料結構儲存頻繁樣式的資訊，以深度優先搜尋的方式先對空間維度、再對時間維度產生出可能的頻繁樣式，最後再利用我們所提出的方法來修剪不符合或不必要的樣式以及判斷其封閉性。我們的演算法利用投影資料庫去產生可能的樣式和進行修剪，並不需要重複地搜尋整個影片資料庫，因此效率能夠得到明顯的改善。在人造及真實資料庫的實驗結果中顯示我們所提出的方法較改良式Apriori的方法來得更有效率。; Nowadays, the number of multimedia datasets is increasing rapidly. Thus, mining implicit and meaningful patterns from multimedia databases has attracted more and more attention in recent years. The event object can be viewed as a sequence of pointsets in a video. Mining closed patterns in pointset video databases can help us understand the pattern of an event in video databases. In this thesis, we first devise two data structures, called rplist and CV-tree, to store the information of frequent video patterns. Next, we propose a novel algorithm, called CVP, to mine frequent closed patterns from a video database in a depth-first search (DFS) manner. Our proposed algorithm consists of two phases. We first grow frequent video patterns in the spatial dimension and then grow them in the temporal dimension. To efficiently mine frequent closed patterns, we develop several pruning strategies to prune non-closed patterns. The CVP algorithm can localize the candidate generation, pattern join, and support counting in a small amount of rplists. Therefore, it can efficiently mine frequent closed patterns in a video database. The experiment results show that our proposed method outperforms modified Apriori algorithm in synthetic data and real data.

2008-01-01T00:00:00Z 點對點網路下複合檢索系統之設計與實作 Li-Wei Yang 楊立偉 http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25692 2021-06-08T06:25:06Z 2006-01-01T00:00:00Z

標題: 點對點網路下複合檢索系統之設計與實作; Complex Search in Peer-to-Peer Networks 作者: Li-Wei Yang; 楊立偉摘要: 本研究提出了一種全新的索引與搜尋方案，稱之為KISS（以關鍵符記為基礎的索引與搜尋方案；keytoken-based index and search scheme）。這是將物件以其關鍵詞之字元與位置資訊來索引在超立方空間的一種方法。透過這個索引與搜尋方案，可以完整實現在點對點網路下各種複合檢索的能力，包含了關鍵詞搜尋、前置搜尋、通用字元搜尋、以及多維度搜尋等。為了避免傳統透過倒置列表法所帶來的問題，這個索引與搜尋方案使用超立方空間做為基礎，並將物件以其所屬的關鍵詞集合來做為索引的依據。因此，這個索引與搜尋方案具有以下特點：　　一、透過單一的查找操作，就能有效率地找到任何物件（就好比在DHT網路下的鍵值搜尋一樣有效率）；若要插入、刪除、或維護物件時，也都可在僅一個查找操作內完成，而不會跟該物件的關鍵詞集大小有關係。　　二、單一關鍵詞的索引資訊是由一群節點所負責：越常出現的關鍵詞，會由越多的節點來負責。因此由索引與搜尋所造成的工作負荷可以平均地分散在各節點上。　　三、每個查詢關鍵詞組均可以得到一個子超立方空間，並進而得到一個二元展開樹；而遍歷該二元展開樹，就可以搜尋到所有符合查詢的物件。其中，查詢結果可以互動、累進的方式來呈現。此外，導引式深度優先搜尋（Guided Depth-Frist Search，GDFS）及快取等技巧，能夠提升搜尋效率；而利用關鍵符記的連續性或對稱性來修剪二元展開樹，則可以縮小搜尋範圍。　　四、基於該二元展開樹的遍歷方式，能夠提供富有彈性的定序機制。例如，查詢結果可以依照物件的明確程度、關鍵詞長度、字典順序、或是相似度來進行排序，而不需經過全域統計的方式來完成。　　五、當把每個屬性視為一個子超立方空間時，則多個屬性可以構成一個較大的超立方空間，也因此多維度搜尋可以在這個較大的超立方空間上被實現，不論該多維度搜尋是針對單一屬性或是多個屬性的查詢。本索引與搜尋方案在進行多維度搜尋時，是一次同時考慮到多個屬性，因此當有越多屬性被指定時，搜尋會變得越有效率；而不像其它傳統方法是一次處理一個屬性，然後再將各屬性處理後的結果做合併。此外，透過將關鍵符記進行合併的技巧，可以調整該超立方空間的維度，端視各種上層應用、或底層網路之需要。　　總括來說，KISS這個索引與搜尋方案提供了一種簡單、有效、統一的方式，來實現點對點網路下各種複合檢索的能力，並且解決了各種既有的問題，如增加的索引成本、頻寬額外負擔、不平衡的負荷、網路熱點與故障點等。而這個方案也支援了富有彈性的定序機制，卻不需要任何的全域資訊。　　本研究以Java程式語言建立了一個KISS的雛形系統，並在該雛形系統上進行一連串的實驗，以評估本方案的負荷平衡程度與搜尋效率。本研究使用了兩組真實資料來進行實驗，第一組為網站分類目錄資料庫，共有131,180筆資料；第二組為網際網路上最大的音樂專輯資料庫，共有2,412,613筆資料。實驗結果證明之前所提到的特點都能夠被具體實現。而本研究也提供了關於本方案複雜性與維度的數學分析。; In this research work, KISS (Keytoken-based index and search scheme) is introduced with a novel technique to extract keytokens - character-position information - from keywords to index objects over a hypercube. With KISS, complex search capabilities are fully provided in P2P networks, including keyword search, prefix search, wildcard search, and multi-dimensional search. To address the problems of other existing approaches mainly based on inverted index, KISS uses hypercube to index objects according to their keyword set. As a result, the following features are provided: 1. An object can be efficiently and deterministically located by one lookup operation such as key search in DHT-based overlay. Object insertion, deletion, and maintenance also take one lookup only, regardless of the keyword set size of the object. 2. The index entries of a single keyword are handled by a set of nodes: the more popular a keyword is, the higher the number of nodes responsible for the keyword. So the index/search load of nodes can be balanced. 3. Searching the matching objects corresponds to traversing the spanning binomial tree of a subhypercube obtained from the queried keyword set. Every matched object can be retrieved interactively and cumulatively by exploring the tree. Techniques such as Guided Depth-Frist Search (GDFS) and caching can boost the search efficiency; while tree pruning by the continuous and/or symmetric property of keytoken can reduce the search space. 4. Flexible ranking is provided based on how the spanning binomial tree is traversed. For example, ranking by object specificness, keyword length, lexicographical order, or similarity, can be facilitated without global statistics. 5. Multi-dimensional search is supported by constructing a larger hypercube, which consists of subhypercubes for every attribute. Thus, queries on any single/multiple attribute(s) can be fulfilled as well. KISS takes multiple attributes into consideration at a time, and becomes more efficient when more attributes are specified, contrary to other existing approaches that can only process one attribute at a time and then merge all the search results for each one. In addition, by keytoken clustering, the dimensionality r of the hypercube is adjustable to fit various upper applications, or the underlying overlay. In summary, KISS provides a simple yet effective unified solution for complex search capabilities in P2P networks, and eliminates the problems such as increased index cost, bandwidth overhead, unbalanced load, hot spots, and points of failures, etc. It also supports a flexible ranking mechanism without any global knowledge required. A prototype of KISS is implemented in Java, and a series of experiments have been conducted to evaluate KISS in load balancing and search efficiency. Two real datasets are used: one consisting of 131,180 records from a website directory and the other consisting of 2,412,613 records from Internet CD database. Experimental results are presented to support the above-mentioned features, while a mathematical analysis of the complexity and dimensionality of KISS is also given.

2006-01-01T00:00:00Z