購物資料之資料採礦

Ching-Huang Yun; 雲晴煌

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/24687

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲(Ming-Syan Chen)
dc.contributor.author	Ching-Huang Yun	en
dc.contributor.author	雲晴煌	zh_TW
dc.date.accessioned	2021-06-08T05:36:44Z	-
dc.date.copyright	2005-01-27
dc.date.issued	2005
dc.date.submitted	2005-01-24
dc.identifier.citation	Bibliography [1] http://www.mobilecommerceworld.com. [2] http://www.research.att.com/ lewis/reuters21578.html. [3] UCI Machine Learning Repository. http:://www.ics.uci.edu/~mlearn/MLRepository.html. [4] C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J.-S. Park. Fast Algorithms for Projected Clustering. ACM SIGMOD Conference, pages 61—72, June 1999. [5] C. C. Aggarwal, J. L. Wolf, K.-L. Wu, and P. S. Yu. The Intelligent Recommendation Analyzer. ICDCS International Workshop of Knowledge Discovery and Data Mining in the World-Wide Web, April 2000. [6] C. C. Aggarwal and P. S. Yu. Finding Generalized Projected Clusters in Dimensional Spaces. ACM SIGMOD Conference, pages 70—81, 2000. [7] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD International Conference on Management of Data, pages 207—216, May 1993. [8] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. 20th International Conference on Very Large Data Bases, pages 478—499, September 1994. [9] R. Agrawal and R. Srikant. Mining Sequential Patterns. 11th ICDE Conference, pages 3—14, March 1995. [10] Wallet Application. http://www.forum.nokia.com. [11] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. [12] D. Barbara, Y. Li, and J. Couto. COOLCAT: An Entropy-Based Algorithm for Categorical Clustering. ACM CIKM Conference, Nov. 2002. [13] M.-S. Chen, J. Han, and P. S. Yu. Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866—833, 1996. [14] M.-S. Chen, J.-S. Park, and P. S. Yu. Efficient Data Mining for Path Traversal Patterns. IEEE Transactions on Knowledge and Data Engineering, 10(2):209—221, April 1998. [15] X. Chen and I. Petr. Discovering Temporal Association Rules: Algorithms, Language, and System. IEEE International Conference on Data Mining, 2000. [16] K.-T. Chuang and M.-S. Chen. Clustering Categorical Data by Utilizing the Correlated-Force Ensemble. 4th SIAM Intern’l Conference on Data Mining, 2004. [17] R. Cooley, B. Mobasher, and J. Srivastava. Data Preparation for Mining World Wide Web Browsing Patterns. Journal of Knowledge and Information Systems, 1(1), 1999. [18] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule Discovery from Time Series. 4th ACM SIGKDD Conference, Aug. 1998. [19] R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973. [20] M. Ester, H.-P. Kriegal, J. Sander, and X. Xu. A Density-Based Algorithm for Discovering Clustering in Large Spatial Databased with Noise. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1996. [21] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy. Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, 1996. [22] C. Fellbaum. Wordnet: An electronic lexical database. The MIT Press, 1998. [23] D. H. Fisher. Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning, 2:139—172, 1987. [24] R. Floyd, B. Housel, and C. Tait. Mobile Web Access Using eNetwork Web Express. IEEE Personal Communications, 5(5):47—52, Oct. 1998. [25] Bluetooth Forum. Bluetooth Overview. http://www.bluetooth.com, 1999. [26] WAP Forum. Wireless Application Protocol. http://www.wapforum.org/. [27] V. Ganti, J. Gehrke, and R. Ramakrishnan. CACTUS-Clustering Categorical Data Using Summaries. In Proc. of ACM SIGKDD, 1999. [28] D. Gibson, J. Kleinberg, and P. Raghavan. Clustering Categorical Data: An Approach Based on Dynamical Systems. 24th Annual International Conference on Very Large Data Bases, pages 311—322, 1998. [29] S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Conference, 27(2):73—84, June 1998. [30] S. Guha, R. Rastogi, and K. Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. 15th ICDE Conference, pages 512—521, March 1999. [31] M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 2001. [32] E.-H. Han, G. Karypis, V. Kumar, and B. Mobasher. Clustering Based On Association Rule Hypergraphs. 1997. [33] J. Han, G. Dong, and Y. Yin. Efficient Mining of Partial Periodic Patterns in Time Series Database. 15th ICDE Conference, March 1999. [34] J. Han and Y. Fu. Discovery of Multiple-Level Association Rules from Large Databases. 21th VLDB Conference, pages 420—431, September 1995. [35] J. Han, W. Gong, and Y. Yin. Mining Segment-Wise Periodic Patterns in Time-Related Databases. 4th ACM SIGKDD Conference, pages 214—218, Aug. 1998. [36] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. [37] A. Hinneburg and D. A. Keim. Optimal Grid-Clustering. 25th VLDB Conference, pages 506—517, Sept. 1999. [38] Z. Huang. Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values. 4th ACM SIGKDD Conference, 1998. [39] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988. [40] A. K Jain, M. N. Murty, and P. J. Flynn. Data Clustering: A Review. ACM Computer Surveys, 31(3), Sept. 1999. [41] F.-X. Jollois and M. Nadif. Clustering Large Categorical Data. PAKDD Conference, 2002. [42] W. A. Kosters, E. Marchiori, and A. A. J. Oerlemans. Mining Clusters with Association Rules. Lecture Notes in Computer Science, 1999. [43] C.-H. Lee, C.-R. Lin, and M.-S. Chen. On Mining General Temporal Association Rules in a Publication Database. 1st ICDM Conference, 2001. [44] B. Lent, A. N. Swami, and J. Widom. Clustering Association Rules. 13th ICDE Conference, pages 220—231, April 1997. [45] C.-R. Lin and M.-S. Chen. A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging. 8th ACM SIGKDD Conference, July 2002. [46] Y.-B. Lin. Modeling Techniques for Large-Scale PCS Networks. IEEE Communications Magazine, 35(2):102—107, Feb. 1997. [47] H. Mannila and C. Meek. Global Partial Orders from Sequential Data. ACM SIGKDD, 2000. [48] H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of Frequent Episodes in Event Sequences. Data Mining and Knowledge Discovery, 1(3):259—289, 1997. [49] E. Modiano and A. Ephremides. Efficient Algorithms for Performing Packet Broadcasts in a Mesh Network. IEEE/ACM Transactions on Networking, 4(4):639—648, Aug. 1996. [50] R. T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. 20th Annual International Conference on Very Large Data Bases, pages 144—155, 1994. [51] C. Ordonez and E. Omiecinski. A Fast Algorithm to Cluster High Dimensional Basket Data. IEEE International Conference on Data Mining, Nov./Dec. 2001. [52] C. Ordonez and E. Omiecinski. FREM: Fast and Robust EM Clustering for Large Data Sets. ACM CIKM Conference, Nov. 2002. [53] J.-S. Park, M.-S. Chen, and P. S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. ACM SIGMOD Conference, pages 175—186, May 1995. [54] W.-C. Peng and M.-S. Chen. Developing Data Allocation Schemes by Incremental Mining of User Moving Patterns in a Mobile Computing System. IEEE Transactions on Knowledge and Data Engineering, 15(1), Feb. 2003. [55] P. Pirolli and J. E. Pitkow. Distributions of Surfers’ Paths through the World Wide Web: Empirical Characterization. World Wide Web (2), pages 29—45, 1999. [56] J. E. Pitkow and P. Pirolli. Mining Longest Repeated Subsequences to Predict World Wide Web Surfing. Second USENIX Symposium on Internet Technologies and Systems, October 1999. [57] J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1986. [58] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [59] H. Ralambondrainy. A Conceptual Version of the k-means Algorithm. Pattern Recognition Letters, pages 1147—1157, 1995. [60] J. B. Schafer, J. Konstan, and J. Riedl. Recommender Systems in E-Commerce. ACM Conference on Electronic Commerce, Nov. 1999. [61] S. Scott and S. Matwin. Text Classification Using WordNet Hypernyms. Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, pages 38—44, 1998. [62] R. Srikant and R. Agrawal. Mining Generalized Association Rules. 21th VLDB Conference, pages 407—419, September 1995. [63] R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements. Proceedings 1996 International Conference on Extending Database Technology (EDBT’96), pages 201—212, March 1996. [64] A. Strehl and J. Ghosh. A Scalable Approach to Balanced, High-dimensional Clustering of Market-baskets. 7th International Conference on High Performance Computing, December 2000. [65] P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of ACM SIGKDD, 2002. [66] U. Varshney, R. J. Vetter, and R. Kalakota. Mobile Commerce: A New Frontier. IEEE Computer, Oct. 2000. [67] J. Veijalainene. Transactions in Mobile Electronic Commerce. 8th International Workshop on Foundations of Models and Languages for Data and Objectss, pages 203—224, Sept. 1999. [68] R. Villafane, K. A. Hua, D. Tran, and B. Maulik. Knowledge Discovery from Series of Interval Events. Journal of Intelligent Information Systems, 2000. [69] K. Wang, C. Xu, and B. Liu. Clustering Transactions Using Large Items. Proceedings of ACM CIKM International Conference on Information and Knowledge Management, 1999. [70] Y. Xiao and M. H. Dunham. Interactive Clustering for Transaction Data. 3rd International Conference on Data Warehousing and Knowledge Discovery, Sept. 2001. [71] J. Yang, W. Wang, and P. S. Yu. Mining Asynchronous Periodic Patterns in Time Series Data. 6th ACM SIGKDD Conference, pages 275—279, 2000. [72] Y. Yang, X. Guan, and J. You. CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data. 8th ACM SIGKDD Conference, July 2002. [73] C.-H. Yun and M.-S. Chen. Mining Web Transaction Patterns in an Electronic Commerce Environment. Proceedings of the 4th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pages 216—219, April 2000. [74] C.-H. Yun and M.-S. Chen. Using Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment. Proceedings of the 24th Annual International Computer Software and Applications Conference, October 2000. [75] C.-H. Yun and M.-S. Chen. Mining Mobile Sequential Patterns in a Mobile Commerce Environment. to appear in IEEE Transactions on Systems, Man, and Cybernetics, 2005. [76] C.-H. Yun, K.-T. Chuang, and M.-S. Chen. An Efficient Clustering Algorithm for Market Basket Data Based on Small-Large Ratios. Proceedings of the 25th Annual International Computer Software and Applications Conference, October 2001. [77] C.-H. Yun, K.-T. Chuang, and M.-S. Chen. Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data. Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, September 2002. [78] C.-H. Yun, K.-T. Chuang, and M.-S. Chen. Using Category-Based Adherence to Cluster Market-Basket Data. Proceedings of the IEEE 2nd International Conference on Data Mining, December 2002. [79] C.-H. Yun, K.-T. Chuang, and M.-S. Chen. Clustering Item Data Sets with Association-Taxonomy Similarity. Proceedings of the IEEE 3rd International Conference on Data Mining, November 2003. [80] C.-H. Yun, K.-T. Chuang, and M.-S. Chen. Adherence Clustering: An Efficient Method for Mining Market-Basket Clusters. to appear in Information Systems, 2005. [81] Osmar R. Z., A. Foss, C.-H. Lee, and W. Wang. On Data Clustering Analysis: Scalability, Constraints and Validation. PAKDD Conference, 2002. [82] T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD Conference, 25(2):103—114, June 1996.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/24687	-
dc.description.abstract	隨著全球資訊網與行動裝置的普及,客戶可以在任何地點任何時間進行交易. 這些交易資料全都被數位化與收集在各式各樣的購物資料庫中. 在資料庫研究領域, 因為它可廣泛地被應用在改進行銷策略, 資料採礦技術已經獲得廣泛的注意力. 在此論文中, 其主要的研究課題有三. 第一, 針對實體零售業賣場的購物資料, 我們探討商品分類規則對交易族群的影響. 第二, 針對實體零售業賣場的購物資料, 我們同時考量商品相關性與商品分類規則對商品族群的影響. 第三, 針對行動商務的購物資料, 我們設計演算法以採礦出行動商務交易順序性模式. 其明確之相關研究課題簡述如下: 為了自實體零售業賣場的購物資料採礦出交易族群, 我們設計一新的測量方式, 叫做分類式黏附, 用來量測交易族群之間的相似度. 我們並設計一新的演算法用來快速地採礦出交易族群. 商品跟族群之間的距離被定義為商品與它最近的群集代表點之間的連結數目. 交易與族群之間的分類式黏附被定義為此交易內所有商品跟族群之間的平均距離. 針對採礦所獲得的交易族群結果, 我們也提出資訊收益機制來驗證其品質. 為了自實體零售業賣場的購物資料採礦出商品族群, 我們設計一新的測量方式, 叫做相關分類相似, 用來量測商品族群之間的相似度. 我們並設計一新的演算法用來快速地採礦出商品族群. 針對採礦所獲得的商品族群結果, 我們也提出相關性指數與分類性指數此兩種新的機制來驗證其品質. 為了自行動商務的購物資料採礦出行動商務交易順序性模式, 我們設計了三種演算法, 分別根據(1)相關性演算法之延伸, (1)同時考量資料中的相關性與路徑, 採用路徑切除機制所設計, (3)利用所觀察到的模式家族現象所設計. 在實驗部份, 我們模擬產生行動商務資料以對所提之演算法進行分析.	zh_TW
dc.description.abstract	With the popularity of mobile devices, customers are able to make transactions from anywhere at anytime. These data has been digitized and collected among various market-basket databases. Mining of databases has attracted a growing amount of attention in database communities due to its wide applicability to improving marketing strategies. In this dissertation, we first study the impact of item taxonomy on the mining of transaction clusters from the retail market-basket database. Then, we take both association and taxonomy relationships into consideration for mining item clusters from the retail market-basket database. Finally, we investigate the problem of mining mobile sequential patterns from the mobile commerce market-basket database with moving patterns and purchase patterns of customers. Explicitly, for mining transaction clusters, we devise a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. With this category-based adherence measurement, we develop algorithm k-todes for market-basket data with the objective to minimize the category-based adherence. The distance of an item to a given cluster is defined as the number of links between this item and its nearest tode. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. It is shown by our experimental results, with the taxonomy information, algorithm k-todes devised in this dissertation significantly outperforms the prior works in both the execution efficiency and the clustering quality. For mining item clusters, we devise association-taxonomy similarity and utilize this measurement to perform the clustering. With this association-taxonomy similarity measurement, we develop algorithm AT for efficiently mining item clusters. Two validation indexes based on association and taxonomy properties are also devised to assess the quality of clustering for item data. It is shown by our experimental results that algorithm AT devised in this dissertation significantly outperforms the prior works in the clustering quality as measured by the validation indexes, indicating the usefulness of association-taxonomy similarity in item data clustering. For mining mobile sequential patterns, we devise three algorithms (algorithm TJLS, algorithm TJPT, and algorithm TJPF). Algorithm TJLS is devised in light of the concept of association rules. Algorithm TJPT is devised by taking both the concepts of association rules and path traversal patterns into consideration and gains performance improvement by path trimming. Algorithm TJPF is devised by utilizing the pattern family technique which is developed to exploit the relationship between moving and purchase behaviors. A simulation model for the mobile commerce environment is developed and a synthetic workload is generated for performance studies. It is shown by our experimental results that algorithm TJPF significantly outperforms others in both the execution efficiency and the memory saving, indicating the usefulness of the pattern family technique devised in this dissertation.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T05:36:44Z (GMT). No. of bitstreams: 1 ntu-94-F86921114-1.pdf: 935197 bytes, checksum: 41e92acb09dc5990367955ea611302b8 (MD5) Previous issue date: 2005	en
dc.description.tableofcontents	Contents 1 Introduction 1 1.1 Motivation and Overview of the Dissertation ..... 1 1.2 Organization of the Dissertation ..... 11 2 Adherence Clustering: An Efficient Method for Mining Transaction Clusters ..... 12 2.1 Introduction ..... 12 2.2 Preliminaries ..... 13 2.2.1 Problem Description ..... 13 2.2.2 Information Gain Validation Model ..... 15 2.3 Algorithm k-todes ..... 17 2.3.1 Similarity Measurement: Category-Based Adherence ..... 17 2.3.2 Procedure of Algorithm k-todes ..... 19 2.3.3 An Illustrative Example ..... 20 2.3.4 Complexity Analysis ..... 23 2.4 Experimental Results ..... 24 2.4.1 Data Generation ..... 25 2.4.2 Performance Study ..... 25 2.5 Summary ..... 30 3 Integrating Association and Taxonomy Similarities for Mining Item Clusters ..... 32 3.1 Introduction ..... 32 3.2 Preliminaries ..... 33 3.2.1 Problem Description ..... 33 3.2.2 Validation Indexes ..... 35 3.3 Algorithm AT (Association Taxonomy) ..... 36 3.3.1 Similarity Measurement ..... 36 3.3.2 Procedure of Algorithm AT ..... 43 3.3.3 An Illustrative Example ..... 44 3.3.4 Complexity Analysis ..... 48 3.4 Experimental Studies ..... 48 3.4.1 Data Generation ..... 48 3.4.2 Performance Study ..... 49 3.5 Summary ..... 52 4 Mining Mobile Sequential Patterns in a Mobile Commerce Environment ..... 54 4.1 Introduction ..... 54 4.2 Preliminaries ..... 56 4.2.1 Problem Formulation ..... 56 4.2.2 Related Works ..... 58 4.2.3 The Procedure for Mining Mobile Sequential Patterns ..... 59 4.3 Algorithms for Mining Mobile Sequential Patterns ..... 64 4.3.1 Algorithm TJLS ..... 65 4.3.2 Algorithm TJPT ..... 67 4.3.3 Algorithm TJPF ..... 70 4.4 Experimental Results ..... 75 4.4.1 Generation of Synthetic Mobile Transaction Sequences ..... 75 4.4.2 Performance Comparison ...... 77 4.5 Summary ..... 81 5 Conclusion ..... 83
dc.language.iso	en
dc.title	購物資料之資料採礦	zh_TW
dc.title	Data Mining on Market-Basket Data	en
dc.type	Thesis
dc.date.schoolyear	93-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	李強(Chiang Lee),陳宜欣(Yi-Shin Chen),陳良弼(Arbee L.P. Chen),曾新穆(Vincent Shin-Mu Tseng),洪宗貝(Tzung-Pei Hong)
dc.subject.keyword	資料採礦,購物資料,	zh_TW
dc.subject.keyword	data mining,market-basket data,	en
dc.relation.page	90
dc.rights.note	未授權
dc.date.accepted	2005-01-24
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 目前未授權公開取用	913.28 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。