Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35957
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李瑞庭
dc.contributor.authorYu Lingen
dc.contributor.author凌宇zh_TW
dc.date.accessioned2021-06-13T07:48:45Z-
dc.date.available2005-08-01
dc.date.copyright2005-08-01
dc.date.issued2005
dc.date.submitted2005-07-26
dc.identifier.citation[1]
Agrawal, R. and Srikant, R. (1994). “Fast algorithms for mining association rules”, In Proc. Int. Conf. Very Large Data Bases (VLDB’94), 487-499.
[2]
Agrawal, R. and Srikant, R. (1996). “Mining sequential patterns: Generalizations and performance improvements”, In Proc. 5th Int. Conf. Extending Database Technology (EDBT’96), 3-17.
[3]
Ayres, J., Flannick, J., Gehrke, J. and Yiu, T. (2002). “Sequential pattern mining using a bitmap representation”, In Proc. Int. Conf. Knowledge Discovery and Data Mining (KDD’02), 429-435.
[4]
Bahar, I. and Chen, S-C. (2004). “Mining frequent patterns in protein structures: a study of protease families”, Bioinformatics, 20, i77-i85.
[5]
Bork, P. and Koonin, E. (1996). “Protein sequence motifs”, Current Opinion in Structural Biogy, 6, 366-376.
[6]
Braun, W., Ivanciuc, O. and Schein, C. (2002). “Data mining of sequences and 3D structures of allergenic proteins”, Bioinformaics, 18, 1358-1364.
[7]
Braun, W., Mathura, V. and Schein, C. (2003). “Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases”, Bioinformatics, 19, 1381-1390.
[8]
Chang, B. and Halgmuge, S. (2002). “Protein motif extraction with neuro-fuzzy optimization”, Bioinformatics, 18, 1084-1090.
[9]
Cosic, I. (1994). “Macromolecular bioactivity: is it resonant interaction between macromolecules?- theory and applications”, IEEE Transactions on Bio-medical Engineering, 41,1101-1114.
[10]
Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Cambridge, U.K.
[11]
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, USA.
[12]
Holm, L. and Heger, A. (2003). “Sensitive pattern discovery with ‘fuzzy’ alignments of distantly related proteins”, Bioinformatics, 19, i130-i137.
[13]
Keles, S., Laan, M. and Vulpe, C. (2004). “Regulatory motif finding by logical regression”, Bioinformatics, 20, 2799-2811.
[14]
Krishnan, A., Li, K-B., and Issac, P. (2004). “Rapid detection of conserved regions in protein sequences using wavelets”, Silico Biology, 13-22.
[15]
Landgraf, R., Xenarios, I., Eisenberg, D. (2001). “Three-dimensional Cluster Analysis Identifies Interfaces and Functional residue Clusters in Proteins”, Journal of Molecular Biology, 307, 1487-1502.
[16]
Li, H. and Li, J. (2004). “Discovery of stable and significant binding motif pairs from PDB complexes and protein interaction datasets”, Bioinformatics Advance Access.
[17]
McCreight, E.M. (1976). “A space-economic suffix tree construction algorithm”, Journal of the ACM, 23, 262-272.
[18]
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M.-C. (2001). “PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth”, In Proc. Int. Conf. Data Engineering (ICDE ’01), 215-224.
[19]
Rigoutsos, I. and Floratos, A. (1998). “Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm”, Bioinformatis, 14, 55-67.
[20]
Ukkonen, E. (1985). “Finding approximate patterns in strings”, Journal of Algorithm, 6, 132-137.
[21]
Ukkonen, E. (1995). “On-line construction of suffix trees”, Algorithmica, 14, 249-260.
[22]
Yan, X., Han, J. and Afshar, R. (2003). “CloSpan: Mining Closed Sequential Patterns in Large Datasets”, In Proc. SIAM Int. Conf. on Data Mining (SDM'03), 166-177.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35957-
dc.description.abstract蛋白質序列中的序列樣式,和蛋白質所執行的功能有著密不可分的關係。因此,如何從蛋白質序列資料庫中,透過系統化的方法,探勘出重要的序列樣式,已成為一個相當重要的研究課題。
針對此一課題,本論文提出了一個以字尾樹(suffix tree)為基礎的演算法。演算法中的所有過程,包括探勘序列樣式中的封閉性高頻子字串(closed frequent substring),將封閉性子字串組成最大高頻序列樣式(maximal frequent sequential pattern),以及調整序列樣式中子字串間的間隔(gap)等,皆可利用字尾樹中所記錄的發生資訊(occurrence information)來完成。而為了確保序列樣式的精簡性,我們的演算法刪減了不必要的序列樣式,僅保留最大序列樣式。由實驗的結果顯示,我們的演算法不僅能夠找出PROSITE資料庫中所記錄的序列樣式,並且還發現了其他一些值得提供生物學家進一步研究的結果,例如更長的序列樣式,及分類樣式集合(classifier pattern set)等。另外,我們演算法在實驗中,也展現了較 Chang and Halgamuge 的方法更為優良的結果。
zh_TW
dc.description.abstractBecause of the close relationship between sequential patterns and protein function, systematically mining significant sequential patterns in protein databases has become an important research topic.
In this thesis, we proposed a suffix-tree-based algorithm to discover patterns in protein databases. We use the occurrence information maintained in the suffix tree to mine closed frequent substrings, generate maximal frequent sequential patterns, and adjust the gaps within the patterns. To ensure the compactness of the patterns we generate, we do not generate all patterns but only maximal patterns. From the experimental results, our proposed algorithm can find not only the patterns recorded in PROSITE database, but also some other patterns worth of further biological studying, such as longer patterns and the classifier pattern set. Besides, our proposed algorithm generates better results than those of Chang and Halgamuge’s method in the experiment.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T07:48:45Z (GMT). No. of bitstreams: 1
ntu-94-R92725020-1.pdf: 689972 bytes, checksum: 0a56349722299f29aeef7bc575780ecf (MD5)
Previous issue date: 2005
en
dc.description.tableofcontentsTable of Contents i
List of Figures ii
List of Tables iii
Chapter 1 Introduction 1
Chapter 2 Literature Survey 3
2.1 Sequential mining methods 3
2.2 Feature-based methods 3
2.3 Mathematical methods 6
2.4 MSA-based methods 6
2.5 Chang and Halgamuge’s algorithm 7
2.6 Discussion 8
Chapter 3 Our Proposed Algorithm 10
3.1 Term and notation description 10
3.2 Problem definition and algorithm overview 12
3.3 Finding all closed frequent substrings 14
3.4 Finding all maximal frequent patterns 21
3.5 Gap adjustment 23
Chapter 4 Experiments and Performance Evaluation 28
4.1 Experiments on real data 28
4.2 Comparisons with PROSITE database 33
4.3 Comparisons with Chang and Halgamuge’s algorithm 38
Chapter 5 Conclusions and Future Work 40
References 42
dc.language.isoen
dc.subject蛋白質zh_TW
dc.subject字尾樹zh_TW
dc.subject最大序列樣式zh_TW
dc.subjectsuffix treeen
dc.subjectmaximal sequential patternen
dc.subjectproteinen
dc.title蛋白質最大序列樣式探勘演算法zh_TW
dc.titleMining Maximal Sequential Patterns in Protein Databasesen
dc.typeThesis
dc.date.schoolyear93-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳良華,傅楸善
dc.subject.keyword蛋白質,最大序列樣式,字尾樹,zh_TW
dc.subject.keywordprotein,maximal sequential pattern,suffix tree,en
dc.relation.page43
dc.rights.note有償授權
dc.date.accepted2005-07-26
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
673.8 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved