請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10769
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 趙坤茂 | |
dc.contributor.author | Yi-Ching Chen | en |
dc.contributor.author | 陳怡靜 | zh_TW |
dc.date.accessioned | 2021-05-20T21:57:09Z | - |
dc.date.available | 2013-07-28 | |
dc.date.available | 2021-05-20T21:57:09Z | - |
dc.date.copyright | 2010-07-28 | |
dc.date.issued | 2010 | |
dc.date.submitted | 2010-07-22 | |
dc.identifier.citation | [1] A.V. Aho, D.S. Hirschberg, and J.D. Ullman. Bounds on the complexity of the longest common subsequence problem. Journal of ACM, 23:1-12, 1976.
[2] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410, 1990. [3] S.F. Altschul, T.L. Madden, A.A. SchaRer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389-3402, 1997. [4] C.E.R. Alves, N. Cáceres, and S.W. Song. An all-substrings common subsequence algorithm. Discrete Applied Mathematics, 156(7):1025-1035, 2008. [5] A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proceeding of the 2nd IEEE Data Compression Conference (DCC'92), pp. 279-288, 1992. [6] A. Amir, G, Benson, and M. Farach. An alphabet independent approach to two dimensional pattern matching. SIAM Journal on Computing, 23(2):313-323, 1994. [7] A. Amir, A. Butman, M. Crochemore, G.M. Landau, and M. Schaps. Two-dimensional pattern matching with rotations. In Proceeding of the 14th Annual Symposium on Combinatorial Pattern Matching (CPM'03), pp. 17-31, 2003. [8] A. Amir and E. Chencinski. Faster Two Dimensional Scaled Matching. Algorithmica, 56(2):214-234, 2010. [9] A. Amir, G.M. Landau, and D. Sokol. Inplace run-length 2d compressed search. Theoretical Computer Science, 290(3):1361-1383, 2003. [10] A. Amir, D. Tsur, and O. Kapah. Faster two dimensional pattern matching with rotations. In Proceeding of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM'04), pp. 409-419, 2004. [11] H.Y. Ann, C.B. Yang, C.T. Tseng, and C.Y. Hor. A fast and simple algorithm for computing the longest common subsequence of run-length encoded sequences. Information Processing Letters, 108(6):360-364, 2008. [12] H.Y. Ann, C.B. Yang, C.T. Tseng, and C.Y. Hor. Fast algorithms for computing the constrained LCS of run-length encoded strings. In Proceedings of the 2009 International Conference on Bioinformatics and Computational Biology (BIOCOMP'09), vol. 2, pp. 646-649, 2009. [13] A. Apostolico and C. Guerra. The longest common subsequence problem revisited. Algorithmica, 2:315-336, 1987. [14] A. Apostolico, G.M. Landau, and S. Skiena. Matching for run-length encoded sequences. Journal of Complexity, 15(1):4-16, 1999. [15] O. Arbell, G.M. Landau, and J.S.B. Mitchell. Edit distance of run-length encoded sequences. Information Processing Letters, 83(6):307-314, 2002. [16] V.L. Arlazarov, E.A. Dinic, M.A. Kronrod, and I.A. Faradzev, On economic construction of the transitive closure of a directed graph. Soviet mathematics - Doklady, 11(5):1209-1210, 1970 (in English). [17] A.N. Arslan and A Ö. Eğecioğlu. Algorithms for the constrained longest common subsequence problems. International Journal of Foundations of Computer Science, 16(6):1099-1109, 2005. [18] L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In Proceedings of the 7th International Symposium on Sequence Processing and Information Retrieval (SPIRE'00), pp. 39-48, 2000. [19] R. Bird. Two dimensional pattern matching. Information Processing Letters, 6(5):168-170, 1977. [20] D. Bodson, K.R. McConnell, and R. Schaphorst. FAX: Digital Facsimile Technology and Applications, Artech House, Norwood, MA, 1989. [21] P. Bonizzoni, G.D. Vedova, R. Dondi, G. Fertin, R. Rizzi, and S. Vialette. Exemplar longest common subsequence. IEEE Transactions on Computational Biology and Bioinformatics, 4(4):535-543, 2007. [22] E.A. Breimer, M.K. Goldberg, and D.T. Lim. A learning algorithm for the longest common subsequence problem. Journal of Experimental Algorithmics, 8(2.1), 2003. [23] G.S. Brodal, K. Kaligosi, I. Katriel, and M. Kutz. Faster algorithms for computing longest common increasing subsequence. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM'06), pp. 330-341, 2006. [24] H. Bunke and J. Csirik. An improved algorithm for computing the edit distance of run-length coded sequences. Information Processing Letters, 54(2):93-96, 1995. [25] K.M. Chao and L. Zhang. Sequence Comparison: Theory and Methods, Springer, 2009. [26] Y.C. Chen and K.M. Chao. On the generalized constrained longest common subsequence problems. Journal of Combinatorial Optimization, accepted, 2009. [27] K.Y. Chen, P.H. Hsu, and K.M. Chao. Hardness of comparing two run-length encoded strings. Journal of Complexity, accepted, 2010. [28] F.Y.L. Chin, N.L. Ho, T.W. Lam, and P.W.H. Wong. Efficient constrained multiple sequence alignment with performance guarantee. Journal of Bioinformatics and Computational Biology, 3(1):1-18, 2005. [29] F.Y.L. Chin, A.D. Santis, A.L. Ferrara, N.L. Ho, and S.K. Kim. A simple algorithm for the constrained longest common sequence problems. Information Processing Letters, 90:175-179, 2004. [30] Y.S. Chung, C.L. Lu, and C.Y. Tang. Constrained sequence alignment: a general model and the hardness results. Discrete Applied Mathematics, 155:2471-2486, 2007. [31] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms, 2nd edition, MIT Press and McGraw-Hill, 2001. [32] M. Crochemore, G.M. Landau, and M. Ziv-Ukelson. A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM Journal on Computing, 32(6): 1654-1673, 2003. [33] J.J. Fanm and K.Y. Su. The design of efficient algorithms for two-dimensional pattern matching, IEEE Transactions on Knowledge and Data Engineering, 7(2):318-327, 1995. [34] V. Freschi and A. Bogliolo. Longest common subsequence between run-length-encoded sequences: a new algorithm with improved parallelism. Information Processing Letters, 90(4):167-173, 2004. [35] Z. Gotthilf, D. Hermelin, and M. Lewenstein. Constrained LCS: hardness and approximation. In Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM'08), pp. 255-262, 2008. [36] R.I. Greenberg. Fast and simple computation of all longest common subsequences. Technical Report, Department of Mathematical and Computer Sciences, Loyola University, Chicago, 2002. [37] D. Gusfield. Algorithms on Sequences, Trees, and Sequences, Cambridge University Press, 1997. [38] D. He and A.N. Arslan. A space-efficient algorithm for the constrained pairwise sequence alignment problem. Genome Informatics, 16(2):237-246, 2005. [39] D. He and A.N. Arslan. A parallel algorithm for the constrained multiple sequence alignment problem. In Proceedings of the 5th IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05), pp. 258-262, 2005. [40] D. He, A.N. Arslan, and A.C.H. Ling. A fast algorithm for the constrained multiple sequence alignment problem. Acta Cybernetica, 17(4):701-717, 2005. [41] D.S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of ACM, 18:341-343, 1975. [42] D.S. Hirschberg. Algorithms for the longest common subsequence problem. Journal of ACM, 24:664-675, 1977. [43] W.J. Hsu and M.W. Du. New algorithms for the LCS problem. Journal of Computer and System Sciences, 29:133-152, 1984. [44] G.S. Huang, J.J. Liu, and Y.L. Wang. Sequence alignment algorithms for run-length-encoded sequences. In Proceedings of the 14th Annual International Computing and Combinatorics (COCOON'08), pp. 319-330, 2008. [45] J.W. Hunt and T.G. Szymanski. A fast algorithm for computing longest common subsequence. Communications of ACM, 20(5):350-353, 1977. [46] C.S. Iliopoulos and M.S. Rahman. New efficient algorithms for the LCS and constrained LCS problems. Information Processing Letters, 106:13-18, 2008. [47] C.S. Iliopoulos. and M.S. Rahman. A new efficient algorithm for computing the longest common subsequence. Theory of Computing Systems, 355-371, 2008. [48] W. Just. Computational complexity of multiple sequence alignment with SP-score. Journal of Computational Biology, 8(6):615-23, 2001. [49] J. KAarkkAainen and E. Ukkonen. Two and higher dimensional pattern matching in optimal expected time. In Proceedings of the 5th annual ACM-SIAM symposium on Discrete algorithms, pp.715-723, 1994. [50] J.W. Kim, A. Amir, G.M. Landau, and K. Park. Computing similarity of run-length encoded sequences with affine gap penalty. Theoretical Computer Science, 395(2{3):268-282, 2008. [51] G.M. Landau, E.W. Myers, and M. Ziv-Ukelson. Two algorithms for LCS consecutive suffix alignment. Journal of Computer and System Sciences, 73(7):1095-1117, 2007. [52] J.J. Liu, G.S. Huang, Y.L. Wang, and R.C.T. Lee. Edit distance for a run-length-encoded sequence and an uncompressed sequence. Information Processing Letters, 105(1):12-16, 2007. [53] J.J. Liu, Y.L. Wang, and R.C.T. Lee. Finding a longest common subsequence between a run-length-encoded sequence and an uncompressed sequence. Journal of Complexity, 24(2):173-184, 2008. [54] C.L. Lu and Y.P. Huang. A memory-efficient algorithm for multiple sequence alignment with constraints. Bioinformatics, 21:20-30, 2005. [55] D. Maier. The complexity of some problems on subsequences and supersequence. Journal of ACM, 25:322-336, 1978. [56] V. MAakinen, E. Ukkonen, and G. Navarro. Approximate matching of run-length compressed sequences. Algorithmica, 35(4):347-369, 2003. [57] W.J. Masek and M.S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20:18-31, 1980. [58] J.S.B. Mitchell. A geometric shortest path problem, with application to computing a longest common subsequence in run-length encoded sequences. Technical Report, Department of Applied Mathematics, SUNY, Stony Brook, NY, 1997. [59] E.W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1:251-266, 1986. [60] E.W. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM, 46(3):395-415, 1999. [61] N. Nakatsu, Y. Kambayashi, and S. Yajima. A longest common subsequence algorithm suitable for similar text strings. Acta Informatica, 18:171-179, 1982. [62] S.B. Needleman and C.D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443-453, 1970. [63] W.R. Pearson and D.J. Lipman. Improved tools for biological sequence comparison. In Proceedings of the National Academy of Sciences of the United States of America, vol. 85, pp. 2444-2448, 1988. [64] Z.S. Peng and H.F. Ting. Time and space efficient algorithms for constrained sequence alignment. In Proceedings of the 9th International Conference on Implementation and Application of Automata (CIAA'04), pp. 237-246, 2004. [65] Y.H. Peng, C.B. Yang, K.S. Huang, and K.T. Tseng. An algorithm and applications to sequence alignment with weighted constraints. International Journal of Foundations of Computer Science, 21(1):51-59, 2010. [66] P.A. Pevzner. Computational Molecular Biology: an Algorithmic Approach, The MIT Press, Cambridge, MA, 2000. [67] C. Rick. Simple and fast linear space computation of longest common subsequence. Information Processing Letters, 75:275-281, 2000. [68] C. Schensted. Longest increasing and decreasing subsequences. Canadian Journal of Mathematics, 13:179-191, 1961. [69] T.F. Smith and M.S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195-197, 1981. [70] C.Y. Tang, C.L. Lu, M.D. Chang, Y.T. Tsai, Y.J. Sun, K.M. Chao, J.M. Chang, Y.H. Chiou, C.M. Wu, H.T. Chang, and W.I. Chou. Constrained multiple sequence alignment tool development and its application to RNase family alignment. Journal of Bioinformatics and Computational Biology, 1(2):267-287, 2003. [71] R.E. Tarjan. Data Structure and Network Algorithms, CBMS 44 (Society for Industrial and Applied Mathematics, Philadephia, PA, 1983. [72] Y.T. Tsai. The constrained longest common subsequence problem. Information Processing Letters, 88:173-176, 2003. [73] Y.T. Tsai, Y.P. Huang, C.T. Yu, and C.L. Lu. MuSiC: a tool for multiple sequence alignment with constraints. Bioinformatics, 20:2309-2311, 2004 [74] H.J. Tsai, C.Y. Lin, Y.C. Chung, and C.Y. Tang. An Efficient Parallel Algorithm for Constraint Multiple Sequence Alignment. In Proceedings of the International Computer Symposium (ICS'06), pp. 1261-1266, 2006. [75] P. van Emde Boas. Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters, 6:80-82, 1977. [76] P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Mathematical Systems Theory, 10:99-127, 1977. [77] R.A. Wagner and M.J. Fischer. The sequence-to-string correction problem. Journal of ACM, 21(1):168-173, 1974. [78] L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337-348, 1994. [79] R.F. Zhu and T. Takaoka. A technique for two-dimensional pattern matching. Communications of the ACM, 32(9):1110-1120, 1989. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10769 | - |
dc.description.abstract | 本篇論文探討數個最長共同子序列的變異問題,是由分子生物學和序列比對的實際應用與理論興趣發展而來。
在論文的第一部份,我們研究四個附加條件限制的最長共同子序列問題,目的是在兩條序列之共同子序列中,求得包含或排除一條附加限制字串為子序列或子字串之最長序列。我們研究這些問題的最佳化特性,並針對每個問題提出一個動態規劃演算法。理論分析顯示,我們提出的演算法之時間複雜度與兩條序列及一條附加限制字串的長度乘積成正比。此外,我們也考慮任意多條附加限制字串的情況。 為了使序列的相似度衡量更有彈性,在論文的第二部份,我們研究一個混合附加條件限制的問題,目的是在兩條序列之共同子序列中,求得包含一條附加限制字串為子序列且排除另一條附加限制字串為子序列之最長序列。我們提出一個動態規劃演算法來解決這個問題,演算法所需的時間與兩條序列及兩條附加限制字串的長度乘積成正比。另外,我們提出一個針對兩條序列配對位置做計算的快速演算法。 在論文最後一個部份,我們在最長共同子序列問題與一個附加條件限制的最長共同子序列問題上考慮一個被廣泛使用的資料壓縮技術,稱為區段長度編碼法。為了解決以區段長度編碼的序列之最長共同子序列問題,我們研究序列以區段分割動態規劃矩陣的特性,並利用簡化矩陣內部份計算來求得最長共同子序列的長度。最後,我們設計兩個演算法,在兩條以區段長度編碼的序列之共同子序列中,求得包含一條以區段長度編碼的附加限制字串為子序列之最長序列。 | zh_TW |
dc.description.abstract | This dissertation studies several variants of the longest common subsequence (abbreviated LCS) problem. These variants arise from some applications and theoretical interests in molecular biology and sequence comparison.
In the First part of this dissertation, we study four constrained LCS (abbreviated CLCS) problems, each of which is to find a longest sequence that is a common subsequence of two sequences and either includes or excludes a constrained pattern as a subsequence or substring. We investigate the optimality principles of these problems and then derive a dynamic programming algorithm for each problem. The theoretical analyses show that the time complexity of each algorithm is proportional to the product of the lengths of the given sequences and constrained pattern. We also consider the case where the number of constrained patterns in each problem is arbitrary. To make the similarity measurement of sequences more flexible, in the second part of this dissertation, we study the problem of finding a longest sequence that is a common subsequence of two sequences and not merely includes a constrained pattern as a subsequence but excludes the other constrained pattern as a subsequence. We give a dynamic programming algorithm whose time complexity is proportional to the product of the lengths of the given sequences and constrained patterns. We also present a fast algorithm which restricts the computation on the positions of matches between the sequences. In the last part of this dissertation, we consider a common used data compression scheme called run-length encoding (abbreviated RLE) on the input sequences of the LCS problem and one of the CLCS problems. To solve the LCS problem of two RLE sequences, we investigate the properties of the partition, induced by the runs of two sequences, in the dynamic programming matrix for the LCS problem and exploit the sequences for computing the length of an LCS by utilizing the simplicity of some positions. Finally, we devise two algorithms for the problem of finding a longest sequence that is a common subsequence of two RLE sequences and includes a constrained RLE pattern as a subsequence. | en |
dc.description.provenance | Made available in DSpace on 2021-05-20T21:57:09Z (GMT). No. of bitstreams: 1 ntu-99-D94922010-1.pdf: 697703 bytes, checksum: b2a10cfd2e174e85269b8be58aa2a3f7 (MD5) Previous issue date: 2010 | en |
dc.description.tableofcontents | Acknowledgement i
Abstract in Chinese iii Abstract v List of Figures xii List of Tables xiii 1 Introduction 1 2 Longest Common Subsequences (LCSs) 9 2.1 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Space-Saving Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Hunt-Szymanski Strategy . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Previous Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Constrained LCSs 23 3.1 Related Works on Problem SEQ-IC-LCS . . . . . . . . . . . . . . . . 25 3.2 Problem STR-IC-LCS . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Problem SEQ-EC-LCS . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4 Problem STR-EC-LCS . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Problem CLCS with an Arbitrary Number of Constrained Patterns . 37 3.5.1 Hardness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.2 Exact Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4 Hybrid Constrained LCSs 53 4.1 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Bounded Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Speeding up the Computation . . . . . . . . . . . . . . . . . . . . . . 60 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 LCSs of Run-Length Encoded Sequences 67 5.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Sliding-Window Maxima with a Dynamic Window Size . . . . . . . . 74 5.3 An E±cient Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6 Constrained LCSs of Run-Length Encoded Sequences 81 6.1 A Simple Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2 A Faster Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7 Concluding Remarks 93 7.1 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . 93 7.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Bibliography 101 | |
dc.language.iso | en | |
dc.title | 附加限制條件的最長共同子序列問題之演算法設計 | zh_TW |
dc.title | Efficient Algorithms for the Constrained Longest Common Subsequence Problems | en |
dc.type | Thesis | |
dc.date.schoolyear | 98-2 | |
dc.description.degree | 博士 | |
dc.contributor.oralexamcommittee | 陳俊良,陳健輝,劉邦鋒,曾宇鳳,蔡英德 | |
dc.subject.keyword | 動態規劃,最長共同子序列,附加限制條件之最長共同子序列,區段長度編碼, | zh_TW |
dc.subject.keyword | Dynamic Programming,Longest Common Subsequence,Constrained Longest Common Subsequence,Run-Length Encoding, | en |
dc.relation.page | 111 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2010-07-22 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-99-1.pdf | 681.35 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。