請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48322
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 趙坤茂 | |
dc.contributor.author | Kuan-Yu Chen | en |
dc.contributor.author | 陳冠宇 | zh_TW |
dc.date.accessioned | 2021-06-15T06:52:30Z | - |
dc.date.available | 2012-02-20 | |
dc.date.copyright | 2011-02-20 | |
dc.date.issued | 2011 | |
dc.date.submitted | 2011-02-14 | |
dc.identifier.citation | [1] Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987.
[2] Alok Aggarwal and James K. Park. Notes on searching in multidimensional monotone arrays (preliminary version). In FOCS, pages 497-512, 1988. [3] Amihood Amir and Gary Benson. Efficient two-dimensional compressed matching. In Data Compression Conference, pages 279-288, 1992. [4] Amihood Amir, Gary Benson, and Martin Farach. Let sleeping files lie: Pattern matching in z-compressed ‾les. J. Comput. Syst. Sci., 52(2):299-307, 1996. [5] Amihood Amir, Gad M. Landau, and Dina Sokol. Inplace run-length 2d compressed search. Theor. Comput. Sci., 290(3):1361-1383, 2003. [6] Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. [7] Hsing-Yen Ann, Chang-Biau Yang, Chiou-Ting Tseng, and Chiou-Yi Hor. A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings. Inf. Process. Lett., 108(6):360-364, 2008. [8] Hsing-Yen Ann, Chang-Biau Yang, Chiou-Ting Tseng, and Chiou-Yi Hor. Fast algorithms for computing the constrained lcs of run-length encoded strings. In BIOCOMP, pages 646-649, 2009. [9] Alberto Apostolico, Gad M. Landau, and Steven Skiena. Matching for run-length encoded strings. J. Complexity, 15(1):4-16, 1999. [10] Ora Arbell, Gad M. Landau, and Joseph S. B. Mitchell. Edit distance of run-length encoded strings. Inf. Process. Lett., 83(6):307-314, 2002. [11] Ilya Baran, Erik D. Demaine, and Mihai Patrascu. Subquadratic algorithms for 3sum. Algorithmica, 50(4):584-596, 2008. [12] Gill Barequet and Sariel Har-Peled. Polygon containment and translational min-hausdorff-distance between segment sets are 3sum-hard. Int. J. Comput. Geometry Appl., 11(4):465-474, 2001. [13] Jon Louis Bentley. Solutions to klee's rectangle problems. Carnegie-Mellon University, 1977. [14] Horst Bunke and János Csirik. An improved algorithm for computing the edit distance of run-length coded strings. Inf. Process. Lett., 54(2):93-96, 1995. [15] Kuan-Yu Chen and Kun-Mao Chao. Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint. Inf. Process. Lett., 96(6):197-201, 2005. [16] Kuan-Yu Chen and Kun-Mao Chao. A fully compressed algorithm for computing the edit distance of run-length encoded strings. In ESA (1), pages 415-426, 2010. [17] Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007. [18] Raphaël Clifford, Manolis Christodoulakis, Tim Crawford, David Meredith, and Geraint A. Wiggins. A fast, randomised, maximal subset matching algorithm for document-level music retrieval. In ISMIR, pages 150-155, 2006. [19] Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. From coding theory to efficient pattern matching. In SODA, pages 778-784, 2009. [20] Maxime Crochemore, Gad M. Landau, and Michal Ziv-Ukelson. A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput., 32(6):1654-1673, 2003. [21] Tali Eilam-Tsoreff and Uzi Vishkin. Matching patterns in a string subject to multilinear transformations. Theor. Comput. Sci., 60(3):231-254, 1988. [22] Martin Farach and Mikkel Thorup. String matching in lempel-ziv compressed strings. Algorithmica, 20(4):388-404, 1998. [23] Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987-1011, 2000. [24] Anka Gajentaan and Mark H. Overmars. On a class of o(n2) problems in computational geometry. Comput. Geom., 5:165-185, 1995. [25] H. Gajewska and Robert Endre Tarjan. Deques with heap order. Inf. Process. Lett., 22(4):197-200, 1986. [26] Leszek Gasieniec, Marek Karpinski, Wojciech Plandowski, and Wojciech Rytter. Efficient algorithms for lempel-zip encoding (extended abstract). In SWAT, pages 392-403, 1996. [27] Leszek Gasieniec and Wojciech Rytter. Almost optimal fully lzw-compressed pattern matching. In Data Compression Conference, pages 316-325, 1999. [28] Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13(2):338-355, 1984. [29] Danny Hermelin, Gad M. Landau, Shir Landau, and Oren Weimann. A unified algorithm for accelerating edit-distance computation via text-compression. In STACS, pages 529-540, 2009. [30] Antonio Hernández-Barrera. Finding an o(n2 log n) algorithm is sometimes hard. In CCCG, pages 289-294, 1996. [31] Masahiro Hirao, Ayumi Shinohara, Masayuki Takeda, and Setsuo Arikawa. Fully compressed pattern matching algorithm for balanced straight-line programs. In SPIRE, pages 132-138, 2000. [32] Daniel S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341-343, 1975. [33] Ping-Hui Hsu, Kuan-Yu Chen, and Kun-Mao Chao. Finding all approximate gapped palindromes. Int. J. Found. Comput. Sci., 21(6):925-939, 2010. [34] Guan-Shieng Huang, Jia Jie Liu, and Yue-Li Wang. Sequence alignment algorithms for run-length-encoded strings. In COCOON, pages 319-330, 2008. [35] Roy Hunter and A. Harry Robinson. International digital facsimile coding standards. In Proceeding of the IEEE, pages 854-867, 1980. [36] Marek Karpinski, Wojciech Rytter, and Ayumi Shinohara. An efficient pattern-matching algorithm for strings with short descriptions. Nord. J. Comput., 4(2):172-186, 1997. [37] Jin Wook Kim, Amihood Amir, Gad M. Landau, and Kunsoo Park. Computing similarity of run-length encoded strings with affine gap penalty. Theor. Comput. Sci., 395(2-3):268-282, 2008. [38] Jia Jie Liu. Distinct squares in run-length encoded strings. Theor. Comput. Sci., 411:4235-4241, 2010. [39] Jia Jie Liu, Guan-Shieng Huang, and Yue-Li Wang. A fast algorithm for finding the positions of all squares in a run-length encoded string. Theor. Comput. Sci., 410:3942-3948, 2009. [40] Jia Jie Liu, Guan-Shieng Huang, Yue-Li Wang, and Richard C. T. Lee. Edit distance for a run-length-encoded string and an uncompressed string. Inf. Process. Lett., 105(1):12-16, 2007. [41] Jia Jie Liu, Yue-Li Wang, and Richard C. T. Lee. Finding a longest common subsequence between a run-length-encoded string and an uncompressed string. J. Complexity, 24(2):173-184, 2008. [42] George S. Lueker. A data structure for orthogonal range queries. In FOCS, pages 28-34, 1978. [43] Veli Mäkinen. Parameterized Approximate String Matching and Local-Similarity-Based Point-Pattern Matching. PhD thesis, University of Helsinki, 2003. [44] Veli Mäkinen, Esko Ukkonen, and Gonzalo Navarro. Approximate matching of run-length compressed strings. Algorithmica, 35(4):347-369, 2003. [45] Marc van Kreveld Mark Overmars Mark de Berg, Otfried Cheong. Computational Geometry: Algorithms and Applications. Springer-Verlag, 2008. [46] William J. Masek and Mike Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18-31, 1980. [47] Wataru Matsubara, Shunsuke Inenaga, Akira Ishino, Ayumi Shinohara, Tomoyuki Nakamura, and Kazuo Hashimoto. Efficient algorithms to compute compressed longest common substrings and compressed palindromes. Theor. Comput. Sci., 410(8-10):900-913, 2009. [48] Joseph S. B. Mitchell. A geometric shortest path problem, with application to computing a longest common subsequence in run-length encoded strings. SUNY Stony Brook, 1997. [49] Wojciech Rytter. Algorithms on compressed strings and arrays. In SOFSEM, pages 48-65, 1999. [50] Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM J. Comput., 27(4):972-992, 1998. [51] Armita Sheari, Mehdi Kargar, Ali Katanforoush, Shahriar Arab, Mehdi Sadeghi, Hamid Pezeshk, Changiz Eslahchi, and Sayed-Amir Marashi. A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins. BMC Bioinformatics, 9, 2008. [52] Esko Ukkonen. Finding approximate patterns in strings. J. Algorithms, 6(1):132-137, 1985. [53] Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974. [54] P. S. P. Wang. Handbook of Optical Character Recognition and Document Image Analysis. World Scientific Publishing Company, 1997. [55] Dan E. Willard. Predicate-Oriented Database Search Algorithms. Outstanding Dissertations in the Computer Sciences. Garland Publishing, New York, 1978. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48322 | - |
dc.description.abstract | 在字串學的研究中,一個最新的趨勢結合了字串搜尋與資料壓縮這兩種概念,創造了所謂「壓縮字串搜尋問題」。在此研究課題中,最終的目標是希望能設計出不必解碼的字串比對演算法。此論文中所探討的壓縮方式稱為「區段長度編碼」,儘管這個編碼方式十分簡易,唯一正面的結果為計算區段長度編碼字串的「插入、刪除距離」,所需花費的時間為O(mn log mn),其中m和n分別代表輸入字串的區段數。此論文中,我們探討了包含如何比較兩區段長度編碼字串與尋找特徵片段等問題,所提出的演算法皆不必經過解碼步驟,其中,我們更提出了第一個不必經過解碼就能計算區段長度編碼字串「編輯距離」的演算法。論文中更對所討論的問題建立其時間複雜度的下界,分別由「三總和問題」或「排序成對和問題」當成其下界,這說明了所討論的問題具有Ω(mn)和Ω(mn logm)等猜測的時間複雜度下界。我們相信本論文的結果有助於設計出不必經過解碼的區段長度編碼字串的序列比對演算法。 | zh_TW |
dc.description.abstract | A recent trend in stringology combines the two concepts of pattern matching and data compression, creating the so-called compressed pattern matching problem. The ultimate goal in this line of investigation is to design algorithms that can cope with encoded strings without resorting to any decoding step. The underlying compression scheme considered in this dissertation is called run-length encoding. Despite its simple coding nature, the only positive result before this work is the computation of indel-distance (dual of longest common subsequence) of two run-length encoded strings, achieving O(mnlog mn) time, where m and n are the number of runs of the input strings. Both comparing and identifying featured patterns in run-length encoded strings are explored in this dissertation. All the presented algorithms contain no decoding step, one of which is the firs-known algorithm that computes the Levenshtein distance of two run-length encoded strings without decoding. Several lower bounds are established in the dissertation, by reduction from either the 3sum problem or the problem of sorting pairwise-sums, implying O­mega(mn) and Omega­(mnlogm) conjectured time bounds, respectively. We believe that the work accomplished in this dissertation shed some light on solutions to aligning two run-length encoded strings without decoding. | en |
dc.description.provenance | Made available in DSpace on 2021-06-15T06:52:30Z (GMT). No. of bitstreams: 1 ntu-100-D94922004-1.pdf: 627684 bytes, checksum: 015ab11945371aa8f23934c0c4494647 (MD5) Previous issue date: 2011 | en |
dc.description.tableofcontents | 1 Introduction 2
1.1 Historical Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Compressed Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Edit Distance Computation . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Palindrome Identi‾cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Manuscript Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 On the Hardness of Comparing Two Run-Length Encoded Strings 6 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Related Work and Our Results . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 The 3SUM Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 The Wildcard Matching Problem . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 The k-Mismatch Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 An Upper Bound of the k-Mismatch with Wildcards Problem . . . . . . . . . . 12 2.3.1 A Plane-Sweep Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 The Function Problems and Sorting Pairwise Sums . . . . . . . . . . . . 19 2.3.3 Two-Dimensional Matching . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Algorithm for Computing the Edit Distance of Two Run-Length Encoded Strings without Decoding 24 3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.1 A Propagation-Based Approach . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.2 Problem Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 A Geometric View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 A Succinct Representation of the Border Values . . . . . . . . . . . . . . 28 3.2.2 Propagation of Turning Points . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 A Linear-Time Algorithm for the Continuous Sliding-Window Minima Problem . 31 3.3.1 The Auxiliary Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.3 Properties of the Output Trajectory . . . . . . . . . . . . . . . . . . . . . 37 3.4 Time Complexity of the Edit Distance Problem . . . . . . . . . . . . . . . . . . 39 3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4 Identifying Approximate Palindromes in a Run-Length Encoded String 44 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Palindrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Notation and Similarity Measure . . . . . . . . . . . . . . . . . . . . . . 45 4.1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 An 〈n log n, log n + min{k, n}〉-Time Algorithm . . . . . . . . . . . . . . . . . . 46 4.3 An 〈n2 log n, log2 n〉-Time Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Query Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 Answering a max level Query . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.3 Answering a roof Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.4 Answering a cost Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5 Future Work 58 | |
dc.language.iso | en | |
dc.title | 不必解碼的區段長度編碼字串比對問題演算法 | zh_TW |
dc.title | Algorithms for Comparing Run-Length Encoded Strings without Decoding | en |
dc.type | Thesis | |
dc.date.schoolyear | 99-1 | |
dc.description.degree | 博士 | |
dc.contributor.oralexamcommittee | 王有禮,楊昌彪,劉邦鋒,傅楸善 | |
dc.subject.keyword | 區段長度,壓縮,字串搜尋,比對,迴文, | zh_TW |
dc.subject.keyword | run length,compression,pattern matching,alignment,palindrome, | en |
dc.relation.page | 65 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2011-02-14 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-100-1.pdf 目前未授權公開取用 | 612.97 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。