請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57079完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 曾宇鳳(Yufeng Jane Tseng) | |
| dc.contributor.author | Chieh Lin | en |
| dc.contributor.author | 林潔 | zh_TW |
| dc.date.accessioned | 2021-06-16T06:34:22Z | - |
| dc.date.available | 2019-08-14 | |
| dc.date.copyright | 2014-08-14 | |
| dc.date.issued | 2014 | |
| dc.date.submitted | 2014-08-04 | |
| dc.identifier.citation | [1] Kahovec, J. A., Fox, R. B., & Hatada, K. (2002). Nomenclature of regular single-strand organic polymers (IUPAC Recommendations 2002). Pure and applied chemistry, 74(10), 1921-1956.
[2] Wilks, E. S. (2000). Polymer nomenclature: the controversy between source-based and structure-based representations (a personal perspective). Progress in Polymer Science, 25(1), 9-100. [3] Lowe, D. M., Corbett, P. T., Murray-Rust, P., & Glen, R. C. (2011). Chemical name to structure: OPSIN, an open source solution. Journal of chemical information and modeling, 51(3), 739-753. [4] Paul J. Flory. (1953). Principles of polymer chemistry. Cornell University Press. [5] Otsuka, S., Kuwajima, I., Hosoya, J., Xu, Y., & Yamazaki, M. (2011, September). PoLyInfo: Polymer Database for polymeric materials design. In Emerging Intelligent Data and Web Technologies (EIDWT), 2011 International Conference on (pp. 22-29). IEEE. [6] Dalby, A., Nourse, J. G., Hounshell, W. D., Gushurst, A. K., Grier, D. L., Leland, B. A., & Laufer, J. (1992). Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. Journal of chemical information and computer sciences, 32(3), 244-255. [7] Chemaxon website. Retrieved April 19, 2014. from URL: http://www.chemaxon.com [8] Weininger, David. 'SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules.' Journal of chemical information and computer sciences 28.1 (1988): 31-36. [9] Weininger, D., Weininger, A., & Weininger, J. L. (1989). SMILES. 2. Algorithm for generation of unique SMILES notation. Journal of Chemical Information and Computer Sciences, 29(2), 97-101. [10] Tarjan, R. (1972). Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2), 146-160. [11] Daylight website. Retrieved April 23, 2014, from URL: http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html [12] Marechal, E., & Wilks, E. S. (2001). Generic source-based nomenclature for polymers (IUPAC Recommendations 2001). Pure and Applied Chemistry, 73(9), 1511-1519. [13] PolymerProcessing website. Retrieved April 19, 2014, from URL: http://www.polymerprocessing.com/ [14] Polymers: A Property Database website. Retrieved April 19, 2014, from URL: http://poly.chemnetbase.com/intro/index.jsp [15] Polymer Library website. Retrieved April 19, 2014, from URL: http://www.polymerlibrary.com/ [16] Polymer Database (PoLyInfo) website. Retrieved April 19, 2014, from URL: http://polymer.nims.go.jp/index_en.html [17] Sigma-Aldrich Polymer Science website. Retrieved June 1, 2014, from URL: http://www.sigmaaldrich.com/materials-science/polymer-science.html [18] OPSIN website. Retrieved April 19, 2014, from URL: https://bitbucket.org/dan2097/opsin/ [19] PubChem Database website. Retrieved June 1, 2014, from URL: https://pubchem.ncbi.nlm.nih.gov [20] PubChem PUG REST API website. Retrieved June 1, 2014, from URL: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html [21] Java API website. Retrieved April 19, 2014, from URL: http://www.chemaxon.com/marvin/help/developer/beans/api/index.html [22] Marvin Suite website. Retrieved April 19, 2014, from URL: http://www.chemaxon.com/download/marvin-suite/#mbeans [23] Marvin was used for drawing, displaying and characterizing chemical structures, substructures, Marvin 5.12.3, 2013, ChemAxon (http://www.chemaxon.com) [24] JChem Suite website. Retrieved April 19, 2014, from URL: http://www.chemaxon.com/download/jchem-suite/#jchem [25] JChem Base was used for substructure searching, JChem 5.12.3, 2013, ChemAxon (http://www.chemaxon.com) [26] Molecule API website. Retrieved April 19, 2014, from URL: http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/struc/Molecule.html [27] StructureChecker API website. Retrieved April 19, 2014, from URL: http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/checkers/StructureChecker.html [28] Error list of StructureChecker website. Retrieved April 19, 2014, from URL: http://www.chemaxon.com/marvin/help/structurechecker/checkerlist.html [29] SMILES in Marvin website. Retrieved April 19, 2014, from URL: http://www.chemaxon.com/marvin/help/formats/smiles-doc.html [30] MolSearch API website. Retrieved April 19, 2014, from URL: https://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/MolSearch.html [31] SearchOptions API website. Retrieved April 19, 2014, from URL: https://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/SearchOptions.html [32] Information on See5/C5.0 - RuleQuest Research Data Mining Tools website. Retrieved April 20, 2014, from URL: http://www.rulequest.com/see5-info.html [33] Is See5/C5.0 Better Than C4.5? website. Retrieved April 20, 2014, from URL: http://rulequest.com/see5-comparison.html. [34] C5.0: An Informal Tutorial website. Retrieved April 20, 2014, from URL: http://www.rulequest.com/see5-unix.html [35] Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In ICML (Vol. 96, pp. 148-156). [36] Skiena, S. (1990). Dijkstra's Algorithm. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica, Reading, MA: Addison-Wesley, 225-227. [37] Japkowicz, N., & Shah, M. (2011). Evaluating learning algorithms: a classification perspective. Cambridge University Press. [38] Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. biometrics, 33(1), 159-174. [39] Fleiss, J. L., Levin, B., & Paik, M. C. (2013). Statistical methods for rates and proportions. John Wiley & Sons. [40] Aromaticity detection website of Chemaxon. Retrieved July 10, 2014, from URL: http://www.chemaxon.com/marvin-archive/4.1.3/marvin/doc/user/aromatization-doc.html | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57079 | - |
| dc.description.abstract | 目前資料探勘技術應用在分析聚合物科學文獻時,因為聚合物複雜的命名方式而讓其非常的具有挑戰性。現在已發展出的技術只能夠用來處理IUPAC所定義的根據聚合物結構所做的化學命名。在這份研究裡面,PolyName2Structure這個新的系統被開發出來,它可以把IUPAC根據結構的聚合物命名,還有根據單體所做的聚合物命名給轉成聚合物的結構。其中,根據結構的命名是使用最新版的OPSIN (Open Parser for Systematic IUPAC Nomenclature)來處理。而根據單體的命名則是先將單體從聚合物名稱中分析出來,然後經由OPSIN和PubChem PUG (Power User Gateway) REST (Representation State Transfer)轉成單體結構,之後將從PoLyInfo資料庫所學習出來的預測模型來預測單體的反應途徑,最後根據預測出的反應途徑來產生聚合物的結構。在此過程中,新的演算法也被開發出來,用來產生機器學習模型所使用的變數、簡化聚合物結構、產生所有最短重複結構、模擬聚合物各種反應、還有從聚合物及單體結構中找出反應基團等等。為了檢驗PolyName2Structure系統的表現,Sigma-Aldrich的聚合物產品目錄也被採用來當作外部測試資料。所有預測反應途徑的模型幾乎都有達到95%以上的正確率。在訓練的資料和外部測試的資料上面,PolyName2Structure也可以分別達到98.1%和92.1%的正確率。有了這樣一個準確的系統,我們可以更好地將期刊、教科書、專利文件裡面的聚合物名稱轉換成結構,增強資料探勘在聚合物中的效果。而在此研究中所開發的方法、機器學習的變數、預測模型等等都可以被重複使用並且應用到未來聚合物資訊學的領域的研究裡面。 | zh_TW |
| dc.description.abstract | Current text mining in polymer scientific documents is highly challenging, mainly due to the complex names used in polymer science. The current tools are only capable of handling systematic IUPAC structure-based polymer names. In this study, a system that can automatically convert polymer structure-based names and source-based names to polymer structures, PolyName2Structure, was developed. Structure-based names are processed using the latest version of OPSIN (Open Parser for Systematic IUPAC Nomenclature). Source-based names are analyzed first to obtain the structural information of monomers using OPSIN and PubChem PUG (Power User Gateway) REST (Representation State Transfer). Then, prediction models built using the predicted reaction pathways of monomers learnt from the dataset in the PoLyInfo database are used to convert monomer structures into a polymer structure. Several algorithms are designed to generate the descriptor sets used in each prediction model, simplifying polymer structures, generating all repeating units, simulating polymer reaction types and finding functional groups from a given set of monomer structures and polymer structures. To validate the performance of the PolyName2Structure system, the Sigma-Aldrich polymer product catalog was used as an external testing dataset. The prediction models of polymer reaction pathways show very high performance (most with above 95% accuracy). The PolyName2Structure system also performs very well on both the training dataset and the external testing dataset, with 98.1% and 92.1% accuracy, respectively. Based on its excellent performance, the PolyName2Structure system can be used to convert polymer names in journal papers, textbooks, patents, and other documents into polymer structures. All the methods, descriptor sets, and models designed in this study also can also be re-used and applied for the future research of polymer informatics. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T06:34:22Z (GMT). No. of bitstreams: 1 ntu-103-R02922027-1.pdf: 6682555 bytes, checksum: aec586d86f7ff0887ddb79bb2de21418 (MD5) Previous issue date: 2014 | en |
| dc.description.tableofcontents | 誌謝 I
中文摘要 III ABSTRACT IV CONTENTS VI LIST OF FIGURES XI LIST OF TABLES XXX GLOSSARY XXXII CHAPTER 1 INTRODUCTION 1 1.1 BACKGROUND 1 1.2 POLYMERS 2 1.2.1 Representation of Polymers 3 1.3 COMPUTER REPRESENTATIONS OF MOLECULES AND POLYMERS 4 1.3.1 MDL Mol File Format 4 1.3.2 SMILES File Format 7 1.4 POLYMER NAMES 10 1.4.1 Typical Names 10 1.4.2 Structure-Based Names 11 1.4.3 Source-Based Names 11 1.5 POLYMER REACTION TYPES 12 1.5.1 Addition Reactions 13 1.5.2 Ring Opening Reactions 13 1.5.3 Condensation Reactions 14 CHAPTER 2 MATERIALS AND TOOLS 15 2.1 POLYINFO DATABASE 15 2.1.1 Polymer Data Format and Extraction 15 2.1.2 Errors in PoLyInfo Database 17 2.2 SIGMA-ALDRICH POLYMER PRODUCTS CATALOG 21 2.2.1 The Method to Extract External Dataset from Sigma-Aldrich Polymer Products Catalog 21 2.2.2 Name Errors of Sigma-Aldrich Polymer Products Catalog 23 2.2.3 The Dictionary of Common Names of Polymer Structures Built from Sigma-Aldrich Polymer Products Catalog 23 2.3 OPSIN: OPEN PARSER FOR SYSTEMATIC IUPAC NOMENCLATURE 24 2.4 NIH/NCBI PUBCHEM POWER USER GATEWAY (PUG) RREPRESENTATION STATE TRANSFER (REST) 24 2.5 CHEMAXON LIBRARIES 25 2.5.1 Data Structure 25 2.5.2 Structure Checker 26 2.5.3 Canonicalization of SMILES 27 2.5.4 Substructure Search 27 2.5.5 Chemical Structures Drawing and Viewing 30 2.6 C5.0 CLASSIFIER 31 2.6.1 Decision Trees or Rules 31 2.6.2 Boosting 32 CHAPTER 3 ALGORITHMS AND IMPLEMENTATIONS 33 3.1 POLYNAME2STRUCTURE SYSTEM 33 3.1.1 Overview of PolyName2Structure System 33 3.1.2 Overview of Handling Source-Based Names 35 3.2 DATA CLEANING OF POLYINFO DATABASE 37 3.3 ALGORITHMS OF HANDLING POLYMER STRUCTURES 38 3.3.1 Finding the Main Chain of a Polymer Structure 38 3.3.2 Connecting Multiple Repeating Units of the Same Polymer 40 3.3.3 Checking If Two Repeating Units Are the Same 41 3.3.4 Generating All Possible Repeating Units of a Polymer 42 3.3.5 Simplifying the Polymer Structure 44 3.3.6 Splitting the Polymer Structure 48 3.4 ALGORITHMS OF HANDLING REACTION TYPES 49 3.4.1 Addition Reactions 49 3.4.1.1 Double/Triple Bond Opening 49 3.4.1.2 Special Cases 51 3.4.2 Ring Opening Reactions 53 3.4.2.1 Ring Bond Opening 53 3.4.2.2 Special Cases 54 3.4.3 Condensation Reactions 56 3.4.3.1 Functional Groups 56 3.4.3.2 Special Cases for Rearrangements without Forming Rings, Ring Opening, etc. 60 3.4.3.3 Automatically Finding Functional Groups 62 3.4.3.4 Special Cases (with Ring Formation/Opening, Rearrangement etc.) 66 3.5 MACHINE LEARNING FOR POLYMER NAMES TO STRUCTURES CONVERSION – FROM POLYMER NAMES TO MONOMER NAMES, STRUCTURES AND REACTION TYPES TO GENERATE POLYMER STRUCTURES 79 3.5.1 Datasets 79 3.5.2 Predicting the Double/Triple Bond Opening of Addition Reaction 80 3.5.3 Predicting the Ring Bond Opening of Ring Opening Reaction 83 3.5.4 Predicting the Functional Groups Reaction of Condensation Reaction 85 3.5.5 Predicting the Type of Reaction Pathway of Monomers 87 CHAPTER 4 RESULTS 88 4.1 MODEL EVALUATION METHOD 88 4.2 PERFORMANCE OF MACHINE LEARNING MODELS FOR PREDICTING REACTION PATHWAYS FROM POLYMER NAMES 92 4.2.1 Addition Reaction Type with Single Monomer 92 4.2.2 Addition Reaction Type with Two Monomers 97 4.2.3 Ring Opening Reaction Type with Single Monomer 101 4.2.4 Condensation Reaction Type with a Single Monomer 108 4.2.5 Condensation Reaction Type with Two Monomers 113 4.2.6 Double/Triple Bond Opening in Addition Reactions 117 4.2.7 The Ring Bond Opening of Ring Opening Reactions 124 4.2.8 The Functional Group Reactions of Condensation Reactions 129 4.3 VALIDATION OF POLYNAME2STRUCTURE SYSTEM 134 4.3.1 Comparative Performance of Exhaustive Mode and Predictive Mode in PolyName2Structure System 134 4.3.2 Examples of Successful Cases and Failed Cases Using the Prediction Models of the PolyName2Structure System 139 4.3.2.1 Examples of Successful Cases in Training Set 139 4.3.2.2 Examples of Failed Cases in Training Set 143 4.3.2.3 Examples of Failed Cases in External Dataset 145 CHAPTER 5 DISCUSSION 149 5.1 NOVELTY OF POLYNAME2STRUCTURE 149 5.1.1 Algorithms for Manipulating Polymer Structures and Polymer Reaction Types 149 5.1.2 Descriptor Sets Generated from PoLyInfo Database 149 5.1.3 Prediction Models of Reaction Pathways 150 5.1.4 Applications of PolyName2Structure System 150 5.2 LIMITATIONS 151 5.2.1 Limitations of Datasets 151 5.2.1.1 Patterns Not Present in the Training Set 151 5.2.1.2 Errors in The Dataset 151 5.2.1.3 The Reaction Condition 151 5.2.1.4 Number of Monomers 152 5.2.2 Limitations of Problem Definition 152 5.2.2.1 Non-Structure-Based or Non-Source-Based Names 152 5.2.2.2 Homopolymers and Alternating Copolymers only 152 5.2.3 Limitations due to Other Restrictions 153 5.2.3.1 Too Many Hits in Substructure Search 153 5.2.3.2 Rearrangement 153 5.2.3.3 File Format 154 5.2.3.4 Cis/Trans Isomers, Chiral Isomers, Ionic bonds, and Aromaticity 154 5.2.3.5 Condensation Reaction Scheme 154 5.3 FUTURE WORK 155 5.3.1 Gathering More Open Polymer Data 155 5.3.2 Detecting More Functional Groups Involved in Rearrangement or Ring Forming/Opening Reactions 155 5.3.3 Handling Other Types of Copolymers and Polymers with End Groups 156 CHAPTER 6 CONCLUSIONS 157 REFERENCE 158 APPENDIX 162 A. RESUME 162 B. EXTERNAL DATASET EXTRACTED FROM SIGMA-ALDRICH POLYMER PRODUCT CATALOG 163 C. DICTIONARY FOR COMMON NAMES OF POLYMER STRUCTURES EXTRACTED FROM SIGMA-ALDRICH POLYMER PRODUCT CATALOG 174 D. ADDITION: DOUBLE/TRIPLE BOND OPENING DESCRIPTORS (SMILES FORMAT) 185 E. RING OPENING: RING BOND OPENING DESCRIPTORS (SMILES FORMAT) 207 F. CONDENSATION: FUNCTIONAL GROUPS AS DESCRIPTOR SET (SMARTS FORMAT) 239 | |
| dc.language.iso | en | |
| dc.subject | 預測反應途徑 | zh_TW |
| dc.subject | 找反應基團 | zh_TW |
| dc.subject | 反應物基準名稱 | zh_TW |
| dc.subject | 資料探勘 | zh_TW |
| dc.subject | 名稱轉結構 | zh_TW |
| dc.subject | 聚合物名稱 | zh_TW |
| dc.subject | 結構基準名稱 | zh_TW |
| dc.subject | predicting reaction pathways | en |
| dc.subject | finding functional groups | en |
| dc.subject | structure-based names | en |
| dc.subject | source-based names | en |
| dc.subject | text mining | en |
| dc.subject | name to structure conversion | en |
| dc.subject | polymer name | en |
| dc.title | PolyName2Structure系統 - 利用預測聚合物單體結構的反應途徑,來達成從聚合物名稱轉成結構的目標 | zh_TW |
| dc.title | PolyName2Structure - A Polymer Name to Structure System of Structure-Based and Source-Based Names by Predicting the Reaction Pathways of Monomers | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 102-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 牟中原(Chung-Yuan Mou),詹益慈(Yi-Tsu Chan),鄭雅如(Joy Cheng) | |
| dc.subject.keyword | 聚合物名稱,名稱轉結構,資料探勘,找反應基團,預測反應途徑,結構基準名稱,反應物基準名稱, | zh_TW |
| dc.subject.keyword | polymer name,name to structure conversion,text mining,source-based names,structure-based names,finding functional groups,predicting reaction pathways, | en |
| dc.relation.page | 244 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2014-08-04 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-103-1.pdf 未授權公開取用 | 6.53 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
