Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 生醫電子與資訊學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16127
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor曾宇鳳
dc.contributor.authorZe-Hao Linen
dc.contributor.author林澤豪zh_TW
dc.date.accessioned2021-06-07T18:02:04Z-
dc.date.copyright2012-08-07
dc.date.issued2012
dc.date.submitted2012-08-03
dc.identifier.citation1. Hertzberg, R. P.; Pope, A. J., High-throughput screening: new technology for the 21st century. Current Opinion in Chemical Biology 2000, 4, (4), 445-451.
2. Geurts, P.; Fillet, M.; de Seny, D.; Meuwis, M.-A.; Malaise, M.; Merville, M.-P.; Wehenkel, L., Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 21, (14), 3138-3145.
3. Nayak, G. S.; Kamath, S.; Pai, K. M.; Sarkar, A.; Ray, S.; Kurien, J.; D'Almeida, L.; Krishnanand, B. R.; Santhosh, C.; Kartha, V. B.; Mahato, K. K., Principal component analysis and artificial neural network analysis of oral tissue fluorescence spectra: Classification of normal premalignant and malignant pathological conditions. Biopolymers 2006, 82, (2), 152-166.
4. Li, Q.; Wang, Y.; Bryant, S. H., A novel method for mining highly imbalanced high-throughput screening data in PubChem. Bioinformatics 2009, 25, (24), 3310-3316.
5. Dawson, W. R.; Windsor, M. W., Fluorescence yields of aromatic compounds. The Journal of Physical Chemistry 1968, 72, (9), 3251-3260.
6. Kinosita Jr, K.; Kawato, S.; Ikegami, A., A theory of fluorescence polarization decay in membranes. Biophysical Journal 1977, 20, (3), 289-305.
7. Simeonov, A.; Jadhav, A.; Thomas, C. J.; Wang, Y.; Huang, R.; Southall, N. T.; Shinn, P.; Smith, J.; Austin, C. P.; Auld, D. S.; Inglese, J., Fluorescence Spectroscopic Profiling of Compound Libraries. Journal of Medicinal Chemistry 2008, 51, (8), 2363-2371.
8. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P., Random Forest:  A Classification and Regression Tool for Compound Classification and QSAR Modeling. Journal of Chemical Information and Computer Sciences 2003, 43, (6), 1947-1958.
9. Tong, W.; Hong, H.; Fang, H.; Xie, Q.; Perkins, R., Decision Forest:  Combining the Predictions of Multiple Independent Decision Tree Models. Journal of Chemical Information and Computer Sciences 2003, 43, (2), 525-531.
10. Yap, C. W., PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry 2011, 32, (7), 1466-1474.
11. Valeur, B., Molecular Fluorescence: Principles and Applications. Wiley-VCH: 2001.
12. O’Boyle, N.; Banck, M.; James, C.; Morley, C.; Vandermeersch, T.; Hutchison, G., Open Babel: An open chemical toolbox. Journal of Cheminformatics 2011, 3, (1), 1-14.
13. Lagunin, A. A.; Zakharov, A. V.; Filimonov, D. A.; Poroikov, V. V., A new approach to QSAR modelling of acute toxicity+. SAR and QSAR in Environmental Research 2007, 18, (3-4), 285-298.
14. Guha, R.; Schürer, S., Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays. Journal of Computer-Aided Molecular Design 2008, 22, (6), 367-384.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16127-
dc.description.abstract近年來機器學習以及高通量篩選在生物醫學相關領域上廣泛的應用,因此依照研究目標建置出一個好的預測分析模型是個重要的課題。在本篇論文中,我藉由在小分子上所做的螢光測定結果來建立出小分子是否具有吸收/放出特定波段的模型以及螢光篩選規則並加以預測。
一般螢光測定用於各種小分子對小分子、或是蛋白質對蛋白質的之高通量篩選(high throughput screening)反應分析上。但是長久以來我們知道,小分子本身發光與否,卻會大大的影響了分析的結果。因此在我的論文中,將小分子從高通量篩選中正確的預測出自體發光與否是首要的目標。
在本篇論文中,我使用了PubChem Bioassay資料庫當中小分子自體吸收特定光波長能量後再放出特定光波長能量之高通量篩選資料,建立一些螢光的數值預測模型。高通量篩選資料(high throughput screening)的特色是有活性(自體發光)的分子少,沒活性的分子多,並且資料分佈廣而雜亂,因此如何利用一些有效的篩選條件去挑選有用的分子也是一個關鍵。此次研究當中,我使用了PubChem Fingerprint,以傳統化學結構上一維跟二維的結構特性當作分子螢光資料的特徵。首先我從PubChem Bioassay資料庫取出總共132371個分子,建立出了五個分子螢光不同波段的模型。再來利用不同的篩選條件嘗試從高通量篩選多而龐雜的資料中挑選出有用的分子,使用改良過的隨機森林建立一些高準確度的分類模型。再利用PubChem Bioassay資料庫的65419個分子用來做為額外的測試資料,測試一個模型在不同資料群上是否具有良好的預測能力。
另外我也將這一些模型建成了一個小型的伺服器,可以讓想測試自己手上擁有的小分子的使用者藉由我所建造的螢光分類模型預測出小分子發光與否。
zh_TW
dc.description.abstractIn recent years, high throughput screening is widely used to screening potentially active compounds in the drug discovery process. Most of those high throughput screening (HTS) are based on fluorescence detection and often false positive screening results were caused by the compounds having fluorescent properties themselves. To avoid the false positive screening results, it would be helpful if one can identify the compounds and eliminate them before spending effort and money for a screening. It is known fluorescent molecules have certain structural features but it is a challenging task to predict the fluorescence property purely from the chemical structures.
In this thesis, we adopted the five sets of high-throughput screening data from PubChem Bioassay database which small molecules absorb the energy of a specific wavelength and then emission at a specific wavelength. It is very typical to have highly imbalanced ratio of fluorescent compounds and non-fluorescent compound among those assays. Therefore, to construct general rules and high quality predictive models are the keys to have a good fluorescence predictor tool. We used PubChem Fingerprints containing 1D and 2D chemical substructure feature as descriptors. First, five models for different wave bands with 132371 compounds in PubChem BioAssay database were independently constructed. Filters of known chemical knowledge for focused compounds in the HTS data were applied. Total of 65419 compounds are used as the testing data. Finally, a web server for prediction fluorescent molecules was established to help identifying fluorescent compound.
en
dc.description.provenanceMade available in DSpace on 2021-06-07T18:02:04Z (GMT). No. of bitstreams: 1
ntu-101-R99945020-1.pdf: 6441134 bytes, checksum: 22fad3e4e9e8b686f58344f6f48bdf40 (MD5)
Previous issue date: 2012
en
dc.description.tableofcontents口試委員會審定書 i
ACKNOWLEDGEMENTS ii
中文摘要 iii
ABSTRACT v
TABLE OF CONTENTS vii
LIST OF FIGURES x
LIST OF TABLES xv
Chapter 1 Introduction 1
1.1 Molecular Fluorescence in High Throughput Screening 1
1.2 Principal of Molecular Fluorescence 2
Chapter 2 Data 3
2.1 General Measurement of Data “2.2”, ”2.3”, ”2.4”, ”2.5”, and ”2.6” 3
2.2 PubChem BioAssay Database AID590: qHTS Assay for Spectroscopic Profiling in A350 Spectral Region 4
2.3 PubChem BioAssay database AID923: qHTS Assay for Spectroscopic Profiling in AFC Spectral Region 6
2.4 PubChem BioAssay Database AID591: qHTS Assay for Spectroscopic Profiling in A488 Spectral Region 7
2.5 PubChem BioAssay Database AID594: qHTS Assay for Spectroscopic Profiling in Rhodamine Spectral Region 9
2.6 PubChem BioAssay Database AID587: qHTS Assay for Spectroscopic Profiling in Texas Red Spectral Region 10
Chapter 3 Method 12
3.1 Random Forest 12
3.2 PubChem Fingerprints 14
3.3 General Molecular Fluorescence Filter based on Common Chemistry Knowledge 33
3.4 Consensus Voting 35
3.5 Model Evaluation 37
3.6 Implement of Molecular Fluorescence Predictor by Java Web Server 38
Chapter 4 Results and Discussion 40
4.1 Prediction Models of PubChem BioAssay Database AID590 40
4.1.1 Prediction Models with raw data 40
4.1.2 Prediction Models by Molecular Fluorescence Filter 47
4.1.3 Example of Molecular Fluorescence Filter Models 63
4.2 Prediction models of PubChem BioAssay database AID923 65
4.2.1 Building Model by Molecular Fluorescence Filter 65
4.2.2 Example of Molecular Fluorescence Filter Models 80
4.3 Prediction models of PubChem BioAssay database AID591 82
4.3.1 Building Model by Molecular Fluorescence Filter 82
4.3.2 Example of Molecular Fluorescence Filter Models 97
4.4 Prediction models of PubChem BioAssay database AID594 98
4.4.1 Building Model by Molecular Fluorescence Filter 98
4.4.2 Example of Molecular Fluorescence Filter Models 113
4.5 Prediction models of PubChem BioAssay database AID587 115
4.5.1 Building Model by Molecular Fluorescence Filter 115
4.5.2 Example of Molecular Fluorescence Filter Models 129
4.6 Example of implementation of Molecular Fluorescence Predictor 131
Chapter 5 Conclusion 133
BIBLIOGRAPHY 135
dc.language.isozh-TW
dc.title分子螢光分類模型zh_TW
dc.titleComputational Classification Molecular Fluorescence Modelsen
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.oralexamcommittee歐陽明,陳俊良,林軒田
dc.subject.keyword高通量篩選,機器學習,隨機森林,zh_TW
dc.subject.keywordHigh-throughput Screening,Machine Learning,Random Forest,en
dc.relation.page136
dc.rights.note未授權
dc.date.accepted2012-08-03
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept生醫電子與資訊學研究所zh_TW
顯示於系所單位:生醫電子與資訊學研究所

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  目前未授權公開取用
6.29 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved