請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/62631
標題: | 應用支持向量機於植物核醣核酸對微型核醣核酸目標基因預測 Plant MicroRNA-mRNA Target Prediction Using Support Vector Machine |
作者: | Shu-Yu Kang 康書語 |
指導教授: | 黃乾綱(Chien-Kang Huang) |
關鍵字: | 植物微型核醣核酸,預測目標基因方法,特徵擷取,機器學習, plant miRNA,target gene prediction method,feature extraction,machine learning, |
出版年 : | 2013 |
學位: | 碩士 |
摘要: | 植物的微型核醣核酸(microRNA)屬於非編碼核醣核酸(non-coding RNA),平均約為19-22核苷酸長,它能抑制目標基因轉譯成蛋白質和對目標基因進行裁切,進而影響許多重要的生物反應。經由生物實驗尋找微型核醣核酸目標基因需耗費大量的時間及成本,因此開發能有效預測目標基因的演算法便成了重要的議題。現有的預測工具,大多運用以下六大類廣為生物界認可的特徵作為預測準則:互補性(Complementarity)、結合體熱動力穩定性(Thermodynamic Stability for Duplex)、區段可鍵結性(Site Accessibility)、演化保留性(Evolutionary Conservation)、序列位置特性(Site Location)與多重鍵結特性(Multiplicity of Binding Sites)。
在本研究中,以前述六大類特徵為基礎,盡可能對各類特徵別進行全方位的考慮,配合支持向量機(Support Vector Machine, SVM)的使用,對植物的微型核醣核酸目標基因進行預測,並透過特徵挑選來評估各類特徵的重要性。經由在阿拉伯芥(Arabidopsis thaliana)上所做的獨立實驗驗證,本研究的演算法相較於其他現有預測方法,有最佳的預測表現:準確度(Precision)100%、正確度(Accuracy)97.8%、敏感度(Sensitivity)97.1%、特異性(Specificity)100%。由RELIEF-F方法的特徵挑選(Feature Selection)結果顯示,微型核醣核酸與核醣核酸鍵結的最小自由能(Minimum Free Energy, MFE)為最重要的特徵。另外,兩兩核苷酸組成(Bigram)與本研究中新加入的三三核苷酸組成(Trigram)亦扮演著相當重要的角色。 Plant microRNAs (miRNAs) are small non-coding RNAs consisting of 19-22 nucleotides. MiRNAs play an important role in gene regulation and affect many follow-up biological interactions either by suppressing the translation of target genes to proteins or by the cleavage of the target genes. Due to the costly and time-consuming biochemical experiment process to verify a target gene, computational methods are developed to screen out candidates that are not likely to be the targets. Most current prediction tools develop their algorithm based on six categories of features that are commonly recognized and reported to be important in miRNA-mRNA interactions. These six categories are complementarity, thermodynamic stability for duplex, site accessibility, evolutionary conservation, site location and multiplicity of binding sites. In this research, all the six categories of features along with proposed features are considered. This research uses machine learning based algorithms “Support Vector Machine (SVM)” as classifier to predict plant miRNA binding targets, followed by a feature selection phase using RELIEF-F method. In an independent test on Arabidopsis thaliana, the proposed tool can achieve the prediction result with the precision of 100%, accuracy of 97.8%, sensitivity of 97.1%, and specificity of 100%. Moreover, according to the result of RELIEF-F scores in feature selection, minimum free energy (MFE) of miRNA-mRNA duplex appears to be the most important feature, followed by the bigram and trigram features. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/62631 |
全文授權: | 有償授權 |
顯示於系所單位: | 工程科學及海洋工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-102-1.pdf 目前未授權公開取用 | 1.21 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。