單核苷酸多型性與單體型相關的最佳化問題之研究

Yao-Ting Huang; 黃耀廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34277

標題:	單核苷酸多型性與單體型相關的最佳化問題之研究 A Study on Some Optimization Problems Related to SNPs and Haplotypes
作者:	Yao-Ting Huang 黃耀廷
指導教授:	趙坤茂(Kun-Mao Chao)
關鍵字:	單核&#33527,酸多型性,單體型,基因型,近似演算法, Single Nucleotide Polymorphism,Haplotype,Genotype,Appproximation Algorithm,
出版年 :	2006
學位:	博士
摘要:	本論文探討一系列與單核苷酸多型性與單體型相關之最佳化問題。本論文中大多數探討之問題都被證明為難題。為了能有效率地解決這些難題，我們設計並實作一系列近似演算法。經由理論分析與實驗佐證，我們證明所設計的演算法不僅有良好的執行速度，其所找到的近似解亦可相當逼近最佳解。本論文的第一部份探討如何尋找一組單核苷酸多型性，稱為強固型標籤單核苷酸多型性，可以容許在實驗中遺失部分單核苷酸多型性。我們證明要尋找最少數量的強固型標籤單核苷酸多型性為一難題。為了能有效率的解決此一難題，我們提出了兩種貪婪演算法與一種線性規劃釋限演算法。我們的理論分析與實驗結果顯示這些演算法不僅執行速度很快，更可以找到相當接近最佳解之近似解。本論文的第二部份探討如何利用多標記單體型，尋找一組最少量的標籤單核苷酸多型性。我們將此問題切割成三個子問題，並證明其中兩個子問題為難題。我們提出一些精確與近似演算法來分別解決這三個子問題。實驗結果顯示我們整合這些演算法所開發出的軟體，不僅可比現有的軟體找出更少量的標籤單核苷酸多型性，其執行速度亦有顯著的提升。本論文的第三部份探討在基於最大簡約法則下，從基因型資料中推測出單體型資料。我們將此問題表示為一個整數二次方規劃的問題，然後提出一個基於半正定規劃之反覆式近似演算法來解決此問題。我們的理論分析與實驗結果顯示我們開發出的軟體，不僅可以找出相當逼近最佳解之近似解，更可以比現有的軟體在部分資料下得到較好的推測正確率。本論文的第四部份探討如何在在全基因組關連分析中，選擇一些具有鑑別性的單核苷酸多型性來區分出病例與對照之樣本。我們設計了一個有效率的演算法來尋找具有鑑別性的單核苷酸多型性，並與目前既有之軟體在各種分類器下作比較。我們的實驗結果顯示此演算法在選擇足夠的鑑別性單核苷酸多型性下，可以比其他方法得到較高的鑑別正確率。 This dissertation studies several optimization problems related to SNPs and haplotypes. Most problems studied in this dissertation are shown to be NP-hard. To efficiently solve these problems, we design and implement a series of approximation algorithms. Our theoretical analysis and experimental results indicate that these algorithms are not only efficient but the solutions found by them are also quite close to the optimal solution. In Part I of this dissertation, we show that there exists a set of SNPs called robust tag SNPs which can tolerate missing SNPs in genotyping. The problem of finding a minimum set of robust tag SNPs is shown to be NP-hard. We give two greedy algorithms and one linear programming relaxation algorithm to efficiently solve this problem. Our theoretical analysis and experimental results show that these algorithms not only run very fast but also find nearly-optimal solutions. In Part II of this dissertation, we study the problem of selecting a minimum set of tag SNPs by multimarker haplotypes. This problem is divided into three subproblems, two of which are shown to be NP-hard. Several exact and approximation algorithms are proposed to solve these subproblems. The experimental results indicate that the program developed by integrating these algorithms finds a smaller set of tag SNPs and runs much faster than existing methods. In Part III of this dissertation, we study the problem of haplotype inference by maximum parsimony. We formulate this problem as an integer quadratic programming problem and present an iterative semi-definite programming relaxation based approximation algorithm. Our theoretical analysis and experimental results show that the solution found is not only close to the optimal solution but the accuracy is also improved in comparison with existing methods. In Part IV of this dissertation, we study the problem of selecting discriminative SNPs for classifying cases and controls in genome-wide association studies. We describe an efficient algorithm for identifying discriminative SNPs and compare it with several existing methods using a variety of classifiers. The experimental results indicate that our method consistently obtains better accuracies than other methods when sufficient discriminative SNPs are selected.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34277
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	706.34 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。