乳癌之基因微陣列晶片數據整合與分析

Hsin-Chieh Yao; 姚欣潔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41091

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	余政靖(Cheng-Ching Yu)
dc.contributor.author	Hsin-Chieh Yao	en
dc.contributor.author	姚欣潔	zh_TW
dc.date.accessioned	2021-06-14T17:16:21Z	-
dc.date.available	2008-07-30
dc.date.copyright	2008-07-30
dc.date.issued	2008
dc.date.submitted	2008-07-25
dc.identifier.citation	參考文獻 [中文] [1] 盧研伯，「混合式模擬退火法應用於具迴流特性流程工廠之研究」，國立台灣科技大學工業管理系碩士論文 (2003). [2] 童慶斌，「應用模擬退火法於QUAL2E模式參數最佳化之研究」，國立台灣大學生物環境系統工程學研究所碩士論文 (2005). [3] 陳筱瑋，「乳癌之基因微陣列分析研究─探討基因表現與單核苷酸多型性及微型核醣核酸之關係」，國立台灣大學醫學暨工學院醫學工程學研究所碩士論文 (2007). [英文] [4] Barbacioru C; Wang Y; Canales RD; Sun YA; Keys DN; Chan F; Poulter KA; Samaha RR, “Effect of various normalization methods on applied biosystems expression array system data”, BMC Bioinformatics, 7:533 (2006). [5] Bammler T et al., “Standardizing global global gene expression analysis between laboratories and across platforms”, Nature Methods, 2:351-356 (2005). [6] Bhattacharjee A et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses”, PNAS 98:13790-13795 (2001). [7] Bolstad BM; Irizarry RA; Astrand M; Speed TP, “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias”, Bioinformatics, 19:185-193 (2002). [8] Borden JR; Paredes CJ; Papoutsakis ET, “Diffusion, mixing, and associated dye effects in DNA-microrarray hybridizations”, Biophysical Journal 89:3277-3284 (2005). [9] Cahan P; Rovegno F; Mooney D; Newman JC; Lauren III GS; McCaffrey TA, “Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization”, GENE 401:12-18 (2007). [10] Chen TS; Lin CC; Chiu YH; Chen RC, “ Combined-Density and Constraint-based Algorithm for Clustering”, In Proceedings of 2006 International Conference on Intelligent Systems and Knowledge (2006). [11] Cooper CS, “Applications of microarray technology in breast cancer research”, Breast Cancer Research, 3:158-175 (2001). [12] Eisen MB; Spellman PT; Brown PO; Botstein D, “Cluster analysis and display of genome-wide expression patterns”, Proc. Natl. Acad. Sci. USA, 95:14863-14868 (1998). [13] Fishel I; Kaufman A; Ruppin E, “Meta-analysis of gene expression data: a predictor-based approach”, Bioinformatics 23:1599-1606 (2007). [14] Geman D; d’Avignon C; Naiman DQ; Winslow RL, “Classifying gene expression profiles from pairwise mRNA comparisons”, NIH Public Access 3:19 (2004). [15] Herwig R; Poustka AJ; Muller C; Bull C; Lehrach H; O’Brien J, “Large-scale clustering of cDNA fingerprinting data”, Genome Research, 9:1093-1105 (1999). [16] Hu J; He X, “Enhanced quantile normalization of microarray data to reduce loss of information in gene expression profiles”, Biometrics, 63:50-59 (2007). [17] Huber W; Heydebreck AV; Sultmann H; Poustka A; Vingron M, “Variance stabilization applied to microrarray data calibration and to the quantification of differential expression”, Bioinformatics 18:s96-s104 (2002). [18] Jiang H; Deng Y; Chen HS; Tao L; Sha Q; Chen J; Tsai CJ; Zhang S, “Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes”, BMC Bioinformatics, 5:81 (2004). [19] Kuo WP; Jenssen TK; Butte AJ; Ohno-Machado L; Kohane IS, “Analysis of matched mRNA measurements from two different microarray technologies”, Bioinformatics 18:405-412 (2002). [20] Lee HK; Lee M; Roh HW; Lee N; Cho YH; Jeong JB; Jung HN; Yang WS; Ryu GH, “DNA chip evaluation as a diagnostic device”, Current Applied Physics, 5:433-437 (2005). [21] Leung YF; Cavalieri D, “Fundamentals of cDNA microarray data analysis”, Trends in Genetics, 19:649-659 (2003). [22] Liu H, “Evolving feature selection”, IEEE Intelligent Systems, 20:64-76 (2005). [23] Park T; Yi SG; Kang SH; Lee SY; Lee YS; Simon R, “Evaluation of normalization methods for microarray data”, BMC Bioinformatics, 4:33 (2003). [24] Petersen D; Chandramouli G; Geoghegan J; Hilburn J; Paarlberg J; Kim CH; Munroe D; Gangi L; Han J; Puri R; Staudt L; Weinstein J; Barrett JC; Green J; Kawasaki ES, “Three microarray platforms: an analysis of their concordance in profiling gene expression”, BMC Genomics 6:63-76 (2005). [25] Osborne C; Wilson P; Tripathy D, “Oncogenes and tumor suppressor genes in breast cancer: potential diagnostic and therapeutic applications”, The Oncologist 9:361-377 (2004). [26] Quackenbush J, “Computational analysis of microarray data”, Nature Reviews – Genetics, 2:418-427 (2001). [27] Schena M; Shalon D; Heller R; Chai A; Brown PO; Davis RW, “Parallel human genome analysis: microarray-based expression monitoring of 1000 genes”, Proc. Natl. Acad. Sci. USA, 93:10614-10619 (1996). [28] Shannon W; Culverhouse R; Duncan J, ”Analyzing microarray data using cluster analysis”, Pharmacogenomics 4:41-52 (2003). [29] Shao W; Brown M, “Advances in estrogen receptor biology: prospects for improvements in targeted breast cancer therapy”, Breast Cancer Research 6:39-52 (2003). [30] Smith GK; Speed T, “Normalization of cDNA microarray data”, Methods, 31: 265-273 (2003). [31] Sorlie T et al., “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications”, PNAS 98:10869-10874 (2001). [32] Sorlie T; Wang Y; Xiao C; Johnsen H; Naume B; Samaha RR; Borresen-Dale AL, “Distinct molecular mechanisms underlying clinically relevant subtypes of breast cancer: gene expression analyses across three different platforms”, BMC Genomics 7:127 (2006). [33] Stafford P; Brun M, “Three methods for optimization of cross-laboratory and cross-platform microarray expression data”, Nucleic Acids Research 35 (2007). [34] Steinhoff C; Vingron M, “Normalization and quantification of differential expression in gene expression microarrays”, Briefings in Bioinformatics, 7:166-177 (2006). [35] Tamayo P; Slonim D; Mesirov J; Zhu Q; Kitareewan S; Dmitrovsky E; Lander E; Golub TR, “Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation”, Proc. Natl. Acad. Sci. USA, 96:2907-2912 (1999). [36] Tantrum J; Murua A; Stuetzle W, “Hierarchical model-based clustering of large datasets through fractionation and refractionation”, Information Systems 29:315-326 (2004). [37] Thalamuthu A; Mukhopadhyay I; Zheng X; Tseng GC, “Evaluation and comparison of gene clustering methods in microarray analysis”, Bioinformatics, 22:2405-2412 (2006). [38] Torrente A; Kapushesky M; Brazma A, “A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings”, Bioinformatics 21:3993-3999 (2005). [39] Warnat P; Eils R; Brors B, “Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes”, BMC Bioinformatics, 6:265 (2005). [40] Xu Lei; Tan AC; Naiman DQ; Geman D; Winslow RL, “Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data”, Bioinformatics (2005). [41] Yang YH; Dudoit S; Luu P; Lin DM;, Peng V; Ngai J; Speed TP, “Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation”, Nucleic Acids Research, 30:4-15 (2002). [42] Yeung KY; Haynor DR; Ruzzo WL, “Validating clustering for gene expression data”, Bioinformatics 17:309-318 (2001).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41091	-
dc.description.abstract	本研究的目的在於探討子群標準化(Subgroup Standardization)在數據整合時的功效以及適用情形。一批乳癌病患樣本由於雜合時所使用的緩衝液為不同的水質來源，使階層化分群樹狀圖依著不同緩衝液分成兩個大群組，而並非按病患樣本最原始的生物特性分群，因此導致重複實驗的產生，製造了另外一批使用原先緩衝液且一模一樣的病患樣本，最後我們得到兩組包含了相同36位乳癌病患者的數據。因為實驗上的異質因子帶來了數據分析上的複雜度，甚至還帶來錯誤的分群結果，但同時卻也帶來另外一組重複實驗的數據，以及讓我們想辦法解決這項問題的契機。我們發現在套用文獻中知名且有效的正規化方法後加入子群標準化動作，能有效移除上述緩衝液造成的偏差，並且達到數據整合的功效。由於是相同病患的重複實驗，因此若預期達成數據整合目標，則利用階層化分群繪圖時，應能將36位病患共72個樣本依照每位病患為單體分為36個小群，我們以配對率來當作成效評估的標準；當我們採用Subgroup Standardization後，原本使用LOWESS正規化方法的數據配對率提高了94%，而對Median Rank Scores (MRS)則改進31%，即使是原本就相當有效的Quantile Normalization其配對率也改善了8%。利用三種文獻中提出的不同正規化方式當作驗證，Subgroup Standardization不論原本使用何種正規化方式，均能改善微陣列生物晶片數據整合的表現。接著採用乳癌病患的臨床資料作更進一步的驗證，一樣使用上述經過不同正規化方式處理後的數據，來觀察Subgroup Standardization對於ER正負分類的效果。首先採用文獻中提出可找出重要需求基因的TSP分類器，接著使用所有Score為1的基因組作階層化分群，由ER正負分類結果作為驗證，結果再度證明Subgroup Standardization對於ER分類有良好的功效。在得到上述結論後，進一步我們希望探討在三個目標下能得到的最佳結果：配對率、ER作階層化分群結果以及敏感度分析。我們使用TSP分類器挑出所有重要的基因組，接著利用模擬退火的最適化來挑選特定組數下的基因組合，接著將結果繪圖總結出最佳的正規化方式搭配最佳的基因組數。我們發現在針對配對率的地方，最佳的正規化方式為LOWESS加上Subgroup Standardization，基因組數不能太少；若要最適化ER分類情形，則較好的正規化方式為LOWESSd配上Standardization；而對於個別樣本的敏感度分析，則使用Quantile Normalization或者甚至直接使用原始數據都會得到較好的結果，且基因組數不能太多，否則將急速降低其敏感性。	zh_TW
dc.description.abstract	Unsuccessful clustering, as a result of different hybridization buffer used in a second set of samples, leads to repetitive experiments on the same samples using the original buffer. Thus, we have two sets of gene expression data for the same 36 samples, breast cancer samples. This heterogeneity provides unnecessary complication in data analysis and, even worse, given false classification in clustering. However, this repetition provides an ultimate test on data treatment methods for possible removal of buffer effects and, eventually, a useful approach for data integration. Subgroup standardization is proposed to compensate for the buffer effect in microarray experiments. This is performed immediately after the normalization step. Provided with repetitive microarray experiments on all 36 samples, the percentage of pair-wise matching for all 36 samples using hierarchical clustering can be used to evaluate different approaches. Using the subgroup standardization, the matching rate is improved by a factor of 94%, 31% and 8% for Lowess, Median Rank Scores (MRS), and quantile normalizations, respectively. The proposed subgroup standardization enhances the performance of data integration for microarray data, regardless of normalization methods. The results are validated via repetitive experiments for the same samples using different buffers on the same platform. Using pair-wise matching from hierarchical clustering as a measure, quantile normalization performs better than MRS, with Lowess performing the worst. However, they all can be further improved using subgroup standardization. To take one step ahead, we aim to classify the ER positive and ER negative patient groups based on the different normalization methods with and without subgroup standardization. We completely imitate the TSP classifier to choose candidate genes about ER values and apply simulated annealing to search for the optimized combination of genes according to the scores. Then we could compare the outcome and effects by some indications, such as matching rate, sensitivity, specificity, and ER hierarchical clustering results both in training data and testing data. We discover that subgroup standardization is useful and helpful to classify ER positive or negative patients and also matching rate when collocating hierarchical clustering. It is an effective way when we try to view the group performance of the whole data sets. Since the sensitivity is bad, however, we should not use it when we want to peruse the behavior and details of every single sample.	en
dc.description.provenance	Made available in DSpace on 2021-06-14T17:16:21Z (GMT). No. of bitstreams: 1 ntu-97-R95524023-1.pdf: 2131584 bytes, checksum: f5dbc6354854a980e6b354125f2c339b (MD5) Previous issue date: 2008	en
dc.description.tableofcontents	目錄致謝 I 摘要 II ABSTRACT IV 目錄 VI 圖索引 VIII 表索引 X 1 緒論 1 1.1 前言 1 1.2 文獻回顧 4 1.3 研究動機與目的 6 1.4 組織章節 7 2 數據前處理與正規化方法介紹（DATA PRETREATMENT AND NORMALIZATION METHODS） 9 2.1 實驗步驟 9 2.2 乳癌基因表現量數據組 11 2.3 微陣列晶片數據前處理 12 2.3.1 微陣列晶片背景值校正 12 2.3.2 不佳點移除 13 2.3.3 對數轉換 13 2.3.4 遺失數據處理 14 2.4 微陣列晶片數據正規化處理 15 2.4.1 原始數據（RAW DATA） 17 2.4.2 分位數正規化（QUANTILE NORMALIZATION） 20 2.4.3 中位數排序法（MEDIAN RANK SCORES） 23 2.4.4 局部加權線性迴歸法（LOWESS） 25 2.4.5 子群標準化（SUBGROUP STANDARDIZATION） 32 2.5 特高或特低表現量基因選擇（EXPRESSED GENES SELECTION） 38 2.5.1 過濾（FILTERING） 38 2.5.2 階層式分群法（HIERARCHICAL CLUSTERING） 39 2.6 結論 45 3 乳癌分類 (BREAST CANCER SUBTYPES CLASSIFICATION) 47 3.1 雌激素受體(ESTROGEN RECEPTOR)簡介 47 3.2 TSP分類器（TOP-SCORING PAIRS CLASSIFIER） 48 3.2.1 TSP方法簡介 48 3.2.2 實驗操作結果 50 3.2.3 對照組（TESTING DATA）結果驗證 54 3.3 利用TSP作基因挑選訓練組與測試組比較 58 3.3.1 最適化設計原理：模擬退火 59 3.3.1.1 蒙地卡羅與模擬退火原理 60 3.3.1.2 模擬退火演算法簡介 61 3.3.1.3 模擬退火參數選擇 63 3.3.2 敏感性(SENSITIVITY)與特異性(SPECIFICITY)分析 64 3.3.3 以字串比對判斷ER階層化分群結果 65 3.3.4 最適化結果 66 3.4 結論 78 4 結論 84 參考文獻 85 附錄 A 89 附錄 B 93 附錄 C 94
dc.language.iso	zh-TW
dc.subject	系統生物	zh_TW
dc.subject	正規化	zh_TW
dc.subject	微陣列生物晶片	zh_TW
dc.subject	Bioinformatics	en
dc.subject	Normalization	en
dc.subject	Microarray	en
dc.title	乳癌之基因微陣列晶片數據整合與分析	zh_TW
dc.title	Breast Cancer Microarray Gene-Expression Data Sets Integration and Analysis	en
dc.type	Thesis
dc.date.schoolyear	96-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃孝平(Hsiao-Ping Huang),陳誠亮(Cheng-Liang Chen),阮雪芬(Hsueh-Fen Juan),王逢盛(Feng-Sheng Wang)
dc.subject.keyword	微陣列生物晶片,正規化,系統生物,	zh_TW
dc.subject.keyword	Microarray,Normalization,Bioinformatics,	en
dc.relation.page	87
dc.rights.note	有償授權
dc.date.accepted	2008-07-28
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	化學工程學研究所	zh_TW
顯示於系所單位：	化學工程學系

文件中的檔案：

檔案	大小	格式
ntu-97-1.pdf 未授權公開取用	2.08 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。