Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93077
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
---|---|---|
dc.contributor.advisor | 蔡懷寬 | zh_TW |
dc.contributor.advisor | Huai-Kuang Tsai | en |
dc.contributor.author | 陳瑜欣 | zh_TW |
dc.contributor.author | Yu-Hsin Chen | en |
dc.date.accessioned | 2024-07-17T16:17:44Z | - |
dc.date.available | 2024-07-18 | - |
dc.date.copyright | 2024-07-17 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2024-07-10 | - |
dc.identifier.citation | 1. Larance M, Lamond AI. (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol, 16, 269–280.
2. Bludau I, Aebersold R. (2020) Proteomic and interactomic insights into the molecular basis of cell functional diversity. Nat Rev Mol Cell Biol, 21, 327–340. 3. Huttlin EL, Bruckner RJ, Paulo JA, et al. (2017) Architecture of the human interactome defines protein communities and disease networks. Nature, 545, 505–509. 4. Cheng F, Zhao J, Wang Y, et al. (2021) Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat Genet, 53, 342–353. 5. Lu H, Zhou Q, He J, et al. (2020) Recent advances in the development of protein–protein interactions modulators: mechanisms and clinical trials. Signal Transduct Target Ther, 5, 213. 6. Paiano A, Margiotta A, De Luca M, et al. (2019) Yeast two-hybrid assay to identify interacting proteins. Curr Protoc Protein Sci, 95, e70. 7. Luck K, Kim DK, Lambourne L, et al. (2020) A reference map of the human binary protein interactome. Nature, 580, 402–408. 8. Duarte CEM, Euclydes NC. (2024) Protein–protein interaction via two-hybrid assay in yeast. Methods Mol Bio, 2724, 193–210. 9. Huttlin EL, Bruckner RJ, Navarrete-Perea J, et al. (2021) Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell, 184, 3022–3040.e28. 10. Gnanasekaran P, Pappu HR. (2023) Affinity purification-mass spectroscopy (AP-MS) and co-immunoprecipitation (Co-IP) technique to study protein–protein interactions. Methods Mol Bio, 2690, 81–85. 11. Foster LJ, de Hoog CL, Zhang Y, et al. (2006) A mammalian organelle map by protein correlation profiling. Cell, 125, 187–199. 12. McBride Z, Chen D, Lee Y, et al. (2019) A label-free mass spectrometry method to predict endogenous protein complex composition. Mol Cell Proteomics, 18, 1588. 13. Salas D, Stacey RG, Akinlaja M, et al. (2020) Next-generation interactomics: considerations for the use of co-elution to measure protein interaction networks. Mol Cell Proteomics, 19, 1–10. 14. Stacey RG, Skinnider MA, Scott NE, et al. (2017) A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE). BMC Bioinformatics, 18, 457. 15. Skinnider MA, Cai C, Stacey RG, et al. (2021) PrInCE: an R/Bioconductor package for protein-protein interaction network inference from co-fractionation mass spectrometry data. Bioinformatics, 37, 2775–2777. 16. Hu LZM, Goebels F, Tan JH, et al. (2019) EPIC: software toolkit for elution profile-based inference of protein complexes. Nat Method, 16, 737–742. 17. Skinnider MA, Foster LJ. (2021) Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments. Nat Methods, 18, 806–815. 18. Glasmachers T. (2017) Limits of end-to-end learning. J Mach Learn Res, 77, 17–32. 19. Miech A, Alayrac JB, Smaira L, et al. (2020) End-to-end learning of visual representations from uncurated instructional videos. CVPR 2020, Seattle, WA, USA. IEEE. 2020. 9876–9886. 20. Bengio Y, Courville A, Vincent P. (2012) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell, 35, 1798–1828. 21. Yu D, Seltzer ML, Li J, et al. (2013) Feature learning in deep neural networks – studies on speech recognition tasks. ICLR 2013, Scottsdale, AZ, USA. JMLR. 2013. 22. Dara S, Tumma P. (2018) Feature extraction by using deep learning: a survey. ICECA 2018, Coimbatore, India. IEEE. 2018. 1795–1801. 23. Yu M, Gormley MR, Dredze M. (2015) Combining word embeddings and feature embeddings for fine-grained relation extraction. NAACL 2015, Denver, Colorado. ACL. 2015. 1374–1379. 24. Kan S, Cen Y, He Z, et al. (2019) Supervised deep feature embedding with handcrafted feature. IEEE Trans Image Process, 28, 5809–5823. 25. Swamy KBS, Lee HY, Ladra C, et al. (2022) Proteotoxicity caused by perturbed protein complexes underlies hybrid incompatibility in yeast. Nat Commun, 13, 4394. 26. Perez-Riverol Y, Bai J, Bandla C, et al. (2022) The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res, 50, D543–D552. 27. Nepusz T, Yu H, Paccanaro A. (2012) Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods, 9, 471–472. 28. Risum AB, Bro R. (2019) Using deep learning to evaluate peaks in chromatographic data. Talanta, 204, 255–260. 29. Melnikov AD, Tsentalovich YP, Yanshole VV. (2020) Deep learning for the precise peak detection in high-resolution LC-MS data. Anal Chem, 92, 588–592. 30. Chen M, Ju CJT, Zhou G, et al. (2019) Multifaceted protein–protein interaction prediction based on Siamese residual RCNN. Bioinformatics, 35, i305–i314. 31. Wang L, Wang HF, Liu SR, et al. (2019) Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation Forest. Sci Rep, 9, 1–12. 32. Zhuang Z, Shen X, Pan W. (2019) A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data. Bioinformatics, 35, 2899–2906. 33. Wang B, Mei C, Wang Y, et al. (2021) Imbalance data processing strategy for protein interaction sites prediction. IEEE/ACM Trans Comput Biol Bioinform, 18, 985–994. 34. Korkmaz S. (2020) Deep learning-based imbalanced data classification for drug discovery. J Chem Inf Model, 60, 4180–4190. 35. Tsitsiridis G, Steinkamp R, Giurgiu M, et al. (2023) CORUM: the comprehensive resource of mammalian protein complexes-2022. Nucleic Acids Res, 51, D539–D545. 36. UniProt Consortium. (2023) UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res, 51, D523–D531. 37. Chan EYS, Corless RM. (2023) Chaos Game Representation. SIAM Review, 65, 261–290. 38. Almeida JS, Carriç JA, Maretzek A, et al. (2001) Analysis of genomic sequences by Chaos Game Representation. Bioinformatics, 17, 429–437. 39. Löchel HF, Eger D, Sperlea T, et al. (2020) Deep learning on chaos game representation for proteins. Bioinformatics, 36, 272–279. 40. Löchel HF, Heider D. (2021) Chaos game representation and its applications in bioinformatics. Comput Struct Biotechnol J, 19, 6263–6271. 41. Chen YH, Chao KH, Wong JY, et al. (2023) A feature extraction free approach for protein interactome inference from co-elution data. Brief Bioinform, 24, bbad229. 42. Kadir T, Brady M. (2001) Saliency, scale and image description. Int J Comput Vis, 45, 83–105. 43. Van Dongen S. (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl, 30, 121–141. 44. Enright AJ, Van Dongen S, Ouzounis CA. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res, 30, 1575–1584. 45. Yip AM, Horvath S. (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics, 8, 22. 46. Wang JZ, Du Z, Payattakool R, et al. (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics, 23, 1274–1281. 47. Zhao C, Wang Z. (2018) GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms. Sci Rep, 8, 15107. 48. Krumsiek J, Zimmer R, Friedel CC. (2009) Bootstrapping the interactome: unsupervised identification of protein complexes in yeast. J. Comput. Biol, 16, 971–987. 49. El Adoui M, Drisis S, Benjelloun M. (2020) Multi-input deep learning architecture for predicting breast tumor response to chemotherapy using quantitative MR images. Int J Comput Assist Radiol Surg, 15, 1491–1500. 50. Tsietso D, Yahya A, Samikannu R, et al. (2023) Multi-input deep learning approach for breast cancer screening using thermal infrared imaging and clinical data. IEEE Access, 11, 52101–52116. 51. Jumper J, Evans R, Pritzel A, et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589. 52. Elnaggar A, Heinzinger M, Dallago C, et al. (2022) ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell, 44, 7112–7127. 53. Singh R, Devkota K, Sledzieski S, et al. (2022) Topsy-Turvy: integrating a global view into sequence-based PPI prediction. Bioinformatics, 38, I264–I272. 54. Ghorbani A, Zou JY. (2018) Embedding for informative missingness: deep learning with incomplete data. Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2018, Monticello, IL, USA. IEEE. 2018. 437–445. 55. Mikołajczyk A, Grochowski M. (2018) Data augmentation for improving deep learning in image classification problem. International Interdisciplinary PhD Workshop (IIPhDW 2018), Swinoujscie, Poland. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93077 | - |
dc.description.abstract | 蛋白質複合物是細胞功能的基礎作用單位,而恢復其組成對於理解細胞過程的機制至關重要。液相層析串聯式質譜儀 (CF-MS) 分析法是高通量技術之一,其分析結果可用於推斷多對多的蛋白質交互作用關係,對於蛋白質體系統性結構復原方法起到重要進展。為了從CF-MS數據推斷蛋白質相互作用,目前的機器學習分析流程通常先基於人為定義的CF-MS特徵來進行蛋白質配對關係的推斷,接著使用聚類演算法形成潛在的蛋白質複合物。然而,目前使用的分析方法存在人為定義特徵導致的分析偏差、數據分佈嚴重不平衡導致的過度擬合問題、CF-MS數據噪音導致的偽陰性以及偽陽性問題。
為了解決人為定義特徵和資料不平衡等問題,我提出了一種基於卷積神經網路 (CNN) 的端到端學習架構,SPIFFED,透過此架構將特徵提取與交互作用組預測在資料平衡的條件下串聯成一個完整的訓練流程。在傳統資料不平衡的訓練模式下,SPIFFED在預測蛋白質-蛋白質交互作用關係 (PPIs) 方面優於目前最先進的分析方法。當在資料平衡的模式下訓練時,SPIFFED大幅提高了對陽性PPIs的敏感度。此外,SPIFFED內建的整合模型提供了不同的投票方案來統整生物性重複樣本之間或不同CF-MS資料集之間的PPIs預測結果,讓使用者可以根據其實驗設計選擇信賴度較高的交互作用關係。 為了解決洗脫圖譜內在的噪音干擾引起的偽陰性問題和偽陽性問題,我提出了另一種平衡的端到端學習架構,FREEPII。FREEPII也使用CNN作為主要架構。與SPIFFED不同的是,FREEPII專注於學習個別蛋白質的特徵表示而非蛋白質對的特徵表示,因此其計算複雜度從O(N2) 降低到O(N)。除此之外,FREEPII使用多個輸入來擴展可用於計算蛋白質之間資料相似性的信息,並使用蛋白質嵌入將存在於蛋白質複合物的網路層級資訊轉移到學習的特徵表示以重新調整CF-MS數據中相互作用的強度。FREEPII在PPIs分類和PPIs聚類方面的結果均優於EPIC 和 SPIFFED。透過視覺化,我揭示了FREEPII在表徵學習和分類判斷方面的優勢。透過交叉預測,我證明了結合不同解析度的CF-MS資料進行模型訓練可以顯著提高CNN對於不同實驗中PPIs分類預測的廣泛性。 综上所述,我提出的方法解決了特徵提取步驟引入的資料壓縮和偏差問題,學習到更通用的特徵表示,並在平衡訓練下更準確地發現陽性的作用關係。透過考慮不同屬性的輸入和網路層級的資訊,我的方法有效地減少了洗脫曲線中存在的雜訊所造成的預測誤差,在 PPIs 分類和 PPIs 聚類方面均優於先前的分析方法。最後,我證明了結合不同解析度的CF-MS資料訓練模型可以顯著提高CNN對於不同實驗中PPIs分類預測的泛化能力。 | zh_TW |
dc.description.abstract | Protein complexes are key functional units in cellular processes, and recovering their composition is critical to understanding the mechanistic basis of cellular processes. Chromatographic fractionation coupled with mass spectrometry (CF-MS) is a high-throughput technique that has significantly advanced protein complex studies by enabling global interactome inference. To infer protein-protein interactions (PPIs) from CF-MS data, current machine learning analysis pipelines usually first infer PPIs based on handcrafted CF-MS features, followed by clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features, overfitting problems caused by severely imbalanced data distribution, false negative problems and false positive problems caused by noise interference in CF-MS data itself.
To address the issues of handcrafted features and data imbalance, I present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network (CNN). SPIFFED outperforms state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED has greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from biological replicates or multiple CF-MS datasets, allowing users to select high confidence interactions depending on the CF-MS experimental designs. To solve the problem of prediction errors caused by noise interference in elution profiles, I further proposed another balanced end-to-end learning architecture, Feature Representation Enhancement End-to-end Protein Interaction Inference (FREEPII). FREEPII also used CNN as the main architecture. Unlike SPIFFED, FREEPII focuses on learning the feature representation of proteins rather than the feature representation of protein pairs, reducing computational complexity from O(N2) to O(N). In addition, FREEPII uses multi-inputs to extend the information available for calculating data similarity between proteins, and uses protein embedding to transfer network-level information of protein complexes into the feature representations to rescale the strength of interactions present in CF-MS data. FREEPII outperforms EPIC and SPIFFED in both PPIs classification and PPIs clustering. Through visualization, I reveal the advantages of FREEPII in representation learning and classification judgment. Through cross prediction, I demonstrated that combining CF-MS data with different resolutions for model training can significantly improve the generality of CNN for PPIs classification in different experiments. In summary, my proposed method solves the data compression and bias issues introduced by the feature extraction step, learns a more general feature representation, and discovers positive interactions more accurately under balanced training. By considering different properties of inputs and network-level information, my method effectively reduces the prediction error caused by the noise present in the elution profiles, outperforming previous analysis methods in both PPIs classification and PPIs clustering. Finally, I demonstrate that combining CF-MS data of different resolutions for model training can significantly improve the generalizability of CNN for PPIs classification. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-17T16:17:44Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2024-07-17T16:17:44Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | English abstract⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ i
Chinese abstract⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ iii Contents⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ v List of Tables⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ viii List of Figures⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ ix List of Abbreviations⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ xi Chapter 1. Introduction⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 1 Chapter 2. A feature extraction free approach for protein interactome inference from co-elution data⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 5 2.1 Materials and Methods⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 5 2.1.1 Overview⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 5 2.1.2 Datasets⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 5 2.1.3 Elution profile pre-processing⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 6 2.1.4 Pre-processing of reference protein complexes⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 6 2.1.5 Protein pairs creation and labelling⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 7 2.1.6 Model architecture⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 7 2.1.7 Model evaluation⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 8 2.1.8 Comparison with EPIC⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 8 2.1.9 Effect of data imbalance on model training⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 9 2.1.10 Ensemble module⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 9 2.1.11 Clustering and evaluation⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 10 2.2 Results⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 10 2.2.1 Performance comparison of SPIFFED with EPIC⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 10 2.2.2 The impact of data imbalance on model performance⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 11 2.2.3 Integrate positive predictions across data via SPIFFED's ensemble model⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 12 2.3 Discussion⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 14 Chapter 3. Complete end-to-end learning from protein feature representation to protein interactome inference⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 24 3.1 Materials and Methods⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 24 3.1.1 CF-MS datasets curation and data pre-processing⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 24 3.1.2 Protein complexes collection and protein pairs labelling⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 24 3.1.3 Protein sequences collection and numerical representation⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 25 3.1.4 Model architecture⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 25 3.1.5 Model training and evaluation on PPIs⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 26 3.1.6 Comparison with other CF-MS analysis tools⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 27 3.1.7 Architecture of models in ablation study⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 27 3.1.8 Visualization of feature representations of proteins labelled by protein complexes⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 27 3.1.9 Visualize feature hotspots for classifying each PPI by computing saliency maps⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 28 3.1.10 Generating clusters using predicted PPI scores⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 29 3.1.11 Gene function annotations and semantic similarity measurement of GO-terms⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 30 3.1.12 Co-localization within protein complexes⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 31 3.1.13 Structure similarity between predicted clusters and reference protein complex dataset⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 32 3.2 Results⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 32 3.2.1 Analysis pipeline of FREEPII⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 32 3.2.2 FREEPII achieves best performance in PPIs classification⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 33 3.2.3 The discriminative power of FREEPII from both CF-MS data and protein sequences⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 34 3.2.4 Protein embedding transfer network-level information in labels to protein feature representation⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 35 3.2.5 FREEPII achieves the best performance in cluster quality evaluation⸱⸱ 36 3.2.6 Co-training greatly improves the prediction generality of FREEPII(-)⸱ 37 3.3 Discussion⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 38 Chapter 4. Conclusion⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 49 References⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 50 Supplementary Materials⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 54 Supplementary Tables⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 58 Supplementary Figures⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱⸱ 62 | - |
dc.language.iso | en | - |
dc.title | 從蛋白質表徵到蛋白質交互作用組推斷的端到端學習 | zh_TW |
dc.title | Complete end-to-end learning from protein feature representation to protein interactome inference | en |
dc.type | Thesis | - |
dc.date.schoolyear | 112-2 | - |
dc.description.degree | 博士 | - |
dc.contributor.coadvisor | 呂俊毅;阮雪芬 | zh_TW |
dc.contributor.coadvisor | Jun-Yi Leu;Hsueh-Fen Juan | en |
dc.contributor.oralexamcommittee | 黃宣誠;陳倩瑜 | zh_TW |
dc.contributor.oralexamcommittee | Hsuan-Cheng Huang;Chien-Yu Chen | en |
dc.subject.keyword | 液相層析串聯式質譜儀分析,蛋白質-蛋白質交互作用關係,蛋白質交 互作用組推斷,卷積神經網路,端到端學習,表徵學習, | zh_TW |
dc.subject.keyword | Co-fractionation coupled with mass spectrometry analysis,Protein-protein interactions,Protein interactome inference,Convolutional neural network,End-to-end learning,Representation learning, | en |
dc.relation.page | 70 | - |
dc.identifier.doi | 10.6342/NTU202401653 | - |
dc.rights.note | 同意授權(限校園內公開) | - |
dc.date.accepted | 2024-07-11 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 生物資訊學國際研究生博士學位學程 | - |
Appears in Collections: | 生物資訊學國際研究生博士學位學程 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-112-2.pdf Restricted Access | 3.94 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.