加速深度學習系統──以DeepVariant為案例研究

Chih-Han Yang; 楊植翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74416

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪士灝(Shih-Hao Hung)
dc.contributor.author	Chih-Han Yang	en
dc.contributor.author	楊植翰	zh_TW
dc.date.accessioned	2021-06-17T08:34:36Z	-
dc.date.available	2024-08-13
dc.date.copyright	2019-08-13
dc.date.issued	2019
dc.date.submitted	2019-08-09
dc.identifier.citation	[1] Green, E.D., Rubin, E.M. and Olson, M.V. The future of DNA sequencing. Nature, 550(7675):179-181, 2017. [2] Stephens, Z.D., et al. Big Data: Astronomical or Genomical? PLoS biology, 13(7):e1002195, 2015. [3] Ashley, E.A. Towards precision medicine. Nature reviews. Genetics, 17(9):507-522, 2016. [4] Dey, N., et al. Mutation matters in precision medicine: A future to believe in. Cancer treatment reviews, 55:136-149, 2017. [5] Park, J.Y., et al. Next-generation sequencing in the clinic. Nat. Biotechnol. 31, 990–992, 2013. [6] McKenna, A., et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research, 20(9):1297-1303, 2010. [7] Goodwin, S., McPherson, J.D. and McCombie, W.R. Coming of age: ten years of next-generation sequencing technologies. Nature reviews. Genetics, 17(6):333-351, 2016. [8] Sandmann, S., et al. Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data. Scientific reports, 7:43169, 2017. [9] Poplin, R., et al. A universal SNP and small-indel variant caller using deep neural networks. Nature biotechnology, 36(10):983-987, 2018. [10] Google. DeepVariant. https://github.com/google/deepvariant. [11] Parabricks. https://www.parabricks.com/. Accessed: 2019-06-11. [12] L. Cheng-Yueh. Sofa. https://github.com/cyliustack/sofa. [13] Uber. Pyflame. https://github.com/uber/pyflame. [14] ZeroMQ. ZMQ C library. https://github.com/zeromq/libzmq. [15] Google. CLIF. https://github.com/google/clif. [16] MXNet. https://mxnet.apache.org/. [17] Patrick Wieschollek. ZMQ operation. https://github.com/PatWie/tf_zmq. [18] Tensorpack. ZMQ operation. https://github.com/tensorpack/zmq_ops. [19] Amazon EC2 pricing. https://aws.amazon.com/ec2/pricing/on-demand/?nc1=h_ls. Accessed: 2019-07-11.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74416	-
dc.description.abstract	隨著次世代定序 (next generation sequencing) 的快速發展，我們可以用低廉的價格取得個人基因體的數十億的片段，這些片段中會有許多的錯誤，我們必須藉由變異偵測 (variant calling) 的技術，才可以確定每一個基因位點的鹼基種類。本論文探討的案例—DeepVariant，利用深度神經網路來對定序資料作變異偵測，曾在2016舉辦的 PrecisionFDA Truth Challenge 中贏得SNP performance的獎項，然而，整個DeepVariant需要數個小時才能完成。在此論文中，我們提出優化DeepVariant效能的方法，首先，利用SOFA觀察程式的執行特徵，發現到DeepVariant分為兩個階段的執行，先將全部的基因轉為圖片，才用神經網路進行圖片推論，我們實作新的資料流方式，將兩個階段的執行重疊以達到加速的效果。接著，我們用Vtune分析第一個階段，觀察到程式花了許多時間在Python與C++之間的資料轉換，以及Python本身沒有效率的函式呼叫的實作方法，因此，我們將整支程式重新以C++改寫，減少了因為使用Python而產生的不必要的時間。最後，我們實作了分散式版本的DeepVariant，使得DeepVariant可以用多個CPU伺服器以及多個GPU平行計算，並客製化TensorFlow從網路收取資料的操作，減少了不必要的資料複製與轉換，提高了GPU的使用率。藉由以上的優化方法，我們將DeepVariant的執行時間從4個小時降到1小時左右，並且，在8台CPU伺服器以及8台GPU的環境下，達到接近線性的加速，只需要少於8分鐘的執行時間，以成本效益來看表現得比Parabricks好。	zh_TW
dc.description.abstract	As the next-generation sequencing (NGS) rapidly evolves, the sequence of an individual’s genome can be determined at a decreasing price from billions of short, errorful sequence reads by calling the genetic variants (variant calling) present in an individual genome. DeepVariant, the case study in this thesis, is an open-source software package that calls genetic variants with a deep neural network (DNN), which has won the PrecisionFDA Truth Challenge for best SNP Performance in 2016. Even with a high-performance GPU device to accelerate the DNN, it still took four hours to complete the variant calling on our workstation, so we chose to analyze the performance of DeepVariant to find ways to further reduce the time and cost of the NGS variant calling pipeline. In this thesis work, we used SOFA (Swarms of Functions Analysis) to characterize the performance of DeepVariant. The original DeepVariant program executed tasks in two stages. In the first stage, all the sequencing data were converted into images, and in the second stage, the inference of images was done using a DNN. Based on this observation, our first optimization work was able to shorten the execution time by 26% by restructuring the program and overlapping the two stages of execution. Next, we used the Intel VTune Amplifier to profile the first stage and revealed a large amount of execution overhead for the Python-based main program to call into C++ functions and convert data between Python and C++ functions. Thus, we decided to re-implement the main program in C++, which resulted in 68% reduction of the execution time. Finally, we built a distributed version of DeepVariant to further scale its performance in the datacenter by distributing the tasks in the first and the second stages onto multiple CPU servers and multiple GPUs. In the meantime, we developed a customized TensorFlow operation to handle the data received from the ZeroMQ network socket, effectively reducing unnecessary data copying and data conversion and improving the GPU utilization. As a result, we reduced the execution time of DeepVariant to 7 minutes and 39 seconds with a near-linear speedup on 8 CPU servers and 8 GPUs, which outperformed an industrial solution provided by Parabricks in terms of cost-performance.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:34:36Z (GMT). No. of bitstreams: 1 ntu-108-R06922123-1.pdf: 1980480 bytes, checksum: a272dd7461eb6db363e9af6017f5f1dd (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii Chapter 1 Introduction 1 Chapter 2 Background 4 2.1 Variant Calling 4 2.2 DeepVariant 5 2.2.1 DeepVariant Inputs 5 2.2.2 DeepVariant Workflow 7 2.3 Accelerated DeepVariant by Parabricks 10 Chapter 3 Methodology 11 3.1 Improving Dataflow 11 3.1.1 DeepVariant: Original Dataflow 11 3.1.2 DeepVariant with Improved Dataflow 14 3.2 Improving Make_examples.py 15 3.2.1 Analyzing Make_examples.py with Intel VTune Amplifier 15 3.2.1 Analyzing Make_examples.py with Pyflame 18 3.2.3 Reimplementation of Make_examples.py in C++ 19 3.3 Customizing TensorFlow Operation 20 3.4 Distributed DeepVariant 21 Chapter 4 Evaluation 23 4.1 Experimental Setup 23 4.1.1 Hardware Configurations 23 4.1.2 Software and Input Data 24 4.2 Effects of Optimization 25 4.2.1 Improved Dataflow 25 4.2.2 Improved Dataflow & C++ Implementation 26 4.2.3 Improved Dataflow & C++ Implementation & ZMQPullOp 27 4.3 Scalability in a Distributed System 28 Chapter 5 Conclusion and Future Work 32 5.1 Conclusion 32 5.2 Future Work 32 Bibliography 34
dc.language.iso	en
dc.subject	變異偵測	zh_TW
dc.subject	深度神經網路	zh_TW
dc.subject	次世代定序	zh_TW
dc.subject	DeepVariant	zh_TW
dc.subject	基因體	zh_TW
dc.subject	Next-generation sequencing (NGS)	en
dc.subject	Genome	en
dc.subject	DeepVariant	en
dc.subject	Deep neural network	en
dc.subject	Variant calling	en
dc.title	加速深度學習系統──以DeepVariant為案例研究	zh_TW
dc.title	Accelerating Deep Learning Systems: A Case Study with DeepVariant	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	郭大維,涂嘉恒
dc.subject.keyword	次世代定序,基因體,變異偵測,深度神經網路,DeepVariant,	zh_TW
dc.subject.keyword	Next-generation sequencing (NGS),Genome,Variant calling,Deep neural network,DeepVariant,	en
dc.relation.page	35
dc.identifier.doi	10.6342/NTU201902523
dc.rights.note	有償授權
dc.date.accepted	2019-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	1.93 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。