以高效能計算平台加速蒙地卡羅模擬

Pei-Jen Wang; 王培任

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20373

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	洪士灝(Shih-Hao Hung)
dc.contributor.author	Pei-Jen Wang	en
dc.contributor.author	王培任	zh_TW
dc.date.accessioned	2021-06-08T02:46:38Z	-
dc.date.copyright	2017-08-25
dc.date.issued	2017
dc.date.submitted	2017-08-22
dc.identifier.citation	[1] AVX-512. https://en.wikipedia.org/wiki/AVX-512. [2] clGetEventProfilingInfo. https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html. [3] icc option - fast. https://software.intel.com/en-us/node/522804. [4] Intel Acquisition of Altera \| Intel Newsroom. https://newsroom.intel.com/press-kits/intel-acquisition-of-altera/. [5] Intel AVX-512 instructions. https://software.intel.com/en-us/blogs/2013/avx-512-instructions. [6] Intel Xeon Phi. https://www.intel.com.tw/content/www/tw/zh/products/processors/xeon-phi.html. [7] Intel® CilkTM Plus. https://www.cilkplus.org/. [8] Intel® Xeon PhiTM Processor 7210. https://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core. [9] Intel® Xeon® Processor E5-2630 v2. https://ark.intel.com/products/75790/Intel-Xeon-Processor-E5-2630-v2-15M-Cache-2_60-GHz. [10] Knights Landing (KNL): 2nd Generation Intel® Xeon PhiTM Processor. https://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.70-Processors-Epub/HC27.25.710-Knights-Landing-Sodani-Intel.pdf. [11] Monte Carlo method. https://en.wikipedia.org/wiki/Monte_Carlo_method. [12] Opencl. https://www.khronos.org/opencl/. [13] OpenMP. http://www.openmp.org/. [14] What public disclosures has intel made about knights landing? https://software.intel.com/en-us/articles/what-disclosures-has-intel-made-about-knights-landing. [15] Xeon Phi. https://en.wikipedia.org/wiki/Xeon_Phi. [16] Xeon+FPGA Platform for the Data Center. https://www.ece.cmu.edu/~calcm/carl/lib/exe/fetch.php?media=carl15-gupta.pdf. [17] E. Alerstam, W. C. Y. Lo, T. D. Han, J. Rose, S. Andersson-Engels, and L. Lilge.Next-generation acceleration and code optimization for light transport in turbid media using gpus. Biomedical optics express, 1(2):658–675, 2010. [18] K. Binder, D. Heermann, L. Roelofs, A. J. Mallinckrodt, S. McKay, et al. Montecarlo simulation in statistical physics. Computers in Physics, 7(2):156–157, 1993. [19] F. B. Brown and W. R. Martin. Monte carlo methods for radiation transport analysison vector computers. Progress in Nuclear Energy, 14(3):269–299, 1984. [20] S. H. Cho. Estimation of tumour dose enhancement due to gold nanoparticles duringtypical radiation treatments: a preliminary monte carlo study. Physics in medicineand biology, 50(15):N163, 2005. [21] C. De Schryver, I. Shcherbakov, F. Kienle, N. Wehn, H. Marxen, A. Kostiuk, andR. Korn. An energy efficient fpga accelerator for monte carlo option pricing withthe heston model. In Reconfigurable Computing and FPGAs (ReConFig), 2011 International Conference on, pages 468–474. IEEE, 2011. [22] M. K. Fix, P. Manser, D. Frei, W. Volken, R. Mini, and E. J. Born. An efficientframework for photon monte carlo treatment planning. Physics in medicine andbiology, 52(19):N425, 2007. [23] P. Glasserman. Monte Carlo methods in financial engineering, volume 53. SpringerScience & Business Media, 2013. [24] X. Hu, W. L. Hase, and T. Pirraglia. Vectorization of the general monte carlo classical trajectory program venus. Journal of computational chemistry, 12(8):1014–1024, 1991. [25] S.-H. Hung, M.-Y. Tsai, B.-Y. Huang, and C.-H. Tu. A platform-oblivious approachfor heterogeneous computing: A case study with monte carlo-based simulation formedical applications. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 42–47. ACM, 2016. [26] A. L. (Intel). Getting the most from openclTM 1.2: How to increaseperformance by minimizing buffer copies on intel® processor graphics.https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics, 09 2014. [27] D. P. Landau and K. Binder. A guide to Monte Carlo simulations in statisticalphysics. Cambridge university press, 2014. [28] T. Liu, X. G. Xu, and C. D. Carothers. Comparison of two accelerators for montecarlo radiation transport calculations, nvidia tesla m2090 gpu and intel xeon phi5110p coprocessor: A case study for x-ray ct imaging dose calculation. Annals ofNuclear Energy, 82:230–239, 2015. [29] W. C. Y. Lo, T. D. Han, J. Rose, and L. Lilge. Gpu-accelerated monte carlo simulation for photodynamic therapy treatment planning. In European Conference onBiomedical Optics, page 7373_13. Optical Society of America, 2009. [30] W. C. Y. Lo, K. Redmond, J. Luu, P. Chow, J. Rose, and L. Lilge. Hardware acceleration of a monte carlo simulation for photodynamic therapy treatment planning.Journal of biomedical optics, 14(1):014019–014019, 2009. [31] W. R. Martin. Successful vectorization-reactor physics monte carlo code. ComputerPhysics Communications, 57(1-3):68–77, 1989. [32] T. Mori, M. Nakagawa, and M. Sasaki. Vectorization of continuous energy montecarlo method for neutron transport calculation. Journal of Nuclear Science and Technology, 29(4):325–336, 1992. [33] N. Ren, J. Liang, X. Qu, J. Li, B. Lu, and J. Tian. Gpu-based monte carlo simulationfor light propagation in complex heterogeneous tissues. Optics express, 18(7):6811–6823, 2010. [34] J. Sempau, A. Sanchez-Reyes, F. Salvat, H. O. ben Tahar, S. Jiang, and J. Fernández-Varea. Monte carlo simulation of electron beams from an accelerator head usingpenelope. Physics in medicine and biology, 46(4):1163, 2001. [35] A. Sodani, R. Gramunt, J. Corbal, H.-S. Kim, K. Vinod, S. Chinthamani, S. Hut-sell, R. Agarwal, and Y.-C. Liu. Knights landing: Second-generation intel xeon phi product. Ieee micro, 36(2):34–46, 2016. [36] S. Thrun, D. Fox, W. Burgard, and F. Dellaert. Robust monte carlo localization for mobile robots. Artificial intelligence, 128(1-2):99–141, 2001. [37] E. Troubetzkoy, H. Steinberg, and M. Kalos. Monte carlo radiation penetration calculations on a parallel computer. Trans. Amer. Nucl. Soc, 17, 1973. [38] M.-Y. Tsai and S.-H. Hung. Hardware acceleration for proton beam monte carlo simulation. In Proceedings of the 2013 Research in Adaptive and Convergent Systems, pages 495–496. ACM, 2013. [39] N. A. Woods and T. VanCourt. Fpga acceleration of quasimonte carlo in finance.In Field programmable logic and applications, 2008. FPL 2008. International Conference on, pages 335–340. IEEE, 2008.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20373	-
dc.description.abstract	蒙地卡羅方法被廣泛地使用於進行核物理等具有內在隨機性的問題模擬，藉由電腦運算能力直接模擬問題內的隨機過程。在腫瘤醫學的放射性療法中，需要先針對劑量分佈進行模擬，以提昇療效並減低對正常細胞的影響。但蒙地卡羅方法需要反覆大量的計算才能得出精準的模擬結果。但以蒙地卡羅方法進行模擬需要反覆、大量的計算才能得出精準的模擬結果。為了進行療程規劃，更需要多次調整各項模擬參數，進行多次模擬，以便挑選出適當的療程，使得模擬程式的執行效率更顯其重要。現存研究中，便有團隊曾利用通用計算圖形處理器、向量處理機及現場可程式邏輯閘陣列試圖加速蒙地卡羅模擬程式。在本論文中，我們延續先前的研究，將使用蒙地卡羅方法的光子模擬程式 MCML 移植到高效能計算平台上以加速模擬進行。我們比較了兩個不同的平台， Intel Xeon Phi 處理器及 Intel CPU + FPGA 軟體開發平台，分別以其進行加速並進行比較其加速成果。在 Xeon Phi 處理器上，我們以 OpenMP 進行平行化，並利用其提供的 AVX-512 向量指令集來進行向量化；在 Intel 的 CPU + FPGA 軟體開發平台上，我們透過 OpenCL 以運用 FPGA 進行計算，並利用其 CPU 與 FPGA 共用實體記憶體的特點達到進一步加速。相對於傳統的中央處理單元，我們在 Xeon Phi 處理器和 Intel CPU + FPGA 軟體開發平台上分別獲得了約 47.8 和 3.1 倍的加速。	zh_TW
dc.description.abstract	Monte Carlo methods are widely used in solving problems with stochastic properties, such as the process of radiation from atoms in nuclear physics. By using Monte Carlo method, the stochastic processes can be directly simulated. In radiation oncology, it is required to simulate the dose distribution in advance, in order to provide higher efficacy and lower the side effects on healthy tissues. However, Monte Carlo simulation requires a lot of intensive computation to deliver the accurate result. Moreover, it is required to examine different configurations and use the simulation results to find out a proper treatment plan. Thus, the execution efficiency is an important issue in Monte Carlo simulation. Based on our previous works, this thesis further explores the performance of a Monte Carlo simulation program on high-performance computing platforms. We compare two different platforms, Intel Xeon Phi processor and Intel CPU + FPGA Software Development Platform, accelerate the program on them, and compare the acceleration results. We use OpenMP to parallelize the program, and use the AVX-512 vector instruction set to vectorize it on a Xeon Phi processor. We use FPGA to accelerate the program with OpenCL, and leverage the shared physical memory between CPU and FPGA to speed up the simulation on Intel CPU + FPGA Software Development Platform. As a result, we get a speedup of 47.8 times on Xeon Phi and 3.1 times on Intel CPU + FPGA Software Development Platform with respect to the CPU.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T02:46:38Z (GMT). No. of bitstreams: 1 ntu-106-R04922130-1.pdf: 1262369 bytes, checksum: 03fff8d2651b011f227f470bf38ec43c (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	1 Introduction 1 2 Background and Related Work 3 2.1 Intel Xeon Phi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 2.2 Intel CPU + FPGA Software Development Platform . . . . . . . . . . . .4 2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 3 Implementation 7 3.1 Acceleration on Intel Xeon Phi x200 Processor . . . . . . . . . . . . . .7 3.1.1 Acceleration by Parallelization . . . . . . . . . . . . . . . . . . .7 3.1.2 Acceleration by Vectorization . . . . . . . . . . . . . . . . . . .8 3.2 Acceleration on Intel CPU + FPGA Software Development Platform . . .11 3.2.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . .11 3.2.2 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 3.2.3 Double Buffering . . . . . . . . . . . . . . . . . . . . . . . . . .13 3.2.4 Using MapBuffer for Zero Copy . . . . . . . . . . . . . . . . . .14 3.2.5 Common Code Extraction . . . . . . . . . . . . . . . . . . . . .14 3.2.6 Sectioning the Thread Execution . . . . . . . . . . . . . . . . . .15 4 Evaluation 16 4.1 Experiment Environments . . . . . . . . . . . . . . . . . . . . . . . . .16 4.2 Original MCML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 4.3 Acceleration on CPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 4.4 Acceleration on SDP . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 4.5 Comparison of Performance . . . . . . . . . . . . . . . . . . . . . . . .20 5 Conclusion and Future Work 21 Bibliography 22
dc.language.iso	en
dc.title	以高效能計算平台加速蒙地卡羅模擬	zh_TW
dc.title	Acceleration of Monte-Carlo Simulation on High Performance Computing Platforms	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	廖世偉(Shih-wei Liao),涂嘉恒(Chia-Heng Tu)
dc.subject.keyword	蒙地卡羅,AVX-512 指令集,現場可程式邏輯閘陣列,開放式計算語言,共用實體記憶體,零複製,	zh_TW
dc.subject.keyword	Monte Carlo,AVX-512,FPGA,OpenCL,share physical memory,zero copy,	en
dc.relation.page	26
dc.identifier.doi	10.6342/NTU201703099
dc.rights.note	未授權
dc.date.accepted	2017-08-23
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-106-1.pdf Restricted Access	1.23 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets