請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81868完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 洪士灝(Shih-Hao Hung) | |
| dc.contributor.author | Yu-Jen Huang | en |
| dc.contributor.author | 黃昱仁 | zh_TW |
| dc.date.accessioned | 2022-11-25T03:05:25Z | - |
| dc.date.available | 2026-09-01 | |
| dc.date.copyright | 2021-11-12 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-09-02 | |
| dc.identifier.citation | [1] Official Documentation: https://developer.nvidia.com/blog/gpudirectstorage/. [2] Official Documentation: https://docs.python.org/3/library/multiprocessing.html. [3] Official Github discussion: https://github.com/NVIDIA/DALI/issues/2588issuecomment756101353. [4] Official Github issue replies: https://github.com/NVIDIA/DALI/issues/2255issuecomment758511816. [5] Official Github reference: https://github.com/NVIDIA/apex. [6] Official released blog: https://www.nvidia.com/enus/geforce/news/rtxiogpuacceleratedstoragetechnology/. [7] PyTorch Official documentation: https://pytorch.org/docs/stable/cuda.htmlmemorymanagement. [8] PyTorch Official documentation: https://pytorch.org/docs/stable/data.htmlsingleandmultiprocessdataloading. [9] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Largescale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. [10] G. Campanella, M. G. Hanna, L. Geneslaw, A. Miraflor, V. Werneck Krauss Silva, K. J. Busam, E. Brogi, V. E. Reuter, D. S. Klimstra, and T. J. Fuchs. Clinicalgrade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine, 25(8):1301–1309, Aug. 2019. [11] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cudnn: Efficient primitives for deep learning, 2014. [12] J. Choquette, W. Gandhi, O. Giroux, N. Stam, and R. Krashinsky. Nvidia a100 tensor core gpu: Performance and innovation. IEEE Micro, 41(02):29–35, mar 2021. [13] D. Ciresan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber. Flexible, high performance convolutional neural networks for image classification. pages 1237–1242, 07 2011. [14] T. Gale, P. Tredak, S. Layton, A. Ivanov, and S. Panev. Official NVIDA DALI GitHub. https://github.com/nvidia/dali. [15] L. Hou, D. Samaras, T. M. Kurc, Y. Gao, J. E. Davis, and J. H. Saltz. Patchbased convolutional neural network for whole slide tissue image classification, 2016. [16] J. D. Ianni, R. E. Soans, S. Sankarapandian, R. V. Chamarthi, D. Ayyagari, T. G. Olsen, M. J. Bonham, C. C. Stavish, K. Motaparthi, C. J. Cockerell, T. A. Feeser, and J. B. Lee. Tailored for RealWorld: A Whole Slide Image Classification System Validated on Uncurated MultiSite Data Emulating the Prospective Pathology Workload. Scientific Reports, 10(1):3217, Dec. 2020. [17] M. James, M. Tom, P. Groeneveld, and V. Kibardin. Ispd 2020 physical mapping of neural networks on a waferscale deep learning accelerator. In Proceedings of the 2020 International Symposium on Physical Design, ISPD ’20, page 145–149, New York, NY, USA, 2020. Association for Computing Machinery. [18] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. [19] S. Kirk, Y. Lee, P. Kumar, J. Filippini, B. Albertina, M. Watson, K. RiegerChrist, and J. Lemmerman. Radiology Data from The Cancer Genome Atlas Lung Squamous Cell Carcinoma [TCGALUSC] collection, 2016. type: dataset. [20] H. Mikami, H. Suganuma, P. Uchupala, Y. Tanaka, and Y. Kageyama. Imagenet/resnet50 training in 224 seconds. 11 2018. [21] J. Mohan, A. Phanishayee, A. Raniwala, and V. Chidambaram. Analyzing and mitigating data stalls in DNN training. CoRR, abs/2007.06775, 2020. [22] J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for? Queue, 6(2):40–53, Mar. 2008. [23] R. Okuta, Y. Unno, D. Nishino, S. Hido, and C. Loomis. Cupy: A numpycompatible library for nvidia gpu calculations. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirtyfirst Annual Conference on Neural Information Processing Systems (NIPS), 2017. [24] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, highperformance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019. [25] H. Pinckaers, W. Bulten, J. van der Laak, and G. Litjens. Detection of prostate cancer in wholeslide images through endtoend training with imagelevel labels, 2020. [26] H. Pinckaers, B. van Ginneken, and G. Litjens. Streaming convolutional neural networks for endtoend learning with multimegapixel images. arXiv eprints, page arXiv:1911.04432, Nov. 2019. [27] M. Satyanarayanan, A. Goode, B. Gilbert, J. Harkes, and D. Jukic. OpenSlide: A vendorneutral software foundation for digital pathology. Journal of Pathology Informatics, 4(1):27, 2013. [28] S. Tokui, R. Okuta, T. Akiba, Y. Niitani, T. Ogawa, S. Saito, S. Suzuki, K. Uenishi, B. Vogel, and H. Yamazaki Vincent. Chainer: A deep learning framework for accelerating the research cycle. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pages 2002–2011. ACM, 2019. [29] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13(4):600–612, Apr. 2004. [30] C. Yang and G. Cong. Accelerating data loading in deep neural network training. CoRR, abs/1910.01196, 2019. [31] Y. You, Z. Zhang, C. Hsieh, and J. Demmel. 100epoch imagenet training with alexnet in 24 minutes. CoRR, abs/1709.05011, 2017. [32] Q. Zhang, Z. Han, F. Yang, Y. Zhang, Z. Liu, M. Yang, and L. Zhou. Retiarii: A deep learning exploratorytraining framework. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 919–936. USENIX Association, Nov. 2020. [33] M. Zolnouri, X. Li, and V. P. Nia. Importance of Data Loading Pipeline in Training Deep Neural Networks. arXiv:2005.02130 [cs], Apr. 2020. arXiv: 2005.02130. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81868 | - |
| dc.description.abstract | "近年來,高解析度的原始醫學影像切片在深度學習的領域愈來愈受歡迎。一方面是因為如此高解析度的圖形能提供相當程度的細節,使得訓練出來的模型精準度可以達到很不錯的效果,另一方面是提供了充沛的訓練資料量。但高解析度圖片在訓練時會產生效能低落的現象,其原因是中央處理器在影像擴充花費了大量的時間。我們嘗試使用圖形處理器來取代掉中央處理器。但是這造就了另一個難處,因為旋轉高解析度圖片需要相當大量的記憶體,這對於記憶體相當稀少的圖形處理器而言是一大痛點,過往也就因為這個原因而遲遲無法將此類工作安心的交給圖形處理器來運算。針對上述難處,我們提出一個相當快速且記憶體用量也很高效的旋轉演算法。核心概念是將原先的大圖切成很多小圖,對這些小圖做運算,而運算結果會與原先直接對大圖做運算的結果一致。在我們的實驗中,旋轉一張(40000, 40000, 3)大小的圖片,相較於採用中央處理器的比較對象,我們節省了90\%的記憶體並同時提升了60倍的處理速度。" | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-25T03:05:25Z (GMT). No. of bitstreams: 1 U0001-0306202117542500.pdf: 6238888 bytes, checksum: 1dad183b99cc41d880b80486bae0dfc3 (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | 誌謝i 摘要ii Abstract iii 1 Introduction 1 2 Background and Related Work 4 2.1 Training in Whole-Slide on the GPU . . . . . . . . . . . . . . . . . . . . 4 2.2 Image Augmentation on the GPU . . . . . . . . . . . . . . . . . . . . . . 5 3 Methodology 7 3.1 Tile-based Augmentation on the GPU . . . . . . . . . . . . . . . . . . . 7 3.2 Implementation Details with Python and C++ . . . . . . . . . . . . . . . 9 3.2.1 The Python Version . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.2 The C++ Version . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Tile-Based Rotation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Keeping the GPU Busy . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Experimental Results 19 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Implementation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Latency Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 GPU Memory Usage Comparison . . . . . . . . . . . . . . . . . . . . . 24 4.5 Image Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Conclusion and Future Work 30 Appendices 30 Bibliography 31 | |
| dc.language.iso | en | |
| dc.subject | 效能調校 | zh_TW |
| dc.subject | 串流小圖 | zh_TW |
| dc.subject | 圖形處理器平行計算 | zh_TW |
| dc.subject | Performance Tuning | en |
| dc.subject | Streaming-Tile | en |
| dc.subject | GPU Parallel Programming | en |
| dc.title | 記憶體節約之超高解析度圖形旋轉演算法及其效能優化 | zh_TW |
| dc.title | Memory-Saving Streaming Tile Rotation Algorithm on Large Scale Medical Image | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 施吉昇(Hsin-Tsai Liu),梁文耀(Chih-Yang Tseng),張原豪,葉肇元 | |
| dc.subject.keyword | 串流小圖,圖形處理器平行計算,效能調校, | zh_TW |
| dc.subject.keyword | Streaming-Tile,GPU Parallel Programming,Performance Tuning, | en |
| dc.relation.page | 34 | |
| dc.identifier.doi | 10.6342/NTU202100950 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2021-09-06 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2026-09-01 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0306202117542500.pdf 此日期後於網路公開 2026-09-01 | 6.09 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
