有效率使用深度強化式學習進行大尺寸圖像分類的靈活多階層式架構

呂承翰; Cheng-Han Lu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85832

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪士灝	zh_TW
dc.contributor.advisor	Shih-Hao Hung	en
dc.contributor.author	呂承翰	zh_TW
dc.contributor.author	Cheng-Han Lu	en
dc.date.accessioned	2023-03-19T23:25:47Z	-
dc.date.available	2023-11-10	-
dc.date.copyright	2022-03-07	-
dc.date.issued	2022	-
dc.date.submitted	2002-01-01	-
dc.identifier.citation	[1] B. Babenko. Multiple instance learning: algorithms and applications. View Article PubMed/NCBI Google Scholar, pages 1–19, 2008. [2] L. M. Ballestar and V. Vilaplana. Brain tumor segmentation using 3d-cnns with uncertainty estimation. arXiv preprint arXiv:2009.12188, 2020. [3] L. Barisoni, K. J. Lafata, S. M. Hewitt, A. Madabhushi, and U. G. Balis. Digital pathology and computational image analysis in nephropathology. Nature Reviews Nephrology, 16(11):669–685, 2020. [4] P. D. Bevan and A. Atapour-Abarghouei. Skin deep unlearning: Artefact and instrument debiasing in the context of melanoma classification. ArXiv, abs/2109.09818, 2021. [5] J. Braatz, P. Rajpurkar, S. Zhang, A. Y. Ng, and J. Shen. Deep learning-based sparse whole-slide image analysis for the diagnosis of gastric intestinal metaplasia. arXiv preprint arXiv:2201.01449, 2022. [6] M. B. Bueno, X. G.-i. Nieto, F. Marqués, and J. Torres. Hierarchical object detection with deep reinforcement learning. Deep Learning for Image Processing Applications, 31(164):3, 2017. [7] J. C. Caicedo and S. Lazebnik. Active object localization with deep reinforcement learning. In Proceedings of the IEEE international conference on computer vision, pages 2488–2496, 2015. [8] G. Campanella, M. G. Hanna, L. Geneslaw, A. Miraflor, V. W. K. Silva, K. J. Busam, E. Brogi, V. E. Reuter, D. S. Klimstra, and T. J. Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine, 25(8):1301–1309, 2019. [9] Cancer Genome Atlas Research Network, J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, and J. M. Stuart. The cancer genome atlas pan-cancer analysis project. Nat Genet, 45(10):1113–1120, Oct. 2013. [10] C. Chen, C. Chen, W. Yu, S. Chen, Y. Chang, T. Hsu, M. Hsiao, C. Yeh, and C. Chen. An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning. Nature Communications, 12(1), Dec. 2021. [11] T. Chen, B. Xu, C. Zhang, and C. Guestrin. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016. [12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. [13] B. Ehteshami Bejnordi, M. Veta, P. Johannes van Diest, B. van Ginneken, N. Karssemeijer, G. Litjens, J. A. W. M. van der Laak, , and the CAMELYON16 Consortium. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA, 318(22):2199–2210, 12 2017. [14] M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson. An overview of the hdf5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, AD ’11, page 36–47, New York, NY, USA, 2011. Association for Computing Machinery. [15] K. Hatch, T. Yu, R. Rafailov, and C. Finn. Example-based offline reinforcement learning without rewards. Proceedings of Machine Learning Research vol, 144:117, 2022. [16] E. Hazan, S. Kakade, K. Singh, and A. Van Soest. Provably efficient maximum entropy exploration. In InternationalConferenceonMachineLearning, pages 2681–2691. PMLR, 2019. [17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. [18] Y.-J. Huang. 記憶體節約之超高解析度圖形旋轉演算法及其效能優化/ 黃昱仁= memory-saving streaming tile rotation algorithm on large scale medical image / yu-jen huang, 2021. [19] Z. Jie, X. Liang, J. Feng, X. Jin, W. Lu, and S. Yan. Tree-structured reinforcement learning for sequential object localization. In Advances in Neural Information Processing Systems, pages 127–135, 2016. [20] A. Katharopoulos and F. Fleuret. Processing megapixel images with deep attention-sampling models. In International Conference on Machine Learning, pages 3282–3291. PMLR, 2019. [21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2017. [22] F. Kong and R. Henao. Efficient classification of very large images with tiny objects. arXiv preprint arXiv:2106.02694, 2021. [23] A. Kumar, K. Subramanian, S. Venkataraman, and A. Akella. Doing more by doing less: How structured partial backpropagation improves deep learning clusters. In Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning, Distributed ML ’21, page 15–21, New York, NY, USA, 2021. Association for Computing Machinery. [24] M. G. Lagoudakis and R. Parr. Reinforcement learning as classification: Leveraging modern classifiers. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, page 424–431. AAAI Press, 2003. [25] E. Lin, Q. Chen, and X. Qi. Deep reinforcement learning for imbalanced classification. Applied Intelligence, 50(8):2488–2502, 2020. [26] M. Y. Lu, D. F. Williamson, T. Y. Chen, R. J. Chen, M. Barbieri, and F. Mahmood. Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering, 5(6):555–570, 2021. [27] P. Micikevicius, S. Narang, J. Alben, G. F. Diamos, E. Elsen, D. García, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, and H. Wu. Mixed precision training. CoRR, abs/1710.03740, 2017. [28] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. International Conference on Learning Representations (ICLR), 2017. [29] A. Pedersen, M. Valla, A. M. Bofin, J. P. De Frutos, I. Reinertsen, and E. Smistad. Fastpathology: An open-source platform for deep learning-based research and decision support in digital pathology. IEEE Access, 9:58216–58229, 2021. [30] H. Pinckaers, B. van Ginneken, and G. Litjens. Streaming convolutional neural networks for end-to-end learning with multi-megapixel images. arXiv preprint arXiv:1911.04432, 2019. [31] F. B. Schmuck and R. L. Haskin. Gpfs: A shared-disk file system for large computing clusters. In FAST, 2002. [32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [33] B. Uzkent, C. Yeh, and S. Ermon. Efficient object detection in large images using deep reinforcement learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1824–1833, 2020. [34] M. A. Wiering, H. Van Hasselt, A.-D. Pietersma, and L. Schomaker. Reinforcement learning algorithms for solving classification problems. In 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 91–96. IEEE, 2011. [35] M. Zarella, D. Bowman, F. Aeffner, N. Farahani, A. Xthona, S. Absar, A. Parwani, M. Bui, and D. Hartman. A practical guide to whole slide imaging: A white paper from the digital pathology association. Archives of pathology laboratory medicine, 143, 10 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85832	-
dc.description.abstract	越來越多的研究人員將深度學習技術應用於數位病理學。然而主流的深度學習方法都是為尺寸在 224 x 224 到 600 x 600 像素之間的典型圖片所設計，相較於擁有百億像素的全玻片影像圖 (whole-slide image, WSI) 來說非常小。直接使用一般的深度學習會導致運算效率低落，因為大量的記憶體消耗會使超過圖形顯示卡 (GPU) 的容量而無法做批次 (batch) 運算。除此之外，載入一張全玻片影像圖會嚴重拖慢訓練和預測的速度。不只如此，因為位置標註的工作非常耗時，且只能由經驗豐富的病理學家進行，所以大部分的玻片都只有標註一個單一標籤。在全玻片影像圖中，感興趣區域 (region of interest, ROI)，例如: 癌症、腫瘤和細菌，通常只佔玻片的一小部分。在本篇論文中，藉著來自顯微鏡操作的靈感，我們提出了一個靈活且有效率的方法來精準找出 ROI 並加速全玻片影像辨識，稱之為 FEZ。FEZ 會先用策略網絡 (policy network, PN) 從一張低解析度全玻片影像圖中檢測出可能的 ROI，並放大該區域再用另一個策略網絡繼續找尋 ROI。最終 FEZ 會載入那些位置最高解析度的小區域圖片 (patch)，並以價值網絡 (value network, VN) 來做出預測。實際上，FEZ 的訓練過程十分有效率，因為策略網絡和價值網絡是套用在相對較小的圖片上；同時也只需要載入少數幾張高解析度的小區域圖片，可以大幅減少圖片載入與計算時間。實驗結果顯示，我們得方法相比於多物件訓練 (multiple instance learning, MIL)，在 Camelyon16 和 TCGA 的肺癌檢測上分別快了 42 與 22 倍；同時在準確程度上也比 CLAM (一種加速 MIL 的技術) 高了7.4%與8.5%。為了再進一步加速，我們可以將低解析度的全玻片影像圖解壓縮為 HDF5 格式，以增加3%的儲存空間為代價再減少58%的訓練時間。由於 FEZ 的速度非常快，因此他可以與全玻片訓練或 MIL 結合使用，以滿足各種應用需求。	zh_TW
dc.description.abstract	While deep learning technologies are popularly used on digital pathology, mainstream deep learning algorithms are designed for typical images sizing from 224 x 224 to 600 x 600 pixels, which are relatively small compared to the whole-slide images (WSI) with tens of billions of pixels. Direct application of mainstream deep learning algorithms would be inefficient as the large memory consumption prohibits batch execution and exceeds the capacity of graphic card units (GPU). In addition, loading ultra-large WSI from the storage can substantially slow down the training and inference tasks. Furthermore, most WSIs are annotated with one label, as annotation has to be conducted by experienced pathologists and takes time. The regions of interest (ROIs), e.g., where cancer, tumor, and bacteria reside, would occupy relatively small areas. In this thesis, we propose Flexible-and-Efficient Zoom-in (FEZ), a method inspired by microscopy to precisely locate ROIs to accelerate the training and inference tasks for ultra-large images. FEZ initially examines a low-resolution WSI with a policy network (PN) to select the potential ROIs and zooms in the ROIs using medium-resolution images to select finer-grain ROIs with another PN. Eventually, FEZ loads the high-resolution patches in the finest ROIs and classifies them with a value network (VN) to make a prediction. In practice, the training process is very efficient as the PNs and the VN can be trained with relatively small images in batches using the GPU. Meanwhile, since FEZ only loads a few selected patches instead of the high-resolution WSI, it dramatically reduces the image loading time and the computation time needed for analyzing the patches. Experimental results show that the proposed method is 42x and 22x faster than Multiple Instance Learning (MIL) on the Camelyon16 and TCGA/Lung Cancer datasets respectively. At the same time, the prediction accuracy is 7.4%~8.5% better than Clustering-Constrained Attention Multiple Instance Learning (CLAM), a technique proposed to accelerate MIL. For further acceleration, the low-resolution whole-slide images can be uncompressed into the HDF5 format to reduce the training time by 58% in the case of Lung Cancer at the cost of 3% of extra storage space. Since FEZ is very fast, it can also be used in conjunction with existing methods such as whole-slide training or MIL to meet application demands.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T23:25:47Z (GMT). No. of bitstreams: 1 U0001-2202202201204300.pdf: 16346596 bytes, checksum: ae2e7a09a7fcd865087a15a46df094a2 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	Contents Page 致謝 v 摘要 vii Abstract ix Contents xi List of Figures xiii List of Tables xv Chapter 1 Introduction 1 Chapter 2 Related Works 5 2.1 Training with Whole-Slide Images . . . . . . . . . . . . . . . . . . . 5 2.2 Patch Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Training with Multiple Resolution Images . . . . . . . . . . . . . . . 7 2.4 Deep Reinforcement Learning and Computer Vision . . . . . . . . . 8 Chapter 3 Methodology and Design 9 3.1 The Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Hierarchical Search Tree . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Policy Networks and Value Networks . . . . . . . . . . . . . . . . . 15 3.4 Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 15 3.5 Special Mechanism: Peep . . . . . . . . . . . . . . . . . . . . . . . 18 3.6 Additional Optimization for Speed . . . . . . . . . . . . . . . . . . . 19 Chapter 4 Evaluation 21 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Camelyon16 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2.2 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Lung Cancer Classification . . . . . . . . . . . . . . . . . . . . . . . 24 4.3.1 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.2 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4 Peep Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5 Impact of k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.6 Trade-off of Using HDF5 . . . . . . . . . . . . . . . . . . . . . . . 29 4.7 Graphical Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.7.1 Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.7.2 Peep Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.7.3 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . 32 Chapter 5 Conclusion 39 References 41	-
dc.language.iso	en	-
dc.title	有效率使用深度強化式學習進行大尺寸圖像分類的靈活多階層式架構	zh_TW
dc.title	Flexible Hierarchical Structures for Efficient Classification of Ultra Large Images with Deep Reinforcement Learning	en
dc.type	Thesis	-
dc.date.schoolyear	110-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	郭大維;施吉昇;吳毅成;葉肇元	zh_TW
dc.contributor.oralexamcommittee	Tei-Wei Kuo;Chi-Sheng Shih;I-Chen Wu;	en
dc.subject.keyword	全玻片影像辨識,多解析度訓練,強化式學習,階層式搜尋,策略網絡和價值網絡,	zh_TW
dc.subject.keyword	whole-slide image classification,training with multiple resolutions,reinforcement learning,hierarchical search,policy network and value network,	en
dc.relation.page	45	-
dc.identifier.doi	10.6342/NTU202200603	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2022-03-04	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2024-03-01	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-110-2.pdf	15.96 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。