有效率使用深度強化式學習進行大尺寸圖像分類的靈活多階層式架構

呂承翰; Cheng-Han Lu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85832

Title:	有效率使用深度強化式學習進行大尺寸圖像分類的靈活多階層式架構 Flexible Hierarchical Structures for Efficient Classification of Ultra Large Images with Deep Reinforcement Learning
Authors:	呂承翰 Cheng-Han Lu
Advisor:	洪士灝 Shih-Hao Hung
Keyword:	全玻片影像辨識,多解析度訓練,強化式學習,階層式搜尋,策略網絡和價值網絡, whole-slide image classification,training with multiple resolutions,reinforcement learning,hierarchical search,policy network and value network,
Publication Year :	2022
Degree:	碩士
Abstract:	越來越多的研究人員將深度學習技術應用於數位病理學。然而主流的深度學習方法都是為尺寸在 224 x 224 到 600 x 600 像素之間的典型圖片所設計，相較於擁有百億像素的全玻片影像圖 (whole-slide image, WSI) 來說非常小。直接使用一般的深度學習會導致運算效率低落，因為大量的記憶體消耗會使超過圖形顯示卡 (GPU) 的容量而無法做批次 (batch) 運算。除此之外，載入一張全玻片影像圖會嚴重拖慢訓練和預測的速度。不只如此，因為位置標註的工作非常耗時，且只能由經驗豐富的病理學家進行，所以大部分的玻片都只有標註一個單一標籤。在全玻片影像圖中，感興趣區域 (region of interest, ROI)，例如: 癌症、腫瘤和細菌，通常只佔玻片的一小部分。在本篇論文中，藉著來自顯微鏡操作的靈感，我們提出了一個靈活且有效率的方法來精準找出 ROI 並加速全玻片影像辨識，稱之為 FEZ。FEZ 會先用策略網絡 (policy network, PN) 從一張低解析度全玻片影像圖中檢測出可能的 ROI，並放大該區域再用另一個策略網絡繼續找尋 ROI。最終 FEZ 會載入那些位置最高解析度的小區域圖片 (patch)，並以價值網絡 (value network, VN) 來做出預測。實際上，FEZ 的訓練過程十分有效率，因為策略網絡和價值網絡是套用在相對較小的圖片上；同時也只需要載入少數幾張高解析度的小區域圖片，可以大幅減少圖片載入與計算時間。實驗結果顯示，我們得方法相比於多物件訓練 (multiple instance learning, MIL)，在 Camelyon16 和 TCGA 的肺癌檢測上分別快了 42 與 22 倍；同時在準確程度上也比 CLAM (一種加速 MIL 的技術) 高了7.4%與8.5%。為了再進一步加速，我們可以將低解析度的全玻片影像圖解壓縮為 HDF5 格式，以增加3%的儲存空間為代價再減少58%的訓練時間。由於 FEZ 的速度非常快，因此他可以與全玻片訓練或 MIL 結合使用，以滿足各種應用需求。 While deep learning technologies are popularly used on digital pathology, mainstream deep learning algorithms are designed for typical images sizing from 224 x 224 to 600 x 600 pixels, which are relatively small compared to the whole-slide images (WSI) with tens of billions of pixels. Direct application of mainstream deep learning algorithms would be inefficient as the large memory consumption prohibits batch execution and exceeds the capacity of graphic card units (GPU). In addition, loading ultra-large WSI from the storage can substantially slow down the training and inference tasks. Furthermore, most WSIs are annotated with one label, as annotation has to be conducted by experienced pathologists and takes time. The regions of interest (ROIs), e.g., where cancer, tumor, and bacteria reside, would occupy relatively small areas. In this thesis, we propose Flexible-and-Efficient Zoom-in (FEZ), a method inspired by microscopy to precisely locate ROIs to accelerate the training and inference tasks for ultra-large images. FEZ initially examines a low-resolution WSI with a policy network (PN) to select the potential ROIs and zooms in the ROIs using medium-resolution images to select finer-grain ROIs with another PN. Eventually, FEZ loads the high-resolution patches in the finest ROIs and classifies them with a value network (VN) to make a prediction. In practice, the training process is very efficient as the PNs and the VN can be trained with relatively small images in batches using the GPU. Meanwhile, since FEZ only loads a few selected patches instead of the high-resolution WSI, it dramatically reduces the image loading time and the computation time needed for analyzing the patches. Experimental results show that the proposed method is 42x and 22x faster than Multiple Instance Learning (MIL) on the Camelyon16 and TCGA/Lung Cancer datasets respectively. At the same time, the prediction accuracy is 7.4%~8.5% better than Clustering-Constrained Attention Multiple Instance Learning (CLAM), a technique proposed to accelerate MIL. For further acceleration, the low-resolution whole-slide images can be uncompressed into the HDF5 format to reduce the training time by 58% in the case of Lung Cancer at the cost of 3% of extra storage space. Since FEZ is very fast, it can also be used in conjunction with existing methods such as whole-slide training or MIL to meet application demands.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85832
DOI:	10.6342/NTU202200603
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2024-03-01
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-110-2.pdf	15.96 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets