Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83150
Title: | SASPA:Swinblock與空洞卷積金字塔結合強化乳房腫瘤超音波影像的分割表現 SASPA:The combination of Swinblock and Atrous convolutional pyramid enhances the segmentation performance of breast tumor ultrasound images |
Other Titles: | SASPA:The combination of Swinblock and Atrous convolutional pyramid enhances the segmentation performance of breast tumor ultrasound images |
Authors: | 蔡適陽 Shih-Yang Tsai |
Advisor: | 林文澧 Win-li Lin |
Keyword: | 深度學習,卷積神經網路,注意力機制,滑動窗口注意力, Deep learning,CNN,Attention,Swin-Transformer, |
Publication Year : | 2022 |
Degree: | 碩士 |
Abstract: | 研究背景與目的
在當前的電腦視覺領域中,應用在圖像辨識、語意分割或目標檢測的深度學習模型架構,通常使用卷積神經網路(CNN)作為特徵擷取的主要技巧。近年來隨著注意力機制在電腦視覺的成熟,出現許多以全注意力機制作為模型的主體架構,雖然注意力機制在影像上具有廣感知域(Perception field)及參數使用量少的優點,但是在全局特徵學習以及模型訓練上卻會消耗大量運算資源而大幅提高使用門檻。注意力機制的本質是由資料導向的學習,相較卷積神經網路需要更大量、品質更好的資料集,在醫學影像上較難取得符合條件的資料集。因此本論文探討結合卷積神經網路在特徵擷取、訓練速度、推論速度的優勢,並且以滑動窗口注意力機制加強影像特徵學習模組,建構乳房超音波腫瘤分割之模型,結合注意力機制特徵擷取與卷積神經網路局部特徵學習優勢,超越醫學影像上常見的卷積神經網路之影像分割表現。 材料與方法 本研究提出名為SASPA(Swin Atrous Spatial Pyramid Assembly)模型,以DeeplabV3+之ASPP模組為基礎設計,特別針對超音波乳房腫瘤影像分割任務設計模型架構,改善特徵結合方式並加入滑動窗口注意力模組(Swinblock)增加模型分割表現,並提出兩個版本的SASPA架構:SASPA-S與SASPA-P。此研究使用乳房腫瘤分割為主要任務,訓練資料集與測試資料分別使用不同來源的資料,並透過K-Fold、資料集分割,將測試分為內部資料測試與外部資料測試,測試模型對相同來源資料集與不同來源資料之泛化能力。此研究亦探討SASPA應用在大腸瘜肉資料測試分割效能,不僅探討跨領域資料之表現,亦探討同為醫學影像的遷移學習提升超音波乳房腫瘤影像分割任務表現的可能性。本研究以Dice score、AUC、模型推論速度作為評估模型表現的指標,同時觀察收斂速度探討相較於其他模型的訓練效率表現。 實驗結果 在測試資料綜合表現結果顯示SASPA的超音波乳房腫瘤Dice score表現無論在相同來源資料與不同來源資料都能夠實現最佳分割效能。一般對於不同來源資料準確度都會有一定程度下降,而CNN-based模型之下降幅度較SASPA更加明顯,從實例分割影像發現SASPA對於分割區域細節描述更加細緻。在模型訓練穩定性方面,SASPA能夠有效降低訓練時的穩定性,內部資料測試之Dice score標準差僅有0.08%,而外部資料測試僅有0.03%,穩定性表現上接近ResNet34-UNet++之外部資料測試0.02%並遠勝過其他模型的穩定性。同時也探討提出的兩種SASPA結構,分別具備較高Dice score分割表現與高穩定性的特性。在醫學影像遷移學習優化SASPA實驗中,相較於未使用醫學影像遷移學習配置取得0.34% Dice score分割表現提升。 消融測試中我們比較了不同位置擺放滑動窗口注意力模組的影響,結果顯示在空間尺度最大(16 × 16)、通道數量最小(128)之Encoder輸出特徵後方擺放滑動窗口注意力模組效果最好,並且發現在小空間尺度下使用滑動窗口注意力模組進行特徵加強無法達成有效的性能提升反而徒增運算量。 本研究達成的貢獻有以下3點: 1. 以DeeplabV3+架構為基礎改良Encoder特徵輸出結構,提出SAS Core,透過改良原本ASPP之特徵結合方法並結合滑動窗口注意力模組,探討使用不同方式結合到Decoder之表現,實現超越卷積神經網路深度學習模型與變換網路之腫瘤分割效果,且模型的穩定性能夠顯著高於其他模型。 2. 滑動窗口注意力模組用於特徵加強能夠有效提升卷積神經網路模型表現與模型之泛化能力,並發現如果使用在空間尺度較大之特徵圖能夠有較大的性能提升。 3. 此研究提出之串聯結構之SASPA使用遷移學習的情況下,以醫學影像進行遷移學習,將大腸瘜肉資料集分割任務遷移到乳房腫瘤超音波分割任務,相較於從一般影像分割任務遷移到乳房腫瘤超音波分割任務,能夠使模型表現獲得進一步提升。 Background and Objective In the current field of computer vision, deep learning model architectures applied in image recognition, semantic segmentation or object detection usually use CNN (Convolutional Neural Network) as the main technique for feature extraction. In recent years, with the maturity of the attention mechanism in computer vision, there have been many main architectures that use the full attention mechanism as a model. However, although the attention mechanism has the advantages of a wide perceptual field and less parameter usage in images , but it consumes a lot of computing resources in global feature learning and model training, which greatly increases the threshold for use. In addition, the essence of the attention mechanism is data-oriented learning. Compared with CNN, it requires a larger and better-quality data set, which is relatively insufficient in medical imaging. Therefore, this study discusses the advantages of CNN in feature extraction, training speed, and inference speed, and uses the Swin attention mechanism to strengthen the image feature module to construct a breast ultrasound tumor segmentation model, so that the attention mechanism can be used in feature extraction. It can surpass the common CNN-Based model on medical imaging Materials and Methods This study proposes a model called SASPA (Swin Atrous Spatial Pyramid Assembly), which is designed based on the ASPP module of DeeplabV3+, especially for the ultrasound breast tumor image segmentation task. And propose two versions of SASPA architecture: SASPA-S and SASPA-P. This study uses breast tumor segmentation as the main task. The training data set and test data use data from different sources, and through K-Fold and data set segmentation, the test is divided into internal data testing and external data testing. The test model is based on the same source. The ability to generalize data sets and data from different sources. This study also explores the application of SASPA to test the segmentation performance of colorectal polyp data, not only the performance of cross-domain data, but also the possibility of transfer learning, which is also a medical image, to improve the performance of ultrasound breast tumor image segmentation task. In this study, Dice score, AUC, and model inference speed are used as indicators to evaluate model performance, and at the same time, the convergence speed is observed to explore the training efficiency performance compared with other models. Results The comprehensive performance results of the test data show that the Dice score performance of SASPA can achieve the best segmentation performance in both the same source data and different source data. Generally, the accuracy of data from different sources will decline to a certain extent, and the decline of the CNN-based model is more obvious than that of SASPA. From the instance segmentation images, it is found that SASPA is more detailed in describing the details of the segmentation area. In terms of model training stability, SASPA can effectively reduce the stability during training. The standard deviation of the Dice score in the internal data test is only 0.08%, while the external data test is only 0.03%. The stability performance is close to the external data of ResNet34-UNet. Tested 0.02% and far outperformed the stability of other models. At the same time, two proposed SASPA structures are also discussed, which have the characteristics of high Dice score segmentation performance and high stability, respectively. In the medical image transfer learning optimization SASPA experiment, compared with the configuration without transfer learning, the segmentation performance of Dice score was improved by 0.34%. In the ablation test, we compared the effects of placing Swinblock in different positions. The results show that placing Swinblock behind the encoder output features with the largest spatial scale (16 × 16) and the smallest number of channels (128) has the best performance. This study also found that in small spatial scales using Swinblock for feature enhancement cannot achieve effective performance improvement but increases the amount of computation. The contributions of this research are as follows: 1. Improve the Encoder feature output structure and propose SAS Core. Improve the ASPP feature combination method of DeeplabV3+ and combine it with Swinblock to explore the performance of using different methods to link Decoder. Realize the tumor segmentation effect beyond CNN-based and Transformer-based models, and the stability of the model can be significantly higher than other models. 2. The use of Swinblock for feature enhancement can effectively improve the performance and model generalization capabilities of CNN-based models. The experimental results show that if the feature map with a larger spatial scale is used, the performance can be greatly improved. 3. When SASPA-S (Series) uses transfer learning, the segmentation task of the gastrointestinal polyp dataset is transferred to the ultrasonic segmentation task of breast tumors. Compared with the transfer from the general image segmentation task to the ultrasonic segmentation task of breast tumors, the performance of the model has been further improved. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83150 |
DOI: | 10.6342/NTU202210098 |
Fulltext Rights: | 同意授權(限校園內公開) |
Appears in Collections: | 醫學工程學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
U0001-0340221201308043.pdf Access limited in NTU ip range | 3.09 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.