請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89920完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李明穗 | zh_TW |
| dc.contributor.advisor | Ming-Sui Lee | en |
| dc.contributor.author | 王檡翔 | zh_TW |
| dc.contributor.author | Ze-Siang Wang | en |
| dc.date.accessioned | 2023-09-22T16:40:56Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-09-22 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-08-11 | - |
| dc.identifier.citation | I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3286–3295, 2019.
H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End- to-end object detection with transformers. In European conference on computer vision, pages 213-229. Springer, 2020. J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021. L.-C.Chen,G.Papandreou,F.Schroff,andH.Adam.Rethinkingatrousconvolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017. L.-C.Chen,Y.Zhu,G.Papandreou,F.Schroff,andH.Adam.Encoder-decoderwith atrous separable convolution for semantic image segmentation. In ECCV, 2018. J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2625–2634, 2015. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam. Searching for mobilenetv3. In ICCV, 2019. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Wein- berger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin trans- former: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431 3440, 2015. F. Milletari, N. Navab, and S.-A. Ahmadi. V-net: Fully convolutional neural net- works for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016. O.Ronneberger,P.Fischer,andT.Brox.U-net:Convolutionalnetworksforbiomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018. X. SHI, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. WOO. Convolu- tional lstm network: A machine learning approach for precipitation nowcasting. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021. H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, and H. Jégou. Going deeper with image transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 32–42, 2021. L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016. 洪商荃. 基於人工智慧分析玻尿酸體積於注射式喉成型手術後的降解情形. 2021. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89920 | - |
| dc.description.abstract | 與他人溝通是我們日常生活中的一項基本能力。 然而,患有聲帶萎縮的人在與他人溝通方面存在困難。值得慶幸的是,一種稱為注射增強的治療方法被創造來解決這種情況並在多年來被證明是有效的,且廣泛應用於許多聲帶疾病。在大多數情況下,醫生會將玻尿酸(Hyaluronic Acid)注射到患者的聲帶中,以改善聲門間隙並幫助聲帶正常閉合。過去,醫生必須從病人的發聲去判斷聲帶恢復情況以及是否需要補充玻尿酸。近來,使用超音波影像來分析玻尿酸在人體內殘留情況和作用位置是可行的。隨著電腦視覺領域的發展,可以使用電腦去幫助醫生追蹤玻尿酸在人體中降解作用以及估算出玻尿酸殘留體積。儘管基於CNN的模型在圖像分割任務中取得了優異的性能,但由於卷積運算的局部性,使得它們仍然無法學習全局和遠程信息。 此外,當前大多數分割模型只關注分割任務中的空間特徵,忽略時間特徵。然而,時間特徵對於醫生推斷玻尿酸體積也很重要。 因此,我們認為時間信息對於模型正確預測玻尿酸也是很重要的。在本研究中,我們提出了 AFTNet(注意力特徵時間網絡),其中包含基於注意力機制的特徵提取器和時間模組。借助基於注意力的特徵提取器和時間模組,我們的模型不僅可以更有效的學習全局和遠程信息,還可以更好地學習目標影片的時間特徵。我們將此模型應用於我們提出的患者喉嚨數據集,不僅能協助醫生解決難以判斷的鈣化以及雜訊案例,其性能優於基於 CNN 的模型和基於 Transformer 的模型。 | zh_TW |
| dc.description.abstract | Communicating with other people is a basic ability in our daily life. However, those who suffering from vocal cord atrophy have trouble communicating with others. Thankfully, a treatment method called injection laryngoplasty is created to solve this situation, which being proved effective over the years and widely applied to many vocal cord disorders. In most cases, doctors inject hyaluronic acid (HA) into patients' vocal cord to improve the glottal gaps and help vocal cord close properly. Previously, doctors have to judge the patients' voice to check the recovery and determine whether to complement HA. Recently, to observe how HA remains and works at, it is feasible to analyze on ultrasound image sequences. With the development of computer vision, doctors can employ computer-assisting method to track degradation of HA and estimate HA volume in human body. Although CNN-based models have achieved excellent performance in image segmentation tasks, they still can not learn global and long-range information due to locality of convolution operation. Besides, most current segmentation models only focus on spatial features, ignoring temporal features in segmentation task. However, temporal features are also important for doctors to inference HA position. Therefore, we believe temporal information is also critical for the models to predict HA position correctly. In this study, we proposed AFTNet(Attention Feature Temporal Network), which contains attention-based feature extractor and temporal module. With the benefit of attention-based feature extractor and temporal module, our model can not only better learn global and long-range dependencies, but temporal features of the target videos. We apply this model to our proposed Patient Throat Dataset, which not only assists doctors in difficult-to-diagnose calcified and noise cases, but outperforms both CNN-based and Transformer-based models. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T16:40:56Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-09-22T16:40:56Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements ii 摘要 iv Abstract vi Contents viii List of Figures x List of Tables xii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 CNN-basedModels........................... 5 2.2 Self-attention/Transformer to complement CNNs . . . . . . . . . . . 6 2.3 Transformer based vision backbones.................. 8 Chapter 3 Method 10 3.1 SystemOverview............................ 10 3.2 DataPreprocessing........................... 11 3.3 Model.................................. 11 3.3.1 FeatureExtractor ........................... 12 3.3.2 TemporalModule........................... 13 3.3.3 RefiningModule ........................... 14 3.4 DataPostprocessing .......................... 14 3.5 HAVolumeAnalysis.......................... 15 Chapter 4 Experiments 16 4.1 Datasets................................. 16 4.1.1 PatientThroatDataset......................... 16 4.1.2 ImagePhantomDataset........................ 17 4.2 ImplementationDetail ......................... 18 4.3 Comparison with Other Segmentation Models. . . . . . . . . . . . . 20 4.4 HA Volume Estimation......................... 22 4.5 AblationStudy ............................. 23 4.5.1 TemporalModule........................... 24 4.5.2 Postprocessing ............................ 24 Chapter 5 Conclusion 29 References 30 | - |
| dc.language.iso | en | - |
| dc.subject | 卷積神經網路 | zh_TW |
| dc.subject | 注射喉成形術 | zh_TW |
| dc.subject | 循環神經網絡 | zh_TW |
| dc.subject | Transformer | zh_TW |
| dc.subject | 超音波影像分割 | zh_TW |
| dc.subject | Convolution Neural Network | en |
| dc.subject | Transformer | en |
| dc.subject | Ultrasound Image Segmentation | en |
| dc.subject | Injection Laryngoplasty | en |
| dc.subject | Recurrent Neural Network | en |
| dc.title | 時序模組輔助基於注意力機制特徵提取器用於超音波影像分割 | zh_TW |
| dc.title | Attention-based Feature Extractor with Temporal Module for Ultrasound Image Sequence Segmentation | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 楊佳玲;曾文萱 | zh_TW |
| dc.contributor.oralexamcommittee | Chia-Lin Yang;Wen-Hsuan Tseng | en |
| dc.subject.keyword | 超音波影像分割,Transformer,卷積神經網路,循環神經網絡,注射喉成形術, | zh_TW |
| dc.subject.keyword | Ultrasound Image Segmentation,Transformer,Convolution Neural Network,Recurrent Neural Network,Injection Laryngoplasty, | en |
| dc.relation.page | 33 | - |
| dc.identifier.doi | 10.6342/NTU202302561 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-08-12 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf | 5.5 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
