請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85755完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳信希(Hsin-Hsi Chen) | |
| dc.contributor.author | Hsin-Yu Tsai | en |
| dc.contributor.author | 蔡欣妤 | zh_TW |
| dc.date.accessioned | 2023-03-19T23:23:24Z | - |
| dc.date.copyright | 2022-07-05 | |
| dc.date.issued | 2022 | |
| dc.date.submitted | 2022-05-20 | |
| dc.identifier.citation | Kundan Krishna, Sopan Khosla, Jeffrey P Bigham, and Zachary C Lipton. Generating soap notes from doctor-patient conversations using modular summarization techniques. arXiv preprint arXiv:2005.01795, 2020. Jessica Lopez. Automatic summarization of medical conversations, a review. In TALN-RECITAL 2019-PFIA 2019, pages 487–498. ATALA, 2019. Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934, 2020. Marti A Hearst. Texttiling: A quantitative approach to discourse segmentation. Technical report, Citeseer, 1993. Freddy YY Choi. Advances in domain independent linear text segmentation. arXiv preprint cs/0003083, 2000. Matthew Purver, Konrad P Körding, Thomas L Griffiths, and Joshua B Tenenbaum. Unsupervised topic modelling for multi-party spoken discourse. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 17–24, 2006. Linzi Xing and Giuseppe Carenini. Improving unsupervised dialogue topic segmentation with utterance-pair coherence scoring. arXiv preprint arXiv:2106.06719, 2021. Ryuichi Takanobu, Minlie Huang, Zhongzhou Zhao, Feng-Lin Li, Haiqing Chen, Xiaoyan Zhu, and Liqiang Nie. A weakly supervised method for topic segmentation and labeling in goal-oriented dialogues via reinforcement learning. In IJCAI, pages 4403–4410, 2018. Omri Koshorek, Adir Cohen, Noam Mor, Michael Rotman, and Jonathan Berant. Text segmentation as a supervised learning task. arXiv preprint arXiv:1803.09337, 2018. Leilan Zhang and Qiang Zhou. Topic segmentation for dialogue stream. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1036–1043. IEEE, 2019. Marti A Hearst. Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1):33–64, 1997. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683, 2019. Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, and Xipeng Qiu. Cpt: A pre-trained unbalanced transformer for both chinese language understanding and generation. arXiv preprint arXiv:2109.05729, 2021. Jianlin Su. T5 pegasus - zhuiyiai. Technical report, 2021. https://github.com/ZhuiyiTechnology/t5-pegasus Xinyuan Zhang, Ruiyi Zhang, Manzil Zaheer, and Amr Ahmed. Unsupervised abstractive dialogue summarization for tete-a-tetes. arXiv preprint arXiv:2009.06851, 2020. Yicheng Zou, Bolin Zhu, Xingwu Hu, Tao Gui, and Qi Zhang. Low-resource dialogue summarization with domain-agnostic multi-source pretraining. arXiv preprint arXiv:2109.04080, 2021. Jiaao Chen and Diyi Yang. Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. arXiv preprint arXiv:2010.01672, 2020. Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using Siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019. Yan Song, Yuanhe Tian, Nan Wang, and Fei Xia. Summarizing medical conversations via identifying important utterances. In Proceedings of the 28th International Conference on Computational Linguistics, pages 717–729, 2020. Guangtao Zeng, Wenmian Yang, Zeqian Ju, Yue Yang, Sicheng Wang, Ruisi Zhang, Meng Zhou, Jiaqi Zeng, Xiangyu Dong, Ruoyu Zhang, et al. Meddialog: A large-scale medical dialogue dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9241–9250, 2020. Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368, 2017. Wen-wai Yim and Meliha Yetisgen-Yildiz. Towards automating medical scribing: Clinic visit dialogue2note sentence alignment and snippet summarization. In Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations, pages 10–20, 2021. Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009. Wenge Liu, Jianheng Tang, Jinghui Qin, Lin Xu, Zhen Li, and Xiaodan Liang. Meddg: A large-scale medical consultation dataset for building medical dialogue system. arXiv preprint arXiv:2010.07497, 2020. 衛生福利部. 台灣 e 院, 2002. https://sp1.hso.mohw.gov.tw/doctor/Index1.php, accessed May 14, 2020. KingNet 國家網路醫藥. 醫學辭典, 2020. https://www.kingnet.com.tw/diagnose/, accessed January 4, 2022. Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270, 2004. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. Revisiting pre-trained models for Chinese natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 657–668, Online, November 2020. Association for Computational Linguistics. Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario Šaško, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugger, Clément Delangue, Théo Matussière, Lysandre Debut, Stas Bekman, Pierric Cistac, Thibault Goehringer, Victor Mustar, François Lagunas, Alexander Rush, and Thomas Wolf. Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 175–184, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002. Tianyi Zhang*, Varsha Kishore*, Felix Wu*, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2020. Chin-Yew Lin and Franz Josef Och. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pages 501–507, Geneva, Switzerland, aug 23–aug 27 2004. COLING. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85755 | - |
| dc.description.abstract | 現今,人工智慧在醫療領域的應用隨著醫療需求的增加也越加受到重視。在各種醫療行為中,門診對話是大多數患者在尋求醫療幫助時都會經歷的過程。由於患者隱私問題,門診對話和患者醫療記錄的收集受到許多限制。此外,研究門診對話的研究人員通常無法公開他們的數據集。因此,以往的工作大多以在線醫療社區的諮詢對話作為研究資料,但這些諮詢對話與門診對話仍有不小的差別。我們與台大醫院(NTUH)家庭醫學部合作,為研究獲取了一些關於門診對話和患者醫療記錄的數據。我們使用基於Transformer的模型進行自動化的門診對話摘要。在訓練的過程中,我們引入了很多外部的醫學數據集來幫助模型學習醫學的術語和知識。由於我們提出的方法是透過分段對話再進行摘要,因此模型可以處理比較長的門診對話。此外,我們還使用NTUH的資料集來訓練文本風格轉換模型來模仿醫師做的醫療筆記。實驗結果顯示我們方法生成的門診對話摘要具有一定的參考價值。 | zh_TW |
| dc.description.abstract | Nowadays, the application of artificial intelligence in the medical field is getting more and more attention with the increase in medical demand. Among various medical practices, outpatient conversation is a process that most patients experience when seeking medical assistance. Due to patient privacy concerns, the collection of outpatient conversations and patient medical records is subject to many limitations. Furthermore, researchers studying outpatient conversations are often unable to make their datasets public. Therefore, most of the previous work used consultation conversations in online medical communities as research materials, but these consultation conversations are still quite different from outpatient conversations. We collaborated with the Department of Family Medicine of National Taiwan University Hospital (NTUH) to obtain some data on outpatient conversations and patient medical records for the study. We use Transformer-based models for automatic summarization of outpatient conversations. During the training process, we introduced many external medical datasets to help the model learn medical terms and knowledge. Since our proposed method performs summarization through segmented conversations, the model can handle relatively long outpatient conversations. Additionally, we use the NTUH dataset to train a writing style conversion model to mimic medical notes made by physicians. The experimental results show that the outpatient dialogue summaries generated by our method have a certain reference value. | en |
| dc.description.provenance | Made available in DSpace on 2023-03-19T23:23:24Z (GMT). No. of bitstreams: 1 U0001-2204202222573600.pdf: 9361734 bytes, checksum: 5c62d62ed98293815fe02f43eb6d1251 (MD5) Previous issue date: 2022 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures x List of Tables xi Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Thesis Organization 6 Chapter 2 Related Work 7 2.1 Topic Segmentation 7 2.1.1 Overview 7 2.1.2 TextTiling 8 2.1.3 Dialogue Topic Segmentation 9 2.2 Text Summarization 11 2.2.1 Overview 11 2.2.2 The Multilingual T5 (mT5) Model 11 2.2.3 Multiview Dialogue 12 2.3 Dialogue in the Medical Domain 13 2.3.1 Overview 13 2.3.2 HETMC 13 2.3.3 MedDialog 14 2.3.4 CLUSTER2SENT 14 2.3.5 Dialogue2Note 15 Chapter 3 Methodology 16 3.1 Overview 16 3.1.1 Task Definition 16 3.1.2 Overall Framework 17 3.2 Dialogue Segmentation 18 3.2.1 BERTbased Coherence Scoring Model 18 3.2.2 Boundary Setting for Segmentation 21 3.3 Summarization 23 3.3.1 Dialogue Summarization 23 3.3.2 Writing Style Conversion 26 Chapter 4 Experiments 27 4.1 Datasets 27 4.1.1 NTUH Dataset 27 4.1.2 HETMC Dataset 28 4.1.3 MedDG Dataset 30 4.1.4 The tweH Dataset 30 4.1.5 KingNet Dataset 31 4.1.6 The UMLS Dataset 31 4.2 Experimental Setup 32 4.2.1 Training BERTbased Coherence Scoring Model 32 4.2.2 Finetuning the mT5 Pretrained Model 33 4.2.3 Finetuning Writing Style Conversion Model 34 4.3 Evaluation Metrics 35 4.3.1 Evaluation of the Coherence Scoring Model 35 4.3.2 Evaluation of mT5 Models 35 4.4 Results 36 4.4.1 Coherence Scoring Model over HETMC[P] Data 36 4.4.2 Finetuning mT5 Model with HETMC[S] Data 37 4.4.3 Finetuning mT5 Model with tweH Data 37 4.4.4 Medical Dialogue Summarization on NTUH Dataset 38 4.4.4.1 Using the A1 Model for Dialogue Summarization 39 4.4.4.2 Using the A2 Model for Dialogue Summarization 40 4.4.4.3 Using the A3 Model for Dialogue Summarization 42 4.4.4.4 Using the AB Model for Dialogue Summarization 46 Chapter 5 Discussion 49 5.1 Discussion on Experiment Results of Our Methods 49 5.2 Effects of Our Method 51 5.2.1 The Effect of Dialogue Segmentation 51 5.2.2 The Effect of mT5 Dialogue Summarization Model 54 5.2.3 The Effect of Domain Adaptation 54 5.2.4 The Effect of Input Prefix 56 5.3 Case Study on NTUH Dataset 58 Chapter 6 Conclusion & Future Work 63 6.1 Conclusion 63 6.2 Future Work 64 References 66 | |
| dc.language.iso | zh-TW | |
| dc.subject | 對話總結 | zh_TW |
| dc.subject | 醫療 | zh_TW |
| dc.subject | 門診對話 | zh_TW |
| dc.subject | Dialogue Summarization | en |
| dc.subject | Medical | en |
| dc.subject | Outpatient Conversation | en |
| dc.title | 門診談話的病史總結 | zh_TW |
| dc.title | Patient History Summarization on Outpatient Conversation | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 110-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 鄭卜壬(Pu-Jen Cheng),蘇家玉(Chia-Yu Su),黃瀚萱(Hen-Hsen Huang) | |
| dc.subject.keyword | 醫療,門診對話,對話總結, | zh_TW |
| dc.subject.keyword | Medical,Outpatient Conversation,Dialogue Summarization, | en |
| dc.relation.page | 71 | |
| dc.identifier.doi | 10.6342/NTU202200718 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2022-05-20 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2022-07-05 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2204202222573600.pdf | 9.14 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
