請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84845
標題: | 透過語言、語音和視覺之多模態線索評估思覺失調之症狀嚴重程度 Multimodal Assessment of Schizophrenia Symptoms Severity from Linguistic, Acoustic and Visual Cues |
作者: | 莊志淵 Chih-Yuan Chuang |
指導教授: | 傅立成 Li-Chen Fu |
關鍵字: | 思覺失調,深度學習,大型預訓練模型,二階段預訓練,多模態融合,參數有效微調, Schizophrenia,Deep learning,Large pre-trained model,Second-stage pre-training,Multimodal fusion,Parameter-efficient fine-tuning, |
出版年 : | 2022 |
學位: | 碩士 |
摘要: | 思覺失調症是一種嚴重的精神疾病,影響患者的思緒和感受,並造成各種情緒、溝通或行為的異常,這些病徵的診斷,必須經由醫師與病患面談的過程中觀察評估得知,藉此推斷患者各項症狀的嚴重程度,作為確診與治療效果的判斷依據。不過,由於症狀評估必須仰賴專業訓練的醫師進行長時間的會談,對於醫療資源是很大的壓力。因此,迫切需要一個自動化的思覺失調症狀嚴重程度評級系統。 本篇研究旨在基於患者在會談時的語意、語法、聲學、視覺表徵,建構一個能夠為思覺失調症患者進行症狀嚴重程度評級的多模態深度神經網路。整體架構由四個基於Transformer的大型單模態預訓練骨幹網路與多模態融合機制所組成。首先,分別針對各單模態預訓練模型進行二階段預訓練,使模型學習到適用於思覺失調會談資料的分析能力;接著,凍結預訓練過的參數,並插入輕量可訓練的模組,以減少微調時所需訓練的參數量;最後,透過提出的多模態融合機制將各個預訓練骨幹整合成最終的思覺失調評估模型,以病患會談時的錄影、錄音與逐字稿為整體模型的輸入,模型將以參數有效的方式微調訓練,並輸出受評估患者的症狀嚴重程度。 為了驗證模型的有效性,我們在收集的思覺失調症會診資料集上進行了完整的實驗,目標是預測評估思覺失調症時常用的TLC和PANSS量表,而測試結果之準確度達0.534 MAE及0.685 MSE,超越其他模型,驗證語言、語音和視覺三個模態的互補效果,並可在未來作為診斷思覺失調的輔助工具。 Schizophrenia is a severe mental illness that affects a person’s state of mind. To assess a schizophrenia patient, it requires a professionally trained doctor to interview the patient and evaluate the severity of each symptom. This process is time-consuming and manpower-consuming, which is why developing an automatic symptoms severity assessment system for schizophrenia is an urgent need. In this study, we proposed a multimodal deep learning model to evaluate the schizophrenia symptoms severity based on linguistic (semantic and syntactic), acoustic and visual cues observed in psychiatric interviews. The proposed multimodal assessment model consists of four unimodal large pre-trained transformer-based backbone networks and the multimodal fusion framework. First, second-stage pre-training is conducted to let each pre-trained model learn the pattern of schizophrenia data and to extract desired features. Then, the pre-trained parameters are frozen, and light-weight trainable modules are inserted to reduce the number of parameters that need to be fine-tuned. Finally, the four adapted pre-trained models are fused into the final multimodal assessment model using the proposed multimodal fusion framework. Given a textual transcription, audio and video recording of the interview, the model can be fine-tuned in a parameter-efficient manner and predict the symptoms severity of a schizophrenia patient. We train and evaluate the proposed model on our schizophrenia dataset to predict TLC and PANSS ratings, and it achieves 0.534 MAE and 0.685 MSE, outperforming the related works. In the future, the proposed model can be used in an automatic assessment system for schizophrenia. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84845 |
DOI: | 10.6342/NTU202202479 |
全文授權: | 同意授權(限校園內公開) |
電子全文公開日期: | 2025-08-24 |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-110-2.pdf 目前未授權公開取用 | 7.14 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。