基於深度學習之多模態自發語言 早期認知障礙檢測系統

張禾姈; Ho-Ling Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90542

標題:	基於深度學習之多模態自發語言早期認知障礙檢測系統 Multi-modal Early Cognitive Impairment Detection System for Spontaneous Speech using Deep Learning
作者:	張禾姈 Ho-Ling Chang
指導教授:	傅立成 Li-Chen Fu
關鍵字:	多模態學習,輕度認知功能障礙,自發語言,縱向分析,快篩系統, Multi-modal learning,Mild cognitive impairment,Spontaneous speech,Longitudinal analysis,Screening system,
出版年 :	2023
學位:	碩士
摘要:	隨著全球老年人口的增加，醫療保健系統面臨著應對日益增多的阿茲海默病患者的負擔。鑑於對治療和早期診斷的巨大需求，對於認知障礙篩查系統的廣泛研究已經展開，旨在協助醫療專業人員準確診斷阿茲海默病。本論文提出了一種多模態早期認知障礙檢測系統，利用自動提取的預定義聲學特徵和自設的嵌入來增強語言表示。該系統利用自發性非結構化語音數據，來自自傳性記憶（AM）測試，該測試是評估個體認知狀態的神經心理學評估工具。具體而言，我們的關注點是檢測輕度認知障礙（MCI），它代表健康個體和患有阿茲海默病（AD）的個體之間的中間階段。通過解決輕度認知障礙檢測問題，我們的目標是促進早期治療干預。鑑於輕度認知障礙患者表現出的症狀較輕微，整合多模態數據可以有效豐富特徵並幫助模型學習。考慮到自發性語音的非結構性和隱含性特點，我們引入了兩個額外的嵌入層，即說話者嵌入和對話嵌入，以增強模型學習的信息。為了評估我們提出的方法的有效性，我們在一個中文數據集上進行了實驗，平均準確率達到了78％。此外，我們進行了一系列消融實驗，以評估我們系統中每個模塊的貢獻。此外，我們擴展了我們的研究範圍，對使用自傳性記憶測試的非結構化語音數據進行縱向分析，這是一個尚未得到廣泛探索的研究領域。為了便於縱向分析，我們設計了一個系統，其中包括一個方向編碼器，用於學習不同訪問之間的時間信息。這種方法在至少具有兩次訪問的子數據集上顯示了3％的準確率改善。 With the increasing global elderly population, healthcare systems face the growing burden of addressing the rising number of individuals affected by Alzheimer's disease. Given the significant demand for treatment and early diagnosis, extensive research has been conducted on cognitive impairment screening systems to assist healthcare professionals in accurately diagnosing Alzheimer's disease. This thesis proposes a multi-modal early cognitive impairment detection system that leverages automatically extracted pre-defined acoustic features and self-designed embeddings to enhance linguistic representation. The proposed system uses spontaneous unstructured speech data from autobiographical memory (AM) tests, which serve as neuropsychological assessments for evaluating individuals' cognitive states. In particular, our focus lies in detecting mild cognitive impairment (MCI), which represents the intermediate stage between healthy individuals and those with Alzheimer's disease (AD). By addressing MCI detection, we aim to facilitate early treatment interventions. Given the subtle symptoms exhibited by individuals with MCI, integrating multi-modal data can effectively enrich features and aid in model learning. Considering the unstructured and implicit nature of spontaneous speech, we introduce two additional embeddings, namely speaker embedding and conversation embedding, to augment the information available for model learning. In order to assess the efficacy of our proposed approach, we conducted experiments on a Chinese dataset, attaining an average accuracy of 78%, which is comparable to the results obtained. Moreover, we conducted a set of ablation studies to evaluate the individual contributions of each module integrated into our system. Moreover, we extend our investigation to encompass the longitudinal analysis of MCI detection using unstructured speech data from AM tests, representing a research area that has yet to be extensively explored. To facilitate longitudinal analysis, we design a system incorporating a direction encoder for learning temporal information between different visits. This approach shows an accuracy improvement of 3% on a subset of the dataset comprising subjects with at least two visits.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90542
DOI:	10.6342/NTU202302644
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	8.97 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。