請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99662| 標題: | 基於TiGRU的雙模態縱向研究:自傳式記憶資料於輕度認知障礙之偵測應用 TiGRU: A Dual-Modal Longitudinal Model for MCI Detection Using Autobiographical Memory |
| 作者: | 廖庭筠 Ting-Yun Liao |
| 指導教授: | 傅立成 Li-Chen Fu |
| 關鍵字: | 雙模態學習,輕度認知障礙,縱向研究,不規則時間序列建模,非結構化自發性語音,認知分類任務,快篩系統, Dual-modal learning,Mild cognitive impairment,Longitudinal analysis,Irregular Time Series Modeling,Unstructured spontaneous speech,Cognitive classification task,Screening system, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 隨著全球人口快速老化,失智症的盛行率隨之攀升,對整體醫療系統與公共健康資源形成沉重壓力。輕度認知障礙(MCI)是介於正常老化與失智症之間的過渡階段,也是進行早期偵測與干預的關鍵時機。但現有診斷方式如磁振造影或生物標記具備高成本與侵入性,不利於大規模應用。近年研究指出,自傳式記憶(AM)語音資料有潛力成為辨識認知衰退的非侵入性早期指標。本研究提出一套時間感知式的雙模態縱向MCI偵測架構,整合語音與文字特徵,並針對受試者在多時間點的AM語音資料進行建模。我們設計了Cross-Visit Encoder與一種新穎的序列模型架構TiGRU(Temporal-infused GRU),將每期資料與前期內容透過cross attention對齊後,計算認知差異並將時間嵌入向量一同送入TiGRU,提升模型對於不規則訪視間隔與非線性認知變化的感知能力。弱對齊後的語音與文字分別經由wav2vec2、OpenSMILE、multilingual-e5語意編碼與額外的詞彙特徵,再透過Bidirectional Cross Attention進行雙模態融合。在NTU-AM資料集上的實驗顯示,我們的模型在MCI偵測任務中可達87%和88%的F1-score,以及90%與95%的AUROC,優於傳統GRU與未建模時間的架構。消融實驗亦驗證各模組在模型效能上的貢獻。本研究證實,結合時間與雙模態特徵的縱向模型,能有效捕捉長期認知變化與非線性認知軌跡,提供具備可擴展性與臨床潛力的 MCI 偵測方法。 As populations around the world grow older, dementia has become increasingly common, creating mounting pressure on both public health and medical infrastructure. Mild Cognitive Impairment(MCI), an intermediate stage between normal aging and dementia, represents a critical window for early detection and intervention. However, conventional diagnostic tools such as MRI and biomarkers are costly and invasive, limiting their scalability in large populations. Recent studies suggest that autobiographical memory(AM) speech may serve as a non-invasive early indicator of cognitive decline. In this study, we propose a temporal-aware dual-modal longitudinal framework for MCI detection by integrating acoustic and linguistic features from participants' AM speech collected across multiple visits. To address challenges in unstructured speech, modality misalignment, and temporal modeling, we design a Cross-Visit Encoder and a novel sequential model, called TiGRU(Temporal-infused GRU). This architecture aligns the data collected at the current and at the previous visits via cross attention, captures cognitive shifts, and incorporates time interval embeddings into the TiGRU to enhance sensitivity to irregular visit spacing and nonlinear cognitive changes. Weakly aligned acoustics and text inputs from the speech are processed through wav2vec2, OpenSMILE, multilinguale5, and lexical feature extractors, and then are fused via Bidirectional Cross Attention with residual connections for robust multimodal integration. Finally, the model predicts the MCI status at the last visit by leveraging information across all visits, enabling early detection and longitudinal tracking of cognitive decline. Experiments on the NTU-AM dataset demonstrate that our model achieves F1-scores of 87% and 88%, and AUROCs of 90% and 95% on the recall and probing data, outperforming traditional GRU-based and temporal-agnostic baselines. Ablation experiments further validate how each proposed component impacts the model’s overall performance. Our results highlight the potential of combining temporal modeling and multimodal learning for effectively capturing long-term cognitive shifts and nonlinear cognitive trajectories, offering a scalable and clinically promising approach to early MCI detection. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99662 |
| DOI: | 10.6342/NTU202500828 |
| 全文授權: | 未授權 |
| 電子全文公開日期: | N/A |
| 顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 4.79 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
