透過可縮放的數值嵌入向量從不規則的多變量時間序列資料中學習：以電子健康紀錄為研究案例

黃俊愷; Chun-Kai Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92205

標題:	透過可縮放的數值嵌入向量從不規則的多變量時間序列資料中學習：以電子健康紀錄為研究案例 Learning From Irregular Multivariate Time Series Data with Scalable Numerical Embedding: A Case Study in Electronic Health Record
作者:	黃俊愷 Chun-Kai Huang
指導教授:	林澤 Che Lin
關鍵字:	深度學習,多變量時間序列數據,缺失值,表徵學習, deep learning,multivariate time series data,missing value,representation learning,
出版年 :	2024
學位:	碩士
摘要:	多變量時間序列數據包含在許多領域，例如能源監測、環境和醫療保健。有許多基於深度學習的方法試圖學習多元時間序列數據的有效表示法。然而，這些工作通常以同一個時間戳的所有變量當作模型的輸入，這導致了模型容易強調變量之間的時間關係。在這篇論文中，我們關注的資料為電子健康記錄數據。這種多元時間序列數據由於不規則採樣和異步測量而導致了非常可觀的缺失值。這種不規則的多變量時間序列數據對有效的表徵學習提出了挑戰。為了應對上述挑戰，我們提出了“可擴展數值嵌入”。可擴展數值嵌入是基於「值作為token」的概念，獨立地將每個值嵌入為輸入模型的向量。使用可擴展數值嵌入，特徵提取器不僅可以學習變量之間的時間關係，更有機會學習到不同變量之間的關係。我們進一步結合可擴展數值嵌入與Transformer encoder來構成TranSCANE。透過Transformer encoder的屏蔽機制和可擴展數值嵌入的幫助，TranSCANE能夠避免關注缺失值。也就是說，TranSCANE針對碎片化多變量時間序列數據而言，可以不需要對缺失值補值。此外，我們還提出了專門為TranSCANE設計的改良型滾動注意力計算，提高了我們模型的可解釋性。實驗結果表明，TranSCANE在三個不同的電子健康紀錄數據集上有最佳的表現。TranSCANE具有學習變量之間更多特徵關係的潛力，以及基於它不需補值而對不同插補的強健性。有了這些結果，我們相信TranSCANE是一個強大的在不規則多元時間序列數據之表示學習模型。 Multivariate time series (MTS) data often arise in numerous domains, such as energy monitoring, environment, and healthcare. Numerous deep-learning-based methods have been proposed that attempt to learn an effective representation of MTS data. However, these works commonly take variables at the same timestamp as model inputs, emphasizing only the temporal relation. This study focuses on electronic health records (EHR) data, which is full of missing values due to irregular sampling and asynchronous measurement. This irregular MTS data poses additional challenges for effective representation learning. To tackle the challenges mentioned above, we propose “SCAlable Numerical Embedding” (SCANE). SCANE is based on the concept of “value as a token” and embeds each value independently. With SCANE, the feature extractor can learn not only the temporal but also the feature-wise relation between variables. We further integrate $\\mathrm{SCANE}$ with the Transformer encoder to form TranSCANE. With the masking mechanism and SCANE, TranSCANE can avoid paying unnecessary attention to missing values. That is, TranSCANE is an imputation-free model for fragmentary MTS data. Moreover, we propose the revised rollout attention toiled for TranSCANE. It improves the interpretability of our model. Experiment results show TranSCANE performs best on three different EHR datasets. It has the potential to learn more feature-wise relations between variables. Furthermore, it is robust against different imputations due to its "imputation-free" nature. As a result, we believe TranSCANE is a powerful representation learning model for irregular MTS data.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92205
DOI:	10.6342/NTU202400619
全文授權:	未授權
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 未授權公開取用	2.21 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。