請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99895| 標題: | 利用彈性距離建構分群方法對未標記之身體活動資料進行波形探索 Clustering-based Motif Discovery Method in Unlabeled Physical Activity Data with Elastic Distance |
| 作者: | 吳亞璇 Ya-Xuan Wu |
| 指導教授: | 王彥雯 Charlotte Wang |
| 關鍵字: | 活動模式,彈性距離,模糊分群,函數型資料分析,身體活動,穿戴式裝置資料, activity pattern,elastic distance,fuzzy clustering,functional data analysis,physical activity,wearable device data, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 在真實世界中,時序資料應用於許多領域,其特性通常包含複雜的時間動態變化與資料的高變異性,這使得資料分析與建模變得具有挑戰性。函數型資料分析 (Functional Data Analysis, FDA) 可將離散的時間點轉換為連續的曲線,提供一種結構化的表示方式。這些曲線常常在單一個體內或跨個體之間出現重複的波形片段,稱為「motif」。Motif的探索對於解析資料中的潛在結構、簡化資料複雜度,以及提升解釋性具有重要意義。在生物醫學應用中,辨識人體活動中的重複運動模式,有助於發現異常活動或運動障礙,進而促進早期診斷與介入治療。
本研究聚焦於自然生活環境下,配戴穿戴式裝置所蒐集的身體活動資料。針對未受實驗控制的紀錄情境與缺乏標註資料的挑戰,我們提出一種基於分群的motif探索方法。這個方法結合函數型資料分析、模糊C均值分群演算法 (Fuzzy C-Means, FCM) 與彈性距離 (Elastic distance) 以萃取候選的motif。首先,透過函數平滑技術降低資料中的雜訊與變異,接著以固定長度的滑動窗口將長時間序列分割成較短的子序列。為了考量子序列的時間位移問題,我們採用彈性距離來同時衡量振幅與時間位移的差異,提升形狀相似度的準確評估。最後,我們修改 FCM 分群方法對子序列進行分群,每個群中心代表在原始資料中重複出現的motif種類,並且透過FCM獲得的隸屬度結果過濾掉不明確屬於任一群的樣本,從而實現基於motif的波形辨識。 我們將提出的方法應用於兩個真實的身體活動資料集:PSYKOSE Study和NHANES。實驗結果顯示,此方法能有效探索橫跨多個樣本的motif。在PSYKOSE Study和NHANES的資料集中,我們分別辨識出5個和10個不同的motif,並分析了這些motif出現於一整天中的分布情形。 本研究提出了一個適用於未標記的身體活動資料的分析方法,所標註的motif 具有作為數位生物標記的潛力,亦可用於後續分析,如:提供對行為模式的洞察、與健康結果的預測或關聯分析等。此外,也同時可為延伸至其他時間序列領域的motif發掘框架。 Many real-world applications generate time series data that capture complex temporal dynamics. Functional data analysis can transform discrete time points into continuous curves, providing a structured representation. These curves often exhibit recurring pattern segments within individual or across curves, known as motifs. Motif discovery is crucial for uncovering underlying structures, simplifying data complexity, and enhancing interpretability. In biomedical applications, identifying repeated movement patterns in human activity can help detect abnormal behaviors or motor impairments, thereby facilitating early diagnosis and intervention. This work focuses on physical activity data collected from wearable devices in free-living environments. We propose a clustering-based motif discovery approach to address the challenge of uncontrolled recordings and the absence of annotated labels. This approach integrates functional data analysis, fuzzy c-means (FCM) clustering, and elastic distance to extract candidate motifs. We first apply functional smoothing to reduce noise and variation. Subsequently, we segment the long time series into shorter subsequences using a fixed-length sliding window. To account for temporal shifts within subsequences, we use elastic distance, quantifying both amplitude and phase differences, ensuring a more accurate assessment of shape similarity. An extended FCM algorithm is employed to cluster these subsequences. Each cluster center represents a candidate motif that appears repeatedly in the original curves, enabling motif-based pattern recognition in functional data. Additionally, we utilize the membership degrees provided by FCM to filter out ambiguous subsequences that do not belong to any specific cluster. Our proposed method was applied to two real-world physical activity datasets: the PSYKOSE Study and NHANES. Experimental results demonstrate that this method effectively explores motifs across multiple samples. We identified 5 and 10 distinct motifs in the PSYKOSE Study and NHANES datasets, respectively. Furthermore, we analyzed how these motifs were distributed throughout the day. This study proposes an analytical method applicable to unlabeled physical activity data. The motifs identified through this method are potential digital biomarkers and can be utilized for further analysis, including offering insights into behavioral patterns, predictive health outcomes, or performing association analysis. In addition, the proposed motif discovery framework is highly extensible and can be adapted to other time-series domains. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99895 |
| DOI: | 10.6342/NTU202502934 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2030-07-29 |
| 顯示於系所單位: | 健康數據拓析統計研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 5.9 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
