融合關節與光流特徵之多流寵物動作辨識架構

張璟榮; Ching-Jung Chang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101408

Title:	融合關節與光流特徵之多流寵物動作辨識架構 A Multi-Stream Framework Integrating Joint and Optical-Flow Representations for Pet Action Recognition
Authors:	張璟榮 Ching-Jung Chang
Advisor:	李明穗 Ming-Sui Lee
Keyword:	寵物動作辨識,多模態學習光流圖卷積神經網路異常行為偵測 Pet Action Recognition,Multi-modal LearningOptical FlowGraph Convolutional NetworksAbnormal Behavior Detection
Publication Year :	2026
Degree:	碩士
Abstract:	本論文針對寵物動作辨識在技術與資料面所面臨的挑戰進行研究，並特別聚焦於家犬家貓自動化健康監測技術中的關鍵缺口。本研究的主要貢獻是建置 PetAction：一個經整理與篩選的影片資料集，並獨特地納入臨床上具重要意義的異常行為，例如癲癇發作、嘔吐與運動障礙等，彌補過往動物行為基準中此類資料多數缺乏的問題。為了在顯著的外觀差異與複雜的身體形變情境下有效建模這些複雜行為，本研究提出一套完整的架構。其中，所提出的Recognition: JOFF（Joint–Optical Flow Fusion）模組採用多串流設計，整合稀疏的骨架幾何資訊與稠密的光流動態資訊。透過納入 Joint Stream 以捕捉身體結構、Local Flow Stream 以編碼細微的組織運動，以及 I3DGCN Stream 以維持全域時間一致性，模型克服了單獨使用關鍵點時的固有限制。實驗結果顯示，本研究框架具備具競爭力的表現：在 PetAction 上達到 78.01% 準確率，並在野生動物資料集 KABR 上以 85.60% 展現良好的泛化能力。消融實驗也進一步證實，透過光流進行顯式的運動建模，對於區分視覺上容易混淆、且僅靠骨架結構不足以判別的行為至關重要。 This thesis addresses the technical and data-related challenges in pet action recognition, specifically targeting the critical gap in automated health monitoring technologies for domestic cats and dogs. The primary contribution of this work is the construction of PetAction, a curated video dataset that uniquely includes clinically relevant abnormal behaviors, such as seizures, vomiting, and movement disorders, which were largely absent in prior animal behavior benchmarks. To effectively model these complex behaviors under significant appearance variations and complex body deformations, a comprehensive framework was developed. The proposed Recognition: JOFF (Joint-Optical Flow Fusion) module introduces a multi-stream architecture that synergistically integrates sparse skeletal geometry with dense optical flow dynamics. By incorporating Joint Stream to capture anatomical structure, Local Flow Stream to encode subtle tissue movements, and I3DGCN Stream to preserve global temporal consistency, the model overcomes the inherent limitations of using keypoints. Experimental results demonstrate that the proposed framework achieves competitive performance, recording 78.01% accuracy on PetAction and showing strong generalization with 85.60% accuracy on the wild-animal dataset KABR. The ablation studies further confirmed that explicit motion modeling via optical flow is critical for distinguishing visually ambiguous behaviors where skeletal structure alone is insufficient.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101408
DOI:	10.6342/NTU202600044
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2026-01-28
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-114-1.pdf	160.53 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets