融合關節與光流特徵之多流寵物動作辨識架構

張璟榮; Ching-Jung Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101408

標題:	融合關節與光流特徵之多流寵物動作辨識架構 A Multi-Stream Framework Integrating Joint and Optical-Flow Representations for Pet Action Recognition
作者:	張璟榮 Ching-Jung Chang
指導教授:	李明穗 Ming-Sui Lee
關鍵字:	寵物動作辨識,多模態學習光流圖卷積神經網路異常行為偵測 Pet Action Recognition,Multi-modal LearningOptical FlowGraph Convolutional NetworksAbnormal Behavior Detection
出版年 :	2026
學位:	碩士
摘要:	本論文針對寵物動作辨識在技術與資料面所面臨的挑戰進行研究，並特別聚焦於家犬家貓自動化健康監測技術中的關鍵缺口。本研究的主要貢獻是建置 PetAction：一個經整理與篩選的影片資料集，並獨特地納入臨床上具重要意義的異常行為，例如癲癇發作、嘔吐與運動障礙等，彌補過往動物行為基準中此類資料多數缺乏的問題。為了在顯著的外觀差異與複雜的身體形變情境下有效建模這些複雜行為，本研究提出一套完整的架構。其中，所提出的Recognition: JOFF（Joint–Optical Flow Fusion）模組採用多串流設計，整合稀疏的骨架幾何資訊與稠密的光流動態資訊。透過納入 Joint Stream 以捕捉身體結構、Local Flow Stream 以編碼細微的組織運動，以及 I3DGCN Stream 以維持全域時間一致性，模型克服了單獨使用關鍵點時的固有限制。實驗結果顯示，本研究框架具備具競爭力的表現：在 PetAction 上達到 78.01% 準確率，並在野生動物資料集 KABR 上以 85.60% 展現良好的泛化能力。消融實驗也進一步證實，透過光流進行顯式的運動建模，對於區分視覺上容易混淆、且僅靠骨架結構不足以判別的行為至關重要。 This thesis addresses the technical and data-related challenges in pet action recognition, specifically targeting the critical gap in automated health monitoring technologies for domestic cats and dogs. The primary contribution of this work is the construction of PetAction, a curated video dataset that uniquely includes clinically relevant abnormal behaviors, such as seizures, vomiting, and movement disorders, which were largely absent in prior animal behavior benchmarks. To effectively model these complex behaviors under significant appearance variations and complex body deformations, a comprehensive framework was developed. The proposed Recognition: JOFF (Joint-Optical Flow Fusion) module introduces a multi-stream architecture that synergistically integrates sparse skeletal geometry with dense optical flow dynamics. By incorporating Joint Stream to capture anatomical structure, Local Flow Stream to encode subtle tissue movements, and I3DGCN Stream to preserve global temporal consistency, the model overcomes the inherent limitations of using keypoints. Experimental results demonstrate that the proposed framework achieves competitive performance, recording 78.01% accuracy on PetAction and showing strong generalization with 85.60% accuracy on the wild-animal dataset KABR. The ablation studies further confirmed that explicit motion modeling via optical flow is critical for distinguishing visually ambiguous behaviors where skeletal structure alone is insufficient.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101408
DOI:	10.6342/NTU202600044
全文授權:	同意授權(全球公開)
電子全文公開日期:	2026-01-28
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf	160.53 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。