Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93945
Title: 動態集成樹於視訊預測及其應用
Dynamic Ensemble-Trees for Video Prediction and Applications
Authors: 艾弗里
Everett Fall
Advisor: 陳良基
Liang-Gee Chen
Keyword: 集成樹,視頻預測,動作條件式,視頻編碼,子任務提取,3D打印,錯誤檢測,
Ensemble-tree,video prediction,action-conditional,video coding,subtask extraction,3D printing,error detection,
Publication Year : 2024
Degree: 博士
Abstract: 本論文主要探討使用神經網路集成於視訊預測任務。每個章節聚焦於一個獨立的任務或應用,每章包含了該任務或應用的介紹、問題定義,以及結果評估和討論。

在第一章中,我們利用視訊預測來提高視訊編碼的品質,透過小型卷積神經網絡(CNN)的集成來預測因標準編碼與解碼演算法所導致的誤差。該預測誤差隨後可從視訊中被減去以提高視訊品質。我們對神經網路集成進行分群,將集成中不同的神經網絡與參數分配到視訊中的特定區域。實驗結果顯示,當我們的方法應用於 H265 編碼時,可以在相同位元率的情況下提升圖像品質。

在第二章中,我們研究在虛擬世界中,代理角色(agent)與環境具有互動的視訊預測。我們提出了一種使用 CNN 集成來識別代理角色在執行較大任務過程中完成的子任務的技術。我們證明,通過在集成中納入不同的預測時間範圍,神經網絡可以學會根據子任務開始時的狀態預測子任務完成後環境的狀態。這種預測反過來可以用來預測子任務執行的開始和結束時間,從而提供一種從可能包含許多其他子任務的較長視訊中提取子任務的方法。

在第三章中,我們進一步研究動作條件視訊預測(action-conditional video prediction),在進行視訊預測時因同時考慮代理角色所採取的動作,複雜性因而提高。這種動作條件視訊預測可用於預測代理角色在環境中的移動軌跡。我們評測代理角色位置的長期預測準確性,並展現我們的方法表現優於目前最先進的方法。我們也開發了一種新穎的指標來量化隨機環境中的視訊預測,並證明該指標可以與定性結果更加一致,因此能更好地區分模型表現。

在第四章中,我們應用本論文提出的神經網路集成樹於3D列印中的誤差預測。我們展示了這種技術可用於預測不同幾何形狀的物件表面上的列印瑕疵,即使在訓練資料集中沒有無瑕疵物件的情況下,仍然可以學習預測誤差。我們計算了透過早期檢測錯誤,理論上所節省的時間和材料。最後,我們設計並製作了一種新型的自動校正3D列印機,具自動檢測錯誤功能並自動使用銑削工具進行修正。本論文提出了一個新穎的方法來建構和存取集成中的神經網路以分割問題空間,我們希望論文中研究的案例能作為神經網路集成於理論和實際應用中的範例。
This dissertation explores strategies for applying network ensembles in video prediction tasks. Each chapter focuses on a separate task or application, and therefore includes an introduction and problem formulation in the context of that task or application as well as an evaluation and discussion of results.

We begin in Chapter 1 by using prediction to enhance the quality of video encoding, employing an ensemble of small convolutional neural networks (CNN) to predict errors that were introduced through the application of a standard encoding/decoding algorithm. That predicted error can then be subtracted from the decoded video to improve quality. We partition the ensemble, assigning different NNs or groups of parameters to specific regions of the video. We show that for a given bit rate, image quality is improved when our method is paired with the H265 encoding scheme.

In Chapter 2 we move to video prediction as it pertains to the state of an agent taking actions in an environment. We propose a technique that uses an ensemble of CNNs to identify subtasks which are completed during the performance of a larger task by an agent. We show that by incorporating a range of different prediction time horizons within the ensemble, the networks can learn to predict the state of the environment after a subtask has been completed based on the state when it is initiated. This prediction in turn can be used to predict the initiation and termination timing of subtask execution and thus provide a way to extract the subtask from a longer video which may contain many other subtasks.

In Chapter 3 we further develop the methods for video prediction of agents where we extend the complexity by conditioning the prediction on an action taken by the agent. This action-conditional video prediction can be used to predict an agent's trajectory within an environment. We evaluate the accuracy of long-term predictions of the agent's location and show improvement over state-of-the-art methods. We also develop a novel, high-level metric to quantify predictions in stochastic environments and show that this metric better aligns with qualitative results and further distinguishes the model.

Chapter 4 applies our proposed ensemble-tree in predicting errors in 3D printing. We demonstrate that this technique can be used to predict error artifacts on the surface of printed parts, of varying geometry and that can even learn to predict errors when the training data had no parts that were error free. We calculate the theoretical time and material savings that can be achieved through early detection of errors. Finally, we designed and implemented a novel Auto-Correcting Printer that detects errors and uses a milling tool to make corrections.

The goal of this dissertation is to present novel techniques and methods for partitioning a problem space, constructing, and accessing networks in an ensemble. We intend for the cases we have studied to serve as examples of theoretical and practical applications of network ensembles for video prediction.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93945
DOI: 10.6342/NTU202403008
Fulltext Rights: 同意授權(限校園內公開)
Appears in Collections:電機工程學系

Files in This Item:
File SizeFormat 
ntu-112-2.pdf
Access limited in NTU ip range
38.07 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved