利用張量近似法表示多維視覺資料之硬體架構與實現

Chi-Yun Yang; 楊其昀

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7609

標題:	利用張量近似法表示多維視覺資料之硬體架構與實現 Hardware Architecture and Implementation of Tensor Approximation for Multi-Dimensional Visual Data
作者:	Chi-Yun Yang 楊其昀
指導教授:	盧奕璋(Yi-Chang Lu)
關鍵字:	張量近似,資料壓縮,多維資料,硬體設計, tensor approximation,data compression,multidimensional data,hardware architecture,
出版年 :	2018
學位:	碩士
摘要:	在電腦視覺與電腦圖學的領域中，常常需要對多維的視覺資料進行分析與處理。隨著使用的的資料量越來越大，用精簡的方式儲存與表示資料也變成一個重要的研究議題。不同於傳統上維度縮減常使用的主成分分析，張量近似可以將資料在維持其原本多維結構的情況下進行維度縮減，更好地利用多維結構中的資料相關性，分別對各個維度進行維度縮減，也提供了在壓縮時的彈性。在需要進行快速影像生成的應用中，資料在經過張量近似壓縮後，重建時的運算量使其無法達到實時生成的需求，因此適用於快速影像生成的張量近似演算法也陸續被提出。在本篇論文中，我們針對適用於快速影像生成的張量近似演算法進行討論，並且提出了一個硬體加速架構來對其中的分群張量近似演算法進行加速。因為龐大資料量與運算量，張量近似的運算過程相當耗時，利用硬體中平行運算的技巧，可以加快其運算速度。我們使用10 塊靜態隨機存取記憶體，組成高頻寬的記憶體陣列作為內部儲存區域，在資料輸入時得以取出需要的資料進行平行運算。另外也實作了適用於大尺寸長方形矩陣的Hestenes-Jacobi 奇異值分解演算法。我們的硬體架構可以對128x128x128x128x的四維張量進行分群張量近似演算法，使用TSMC 40nmx製程，運行於476 MHzx的時脈頻率，運算速度為軟體實作結果的9.41x倍。完成的晶片面積為3.151 mm²，消耗功率為744.8xmW。 In the field of computer vision and computer graphics, processing and analyzing of multidimensional visual data are widely used. As the size of data to be processed increases, how to represent and store data in a compact way becomes an important issue. Unlike traditional dimensionality reduction algorithm, like principal component analysis (PCA), tensor approximation is used to analyzes data while the multidimensional structure is retained, which allows the exploitation of spatial redundancy. Dimensionality reduction along each mode also makes the process more flexible. However, for application which requires rapid image rendering, the computational cost of data reconstruction after applying tensor approximation is very high. As a result, several modified tensor approximation algorithms support fast reconstruction have been proposed. Because of the enormous size of data and computational cost, tensor approximation suffers from long computation time.In this thesis, we propose a hardware accelerator for one of the modified algorithm, clustered tensor approximation (CTA). With parallel processing techniques in hardware implementation, speed-up can be achieved. We utilize 10 SRAMs to compose a memory array with high bandwidth so that all corresponding data of inputs can be fetched and manipulated simultaneously. We also implement a singular value decomposition (SVD) processor based on Hestenes-Jacobi algorithm which is suitable for decomposition of large rectangular matrices. The architecture we propose can apply clustered tensor approximation to a tensor with a dimension of 128x128x128x128. Using TSMC 40nm technology, the hardware can operate at 476 MHz. The approximation process can be computed 9.41 times faster than the software version. The chip area is 3.151 mm² and the power consumption is 744.8 mW.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7609
DOI:	10.6342/NTU201800667
全文授權:	同意授權(全球公開)
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf	6.39 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。