請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40635
標題: | 超高畫質H.264/AVC編碼器之預測核心與快取架構設計 VLSI Architecture Design of Prediction Core and Cache in Super High Definition H.264/AVC Encoder |
作者: | Wei-Yin Chen 陳威尹 |
指導教授: | 陳良基(Liang-Gee Chen) |
關鍵字: | H.264,影像處理,影像編碼,快取記憶體,積體電路, H.264,video processing,video encoding,cache,VLSI architecture, |
出版年 : | 2008 |
學位: | 碩士 |
摘要: | 隨著視訊技術的日新月異,影像的解析度日趨精密,而影像品質也隨之改善。從VCD發展到近日的高畫質(HD)內容,視訊品質一直在快速地蓬勃發展。在可預見的未來,高畫質,甚至是超高畫質(Super HD)的攝影機與顯示器,會讓這個趨勢繼續向前邁進。除此之外,多視角視訊與立體視訊帶來的鮮活感受以及身歷其境的體驗,也是令人無法抗拒的。在最近的視訊編碼標準發展中,多視角視訊編碼(MVC)擴充了原本的H.264/AVC標準,並且支援了跨視角間的位移補償,以進一步降低資料冗餘。因為MVC中使用的功能區塊,大部分與H.264/AVC標準的相似,因此有可能在只提高些許複雜度的情況下,實作出同時支援單視角超高畫質、以及多視角高畫質的編碼器。然而視訊處理量越高,編碼器的工作負荷也隨之增加,而現今的架構難以負荷如此重任。為了解決此難題,本論文提出了超高畫質H.264/AVC編碼器的超大型積體電路架構,包含支援超高處理量的預測核心以及高效能的快取系統。
在視訊編碼器中,運算量與頻寬需求量最高的模組是預測核心,而整數移動估計就佔了超過一半的資源。在我們的目標規格(超高畫質4k×2k)中,運算量與頻寬都比可接受範圍高了數個數量級,而且晶片上記憶體的面積也遠超過預期。首先,我們提出了以硬體設計為導向的快速整數移動估計演算法,精密地進行資料重複利用,並準確地判斷預測改進中心,可以在影像畫質PSNR只有降低0.013dB的情況下,節省96%的運算量。其次,簡化的半整數點內差法將小數移動估計的頻寬需求降低了31%,而影像畫質只降低了0.03dB。交錯式雙核心熵編碼器使用了執行緒層級的平行化,解決了算數編碼的效能瓶頸。我們也提出了高效能的快取系統來取代傳統的參考畫面緩衝器,如此可降低晶片上的記憶體容量需求,並且降低對外頻寬。 在超高畫質編碼器上的快取設計中,設計的困難點主要在於快取失誤率、快取失誤損失、資料預先讀取的額外資源、以及資料量高速處理的需求。本論文提出的預先讀取演算法,採用快速預先讀取樣式,以及以優先權為基準的快取替代準則,能夠降低資料讀取時的快取失誤率,與資料預先讀取的額外資源。本論文提出的四路非阻擋式快取架構,支援了同時的資料讀取與預先讀取,進一步降低了快取失誤損失。此外,本架構支援每個硬體週期五個資料字節(word)的處理量,而且在字節不對齊快取資料線時沒有效能損失。在實驗結果裡,平均快取失誤率比傳統架構降低了93%,而且平均快取命中率超過了99.7%。與之前在ISSCC 2008提出的作品相比,本架構減少了82%的晶片面積,節省了39%的外部記憶體頻寬,而支援的影像解析度較之前提高了四倍多。 在本論文中提出的設計,已經實作在90奈米製程的特殊應用積體電路(ASIC)中,其工作時脈為300MHz。這是世界上第一個支援到超高畫質4096x2160p 24fps即時性能的H.264/AVC視訊編碼晶片。除此之外,此晶片可以重新組態以支援立體與多視角編碼標準,也是第一個支援即時的1920x1080p 30fps三視角與1280x720p 30fps的六視角編碼的設計。視訊編碼技術又朝向終極目標邁進了一步。 With the progress of video technology, the image resolution is getting finer, and this directly contributes to the video quality. From VCD to high definition (HD) contents, the video quality stays on a fast growing track. In the foreseeable future, camcorders and display devices with HD or super HD capability will make this trend keep going. Moreover, the vividness and the immersive perceptual experience brought by multi-view and stereo video are also irresistible. In the recent development of video coding standards, multi-view video coding (MVC) extends H.264/AVC and supports inter-view prediction to further reduce the data redundancy between different views. Since the majority of functional blocks in MVC resemble those in H.264/AVC, it is possible to implement a multi-standard video encoder for both super high definition video and multi-view video of similar throughput without much overhead. However, with greater video throughput comes greater burden for the encoders, and this burden is difficult to overcome in currently available architectures. In order to solve this challenge, a VLSI architecture design of super high definition H.264/AVC encoder with a ultra high throughput prediction core and an efficient cache system is proposed in this thesis. In a video encoder, the most computation and bandwidth requirements are caused by the prediction core, and integer motion estimation (IME) alone costs more than half the resources. In our target specification (Super HD 4k x 2k), the computation and bandwidth are orders of magnitude beyond the acceptable range, and the silicon real-estate for on-chip SRAM is far from affordable. A hardware-oriented fast IME algorithm with sophisticated data reuse and refinement center decision is proposed, and 96% computation is saved at expense of only 0.013 dB PSNR drop on average. With simplified half-pel interpolation, the memory bandwidth of FME reduces by 31% and the quality drop is only 0.03 dB. Interleaved double current frame scheme exploits thread parallelism in the entropy coder. This solves the throughput bottleneck in CABAC and achieves 1.2G symbols per second. An efficient cache system is proposed as the reference frame buffer, which occupies smaller on-chip memory size and consumes lower external bandwidth. The main challenge of cache design for super high definition H.264 encoder includes miss rate, miss penalty, overhead of data prefetching, and requirement of high throughput. In the proposed prefetching algorithm, rapid prefetching patterns and the priority-based replacement policy reduce the miss rate in data reading and the overhead of data prefetching. The proposed 4-way non-blocking cache architecture with concurrent data prefetching further reduces the miss penalty and supports throughput of 5 words per cycle with no penalty of cache line split. The average cache miss rate is decreased by 93%, thus the average cache hit rate is greater than 99.7%. Compared with the prior art in ISSCC '08, the proposed cache architecture requires 82% less chip area and 39% less external memory bandwidth while supports video resolution more than four times higher. The proposed design in this thesis is implemented in TSMC 90 nm technology and works at 300 MHz. It is the first H.264/AVC video encoding chip that supports Super HD 4096 x 2160p resolution with 24 fps real-time performance. Furthermore, it can be reconfigured to support the MVC format with world-record throughput of three-view 1920 x 1080, 30 fps on a single chip. Therefore, the video coding technology is one step closer to the ultimate goal. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40635 |
全文授權: | 有償授權 |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-97-1.pdf 目前未授權公開取用 | 4.03 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。