請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10324
標題: | 應用於超高畫質液晶顯示系統之多畫面插補演算法與硬體架構設計 Algorithm and Architecture Design of Multi-rate Frame Rate Up-conversion in Quad High Definition Liquid Crystal Display System |
作者: | Fu-Chen Chen 陳復禎 |
指導教授: | 簡韶逸(Shao-Yi Chien) |
關鍵字: | 畫面頻率提升,超高畫質,真實移動搜尋,馬可夫隨機場域,損壞偵測,乒乓雙向排程,移動向量群聚,反向移動補償排程,硬體效率, frame rate up-conversion,Quad HD,true motion estimation,Marcov random field,artifact detection,ping-pong two-way scheduling,motion vector grouping,inverse motion compensation scheduling,hardware efficiency, |
出版年 : | 2010 |
學位: | 碩士 |
摘要: | 畫面插補演算法,是經由分析輸入的視訊,插補出額外的畫面,藉以提升視訊的顯示頻率。而多畫面插補演算法是在兩張已有的畫面之間,插補出兩張以上的新畫面。早期這項技術是在視訊壓縮中被討論,而近幾年來則是被應用在液晶螢幕顯示器,將畫面顯示頻率提升到120赫茲或甚至更高,以解決液晶螢幕的動態模糊問題。
此項技術大致可分四個階段,第一階段先找出相鄰兩張畫面的移動向量,第二階段將移動向量作分析與優化,第三階段根據新的移動向量插補出額外的畫面,第四階段則根據已補好的畫面修正補不好的區域。由於液晶螢幕畫面的解析度越來越高,此項技術最主要的挑戰是運算量、頻寬以及記憶體的大量需求,因此成本相較於其他視訊處理的晶片是更為昂貴的。 我們發展出一套多畫面插補演算法與硬體架構,能夠符合現行液晶螢幕顯示系統的基準。演算法先以預測方形搜尋法,利用移動向量空間相依的特性,快速的找出畫面大致的移動向量。接著我們以馬可夫隨機場域為基礎,對已有的移動向量作修正,以極低的運算量找出畫面中真正的移動向量。將原本的移動向量轉移到中間的畫面主要有三種方法,但每種都不是完美的。我們綜合各種方法的優點,以方格穿越式移動補償來出補出中間的畫面。接著我們提出一個簡單有效的方法,保證找出補不好的區域。對於這些區域,我們將他的方格切小,用雙向的方式幫他們找到缺失最少的移動向量,再以疊加式方格移動補償將此區域畫面給補得更好。 硬體架構上,由於支援的向量範圍為負一百二十八到正一百二十八,因此我們以特殊的記憶體存取安排,使記憶體使用量小非常多。為了去除各個步驟相依性造成的排程空缺,我們提出乒乓雙向的特殊排程將空缺給填滿。對於差異計算的部分,我們提出以可變性加法器構成的加法樹架構,僅僅只需要八十五個加法器。整個系統的記憶體和和加法樹是被所有的分部給共用的。對於馬可夫隨機場域向量修正,我們發展出一套向量群聚的演算法與硬體架構,使相似向量的所需的資料可以共用,並且利用已計算過的差異值讓運算量可以更進一步減少,省下大量的頻寬和運算時間。多畫面插補的部分,我們提出反向插補排程技巧,使得這部分的頻寬和運算能達到最小值,而硬體架構也可以和方格穿越式相量轉移來共用。最後對於小區域的修正,我們以模擬進行平行度分析,使運算速度達到要求且不需要額外的硬體。為了要充分利用頻寬,我們也用了特殊的記憶體排列,使得資料可以任意的被存取。 實驗分析上,我們挑選三篇文獻作演算法比較,主觀比較是由受測者從不同演算法插補的畫面中,挑選他認為最好的;客觀比較則是將原始視訊的顯示頻率減半,用不同演算法插補出新的視訊,再和原始視訊比較信噪比。結果顯示我們的演算法無論在主觀比較和客觀比較上,都優於其他三種論文的演算法。硬體實作以Verilog硬體描述語言實現,用UMC 90nm製程元件庫以及SYNOPSIS Design Compiler來合成。所得之全部閘數為274K,記憶體使用量為單端口9984 bytes。其運算頻率為300MHz,能提供24赫茲轉120赫茲及60赫茲轉120赫茲的多畫面插補,並支援到下一代液晶螢幕的3840x2160解析度。硬體使用效率部分,比較其它文獻的硬體實作,我們的使用效率也是最佳的。 Frame rate up-conversion is a technique that up-converts the frame rate of video sequence by analyzing it and interpolating additional frames. Multi-rate frame rate up-conversion interpolates two or more frames between two existing frames. This technique is previously discussed for video compression and applied on LCD to convert frame rate up to 120Hz or even higher for eliminating the LCD motion blur problem in recent years This technique roughly consist of four steps, the first step finds the motion vectors between two successive frames, the second step then analyzes and optimizes the motion vectors, the third step interpolates additional frames according to new motion vectors, and the forth step corrects the region with artifact on interpolated frames. As the resolution of LCD getting higher and higher, the main challenges of this technique are the huge demands of computation, bandwidth and on-chip SRAM. Therefore the cost of it is more expensive than other video processing DSPs. We develop a multi-rate frame rate up-conversion algorithm and architecture that is compatible with current LCD system’s standard. The algorithm first performs predictive square search motion estimation which utilizes the spatial coherence of motion vector field. This motion estimation algorithm can roughly find the true motion of existing frames quickly. Then we apply motion vector processing based on Marcov random field with a very low-cost minimization method to find the true motion of existing frames. There are three general methods for mapping motion vector to inter-frames but none of them is perfect. We employ the advantages of each method, proposed a block-based through motion compensation for interpolating inter-frames. And then we bring up a simple and precise technique that guarantees to detect the region with artifact. For the region, we perform sub-block division, find new motion vector with the least artifact in bilateral directions and interpolate it by overlapped block motion compensation for better visual quality. As regards to hardware architecture, since the supporting search range of motion vector is ±128x±128, we provide a special SRAM arrangement that reduces a huge amount of SRAM size. For eliminating the dependencies of each step causing pipeline bubbles, we propose ping-pong two-way scheduling to fill-up the bubbles. For distortion computation, we devise sub-trees composed of 85 flexible adders. The sum-trees and SRAM are shared by all modules of architecture. For Marcov random field motion vector correction, we develop a motion vector grouping algorithm and architecture for data reuse of similar motion vectors. We also employ the computed distortion result to further reduce the computation, saving lots of cycles and bandwidth. For multi-frame interpolation, an inverse motion compensation scheduling is proposed that reaches the minimum requirement of computation and bandwidth. The architecture here is also shared by block-based through motion vector mapping. For post-processing on the sub-blocks, we analyze the parallelism by simulations and reach the demand of speed without addition area overhead. For exhaustively utilizing the bandwidth, we bring up special SRAM interleaves such that the data can be read or written in an easy manner. As regards the experiments, we select three literatures for algorithm comparison. The subjective evaluation is that the subjects choose the best one from all frames interpolated by different algorithms. The objective evaluation is that halves the frame rate of original video sequence, interpolates new sequences by different algorithms and generates PSNR to the original video sequence. The results indicate that our algorithm is better than other three algorithms on both subjective and objective evaluation. We use Verilog-HDL for hardware implementation and synthesize it by SYNOPSIS Design Compiler with UMC 90nm cell library. The implementation shows that the total number of gate count is 274K and on-chip SRAM is single-port 9984 byte. It works at 300MHz frequency, providing 24Hz to 120Hz and 60Hz to 120Hz multi-rate up-conversion and supporting 3840x2160 resolution for next LCD generation. For the hardware efficiency, our architecture is also the best comparing to other previous implementations. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10324 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-99-1.pdf | 2.4 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。