請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66759完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 簡韶逸(Shao-Yi Chien) | |
| dc.contributor.author | Po-Hao Hsu | en |
| dc.contributor.author | 許博豪 | zh_TW |
| dc.date.accessioned | 2021-06-17T00:56:01Z | - |
| dc.date.available | 2011-10-21 | |
| dc.date.copyright | 2011-10-21 | |
| dc.date.issued | 2011 | |
| dc.date.submitted | 2011-09-21 | |
| dc.identifier.citation | [1] Elsevier, Computer Architecture, 2006.
[2] D. A. Patterson and J. L. Henness, Computer Organization & Design, The Hardware/Software Interface, 3rd edition. [3] A. J. Smith, 'Cache memories,' in ACM Computing Survey, 1982, vol. 14, pp. 473clk530. [4] P. Kostas and S. Ali, 'Contentclkaddressable memory (CAM) circuits and architectures: A tutorial and survey,' in Journal of SolidclkState Circuits, 2006, vol. 41, pp. 712clk727. [5] W.H. Burge, 'Stream processing functions,' in IBM Journal of Research and Development, 1975, vol. 19, pp. 12{25. [6] P. Viola and M. Jones., 'Rapid object detection using a boosted cascade of simple features,' in Proceedings of Computer Vision and Pattern Recognition (CVPR), 2001, pp. 511{518. [7] F. Porikli, 'Integral histogram: a fast way to extract histograms in cartesian spaces,' in Computer Vision and Pattern Recognition, CVPR, 2005, vol. 1, pp. 829{836. [8] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, 'SURF: Speeded up robust features,' in Proceedings of Computer Vision and Image Underclk standing (CVIU), 2008, vol. 110, pp. 346clk359. [9] D. G. Lowe, 'Distinctive image features from scaleclkinvariant keypoints,' in International Journal of Computer Vision, 2004, vol. 60, p. 91V110. [10] C. Huang, K. Lin, and F. Long, 'A fast eye localization algorithm using integral image,' in Proceedings of Computational Intelligence and Design, 2009. ISCID, 2009, vol. 1, pp. 231{234. [11] J. Zhi, Y. Chen, and Y. Xia, 'A vehicle segmentation approach by fast mean computation using integral image in intelligent transportation system,' in Proceedings of Industrial Mechatronics and Automation (ICIMA), 2010, pp. 126{129. [12] G. Carrera, J. Savage, and W. MayolclkCuevas, 'Robust feature descriptors for vision based object tracking,' in Iberoamerican Conference on Pattern Recognition, CIARP, 2007, pp. 251{260. [13] J.clkH. Jung, H.clkS. Lee, J.clkH. Lee, and D.clkJ. Park, 'A novel template matching scheme for fast fullclksearch boosted by an integral image,' inSignal Processing Letters, IEEE, 2010, pp. 107{110. [14] W. K. Pratt, Digital Image Processing, 3rd Edition, John Wiley & Sons Inc., 2001. [15] B. Weiss, 'Fast median and bilateral filtering,' in ACM Transactions on Graphics, 2006, vol. 25, pp. 519{526. [16] F. Porikli, 'Constant time O(1) bilateral filtering,' in Proceedings of Computer Vision and Pattern Recognition (CVPR), 2008, pp. 3895{3902. [17] P.clkH. Hsu, Y.clkC. Tseng, and T.clkS. Chang, 'Low memory cost bilateral filtering using stripeclkbased sliding integral histogram,' in Proceedings of International Symposium on Circuits and Systems (ISCAS), 2010, pp. 3120{3123. [18] Y.clkC. Tseng, P.clkH. Hsu, and T.clkS. Chang, 'A 124 mpixels/sec VLSI design for histogramclkbased joint bilateral filtering,' in Image Processing, IEEE Transactions on, 2011, vol. 99.[19] A. Adam, E. Rivlin, and I. Shimshoni, 'Robust fragmentsclkbased tracking using the integral histogram,' in Computer Vision and Pattern Recognition, CVPR, 2006, vol. 1, pp. 798 { 805. [20] Y. Chai, S. Shin, K. Chang, and T. Kim, 'Realclktime user interface using particle filter with integral histogram,' in Consumer Electronics, IEEE Transactions on, 2010, vol. 56, pp. 510{515. [21] Y. Horry, Anjyo, and Kiyoshi K. Arai, 'Tour into the picture: using a spidery mesh interface to make animation from a single image,' in SIGGRAPH '97: Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 1997, pp. 225{232, ACM Press/AddisonclkWesley Publishing Co. [22] S. P. VanderWiel and D. J. Lilj, 'Data prefetch mechanisms,' in ACM Computing Surveys (CSUR), 2000, vol. 32, pp. 174{199. [23] K. J. Nesbit and J. E. Smith, 'Data cache prefetching using a global history buffer,' in IEEE Computer Society, 2005, vol. 25, pp. 90{97. [24] S. Bhatia, E. Varki, and A. Merchant, 'Sequential prefetch cache sizing for maximal hit rate,' in Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2010, pp. 89{98. [25] http://www.chrisevansdev.com/computerclkvisionclkopensurf.html. [26] M. Agrawal, K. Konolige, and M. R. Blas, 'CenSurE: Center surround extremas for realtime feature detection and matching,' in Proceedings of European Conference on Computer Vision (ECCV), 2008, pp. 102{115. [27] C. Harris and M.J. Stephen, 'A combined corner and edge detector,' in Alvey Vision Conference, 1988, vol. PAMIclk8, pp. 147{152. [28] J. Canny, 'A computational approach to edge detection,' in Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1986, pp. 679{698. [29] C. P. Papageorgiou and T. Poggio M. Oren, 'A general framework for object detection,' in International Conference on Computer Vision ICCV, 1998, pp. 555{562. [30] M. Schaeferling and G. Kiefer, 'FlexclkSURF: A fliexible architecture for fpgaclkbased robust feature extraction for optical tracking systems,' in Reconfigurable Computing and FPGAs (ReConFig), 2010, pp. 458{463. [31] S. Ehsan and K.D. McDonaldclkMaier, 'Exploring integral image word reduction techniques for surf detector,' in Computer and Electrical Engineering,ICCEE, 2009, vol. 1. [32] H.J.W. Belt, 'Word length reduction for the integral image,' in Proceedings of IEEE International Conference on Image Processing, 2008,pp. 805{808. [33] H.J.W. Belt, 'Storage size reduction for the integral image,' in Koninklijke Philips Electronics N.V, 2007. [34] AMBA Specification (Rev 2.0), 1999. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66759 | - |
| dc.description.abstract | 半導體科技的進步日新月異,摩爾定律(Moore’s Law)訂定出微型處理器(Micro-processor)的效能每18 個月會增長一倍。然而相對於微型處理器的成長,記憶體的速度平均每年只有增加7%。因此微型處理器與記憶體之間的速度差異
形成巨大的鴻溝。快取記憶體(Cache Memory)是用來作為處理器與晶片外的動態隨機存取記憶體(DRAM)的緩衝裝置,它是一塊非常小,具備高速讀寫功能的記憶體。快取記憶體將經常用到的資料儲存在快取內部以減少處理器與DRAM 讀寫之間的記憶體延遲。然而,快取的讀寫機制卻不適合處理一些具備串流處理(Stream Processing)的演算法,例如著名的積分圖形(Integral Image)以及積分直方圖(Integral Histogram)。透過積分圖形及積分直方圖,我們可以輕易取得任意形狀大小的面積總合或是統計直方圖以加速運算。然而這些演算法通常都具有串流處理的特性,分析結果顯示快取記憶體的擊中率(Cache hit rate)在此類型的資料流中容易遇上瓶頸。無法使效能更加進升一步。 在這篇論文當中,我們提出一種可重組(Reconfigurable)的快取記憶體機制同時 支援一般資料抓取或是專門負責積分圖形或積分直方圖的串流處理,稱作RBSP-記憶體。它具備兩種不同的運作模式:快取記憶體模式以及RBSP 模式。當RBSP-記憶體處於快取記憶體模式時,它的運作方式如同一個集合關聯式的快取記憶體。而當處於RBSP 模式時,則是專門用來處理積分圖形以及積分直方圖的應用。它會先將演算法中接下來會用到的積分圖形或積分直方圖資料從DRAM 取回儲存至RBSP-記憶體中。之後處理器直接跟RBSP-記憶體溝通進行存取,以減少不必要的記憶體延遲。我們將這樣的一塊記憶體實現至兩種積分圖形以及積分直方圖的演算法應用中。一是加速強健特徵點(Speed Up Robust Feature, SURF),另一是中央-周圍直方圖差(Center-surround histogram)。其中我們討論到RBSP-記憶體讀取時對於圖形中每一列的重覆使用情形。此外,為了將演算法中每一次之後都會使用到的列元素存取至記憶體中,我們提出了一種映射演算法來幫助記憶體的存取,我們分別使用硬體及軟體來實現該演算法並討論其效能及對於晶片面積的大小影響。最後討論到的是藉由將讀取的區塊作切割(Memory Dividing Technique, MDT),以減少記憶體中所需儲存的單元長度(Word length)。 我們將硬體實現於電子系統層級設計(Electronic System Level)的模擬軟體中。在輸入影像為VGA 640x480 的大小下, RBSP-記憶體在加速強健特徵點的表現比同樣大小的傳統快取記憶體好上38.31%;而在中央-周圍直方圖差當中則快上48.29%。最後我們使用Synopsys Design Compiler 進行合成,使用TSMC 180 奈米製程,所得到的閘數為514.6K,其中RBSP 模式中,分別使用硬體或軟體執行映射演算法所需要的控制電路只佔全部的7.61%或5.28%。 | zh_TW |
| dc.description.abstract | With the development of semiconductor technology, the capability of micro-processor doubles almost every 18 months, by the Moore's Law. However, the speed of off-chip DRAM grows only 7% every year. There is a huge gap between the speed of CPU and DRAM. Cache memory is a high speed memory which can reduce the memory access latency between the processor and off-chip DRAM, and it usually occupies a large area of the whole system.However, for some operation with integral images and integral integral histograms, which are famous for getting an arbitrary-sized block summation and histogram in a constant speed and widely implemented in many applications, the read and write mechanisms of cache are not suitable for such algorithms with stream processing characteristic. With larger cache size, the cycle count can be further reduced. However, the analysis results show that there is a bottleneck of cache hit rate and the cycle count reduction meets a limitation.
For the above reasons, in this thesis, a reconfigurable cache memory mechanism is proposed to support both general data access and stream processing. This proposed memory has two modes: normal cache mode and Row-Based Stream Processing (RBSP) mode, which is a specific mechanism for data accessing of integral images and integral histograms. The RBSP mode can reduce the cycle count because all the subsequent necessary data has been precisely prefetched with the basic accessing unit of an image row. Two integral image and integral histogram applications, SURF algorithm and center-surround histogram of salience map, are implemented to verify the proposed mechanism. Moreover, the data reuse scheme intra-filter-size sharing and inter-filter-size sharing between different filter sizes and diffierent filtering stripes are taken into consideration to further reduce the data access to the off-chip DRAM. A mapping algorithm is proposed to help the RBSP memory read and write the data, which is implemented in hardware and software versions. In addition, a method called Memory Dividing Technique (MDT) is also proposed to further reduce the word-length. The whole system is built in the Coware Platform Architect to verify our design. Our target image size is VGA 640 x 480 and the experimental results show that the proposed Reconfigurable RBSP memory can save 38.31% and 48.29% memory cycle count for these two applications compared to the traditional data cache in the same level of size. The hardware is implemented with Verilog-HDL and synthesized with Synopsys Design Compiler in TSMC 180nm technology. The total gate count of RBSP memory is 557.0K. The overhead of our proposed RBSP memory is very small, just 7.61% or 5.28% with hardware or software based implementation compared to the set associative cache. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-17T00:56:01Z (GMT). No. of bitstreams: 1 ntu-100-R98943018-1.pdf: 3867763 bytes, checksum: 684a4b6e38af81f33f4061aa4c7292af (MD5) Previous issue date: 2011 | en |
| dc.description.tableofcontents | 1 Introduction and Motivation 1
1.1 Introduction to Cache memory . . . . . . . . . . . . . . . . . . 1 1.2 Introduction to Integral Image and Integral Histogram . . . . 5 1.3 Motivation and Design Target . . . . . . . . . . . . . . . . . . 8 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Related Application and Algorithms 11 2.1 Speeded-Up Robust Features . . . . . . . . . . . . . . . . . . . 11 2.1.1 Integral Image Construction . . . . . . . . . . . . . . . 12 2.1.2 SURF Detector . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 SURF Descriptor . . . . . . . . . . . . . . . . . . . . . 15 2.2 Center-Surround Histogram of Salience Map . . . . . . . . . . 18 3 Algorithm Analysis and Proposed mechanism 21 3.1 Analysis of Cache Memory . . . . . . . . . . . . . . . . . . . . 21 3.2 Analysis for Stream Processing of Integral Image and Integral Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Row-Based Stream Processing . . . . . . . . . . . . . . . . . . 27 3.3.1 Row-Based Stream Processing for SURF . . . . . . . . 28 3.3.2 Row-Based Stream Processing for Center-surround Histogram .. . . . . . 34 3.4 Memory Dividing Technique . . . . . . . . . . . . . . . . . . . 34 3.5 Hardware Analysis for RBSP Mode and Cache Mode . . . . . 36 4 Proposed Hardware Design 39 4.1 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Cache Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 RBSP Mode for Writing Data . . . . . . . . . . . . . . . . . . 45 4.4 RBSP Mode for Reading Data . . . . . . . . . . . . . . . . . . 48 5 Experimental Results and Comparison 49 5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 49 5.2 AMBA AHB Protocol . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Implementation Results and Comparison . . . . . . . . . . . . 54 5.3.1 Comparison between RBSP Memory and Cache in SURF Descriptor. 54 5.3.2 Comparison between RBSP Memory and Cache in SURF Detector . . . 54 5.3.3 Comparison between RBSP Memory and Cache in Center- surround Histogram . . . . . . . . . . . . . . . . . . . . 59 5.3.4 Synthesis Result and Area Overhead Discussion . . . . 62 6 Conclusion 65 | |
| dc.language.iso | zh-TW | |
| dc.subject | 串流處理 | zh_TW |
| dc.subject | 積分圖形 | zh_TW |
| dc.subject | 積分直方圖 | zh_TW |
| dc.subject | 可重組快取記憶體 | zh_TW |
| dc.subject | Integral Image | en |
| dc.subject | Integral Histogram | en |
| dc.subject | Reconfigurable Cache Memory | en |
| dc.subject | Stream Processing | en |
| dc.title | 基於積分圖形與積分直方圖應用之可重組快取記憶體之硬體機制設計 | zh_TW |
| dc.title | Reconfigurable Cache Memory Mechanism for Integral Image and
Integral Histogram Applications | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 100-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 賴永康(Yeong-Kang Lai),盧奕璋(Yi-Chang Lu),張天烜(Tian-Sheuan Chang),洪士灝(Shih-Hao Hung) | |
| dc.subject.keyword | 積分圖形,積分直方圖,可重組快取記憶體,串流處理, | zh_TW |
| dc.subject.keyword | Integral Image,Integral Histogram,Reconfigurable Cache Memory,Stream Processing, | en |
| dc.relation.page | 70 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2011-09-21 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電子工程學研究所 | zh_TW |
| 顯示於系所單位: | 電子工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-100-1.pdf 未授權公開取用 | 3.78 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
