快捷資料列搬移機制在非對稱式次陣列動態隨機存取記憶體中的應用

Ying-Chen Lin; 林映辰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18200

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲(Chia-Lin Yang)
dc.contributor.author	Ying-Chen Lin	en
dc.contributor.author	林映辰	zh_TW
dc.date.accessioned	2021-06-08T00:54:39Z	-
dc.date.copyright	2015-08-11
dc.date.issued	2014
dc.date.submitted	2015-03-12
dc.identifier.citation	[1] Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, and Onur Mutlu. Tiered-latency dram: A low latency and low cost dram architec- ture. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on, pages 615–626, Feb 2013. [2] Young Hoon Son, O. Seongil, Yuhwan Ro, Jae W. Lee, and Jung Ho Ahn. Reducing memory access latency with asymmetric dram bank organizations. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pages 380–391, New York, NY, USA, 2013. ACM. [3] Tae-Young Oh, Hoeju Chung, Young-Chul Cho, Jang-Woo Ryu, Kiwon Lee, Changyoung Lee, Jin-Il Lee, Hyoung-Joo Kim, Min Soo Jang, Gong-Heum Han, Kihan Kim, Daesik Moon, Seungjun Bae, Joon-Young Park, Kyung-Soo Ha, Jae- woong Lee, Su-Yeon Doo, Jung-Bum Shin, Chang-Ho Shin, Kiseok Oh, Doohee Hwang, Taeseong Jang, Chulsung Park, Kwangil Park, Jung-Bae Lee, and Joo Sun Choi. 25.1 a 3.2gb/s/pin 8gb 1.0v lpddr4 sdram with integrated ecc engine for sub-1v dram core operation. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International, pages 430–431, Feb 2014. [4] Ricki Dee Williams, Theresa Sze, Dawei Huang, Sreemala Pannala, and Clement Fang. Server memory road map. 2012. [5] Micron. Rldram 2 sio. 2004. [6] T. Takahashi, T. Sekiguchi, R. Takemura, S. Narui, H. Fujisawa, S. Miyatake, M. Morino, K. Arai, S. Yamada, S. Shukuri, M. Nakamura, Y. Tadaki, Kazuhiko Kajigaya, Katsutaka Kimura, and B.S. Kiyoo Itoh. A multigigabit dram technology with 6f2 open-bitline cell, distributed overdriven sensing, and stacked-flash fuse. Solid-State Circuits, IEEE Journal of, 36(11):1721–1727, Nov 2001. [7] Y. Sato, T. Suzuki, T. Aikawa, S. Fujioka, W. Fujieda, H. Kobayashi, H. Ikeda, T. Nagasawa, A Funyu, Y. Fuji, K. Kawasaki, M. Yamazaki, and M. Taguchi. Fast cycle ram (fcram); a 20-ns random row access, pipe-lined operating dram. In VLSI Circuits, 1998. Digest of Technical Papers. 1998 Symposium on, pages 22–25, June 1998. [8] J Thomas Pawlowski. Hybrid memory cube (hmc). In Hotchips, volume 23, pages 1–24, 2011. [9] T. Schloesser, F. Jakubowski, J. v.Kluge, A Graham, S. Slesazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pam- ler, T. Schuster, W. Bergner, and W. Mueller. 6f2 buried wordline dram cell for 40nm and beyond. In Electron Devices Meeting, 2008. IEDM 2008. IEEE International, pages 1–4, Dec 2008. [10] Samsung. 2gb d-die ddr3 sdram. 2011. [11] Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. Marss: A full system simulator for multicore x86 cpus. In Proceedings of the 48th Design Automation Conference, DAC ’11, pages 1050–1055, New York, NY, USA, 2011. ACM. [12] P. Rosenfeld, E. Cooper-Balis, and B. Jacob. Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters, 10(1):16–19, Jan 2011. [13] Aamer Jaleel. Memory characterization of workloads using instrumentation-driven simulation. 2010. [14] Renesas. 1.1g-bit low latency dram-iii. 2013. [15] Micron. 2gb: x4, x8, x16 ddr2 sdram. 2006. [16] Aniruddha N. Udipi, Naveen Muralimanohar, Niladrish Chatterjee, Rajeev Balasub- ramonian, Al Davis, and Norman P. Jouppi. Rethinking dram design and organiza- tion for energy-constrained multi-cores. In Proceedings of the 37th Annual Interna- tional Symposium on Computer Architecture, ISCA ’10, pages 175–186, New York, NY, USA, 2010. ACM. [17] Yoongu Kim, V. Seshadri, Donghyuk Lee, J. Liu, and O. Mutlu. A case for exploiting subarray-level parallelism (salp) in dram. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on, pages 368–379, June 2012. [18] F.A Ware and C. Hampel. Improving power and data efficiency with threaded mem- ory modules. In Computer Design, 2006. ICCD 2006. International Conference on, pages 417–424, Oct 2006. [19] Hongzhong Zheng, Jiang Lin, Zhao Zhang, E. Gorbatov, H. David, and Zhichun Zhu. Mini-rank: Adaptive dram architecture for improving memory power efficiency. In Microarchitecture, 2008. MICRO-41. 2008 41st IEEE/ACM International Sympo- sium on, pages 210–221, Nov 2008. [20] Doe Hyun Yoon, Min Kyu Jeong, and Mattan Erez. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. SIGARCH Comput. Archit. News, 39(3):295–306, June 2011. [21] Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, M. Sullivan, Ikhwan Lee, and M. Erez. Balancing dram locality and parallelism in shared memory cmp systems. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on, pages 1–12, Feb 2012. [22] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, Lei Jiang, G.H. Loh, D. Mc- Cauley, P. Morrow, D.W. Nelson, D. Pantuso, P. Reed, J. Rupley, Sadasivan Shankar, J. Shen, and C. Webb. Die stacking (3d) microarchitecture. In Microar- chitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, pages 469–479, Dec 2006. [23] Gabriel H. Loh. 3d-stacked memory architectures for multi-core processors. SIGARCH Comput. Archit. News, 36(3):453–464, June 2008. [24] G. Dhiman, R. Ayoub, and T. Rosing. Pdram: A hybrid pram and dram main memory system. In Design Automation Conference, 2009. DAC ’09. 46th ACM/IEEE, pages 664–669, July 2009. [25] Li Zhao, R. Iyer, R. Illikkal, and D. Newell. Exploring dram cache architectures for cmp server platforms. In Computer Design, 2007. ICCD 2007. 25th International Conference on, pages 55–62, Oct 2007. [26] Xiaowei Jiang, N. Madan, Li Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian. Chop: Adaptive filter-based dram caching for cmp server platforms. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on, pages 1–12, Jan 2010.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18200	-
dc.description.abstract	在過去十年間,DRAM(動態隨機存取記憶體)科技的發展持續不斷的在追求產品的容量與傳輸頻寬。相較之下,DRAM 的存取時間在這十年中並沒有大幅度的進展。在 DRAM 產品中採用較短的 bitline 可以有效的縮短DRAM 存取時間,但是另一方面這個方法會影響 DRAM 中陣列的密度。在主要的產品市場中,製造商並不樂意以密集度換取存取時間。因此過去的研究中 [1, 2],曾經提出以複合式 bitline 的設計來解決這個問題。在同一顆DRAM 晶片中,以長短不同的 bitline 分別構成的速度與密度不同的兩個區域,藉此平衡較短 bitline 對於陣列密度所帶來的衝擊。但是,我們觀察到過去研究中幾個主要的缺點,例如在 TL-DRAM 中 [1],所採用的設計對於陣列具有侵入性,或者在非對稱式次陣列 DRAM 中 [2],不同區域間沒有搬移資料列的機制。在這篇論文中,我們提出一個創新且快捷的方法,讓資料列可以在兩個次陣列間搬移,並將這個方法應用在非對稱式次陣列 DRAM 中 [2]。再者我們探索了許多管理的策略,並提出一套簡單的管理機制。我們的實驗結果顯示加入了資料列搬移與管理機制後,我們可以分別在單執行序與多執行序的環境中,達到平均 7.25%與 11.77%的效能提昇。這些數字已經超過理想中全部以較短 bitline 所組成的 DRAM 效能的 80%。	zh_TW
dc.description.abstract	The evolution of DRAM technology has been driven by capacity and bandwidth during the last decade. In contrast, DRAM access latency stays relatively constant and is trending to increase during the same period. Having smaller bitline length in a DRAM device will reduce the device access latency. However by doing so it will impact the array efficiency. In the mainstream market, manufacturers are not willing to trade capacity for latency. Prior works [1, 2] had proposed hybrid-bitline DRAM design to overcome this problem. They hybrid long and short bitline designs on the same chip to form fast and slow levels, and the capacity lost is amortized. However, the main drawbacks of those methods are either intrusive to the circuit design [1], or there’s no direct way to migrate data between the fast and slow level [2]. In this paper, we proposed a novel and low cost way to allow data to migrate between subarrays. Applying this design to asymmetric sub-array DRAM, we proposed a simple management mechanism and explored many management related policies. We showed that with this new design and simple management technique we could achieve 7.25% and 11.77% performance improvement in single- and multi-programming workloads, respectively over a system with traditional homogeneous DRAM. This gain is above 80% of the potential performance gain of a system based on a hypothetical DRAM which is made out of short bitlines entirely.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T00:54:39Z (GMT). No. of bitstreams: 1 ntu-103-R01922017-1.pdf: 1373083 bytes, checksum: a1a2ea09a4045b17e1e0021013ff66db (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	1 Introduction 1 2 Background 4 2.1 DRAM Device Organization . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Subarray Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 DRAM Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Hybrid-Bitline DRAM 7 3.1 TL-DRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Asymmetric-subarray DRAM . . . . . . . . . . . . . . . . . . . . . . . 10 4 Proposed Hybird-DRAM Design with Row Migration Mechanism 4.1 Migration Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . .11 4.2 Row Migration Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.3 Silicon Area Overhead and Migration Path Length . . . . . . . . . . . . 14 5 Data Management 16 5.1 Row Swapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 Translation Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3 Management Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6 Experimental Setup 22 7 Results and Discussion 24 7.1 Single-Programming Evaluation . . . . . . . . . . . . . . . . . . . . . . 24 7.2 Multi-Programming Evaluation . . . . . . . . . . . . . . . . . . . . . . . 26 7.3 Filtering Policy for Row Promotion . . . . . . . . . . . . . . . . . . . . 27 7.4 Translation Cache Capacity Sensitivity . . . . . . . . . . . . . . . . . . . 28 7.5 Migration Group Size Sensitivity . . . . . . . . . . . . . . . . . . . . . . 29 7.6 Fast Level Capacity and Replacement Policy . . . . . . . . . . . . . . . 30 8 Related Works 31 9 Conclusion 33 Bibliography 34
dc.language.iso	en
dc.title	快捷資料列搬移機制在非對稱式次陣列動態隨機存取記憶體中的應用	zh_TW
dc.title	Supporting Lightweight Row Migration for Asymmetric-Subarray DRAM	en
dc.type	Thesis
dc.date.schoolyear	103-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	呂士漣(Shih-Lien Lu),洪士灝(Shih-Hao Hung),徐慰中(Wei-Chung Hsu)
dc.subject.keyword	動態隨機存取記憶體,非對稱式次陣列,資料列搬移,	zh_TW
dc.subject.keyword	DRAM,Asymmetric Subarray,Data Migration,	en
dc.relation.page	37
dc.rights.note	未授權
dc.date.accepted	2015-03-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	1.34 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。