請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10491
標題: | 多核心系統晶片之系統合成方法 System Synthesis for Multi-Processor System-on-Chips |
作者: | Yi-Jung Chen 陳依蓉 |
指導教授: | 楊佳玲(Chia-Lin Yang) |
關鍵字: | 運算單元,記憶體系統,多核心單晶片系統,合成,三維堆疊,分散式記憶體系統介面, Processing Elements,Memory Subsystem,Multi-Processor System-on-Chip,Synthesis,3-Dimensional Integration,Distributed Memory Interface, |
出版年 : | 2010 |
學位: | 博士 |
摘要: | 近年來由於製成技術的進步,使支援多個工作同時進行之多核心架構(Multi-core architecture)成為晶片設計主流。多核心架構因系統上有多個可獨立運做之運算單元(Processing Elements, PE),使其特別適用於有大量平行度之應用程式。但每個可獨立運做之PE也會同時發出記憶體存取之需求,進而對記憶體系統造成不小之壓力。因此在多核心架構晶片設計上,記憶體系統之設計對整體系統效能有相當重要的影響。因此,在本論文中,我們針對兩種不同的多核心系統晶片架構:(1) 以傳統二維方式連結處理器及動態存取記憶體(Dynamic Random Access Memory, DRAM)多核心系統晶片(Multi-Processor System-on-Chips (MPSoCs) with traditional 2D CPU-DRAM connection),及(2)以三維堆疊DRAM之多核心系統晶片(MPSoCs with stacked DRAMs),分別提出考量記憶體系統架構之系統合成方法。
在MPSoCs with traditional 2D CPU-DRAM connection架構方面,我們發現,為達最佳系統效能,系統晶片上之運算單元與記憶體模組之資源分配應針對所執行之應用程式,以避免記憶體系統成為系統效能之主要瓶頸。然而,傳統之多核心單晶片系統設計流程中,運算單元與記憶體模組之資源分配通常都分開獨立進行,因此無法考量到兩種資源之分配多寡對系統之影響。因此,在本論文中,我們針對此一問題提出第一個運算單元與記憶體系統資源共同合成之多核心單晶片系統設計流程(PE and Memory Co-Synthesis (PM-COSYN) for MPSoCs)。 在以三維堆疊技術實現之單晶片多核心系統方面,由於三維堆疊技術可利用穿矽通孔(Through-Silicon Vias, TSVs)所組成之垂直通道(Vertical Bus),使DRAM可與運算處理器晶片以三維堆疊的方式整合在同一晶片系統上,此外,TSV可高密度地擺放在晶片上進而提供大量記憶體頻寬(Memory Bandwidth)。因此有大量Memory Bandwidth需求的PE,可整合記憶體控制器(DRAM Memory Controller, DMC)於其上,使PE可透過近端整合之DMC與堆疊其上之DRAM 溝通。由此可見,MPSoCs with stacked DRAM中PE與DRAM之溝通介面為由多個DMC所組成之分散式介面(Distributed DRAM Interface)。然而,DMC所需之晶片資源相當多,如果PE可不近端整合DMC,其資源可以用來擴大PE之靜態存取記憶體(Static Random Access Memory, SRAM)之容量。此外,TSV雖可提供大量Memory Bandwidth,但TSV的製成除需額外之金錢花費外,更對晶片良率(Chip Yields)有負面的影響。因此,我們在本論文中,我們針對MPSoCs with stacked DRAMs,提出一套分散式記憶體系統介面合成方法(Distributed Memory Interface Synthesis)。此合成方法根據系統對記憶體系統進行存取的行為與需求,決定晶片上之記憶體控制器個數,以及每個記憶體控制器之Vertical Bus寬度,讓系統可維持在指定之效能需求下,使晶片上之TSV總數量最少化。 Multi-core architecture is attractive to applications with significant parallelism since multiple processing elements (PEs) are put on a single die to support parallel execution. However, multi-core architecture also stresses the memory system with concurrent memory accesses from different PEs. With the number of cores on a chip increases, the main memory bandwidth requirement also grows. Therefore, it is important to have a memory-aware design when designing Multi-Processor System-on-Chips (MPSoCs). In this thesis, we propose memory-aware MPSoC synthesis methods for MPSoCs with two different architectures: (a) MPSoCs with the traditional 2-Dimensional (2D) CPU-DRAM connection, and (b) MPSoCs with 3-Dimensional (3D) stacked DRAMs. For MPSoCs with the traditional 2D CPU-DRAM connection, the main memory bandwidth is limited due to pin limitations. To maximize system performance, it is important to simultaneously consider the PE and on-chip memory architecture design with limited on-chip resource. That is, on one hand, we want to allocate as many PEs as possible to fully utilize the available task parallelism in the target applications, and on the other hand, we need to incorporate a significant amount of on-chip memory to alleviate memory bottleneck. However, in a traditional MPSoC design flow, memory and computation components are often considered independently. To tackle this problem, we develop the first PE and memory co-synthesis framework for MPSoCs with 2D CPU-DRAM connections. The goal of the algorithm is to simultaneously synthesize the allocation of PE and on-chip memory modules so that system performance is maximized subject to the resource constraint. In MPSoCs with stacked DRAMs, the 3D die-stacking technology utilizes Though-Silicon Vias (TSVs) to integrate processing cores and DRAMs on the same chip. Moreover, the TSVs that can be placed densely provide high DRAM bandwidth for the system. Therefore, to utilize the high DRAM bandwdith, each PE can have a local DRAM memory controller (DMC) so that it can directly access the DRAM module stacked on top of the PE. This forms a distributed memory interface for CPU-DRAM connection in MPSoCs with stacked DRAMs. However, a DMC occupies a significant share of transistor budget, which can be traded for enlarging the capacity of high speed local SRAM. Moreover, TSVs need extra manufacturing cost and have adverse impact on chip yields. Therefore, the distributed memory interface, including the number of allocated DMCs and vertical bus width of each DMC, should be designed carefully. To tackle this problem, in this thesis, we propose the first algorithm to synthesize the DMC allocation and vertical bus allocation for MPSoCs with stacked DRAMs. The goal of the proposed algorithm is to find a proper distributed memory interface design for the given task set so that the total number of TSVs in the system is minimized while the user-defined performance constraint is met. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10491 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-99-1.pdf | 843.13 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。