基於FPGA上實現記憶體受限之Autolykos演算法

張凱茗; Kai-Ming Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88599

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李致毅	zh_TW
dc.contributor.advisor	Jri Lee	en
dc.contributor.author	張凱茗	zh_TW
dc.contributor.author	Kai-Ming Chang	en
dc.date.accessioned	2023-08-15T17:00:19Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-15	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-24	-
dc.identifier.citation	[1] W.-K. Chang. A high-performance memory subsystem tailored to applications with heavy bandwidth requirement. Master’s thesis, National Taiwan University, Taiwan, Dec. 2021. [2] G. Coley. Beaglebone black system reference manual. [3] S.-N. Huang. Hardware implementation of high-efficiency memory controller applied to high bandwidth memory. Master’s thesis, National Taiwan University, Taiwan, Dec. 2021. [4] JEDEC. High bandwidth memory (hbm) dram. [5] TweakTown. Nvidia geforce rtx 3070 ti teased: 8gb gddr6x. [6] Xilinx. Axi high bandwidth memory controller v1.0. [7] Xilinx. Ultrascale architecture and product data sheet: Overview.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88599	-
dc.description.abstract	本論文的重點在於深入探討Ergo虛擬貨幣的工作證明演算法Autolykos。其中工作證明(Proof of Work)是區塊鏈中廣為使用的一種共識機制，人們可透過解決複雜的數學難題來獲得加密貨幣的獎勵，目前普遍使用GPU、FPGA、ASIC來尋找更高的算力卻又更省能源的方法。傳統的PoW演算法，例如比特幣的SHA-256，是compute hard的算法，這意味著它主要受到處理器速度的限制。這種設計使得那些有能力購買和運營能源消耗大且昂貴的特殊硬體的礦工（例如ASIC）具有顯著的優勢。這導致了大規模礦工和礦場的出現，並引發了中心化和公平性的問題，因為個人和小規模礦工很難與這些大型勢力競爭。相比之下，記憶體束縛的PoW演算法是設計來限制這種優勢的。內存受限演算法的工作需求大部分基於記憶體，因此，它鼓勵礦工使用具有大量可用記憶體的硬體，而這種硬體通常比高速處理器更便宜、更容易獲得。此次研究之Autolykos算法為memory bound演算法。其特性為會對內存進行大量讀取與寫入，故整體計算速度會受限於記憶體頻寬與記憶體容量大小。而與乙太坊所使用的Ethash算法相比，Autolykos提高了計算資源的使用量，使ASIC和FPGA所需面積大為增加，以此提高兩者的成本。並且提高記憶體讀取在整體計算的比例，因此相較於Ethash又更強調memory hard的特性，如何有效運用記憶體頻寬成為提高算力最大的討論點。本論文將在Xilinx的VU35P FPGA晶片上做硬體實現，論文中將詳細描述軟體架構、舊版硬體架構及其瓶頸，再提出一種全新的架構以大幅超越同記憶體頻寬的GPU算力，並可透過調整HBM reference clock至130MHz以上，提升記憶體頻寬上限，最終算力達到266.4MHz/s，為高階GPU算力的1.5倍。	zh_TW
dc.description.abstract	This paper focuses on an in-depth exploration of the Proof of Work (PoW) algorithm, Autolykos, used in the Ergo cryptocurrency. PoW is a widely used consensus mechanism in blockchain, where individuals can earn rewards in cryptocurrency by solving complex mathematical problems. Currently, general-purpose computing devices like GPUs, specific hardware like FPGAs, and specialized ASICs are used to find more powerful yet energy-efficient mining methods. Traditional PoW algorithms, such as Bitcoin's SHA-256, are compute-bound, meaning they are mainly limited by the speed of the processor. This design gives miners who can afford to purchase and operate power-intensive and expensive specialized hardware (such as ASICs) a significant advantage. This has led to the emergence of large-scale miners and mining farms, raising issues of centralization and fairness, as individual and small-scale miners find it hard to compete with these large entities. In contrast, memory-bound PoW algorithms are designed to curb this advantage. The workload requirements of memory-bound algorithms are primarily based on memory, thus encouraging miners to use hardware with a large amount of available memory, which is generally cheaper and more accessible than high-speed processors. The Autolykos algorithm investigated in this study is memory-bound. Its characteristic feature is that it involves a large number of memory read and write operations, so the overall calculation speed is limited by memory bandwidth and memory size. Compared to Ethereum's Ethash algorithm, Autolykos increases the use of computing resources, making the area required by ASICs and FPGAs significantly larger, thus raising their cost. Furthermore, it increases the proportion of memory read operations in the overall calculation, emphasizing the memory-hard feature even more than Ethash. How to effectively utilize memory bandwidth becomes the key discussion point for enhancing the hash rate. This paper presents a hardware implementation on the Xilinx VU35P FPGA chip. It will detail the software architecture, the old hardware architecture, and its bottlenecks, and then propose a novel architecture to significantly surpass the hash rate of a GPU with the same memory bandwidth. By adjusting the HBM reference clock to above 130MHz, the upper limit of memory bandwidth can be improved, with the final computational power reaching 266.4MH/s, which is 1.5 times that of high-end GPUs.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T17:00:19Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-15T17:00:19Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents xi List of Figures xv List of Tables xvii Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2 Preliminaries 3 2.1 Introduction to Autolykos Algorithm . . . . . . . . . . . . . . . . . 3 2.1.1 The overall mining process . . . . . . . . . . . . . . . . . . . . . . 3 2.1.2 Property of Autolykos algorithm . . . . . . . . . . . . . . . . . . . 4 2.1.3 High Bandwidth Memory Overview . . . . . . . . . . . . . . . . . 6 2.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Software architecture of Autolykos algorithm . . . . . . . . . . . . 7 2.2.2 Introduction to FPGA VU35P . . . . . . . . . . . . . . . . . . . . 9 2.2.3 The problem of FPGA Resource Utilization . . . . . . . . . . . . . 10 Chapter 3 Circuit improvement of Autolykos algorithm 13 3.1 Design methodologies on FPGA . . . . . . . . . . . . . . . . . . . . 13 3.2 Design consideration . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 LUT utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 HBM Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Analysis of Mining Steps . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Problem of the HBM Bandwidth . . . . . . . . . . . . . . . . . . . . 19 3.5 Building Block of the Engine . . . . . . . . . . . . . . . . . . . . . 20 3.5.1 Overview of the engine . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5.2 Architecture of the engine . . . . . . . . . . . . . . . . . . . . . . . 24 3.6 Building Block of the hbm_mgr . . . . . . . . . . . . . . . . . . . . 30 3.6.1 Overview of the hbm_mgr . . . . . . . . . . . . . . . . . . . . . . 30 3.6.2 Architecture of the write_cmd_top . . . . . . . . . . . . . . . . . . 31 3.6.3 Architecture of the cmd_mgr_top . . . . . . . . . . . . . . . . . . . 32 3.6.4 Architecture of the fabric_top . . . . . . . . . . . . . . . . . . . . . 33 3.6.5 Architecture of the mix_mgr_top . . . . . . . . . . . . . . . . . . . 37 3.7 Building Block of system . . . . . . . . . . . . . . . . . . . . . . . 38 3.8 Improvement of Hash Rate . . . . . . . . . . . . . . . . . . . . . . 39 3.8.1 Address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.8.2 Memory refresh command . . . . . . . . . . . . . . . . . . . . . . 40 3.8.3 Theoretical maximum hash rate . . . . . . . . . . . . . . . . . . . . 42 Chapter 4 Measurement Results 45 4.1 Methodology of hash rate estimation . . . . . . . . . . . . . . . . . . 45 4.2 Configure design on FPGA . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.1 Address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.2 Memory refresh command . . . . . . . . . . . . . . . . . . . . . . 47 4.2.3 Clock frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3 Final result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4 Extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 5 Conclusion 51 5.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 References 53	-
dc.language.iso	en	-
dc.subject	Autolykos	zh_TW
dc.subject	區塊鏈	zh_TW
dc.subject	ERGO	zh_TW
dc.subject	FPGA (現場可程式化邏輯閘陣列)	zh_TW
dc.subject	Autolykos硬體架構分析	zh_TW
dc.subject	Blockchain	en
dc.subject	ERGO	en
dc.subject	Anal ysis of hardware architecture for Autolykos	en
dc.subject	Autolykos	en
dc.subject	FPGA (Field Programmable Gate Array)	en
dc.title	基於FPGA上實現記憶體受限之Autolykos演算法	zh_TW
dc.title	Memory-Bound Proof-of-Work Implementation of Autolykos Algorithm on FPGA	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	劉宗德 ;廖世偉	zh_TW
dc.contributor.oralexamcommittee	Tsung-Te Liu;Shih-wei Liao	en
dc.subject.keyword	區塊鏈,ERGO,FPGA (現場可程式化邏輯閘陣列),Autolykos,Autolykos硬體架構分析,	zh_TW
dc.subject.keyword	Blockchain,ERGO,FPGA (Field Programmable Gate Array),Autolykos,Anal ysis of hardware architecture for Autolykos,	en
dc.relation.page	53	-
dc.identifier.doi	10.6342/NTU202301940	-
dc.rights.note	未授權	-
dc.date.accepted	2023-07-26	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	6.02 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。