Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88599
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李致毅zh_TW
dc.contributor.advisorJri Leeen
dc.contributor.author張凱茗zh_TW
dc.contributor.authorKai-Ming Changen
dc.date.accessioned2023-08-15T17:00:19Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-15-
dc.date.issued2023-
dc.date.submitted2023-07-24-
dc.identifier.citation[1] W.-K. Chang. A high-performance memory subsystem tailored to applications with heavy bandwidth requirement. Master’s thesis, National Taiwan University, Taiwan, Dec. 2021.
[2] G. Coley. Beaglebone black system reference manual.
[3] S.-N. Huang. Hardware implementation of high-efficiency memory controller applied to high bandwidth memory. Master’s thesis, National Taiwan University, Taiwan, Dec. 2021.
[4] JEDEC. High bandwidth memory (hbm) dram.
[5] TweakTown. Nvidia geforce rtx 3070 ti teased: 8gb gddr6x.
[6] Xilinx. Axi high bandwidth memory controller v1.0.
[7] Xilinx. Ultrascale architecture and product data sheet: Overview.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88599-
dc.description.abstract本論文的重點在於深入探討Ergo虛擬貨幣的工作證明演算法Autolykos。其中工作證明(Proof of Work)是區塊鏈中廣為使用的一種共識機制,人們可透過解決複雜的數學難題來獲得加密貨幣的獎勵,目前普遍使用GPU、FPGA、ASIC來尋找更高的算力卻又更省能源的方法。
傳統的PoW演算法,例如比特幣的SHA-256,是compute hard的算法,這意味著它主要受到處理器速度的限制。這種設計使得那些有能力購買和運營能源消耗大且昂貴的特殊硬體的礦工(例如ASIC)具有顯著的優勢。這導致了大規模礦工和礦場的出現,並引發了中心化和公平性的問題,因為個人和小規模礦工很難與這些大型勢力競爭。
相比之下,記憶體束縛的PoW演算法是設計來限制這種優勢的。內存受限演算法的工作需求大部分基於記憶體,因此,它鼓勵礦工使用具有大量可用記憶體的硬體,而這種硬體通常比高速處理器更便宜、更容易獲得。
此次研究之Autolykos算法為memory bound演算法。其特性為會對內存進行大量讀取與寫入,故整體計算速度會受限於記憶體頻寬與記憶體容量大小。而與乙太坊所使用的Ethash算法相比,Autolykos提高了計算資源的使用量,使ASIC和FPGA所需面積大為增加,以此提高兩者的成本。並且提高記憶體讀取在整體計算的比例,因此相較於Ethash又更強調memory hard的特性,如何有效運用記憶體頻寬成為提高算力最大的討論點。
本論文將在Xilinx的VU35P FPGA晶片上做硬體實現,論文中將詳細描述軟體架構、舊版硬體架構及其瓶頸,再提出一種全新的架構以大幅超越同記憶體頻寬的GPU算力,並可透過調整HBM reference clock至130MHz以上,提升記憶體頻寬上限,最終算力達到266.4MHz/s,為高階GPU算力的1.5倍 。
zh_TW
dc.description.abstractThis paper focuses on an in-depth exploration of the Proof of Work (PoW) algorithm, Autolykos, used in the Ergo cryptocurrency. PoW is a widely used consensus mechanism in blockchain, where individuals can earn rewards in cryptocurrency by solving complex mathematical problems. Currently, general-purpose computing devices like GPUs, specific hardware like FPGAs, and specialized ASICs are used to find more powerful yet energy-efficient mining methods.
Traditional PoW algorithms, such as Bitcoin's SHA-256, are compute-bound, meaning they are mainly limited by the speed of the processor. This design gives miners who can afford to purchase and operate power-intensive and expensive specialized hardware (such as ASICs) a significant advantage. This has led to the emergence of large-scale miners and mining farms, raising issues of centralization and fairness, as individual and small-scale miners find it hard to compete with these large entities.
In contrast, memory-bound PoW algorithms are designed to curb this advantage. The workload requirements of memory-bound algorithms are primarily based on memory, thus encouraging miners to use hardware with a large amount of available memory, which is generally cheaper and more accessible than high-speed processors.
The Autolykos algorithm investigated in this study is memory-bound. Its characteristic feature is that it involves a large number of memory read and write operations, so the overall calculation speed is limited by memory bandwidth and memory size. Compared to Ethereum's Ethash algorithm, Autolykos increases the use of computing resources, making the area required by ASICs and FPGAs significantly larger, thus raising their cost. Furthermore, it increases the proportion of memory read operations in the overall calculation, emphasizing the memory-hard feature even more than Ethash. How to effectively utilize memory bandwidth becomes the key discussion point for enhancing the hash rate.
This paper presents a hardware implementation on the Xilinx VU35P FPGA chip. It will detail the software architecture, the old hardware architecture, and its bottlenecks, and then propose a novel architecture to significantly surpass the hash rate of a GPU with the same memory bandwidth. By adjusting the HBM reference clock to above 130MHz, the upper limit of memory bandwidth can be improved, with the final computational power reaching 266.4MH/s, which is 1.5 times that of high-end GPUs.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T17:00:19Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-15T17:00:19Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents xi
List of Figures xv
List of Tables xvii
Chapter 1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2 Preliminaries 3
2.1 Introduction to Autolykos Algorithm . . . . . . . . . . . . . . . . . 3
2.1.1 The overall mining process . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Property of Autolykos algorithm . . . . . . . . . . . . . . . . . . . 4
2.1.3 High Bandwidth Memory Overview . . . . . . . . . . . . . . . . . 6
2.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Software architecture of Autolykos algorithm . . . . . . . . . . . . 7
2.2.2 Introduction to FPGA VU35P . . . . . . . . . . . . . . . . . . . . 9
2.2.3 The problem of FPGA Resource Utilization . . . . . . . . . . . . . 10
Chapter 3 Circuit improvement of Autolykos algorithm 13
3.1 Design methodologies on FPGA . . . . . . . . . . . . . . . . . . . . 13
3.2 Design consideration . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 LUT utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 HBM Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Analysis of Mining Steps . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Problem of the HBM Bandwidth . . . . . . . . . . . . . . . . . . . . 19
3.5 Building Block of the Engine . . . . . . . . . . . . . . . . . . . . . 20
3.5.1 Overview of the engine . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.2 Architecture of the engine . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Building Block of the hbm_mgr . . . . . . . . . . . . . . . . . . . . 30
3.6.1 Overview of the hbm_mgr . . . . . . . . . . . . . . . . . . . . . . 30
3.6.2 Architecture of the write_cmd_top . . . . . . . . . . . . . . . . . . 31
3.6.3 Architecture of the cmd_mgr_top . . . . . . . . . . . . . . . . . . . 32
3.6.4 Architecture of the fabric_top . . . . . . . . . . . . . . . . . . . . . 33
3.6.5 Architecture of the mix_mgr_top . . . . . . . . . . . . . . . . . . . 37
3.7 Building Block of system . . . . . . . . . . . . . . . . . . . . . . . 38
3.8 Improvement of Hash Rate . . . . . . . . . . . . . . . . . . . . . . 39
3.8.1 Address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.8.2 Memory refresh command . . . . . . . . . . . . . . . . . . . . . . 40
3.8.3 Theoretical maximum hash rate . . . . . . . . . . . . . . . . . . . . 42
Chapter 4 Measurement Results 45
4.1 Methodology of hash rate estimation . . . . . . . . . . . . . . . . . . 45
4.2 Configure design on FPGA . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.1 Address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.2 Memory refresh command . . . . . . . . . . . . . . . . . . . . . . 47
4.2.3 Clock frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Final result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Chapter 5 Conclusion 51
5.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
References 53
-
dc.language.isoen-
dc.subjectAutolykoszh_TW
dc.subject區塊鏈zh_TW
dc.subjectERGOzh_TW
dc.subjectFPGA (現場可程式化邏輯閘陣列)zh_TW
dc.subjectAutolykos硬體架構分析zh_TW
dc.subjectBlockchainen
dc.subjectERGOen
dc.subjectAnal ysis of hardware architecture for Autolykosen
dc.subjectAutolykosen
dc.subjectFPGA (Field Programmable Gate Array)en
dc.title基於FPGA上實現記憶體受限之Autolykos演算法zh_TW
dc.titleMemory-Bound Proof-of-Work Implementation of Autolykos Algorithm on FPGAen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee劉宗德 ;廖世偉zh_TW
dc.contributor.oralexamcommitteeTsung-Te Liu;Shih-wei Liaoen
dc.subject.keyword區塊鏈,ERGO,FPGA (現場可程式化邏輯閘陣列),Autolykos,Autolykos硬體架構分析,zh_TW
dc.subject.keywordBlockchain,ERGO,FPGA (Field Programmable Gate Array),Autolykos,Anal ysis of hardware architecture for Autolykos,en
dc.relation.page53-
dc.identifier.doi10.6342/NTU202301940-
dc.rights.note未授權-
dc.date.accepted2023-07-26-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電子工程學研究所-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
  未授權公開取用
6.02 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved