Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101269
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor洪士灝zh_TW
dc.contributor.advisorShih-Hao Hungen
dc.contributor.author王傳啓zh_TW
dc.contributor.authorCHUAN CHI WANGen
dc.date.accessioned2026-01-13T16:09:33Z-
dc.date.available2026-01-14-
dc.date.copyright2026-01-13-
dc.date.issued2026-
dc.date.submitted2026-01-05-
dc.identifier.citation[1] cublas: Basic linear algebra on nvidia gpus, 2024.
[2] Nvidia gb200 nvl72, 2024.
[3] Qibojit benchmarks: Benchmarking quantum simulation, 2024.
[4] E. Bernstein and U. Vazirani. Quantum complexity theory. SIAM Journal on Computing, 26(5):1411–1473, 1997.
[5] J. Chow, O. Dial, and J. Gambetta. IBM Quantum breaks the 100‑qubit processor barrier. https://research.ibm.com/blog/127-qubit-quantum-processor-eagle, 2021.
[6] D. Coppersmith. An approximate fourier transform useful in quantum factoring, 2002.
[7] N. Corporation. Nsight compute documentation, 2024.
[8] A. W. Cross, L. S. Bishop, S. Sheldon, P. D. Nation, and J. M. Gambetta. Validating quantum computers using randomized model circuits. Physical Review A, 100(3), Sept. 2019.
[9] A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta. Open quantum assembly language, 2017.
[10] T. cuQuantum development team. cuquantum, Apr. 2023.
[11] C. development team. Cirq is a python library for writing, manipulating, and optimizing quantum circuits and running them against quantum computers and simulators., 2022.
[12] J. Doi and H. Horii. Cache blocking technique to large scale quantum computing simulation on supercomputers. In 2020 IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, Oct. 2020.
[13] E. Farhi, J. Goldstone, and S. Gutmann. A quantum approximate optimization algorithm, 2014.
[14] E. Farhi, J. Goldstone, and S. Gutmann. A quantum approximate optimization algorithm, 2014.
[15] M. P. Forum. Mpi: A message-passing interface standard. Technical report, USA, 1994.
[16] V. Gheorghiu. Quantum++: A modern c++ quantum computing library. PLOS ONE, 13(12):e0208073, dec 2018.
[17] Hiroshi Horii and Jun Doi. Optimization of quantum computing simulation with gate fusion, 2021.
[18] C.-H. Hsu, C.-C. Wang, N.-W. Hsu, C.-H. Tu, and S.-H. Hung. Towards scalable quantum circuit simulation via rdma. In Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems, RACS ’23, New York, NY, USA, 2023. Association for Computing Machinery.
[19] A.-P. Hynninen and D. I. Lyakh. cutt: A high-performance tensor transpose library for cuda compatible gpus, 2017.
[20] S. Imamura, M. Yamazaki, T. Honda, A. Kasagi, A. Tabuchi, H. Nakao, N. Fukumoto, and K. Nakashima. mpiqulacs: A distributed quantum computer simulator for a64fx-based cluster systems, 2022.
[21] A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lishman, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Cross, B. R. Johnson, and J. M. Gambetta. Quantum computing with Qiskit, 2024.
[22] C. Jiao, W. Zhang, and L. Shen. Communication optimizations for state-vector quantum simulator on cpu+gpu clusters. In Proceedings of the 52nd International Conference on Parallel Processing, ICPP ’23, page 203–212, New York, NY, USA, 2023. Association for Computing Machinery.
[23] T. Jones, A. Brown, I. Bush, and S. Benjamin. Quest and high performance simulation of quantum computers. Scientific Reports, 9, 07 2019.
[24] V. Kelefouras, A. Kritikakou, I. Mporas, and V. Kolonias. A high-performance matrix–matrix multiplication methodology for cpu and gpu architectures. The Journal of supercomputing, 72(3):804–844, 2016.
[25] Y.-C. Lin, C.-C. Wang, C.-H. Tu, and S.-H. Hung. Towards optimizations of quantum circuit simulation for solving max-cut problems with qaoa. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, SAC '24, page 1487–1494. ACM, Apr. 2024.
[26] I. L. Markov, A. Fatima, S. V. Isakov, and S. Boixo. Quantum supremacy is both closer and farther than it appears, 2018.
[27] P. Minet, E. Renault, I. Khoufi, and S. Boumerdassi. Analyzing traces from a google data center. In 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), pages 1167–1172. IEEE, 2018.
[28] H. Nai-Wei, C.-C. Wang, C.-H. Hsu, C.-H. Tu, and H. Shih-Hao. Toward cost-effective quantum circuit simulation with performance tuning techniques. Connection Science, 36(1):2349541, 2024.
[29] NVIDIA Corporation. Nvidia nccl, 2024.
[30] D. Park, H. Kim, J. Kim, T. Kim, and J. Lee. Snuqs: scaling quantum circuit simulation using storage devices. In Proceedings of the 36th ACM International Conference on Supercomputing, pages 1–13, 2022.
[31] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. AspuruGuzik, and J. L. O'Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Communications, 5(1), July 2014.
[32] Qiskit contributors. Qiskit: An open-source framework for quantum computing, 2023.
[33] M. Smelyanskiy, N. P. D. Sawaya, and A. Aspuru-Guzik. qhipster: The quantum high performance software testing environment, 2016.
[34] Y. Suzuki, Y. Kawase, Y. Masumura, Y. Hiraga, M. Nakadai, J. Chen, K. M. Nakanishi, K. Mitarai, R. Imai, S. Tamiya, T. Yamamoto, T. Yan, T. Kawakubo, Y. O. Nakagawa, Y. Ibe, Y. Zhang, H. Yamashita, H. Yoshimura, A. Hayashi, and K. Fujii. Qulacs: a fast and versatile quantum circuit simulator for research purpose. Quantum, 5:559, Oct. 2021.
[35] Q. A. team and collaborators. qsim, Sept. 2020.
[36] Y.-H. Tsai, J.-H. R. Jiang, and C.-S. Jhang. Bit-slicing the hilbert space: Scaling up accurate quantum circuit simulation to a new level, 2020.
[37] W. van Dam, S. Hallgren, and L. Ip. Quantum algorithms for some hidden shift problems, 2002.
[38] C.-C. Wang, Y.-C. Lin, Y.-J. Wang, C.-H. Tu, and S.-H. Hung. Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputing, 2024.
[39] M. Xu, S. Cao, X. Miao, U. A. Acar, and Z. Jia. Atlas: Hierarchical partitioning for quantum circuit simulation on gpus (extended version), 2024.
[40] C. Zhang, Z. Song, H. Wang, K. Rong, and J. Zhai. Hyquas: hybrid partitioner based quantum circuit simulation system on gpu. In Proceedings of the 35th ACM International Conference on Supercomputing, ICS ’21, page 443–454, New York, NY, USA, 2021. Association for Computing Machinery.
[41] C. Zhang, H. Wang, Z. Ma, L. Xie, Z. Song, and J. Zhai. Uniq: A unified programming model for efficient quantum circuit simulation. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 692–707. IEEE Computer Society, 2022
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101269-
dc.description.abstract量子電路模擬在以經典計算資源開發與評估新型量子演算法的研究中,扮演著不容替代的關鍵。雖然全態量子電路模擬(full-state quantum circuit simulation)已廣泛應用於原型設計與除錯,但其模擬時間會根據量子位元總數急遽地上升,對於大型的分散式系統來說,構成重大挑戰。本論文擬定了一個具可擴充性的模擬框架,藉由充分利用平行硬體的架構特性,並針對張量積運算進行高效優化,以提升全態量子電路模擬之效率。為了達到可擴充性的最佳化,本論文也擬定了儲存裝置與遠端記憶體直接存取這兩種模擬技術,以最小成本構築大規模模擬的基礎。高效能方面,框架由兩個核心模組組成:集成優化模組 (Swarm Optimization) 與模擬流模組 (Flow Simulation)。此框架由兩個核心模組組成:集成優化模組與模擬流模組。集成優化模組包含多種電路層級的優化技術,包括快取運行、閘合併、無限制對角線閘合併與張量積加速。模擬流模組則根據電路特性動態切換至最適合的模擬方式,並完全由研究團隊自主開發,不仰賴其他的商業用函式庫,以確保效能的純粹性。不僅如此,本論文還繼續在特定的 Quantum Approximate Optimization Algorithm (QAOA) 電路做進一步地對角化矩陣的分析與優化。在實驗評估中,本研究分別於配備八張 NVIDIA A100 與 H100 GPU 的 DGX-A100 與 DGX-H100 工作站上,進行閘級與電路級的效能基準測試。與 QuEST、Aer Simulator(基於 CUDA Thrust 與 cuQuantum 後端)、以及 HyQuas 等業界先進模擬器相比,本研究在閘級測試中達到最高 45.3 倍的加速效能,在電路級測試中則實現 12.7 倍的加速。此外,亦透過消融實驗系統性地評估各項優化功能對整體效能的實際貢獻。此模擬器體現三項核心傳統:高效能模擬器、三層式優化機制,以及精細的效能調校,三者交織呈現極致模擬速度。總而言之,本論文在加速全態量子電路模擬領域再次突破,並預期能夠有效促進未來創新量子演算法的開發延續。zh_TW
dc.description.abstractQuantum circuit simulation is essential for advancing and assessing new algorithms through the use of traditional computing resources. Although full-state quantum circuit simulation has been widely adopted for prototyping and debugging, the execution time grows exponentially as the qubit count rises, posing significant challenges for large-scale systems. This study proposes a scalable simulation framework that fully leverages parallel hardware architectures and optimizes tensor product operations to enhance simulation efficiency. Two simulation paradigms, storage-based and RDMA-based methods, are introduced to achieve scalability with minimal cost. The framework consists of two major components: the Swarm Optimization Module, which performs circuit-level optimizations, and the Flow Simulation Module, which dynamically selects the most suitable simulation strategy. The optimization module integrates multiple techniques, including cache execution, gate fusion, unrestricted diagonal gate fusion, and tensor product acceleration. The Flow Simulation Module is entirely developed in-house without reliance on third-party libraries, ensuring pure performance evaluation. In the experimental evaluation, benchmarks were conducted on both DGX-A100 and DGX-H100 workstations, each equipped with eight NVIDIA A100 and H100 GPUs, respectively. Compared with industry-leading simulators, including QuEST, Aer Simulator (with CUDA Thrust and cuQuantum backends), and HyQuas, the proposed framework achieves up to 45.3× speedup in gate-level benchmarks and 12.7× speedup in circuit-level benchmarks. Furthermore, ablation experiments were performed to systematically assess the practical contributions of each optimization technique. Moreover, the Quantum Approximate Optimization Algorithm (QAOA) case study reveals the performance improvements brought by diagonal matrix optimizations. Collectively, these results demonstrate that the proposed framework represents an important advancement in accelerating full-state quantum circuit simulation, paving the way for more efficient exploration of innovative quantum algorithms.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-01-13T16:09:33Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2026-01-13T16:09:33Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xvii
Chapter 1 Introduction 1
Chapter 2 Background 7
2.1 Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Quantum Bit and Quantum State . . . . . . . . . . . . . . . . . . . 8
2.1.2 Quantum Gate Operation . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Quantum Measurement . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Quantum Circuit Simulation . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Foundation of Quantum Circuit Simulation . . . . . . . . . . . . . 12
2.2.2 Simulation of Single-qubit Gates . . . . . . . . . . . . . . . . . . . 13
2.2.3 Simulation of Multi-qubit Gates . . . . . . . . . . . . . . . . . . . 16
2.2.4 Swapping Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.5 Swap Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6 K-on-K Swap Gate . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.7 Intra-Rank Swapping (IRS) . . . . . . . . . . . . . . . . . . . . . . 22
2.2.8 Cross-Rank Swapping (XRS) . . . . . . . . . . . . . . . . . . . . . 24
2.2.9 Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.10 Multithreaded Simulation . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.10.1 Multithreaded Implementation Technique . . . . . . . . 29
2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 Parallel Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.2 Quantum Circuit Optimization . . . . . . . . . . . . . . . . . . . . 40
2.3.3 Scalable Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 3 Methodology 43
3.1 Preliminary Configuration . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.1 SSD-Enabled Simulation . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 RDMA-Enabled Simulation . . . . . . . . . . . . . . . . . . . . . . 52
3.3 High Performance Quantum Circuit Simulation . . . . . . . . . . . . 56
3.3.1 Quantum Circuit Simulation Scheme . . . . . . . . . . . . . . . . . 56
3.3.1.1 Gate-by-Gate Simulation Scheme . . . . . . . . . . . . 57
3.3.1.2 Block-by-Block Simulation Scheme . . . . . . . . . . 58
3.3.1.3 Hybrid Simulation Scheme . . . . . . . . . . . . . . . 59
3.3.1.4 All-in-one Simulation Scheme . . . . . . . . . . . . . 60
3.3.1.5 Flow Simulation Scheme . . . . . . . . . . . . . . . . 61
3.3.1.6 Simulation Workflow . . . . . . . . . . . . . . . . . . 63
3.3.1.7 Simulation Guidelines . . . . . . . . . . . . . . . . . . 67
3.3.2 Quantum Circuit Optimization . . . . . . . . . . . . . . . . . . . . 68
3.3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.2.2 Gate Block Finding Algorithm (GBFA) . . . . . . . . . 71
3.3.2.3 Merge Boosting Optimization . . . . . . . . . . . . . . 77
3.3.2.4 Diagonal Detecting Optimization . . . . . . . . . . . . 84
3.3.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 90
3.4 QAOA-Specialized Simulation Framework . . . . . . . . . . . . . . 93
Chapter 4 Evaluation 99
4.1 Experimental Configurations . . . . . . . . . . . . . . . . . . . . . . 100
4.1.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.1.2 Experimental Setup for Scalable Simulation . . . . . . . . . . . . . 101
4.1.3 Experimental Setup for High-Performance Simulation . . . . . . . . 103
4.1.4 Experimental Setup for QAOA-Specialized Simulation . . . . . . . 106
4.2 Experimental Evaluation of Scalable Simulation . . . . . . . . . . . 106
4.2.1 Performance Results of SSD-enabled Simulation . . . . . . . . . . 106
4.2.2 Performance Results of RDMA-enabled Simulation . . . . . . . . . 109
4.3 Experimental Evaluation of High-Performance Simulation . . . . . . 110
4.3.1 Performance Results on DGX-A100 Workstation . . . . . . . . . . 111
4.3.1.1 Overall Performance Speedup . . . . . . . . . . . . . . 111
4.3.1.2 Qubit Benchmark . . . . . . . . . . . . . . . . . . . . 113
4.3.1.3 Strong Scaling . . . . . . . . . . . . . . . . . . . . . . 114
4.3.1.4 Quantum Gate Benchmark . . . . . . . . . . . . . . . 116
4.3.1.5 Quantum Circuit Benchmark . . . . . . . . . . . . . . 117
4.3.2 Performance Results on DGX-H100 Workstation . . . . . . . . . . 117
4.3.2.1 Overall Performance Speedup . . . . . . . . . . . . . . 118
4.3.2.2 Qubit Benchmark . . . . . . . . . . . . . . . . . . . . 120
4.3.2.3 Strong Scaling . . . . . . . . . . . . . . . . . . . . . . 121
4.3.2.4 Quantum Gate Benchmark . . . . . . . . . . . . . . . 122
4.3.2.5 Quantum Circuit Benchmark . . . . . . . . . . . . . . 123
4.3.2.6 Diagonal Gate Benchmark . . . . . . . . . . . . . . . 124
4.3.2.7 Circuit Benchmark of Diagonal Gate Fusion . . . . . . 126
4.3.2.8 Ablation Experiment . . . . . . . . . . . . . . . . . . 129
4.3.3 Comparison of NVIDIA A100 and H100 . . . . . . . . . . . . . . . 131
4.4 Performance Results of QAOA-Specialized Simulation . . . . . . . . 132
4.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.5.1 GBFA Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.5.2 Breakdown Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5.3 Roofline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.5.4 Caching Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Chapter 5 Conclusion 143
References 145
Appendix A — OpenQASM Codes 151
A.1 An example of raw and optimized quantum circuits . . . . . . . . . . 151
A.2 An example quantum circuit illustrating the optimization process . . 152
-
dc.language.isoen-
dc.subject量子運算-
dc.subject量子電路優化-
dc.subject量子電路模擬-
dc.subject平行化程式-
dc.subject高效能運算-
dc.subjectquantum computing-
dc.subjectquantum circuit optimization-
dc.subjectquantum circuit simulation-
dc.subjectparallel programming-
dc.subjecthigh-performance computation-
dc.title一套適用於超級運算之快速、可擴展且完善的量子電路模擬器zh_TW
dc.titleA quick, scalable, and comprehensive quantum circuit simulation for supercomputingen
dc.typeThesis-
dc.date.schoolyear114-1-
dc.description.degree博士-
dc.contributor.oralexamcommittee江介宏;黃鐘揚;李建模;邱大維;施吉昇;涂嘉恆zh_TW
dc.contributor.oralexamcommitteeJie-Hong Jiang;Chung-Yang Huang;Chien-Mo Li;Dah-Wei Chiou;Chi-Sheng Shih;Chia-Heng Tuen
dc.subject.keyword量子運算,量子電路優化量子電路模擬平行化程式高效能運算zh_TW
dc.subject.keywordquantum computing,quantum circuit optimizationquantum circuit simulationparallel programminghigh-performance computationen
dc.relation.page152-
dc.identifier.doi10.6342/NTU202504792-
dc.rights.note未授權-
dc.date.accepted2026-01-05-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-114-1.pdf
  未授權公開取用
12.6 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved