請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2701
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 洪士灝 | |
dc.contributor.author | Tzu-Hsien Kao | en |
dc.contributor.author | 高至賢 | zh_TW |
dc.date.accessioned | 2021-05-13T06:48:42Z | - |
dc.date.available | 2017-08-25 | |
dc.date.available | 2021-05-13T06:48:42Z | - |
dc.date.copyright | 2017-08-25 | |
dc.date.issued | 2017 | |
dc.date.submitted | 2017-08-22 | |
dc.identifier.citation | [1] TensorFlow Models. https://github.com/tensorflow/models.
[2] B. Abali, R. J. Eickemeyer, H. Franke, C. Li, and M. Taubenblatt. Disaggregated and optically interconnected memory: when will it be cost effective? CoRR, abs/1503.01416, 2015. [3] A. Badam. Bridging the Memory-Storage Gap. PhD thesis, October 2012. [4] A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift. Efficient virtual memory for big memory servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pages 237–248, New York, NY, USA, 2013. ACM. [5] David A Rusling. Linux memory management. http://www.tldp.org/LDP/tlk/mm/memory.html. [6] Dell’Oro Group. Ethernet Switch —Data Center Five Year Forecast Report. http://www.delloro.com/products-and-services/ethernet-switchdata-center. [7] Gilad Shainer. 100 Gbps Headed For The Data Center. http://www.networkcomputing.com/data-centers/100-gbps-headed-datacenter/407619707. [8] M. GOTO, M. SATO, K. NAKASHIMA, and K. KUMON. Implementing remote swap memory using rdma over 10gb ethernet. IEICE technical report. Computer systems, 106(287):7–12, oct 2006. [9] J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 649–667, Boston, MA, 2017. USENIX Association. [10] InfiniBand Trade Association. The InfiniBand Architecture. http://www.infinibandta.org/specs. [11] E. Kissel, M. Swany, B. Tierney, and E. Pouyoul. Efficient wide area data transfer protocols for 100 gbps networks and beyond. In Proceedings of the Third International Workshop on Network-Aware Data Management, NDM ’13, pages 3:1–3:10, New York, NY, USA, 2013. ACM. [12] S. Liang, R. Noronha, and D. K. Panda. Swapping to remote memory over infiniband: An approach using a high performance network block device. In 2005 IEEE International Conference on Cluster Computing, pages 1–10, Sept 2005. [13] Linux SCSI Target. LinuxIO. http://linux-iscsi.org. [14] K. Liu, X. Zhang, K. Davis, and S. Jiang. Synergistic coupling of ssd and hard disk for qos-aware virtual memory. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 24–33, April 2013. [15] Max Woolf. Benchmarking TensorFlow on Cloud CPUs: Cheaper Deep Learning than Cloud GPUs. http://minimaxir.com/2017/07/cpu-or-gpu/. [16] H. S. S. Nichole Boscia. Comparison of 40g rdma and traditional ethernet technologies. NAS Technical Report: NAS, 01 2014. [17] RDMA Consortium. An RDMA Protocol Specification (Version 1.0). http://www.rdmaconsortium.org/home/draft-recio-iwarp-rdmap-v1.0.pdf. [18] RDMA Consortium. iSCSI Extensions for RDMA Specification (Version 1.0). http://www.rdmaconsortium.org/home/draft-ko-iwarp-iser-v1.PDF. [19] Red Hat, Inc. Overcommitting with KVM. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/chap-VirtualizationTips_and_tricks-Overcommitting_with_KVM.html. [20] RoCE Initiative. RoCE Introduction. http://www.roceinitiative.org/roceintroduction/. [21] A. Samih, R. Wang, C. Maciocco, T. Y. C. Tai, R. Duan, J. Duan, and Y. Solihin. Evaluating dynamics and bottlenecks of memory collaboration in cluster systems. In 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pages 107–114, May 2012. [22] M. Saxena and M. M. Swift. Flashvm: Virtual memory management on flash. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pages 14–14, Berkeley, CA, USA, 2010. USENIX Association. [23] M. C. Schatz, C. Trapnell, A. L. Delcher, and A. Varshney. High-throughput sequence alignment using graphics processing units. BMC Bioinformatics, 8:474 – 474, 2007/// 2007. [24] Stanford Vision Lab. ImageNet. http://image-net.org/. [25] K. G. Tia Newhall, Sean Finney and M. Spiegel. Nswap: A network swapping module for linux clusters. In Proceedings of Euro-Par’03 International Conference on Parallel and Distributed Computing (Klagenfurt, Austria, August 2003), volume 2790 of Lecture Notes in Computer Science. Springer, 2003. [26] Twitter. twemperf (mcperf). https://github.com/twitter/twemperf. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2701 | - |
dc.description.abstract | 近年來因為大數據分析的需求,資料中心與叢集內部高速網路效能朝向高頻寬、低延遲的方向急遽成長。比起過去,利用高速網路技術共享伺服器間的資源的效率變得前所未有地高。在這篇論文中,我們研究透過高速內部網路伺服器間遠端共享記憶體的技術,透過虛擬記憶體系統將遠端伺服器的記憶體作為交換空間使用。我們提出以的高度可移植的交換記憶體架構,是使用產業標準的開放原始碼系統程式和硬體架構所完成。故只需要進行軟硬體的配置與安裝,不需要對作業系統加上特殊的修改,我們相信這樣的架構可以廣泛應用在各領域中。
我們用簡單的測試程式與實際的應用作為此遠端記憶體機制的驗證,包含記憶體內快取系統(memcached)、訓練機械學習模型(Tensorflow)與基因定序(MUMmer)於50Gbps的高速乙太網路環境。實驗結果顯示我們的機制可以提供高效率的遠端虛擬記憶體存取:第一,跟使用傳統硬碟的記憶體交換機制比起來,我們不需要為了避免虛擬記憶體的猛移現象(thrashing)而付出高昂的記憶體採購成本。舉例來說,使用我們的方式,TensorFlow訓練深度學習模型時,效能可以僅低於使用足夠放進所有資料的實體記憶體時1.24倍、並發揮比傳統硬碟快16倍的效能。最後,因為使用RDMA over Converged Ethernet(RoCE)網路,我們的機制只造成雙方伺服器間極低的間接成本。 | zh_TW |
dc.description.abstract | In recent years, the performance of interconnection networks in the datacenter have been vastly improved with higher bandwidth and lower latency, driven by the demand of big data analytics. With the high-speed network technologies, sharing of resources among different servers becomes more efficient than ever. In this thesis, we study remote swap memory technologies which allows one server to utilize the memory on a remote server as the swap memory for the virtual memory system via a high-speed interconnection network. We propose a portable remote memory swap mechanism with reliable open-source system software and industrial standard hardware components. The construction of the mechanism is done by configuring the software and hardware beyond the operating system without level vendor-specific modifications, so we believe the methodology is generic and is useful to a wide range of applications.
To evaluate the performance of our proposed mechanism, we carry out microbenchmarks as well as realistic applications, including in-memory cache (memcached), machine learning model training (Tensorflow), and genome sequence alignment (MUMmer), on a setup with two servers connected via 50Gbps Ethernet. The experimental results show the efficiency of the our mechanism. First, the remote memory swap mechanism is faster than traditional memory swap mechanism with hard disks and saves the cost of adding physical memory to avoid thrashing of virtual memory. For example, using our remote swap mechanism, TensorFlow training deep learning models was accelerated by 16 times, compared to swapping using local disk. It ran only 1.24 times slower than running on a server with larger physical memory to hold the entire data set. Finally, due to the use of RDMA over Converged Ethernet (RoCE), our remote memory swap mechanism caused little overhead to both servers. | en |
dc.description.provenance | Made available in DSpace on 2021-05-13T06:48:42Z (GMT). No. of bitstreams: 1 ntu-106-R01944033-1.pdf: 798757 bytes, checksum: 21fc1a1b88a9a443ae61b362bfdad635 (MD5) Previous issue date: 2017 | en |
dc.description.tableofcontents | 㽫謝 i
ḵ要 ii Abstract iii Chapter 1 Introduction 1 Chapter 2 Background 4 2.1 High-Speed Network and RDMA 4 2.2 Linux Memory Swap Mechanism 5 2.3 Linux SCSI Target 6 Chapter 3 Related Work 7 Chapter 4 Remote Memory Swap Framework 9 4.1 Designing the Remote Memory Swap Framework 9 4.2 Performance Issues 10 Chapter 5 Performance Experiments and Evaluation 12 5.1 Experiment Setup 13 5.2 Micro-benchmark Performance Results 13 5.3 Application Performance Results 14 5.3.1 Memcached 14 5.3.2 TensorFlow 14 5.3.3 MUMmer 17 Chapter 6 Conclusion and Future Work 19 Bibliography 21 | |
dc.language.iso | en | |
dc.title | 使用高效能遠端虛擬記憶體聚合未使用之記憶體空間 | zh_TW |
dc.title | Aggregating Unused Memory with Efficient Remote Swapping | en |
dc.type | Thesis | |
dc.date.schoolyear | 105-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 廖世偉,涂嘉恆 | |
dc.subject.keyword | 交換空間,遠端交換空間,RDMA,RoCE, | zh_TW |
dc.subject.keyword | Swap,Remote Swap,RDMA,RoCE, | en |
dc.relation.page | 23 | |
dc.identifier.doi | 10.6342/NTU201704017 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2017-08-22 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊網路與多媒體研究所 | zh_TW |
顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-106-1.pdf | 780.04 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。