請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38781
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 郭斯彥 | |
dc.contributor.author | Chung-Ching Jiang | en |
dc.contributor.author | 江忠卿 | zh_TW |
dc.date.accessioned | 2021-06-13T16:45:57Z | - |
dc.date.available | 2006-07-05 | |
dc.date.copyright | 2005-07-05 | |
dc.date.issued | 2005 | |
dc.date.submitted | 2005-06-28 | |
dc.identifier.citation | [1] Message Passing Interface Forum. ”MPI: A Message -Passing Interface Standard,” 1994.
[2] Michael J. Quinn. “Parallel Programming in C with MPI and OpenMP.” McGraw Hill, 2004. [3] Xu, J and Netzer, R.H.D. “Adaptive independent checkpointing for reducing rollback propagation,” in Parallel and Distributed Processing, 1993. Proceedings of the Fifth IEEE Symposium on. [4] M. Chandy and L. Lamport. “Distributed snapshots: Determining global states of distributed systems,” ACM Transactions on Computing Systems, vol. 3(1), pp. 63-75, Aug. 1985. [5] L. Alvisi and K. Marzullo. “Message logging: Pessimistic, optimistic, and causal,” In Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS 1995), pp. 229-236, IEEE CS Press, May-June 1995. l [6] Anton Selikhov, George Bosilca, Cecile Germain, Gilles Fedak, and Franck Cappello. “MPICH-CM: A Communication Library Design for a P2P MPI Implementation,” In 9th European PVM/MPI Users' Group Meeting, Linz, Austria, 2002. [7] George Bosilca, Aurelien Bouteiller, Franck Cappelo, Samir Djilali, Gilles Fedak, Cecile Germain, and Thomas Herault. “MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes,” in: 'proceedings of ACM/IEEE International Conference on Supercomputing', 2002. [8]Y.M Wang and K. Fuchs. “Optimistic message logging for independent checkpointing in message passing systems,” In Proceedings of the IEEE Symposium on Reliable Distributed Systems, pp. 147-154. Oct. 1992. [9] Jim Basney and Miron Livny, 'Deploying a High Throughput Computing Cluster', High Performance Cluster Computing, Rajkumar Buyya, Editor, Vol. 1, Chapter 5, Prentice Hall PTR, May 1999. [10] Aur´elien Bouteiller, Franck Cappello, Thomas H´erault, G´eraud Krawezik, Pierre Lemarinier, and Fr´ed´eric Magniette. “MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging.” In to appear in High Performance Networking and Computing (SC2003). Phoenix USA, IEEE/ACM, November 2003. [11] Aur´elien Bouteiller, Pierre Lemarinier, G´eraud Krawezik, Franck Cappello. ”Coordinated checkpoint versus message log for fault tolerant MPI,” In Proceedings of the 2003 IEEE International Conference on Cluster Computing, pages 242-250, 2003. [12] L. Alvisi, B. Hoppe, and K. Marzullo. “Nonblocking and orphan-free message logging protocols,” In Proceedings of the Twenty Third International Symposium on Fault-Tolerant Computing (FTCS-23), pp. 145-154, Jun. 1993. [13] E. Strom and S. Yemini. “Optimistic recovery in distriuted systems,” in ACM Transactions on Computer Systems, volume 3(3), page 2004-226. ACM, Aug 1985. [14] David Bailey, Tim Harris, William Saphir, Rob Van Der Wijngaart, Alex Woo, and Maurice Yarrow. “The NAS Parallel Benchmarks 2.0.” Report NAS-95-020, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, 1995. [15] MPI home page http://www-unix.mcs.anl.gov/mpi/ [16] MPICH-V home page http://www.lri.fr/~bouteill/MPICH-V/ [17] NAS Parallel Benchmark home page http://www.nas.nasa.gov/Software/NPB/ [18] SETI@home http://setiathome.ssl.berkeley.edu/ | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38781 | - |
dc.description.abstract | 近年來平行式計算已成為提高電腦計算效能的主要方式之一。高效能的平行式電腦可被應用在商業、國防、科學等不同的領域。在科學上,高效能的計算提供數值模擬一個很大的助力。而數值模擬則是促進當代科學進步的一個重要方法。
許多人已開始研究與發展用來實行平行式計算的分散式系統。要設計一個分散式系統是複雜而困難的。在許多值得詳細規畫與設計的特性當中,容錯是一個重要的目標。分散式系統內的每台電腦都有可能產生錯誤。容錯的能力即在於處理系統內發生的錯誤。如何讓系統在執行時不受錯誤影響,是容錯技巧上值得研究的課題。 容錯的方式基本上分為檢查點與訊息紀錄兩種方式,這兩種方式也各自發展出不同形式的演算法。但至目前為止,並沒有一種演算法是公認有最佳效率的。在不同的環境或不同的狀況下我們要選擇不同的演算法以獲得最佳效率。 本論文的目標在於分析現今以MPI架構的分散式系統上,使用不同容錯方式的差異。實作出以MPI環境為主的非阻斷式訊息紀錄容錯中介軟體,測量其效能並分享實作經驗。 | zh_TW |
dc.description.abstract | In recent years, parallel computing is one of the main ways to increase computer performance. High performance parallel computers apply to the fields of commerce, defense, and science, where high performance computing benefits numerical simulations, a major way to accelerate improvement of the current science.
Many people begin to research and develop distributed systems which perform parallel computing. To design a distributed system is complicated and difficult. Fault tolerance is an important indicator in many characteristics worthy to be particularly designed. Although every computer in a distributed system may fail, fault tolerance has the capability to deal with the failures in the system. Thus, how to make a system free from failures when in executing is an important study in fault tolerance. The methods of rollback recovery are divided into checkpoint and message log. These two methods have different algorithms. Until now, no algorithm is admittedly the most efficient. Thus, we have to choose a different algorithm in different environments or circumstances to get the best efficiency. This goal of this paper is to discuss the differences in fault tolerance methods in MPI-based distributed system. We implement a MPI-based fault tolerant middleware with non-blocking message logging protocol, measure its performances, and share practical experience with others. | en |
dc.description.provenance | Made available in DSpace on 2021-06-13T16:45:57Z (GMT). No. of bitstreams: 1 ntu-94-R92921082-1.pdf: 435744 bytes, checksum: a85d506a0501dd2aa6299384285d53f6 (MD5) Previous issue date: 2005 | en |
dc.description.tableofcontents | List of Figures 2
Abstract 3 Chapter 1 Introduction 4 1.1 Fault Tolerance in Parallel Computing 4 1.2 Family-Based Logging 5 Chapter 2 Background 7 2.1 parallel computing 7 2.2 Parallel Architectures 9 2.3 MPI (Message Passing Interface) 16 2.4 MPICH 18 2.5 Fault Tolerance 19 2.6 checkpoint 21 2.7 message log 25 Chapter 3 Related Works 27 3.1 MPICH-CM 27 3.2 MPICH-V 30 3.3 MPICH-V2 36 Chapter 4 Non-Blocking Message Logging Protocols 42 4.1 Message Logging Model and Design 42 4.2 Abstract Message Logging Protocol 44 4.3 Family-Based Logging 45 Chapter 5 Implementation 51 5.1 The Architecture of MPICH-FBL 51 5.2 Benchmark 53 5.3 Performance 54 Chapter 6 Conclusion 57 Reference 58 | |
dc.language.iso | en | |
dc.title | 以非阻礙式訊息紀錄協定實作MPI-Based容錯中介軟體 | zh_TW |
dc.title | Implementation of MPI-Based Fault Tolerant Middleware with Non-Blocking Message Logging Protocol | en |
dc.type | Thesis | |
dc.date.schoolyear | 93-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 蔡一鳴,顏嗣鈞,陳英一,雷欽隆 | |
dc.subject.keyword | 平行式計算,容錯,檢查點,訊息紀錄, | zh_TW |
dc.subject.keyword | parallel computing,fault tolerance,MPI,checkpoint,message log, | en |
dc.relation.page | 59 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2005-06-29 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-94-1.pdf 目前未授權公開取用 | 425.53 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。