請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38781
標題: | 以非阻礙式訊息紀錄協定實作MPI-Based容錯中介軟體 Implementation of MPI-Based Fault Tolerant Middleware with Non-Blocking Message Logging Protocol |
作者: | Chung-Ching Jiang 江忠卿 |
指導教授: | 郭斯彥 |
關鍵字: | 平行式計算,容錯,檢查點,訊息紀錄, parallel computing,fault tolerance,MPI,checkpoint,message log, |
出版年 : | 2005 |
學位: | 碩士 |
摘要: | 近年來平行式計算已成為提高電腦計算效能的主要方式之一。高效能的平行式電腦可被應用在商業、國防、科學等不同的領域。在科學上,高效能的計算提供數值模擬一個很大的助力。而數值模擬則是促進當代科學進步的一個重要方法。
許多人已開始研究與發展用來實行平行式計算的分散式系統。要設計一個分散式系統是複雜而困難的。在許多值得詳細規畫與設計的特性當中,容錯是一個重要的目標。分散式系統內的每台電腦都有可能產生錯誤。容錯的能力即在於處理系統內發生的錯誤。如何讓系統在執行時不受錯誤影響,是容錯技巧上值得研究的課題。 容錯的方式基本上分為檢查點與訊息紀錄兩種方式,這兩種方式也各自發展出不同形式的演算法。但至目前為止,並沒有一種演算法是公認有最佳效率的。在不同的環境或不同的狀況下我們要選擇不同的演算法以獲得最佳效率。 本論文的目標在於分析現今以MPI架構的分散式系統上,使用不同容錯方式的差異。實作出以MPI環境為主的非阻斷式訊息紀錄容錯中介軟體,測量其效能並分享實作經驗。 In recent years, parallel computing is one of the main ways to increase computer performance. High performance parallel computers apply to the fields of commerce, defense, and science, where high performance computing benefits numerical simulations, a major way to accelerate improvement of the current science. Many people begin to research and develop distributed systems which perform parallel computing. To design a distributed system is complicated and difficult. Fault tolerance is an important indicator in many characteristics worthy to be particularly designed. Although every computer in a distributed system may fail, fault tolerance has the capability to deal with the failures in the system. Thus, how to make a system free from failures when in executing is an important study in fault tolerance. The methods of rollback recovery are divided into checkpoint and message log. These two methods have different algorithms. Until now, no algorithm is admittedly the most efficient. Thus, we have to choose a different algorithm in different environments or circumstances to get the best efficiency. This goal of this paper is to discuss the differences in fault tolerance methods in MPI-based distributed system. We implement a MPI-based fault tolerant middleware with non-blocking message logging protocol, measure its performances, and share practical experience with others. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38781 |
全文授權: | 有償授權 |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-94-1.pdf 目前未授權公開取用 | 425.53 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。