Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53668
Title: | ARM1136的時序錯誤偵測與更正機制之設計與實作 Design and Implementation of a Timing-Error Detection and Correction Mechanism on ARM1136 |
Authors: | Cheng-Hao Luo 羅晟豪 |
Advisor: | 陳少傑(Sao-Jie Chen) |
Keyword: | 設計邊界消除,錯誤偵測,錯誤回復, Design Margins Elimination,Error Detection,Error Correction, |
Publication Year : | 2015 |
Degree: | 碩士 |
Abstract: | 傳統的數位設計需要保證電路能夠於最壞的情況下(worst case)仍然能夠不發生錯誤。可能造成錯誤發生的原因來自於各種設計不確定性(design uncertainty),如製程、電壓、溫度的變異。隨著製程的進步以及對於效能的追求,設計不確定性對於晶片設計的影響逐漸提升。在使用先進製程時,設計者往往需要留下大量的設計邊界(design margin)來提高整體設計的良率,造成設計成本的浪費。因此,各種設計邊界消除的技術相繼被提出以提升效能並減少浪費,錯誤容忍技術為其中一個直接且極為有效的方式。藉由加入偵測與回復錯誤的機制到設計中,保留的邊界值能夠完全被消除,得到較最壞情況更佳(better-than-worst-case)的設計,其具有對抗設計不確定性的能力,更增加整體設計的可靠度(reliability)。
本論文在ARM處理器上設計並實現一錯誤偵測與回復的機制,使其能夠有效對抗各種變異並達到更高的效率與可靠度。我們基於Razor與Surger建立一混合式的錯誤偵測機制,結合全域與區域的時序(timing)資訊做錯誤的偵測。基於指令的重播(instruction replay)在架構層級實現錯誤回復的機制,輔以頻率調控機制以避免重複的錯誤導致死結(deadlock)產生。此外,為使錯誤容忍機制能夠更有效率的運作,我們將可能發生的時序錯誤納入考量,以電路的路徑活躍度(path activation probability)對設計進行優化,使其錯誤率降低以提高效率。我們提出一系統性的完整方案將錯誤容忍機制融入處理器的設計中,達到消除設計邊界的目的。 With the growing popularity of mobile devices, the trend in the field of system-on-chip has shifted from high performance to low power operation. However, traditional design methodology is limited by the design margins reserved for process, voltage and temperature variations. The operating point is chosen under a worst-case scenario of variations for the circuit to operate correctly. This design methodology is the so called worst-case design. On the other hand, it is possible to reduce the design margins if timing error is detectable and recoverable, which leads to prominent energy saving and better reliability compared with the worst-case design. In this Thesis, a systematic solution that enables real-time timing error detection and correction was proposed to eliminate redundant design margins and implement it on an ARM microprocessor. We build a hybrid error detection mechanism that combines global and local timing information to detect errors. The error correction mechanism is implemented on architectural-level based on instruction replay, and a frequency control mechanism is added to prevent possible deadlock situation caused by repeated errors. To better utilize the underlying error-tolerance mechanism, an activity-driven optimization procedure is proposed to reshape the slack distribution based on path activity. As a result, the design becomes more robust against process, voltage, and temperature variations. On the other hand, the power efficiency increases due to the reduction of design margins, thus making it a better-than-worst-case design. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53668 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 電子工程學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-104-1.pdf Restricted Access | 2.21 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.