請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68011| 標題: | 利用虛擬平台分析異質計算系統與應用之效能 Analyzing Application Performance for Heterogeneous Platforms |
| 作者: | Chih-Wei Yeh 葉志威 |
| 指導教授: | 洪士灝(Shih-Hao Hung) |
| 關鍵字: | 時間模擬,效能分析,異質平台,機器學習,效能預測, Timing Simulation,Performance Analysis,Heterogeneous Platforms,Machine Learning,Performance Prediction, |
| 出版年 : | 2017 |
| 學位: | 博士 |
| 摘要: | 隨著時代演進,為了達到更好的應用程式效能,現今處理器紛紛朝向異質化和訂製的加速器發展,從傳統的資料處理應用到智慧應用,像是深度學習、物聯網、邊緣運算和工業4.0,因此,系統設計也隨著變得更為複雜,設計焦點也逐漸轉向為探索不同的設計空間和硬體參數,找到一個適合的處理器/加速器的組合,甚至還需要考慮在不同的處理器架構下軟體的效能變化,因為一個程式的效能會隨著演算法的設計而改變,而且演算法的效能又會隨著硬體的參數變化或不同的硬體平台而改變,為了在合理的硬體成本下得到最好的效能,理解程式和演算法的行為變化變成是一個重要的議題,一個能垂直從軟體行為分析到硬體變化的效能分析整合會是這個議題的關鍵技術,也會是在異質時代優化程式的關鍵一環。
傳統的模擬器,像是gem5,雖然能夠在微處理器等級提供準確的效能分析和時間估計,但是加入一套新的硬體元件時間模型並不簡單,並且這些傳統的模擬器的方式不能夠提供軟體行為分析的功能,也不能了解這些軟體行為變化對應到的個別硬體元件影響,模擬速度也因為這些複雜的時間模型而變得緩慢。為了解決模擬速度和程式分析問題,我們整合了時間模擬器開發出了一個快速混合式模擬器,Snippit,提供使用者可以在不用修改的情況下執行和模擬一個完整的系統,並且可以用於雛型系統開發和設計使用,此外,我們所提出的可熱插拔硬體模擬器的快速時間模擬器和即時動態模型選擇器大幅的減少了模擬器的執行時間,使其可以在40到70 MIPS的速度下執行,以及整合了Multi2Sim GPGPU模擬器提供異質環境的模擬和實做了共用變數追蹤機制來提供簡易的競合状態檢測,我們的異質模擬器能提供使用者在異質環境下的資料搬移和共用變數的問題檢測。最後我們實做了程式Phase偵測演算法來蒐集和提供程式行為的資訊和每段程式行為對應到使用者指定的不同硬體參數的效能資料。有了這些功能,我們提出的Snippit作為一個模擬環境,除了可以幫助軟硬體設計與效能分析外,還做到了垂直整合:從軟體的演算法行為變化對應到硬體的變化,再對應到這些變化的效能指標;並且結合了機器學習方法來幫助自動分析和預測程式碼不同部份的行為執行在不同的硬體平台上的效能,提供使用者優化方向的建議。 Today's state-of-the-art processing systems often require heterogeneous computing and special-purpose accelerators to offer highly efficient performance for mixed application workloads, including not only traditional data processing algorithms, but also the demands to enable smart applications such as deep learning, Internet of Things, Edge Computing, as well as the Industry 4.0. Thus, the complexity of such systems has been increasing, and the focus of designing has been shifting to exploring the design space with a mixture of processing cores/accelerators and the performance impacts from application behaviors to hardware resources. In order to gain the best performance under acceptable hardware costs, understanding the program behaviors and its algorithms is critical to the problem but the performance of each algorithm varies on different accelerators and hardware parameters. The vertical integration and analysis from high-level program behaviors to underlying changes of hardware parameters is the key technique to the optimizations in the heterogeneous era. Traditional simulation tools may offer accurate performance estimation at the micro-architectural level, but it is highly complicated to combine the simulators for various components to perform complex applications, and they fall in short in terms of their capabilities to profiling application behavior with its performance impacts of hardware changes. Furthermore, the speed of such complex simulation would be slow with cycle-accurate heterogeneous emulation framework such as gem5. To solve the problem of simulation speed and performance analysis, we developed a rapid hybrid emulation/simulation framework, Snippit, that allows the user to execute a full-blown system and plug in emulators, simulators, and timing models for various components in the prototype system. With the proposed scalable and hot-pluggable timing simulation scheme and the just-in-time model selection mechanism which reduces the simulation time of regular patterns, the proposed framework is capable to be running at the speed of 40-70 MIPS. Integrating with Multi2Sim GPGPU emulator, we further implemented a shared variable tracking mechanism in order to trace the race conditions as well as the throughput of data copies among processing units. In addition, we implemented a phase detection algorithm to track and collect the application behaviors and its performance data under different hardware parameters which user specified. With all the functionalities, Snippit is an emulation tool that can help both system and application developers analyze the performance from software to hardware in the way program behaves. Finally, we incorporated machine learning to help analyze and predict the performance of optimizing and running target applications on other accelerators. As a vertical integration from software to hardware, Snippit can classify the execution of an application into program phases and give suggestions on optimizing each phase with its performance prediction. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68011 |
| DOI: | 10.6342/NTU201800094 |
| 全文授權: | 有償授權 |
| 顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-106-1.pdf 未授權公開取用 | 2.24 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
