一套於多核心系統上具可攜性且有效率的使用者分配資源機制

Tang-Hsun Tu; 涂堂訓

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43866

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	薛智文
dc.contributor.author	Tang-Hsun Tu	en
dc.contributor.author	涂堂訓	zh_TW
dc.date.accessioned	2021-06-15T02:31:01Z	-
dc.date.available	2010-08-19
dc.date.copyright	2009-08-19
dc.date.issued	2009
dc.date.submitted	2009-08-14
dc.identifier.citation	[1] Intel (R) VTune(TM) Performance Analyzer. http://www.intel.com/cd/software/products/asmo-na/eng/239144.htm. [2] POSIX Threads (pthreads) for Win32. http://sourceware.org/pthreads-win32, 2005. [3] GNU C Library. http://www.gnu.org/software/libc, Nov. 2008. [4] H.264/AVC JM Reference Software. http://iphome.hhi.de/suehring/tml, Jan. 2009. [5] Loadable Kernel Module - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Loadable kernel module, Mar. 2009. [6] Man page - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Man page, May 2009. [7] procfs - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Procfs, Mar. 2009. [8] Sysfs - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Sysfs, Mar. 2009. [9] Yen-Kuang Chen, X Tian, Steven Ge, and M. Girkar. Towards Efficient Multi-Level Threading of H.264 Encoder on Intel Hyper-Threading Architectures. IEEE Proceedings of the 18th International Parallel and Distributed Processing Symposium, page 63, 2004. [10] Corbet. Scheduling domains [LWN.net]. http://lwn.net/Articles/80911/, Apr. 2004. [11] L Dagum and R Menon. OpenMP: An Industry-Standard API for Shared Memory Programming. IEEE Computational Science & Engineering, 5(1):46–55, Jan. 1998. [12] RL Graham. Bounds on Multiprocessing Timing Anomalies. SIAM Journal of Applied Mathematics, 17(2):417, 1969. [13] ISO/IEC 14496-10, International Standard of Joint Video Specification. Coding of Audiovisual Objects-Part 10: Advanced Video Coding, 2003. [14] Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication. Proceedings of the 13th international conference on Supercomputing, pages 228–237, 1999. [15] Amy W. Lim and Monica S. Lam. Maximizing Parallelism and Minimizing Synchronization with affine Transforms. Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 201–214, 1997. [16] A.A.Moinuddin, E. Khan, and F.Ghani. An Efficient Technique for Storage of Two-Tone Images. IEEE Transactions on Consumer Electronics, 43(4), Nov. 1997. [17] Michael J. Quinn. Parallel Programming in C with MPI and OpenMP. McGraw-Hill, 2003. [18] Iain E.G. Richardson. H.264 and MPEG-4 Video Compression. Wiley, 1 edition, Aug. 2003. ISBN 0-470-84837-5. [19] Michael Roitzsch. Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding. Proceedings of the 7th ACM & IEEE international conference on Embedded software, pages 269–278, 2007. [20] Hao Sung. A Skip-line with Threshold Technique for Binary Image Compression. Master Thesis, Fu Jen Catholic University, Taipei, Taiwan 106, R.O.C, Jul. 2008. [21] Sung-Wen Wang, Ya-Ting Yang, Chia-Ying Li, Yi-Shin Tung, and Ja-Ling Wu. The Optimization of H.264/AVC Baseline Decoder on Low-Cost TriMedia DSP Processor. Proceeding of SPIE, 5558, 2004. [22] Shu-Sian Yang, Sung-WenWang, and Ja-LingWu. A Parallel Algorithm for H.264/AVC Deblocking Filter Based on Limited Error Propagation Effect. In IEEE International Conference on Multimedia and Expo, pages 1858–1861, Jul. 2007. [23] Xiaosong Zhou, Eric Q. Li, and Yen-Kuang Chen. Implementation of H.264 Decoder on General-Purpose Processors with Media Instructions. Proceeding of SPIE Conference on Image and Video Communication and Process- ing, 5022, Jan. 2003.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43866	-
dc.description.abstract	在多核心系統上，使用多執行緒的方式來加速效能是相當常見的一種方法。儘管如此，在很多簡單的應用程式中，增加執行緒的數目反而使得效能下降，與預期結果並不相符。而通常使用者都會覺得是建立與結束執行緒所造成的額外花費。然而，在我們的觀察中，其最重要影響的原因是在於執行緒的分配。因此，在本論文中，我們討論執行緒的相關問題並提出一套新穎的使用者分配機制 (UDispatch) 來解決。因為執行緒是執行於使用者空間，並不能直接控制核心空間的系統資源。因此，我們利用一個虛擬裝置來作為兩空間的的橋樑，以避免直接修改核心增加系統呼叫，透過其可很有效率且可攜的與作業系統溝通。另外，我們提供 UDispatch 相關的程式應用介面 (API) 供使用者直接於應用程式原始碼中使用。雖然 API 能幫助使用者，然而，因為一些原因，有時候使用者並不想修改原始碼或沒辦法修改原始碼，使得 UDispatch 的好處沒辦法顯現。因此，我們又提出了一命令列的程式應用介面 ─ 使用者分配機制載入器 (UDLoader)，其可幫助使用者操作 UDispatch 而不需修改程式碼。我們也將 UDispatch 與 UDLoader 實驗於兩個多媒體應用程式：跳行二元壓縮應用程式與 H.264/AVC 解碼器。其實驗結果顯示，在跳行二元壓縮應用程式在四核心與八核心機器上分別有 171.8% 與 111.6% 的增進。而 H.264/AVC 解碼器在四核心機器上則有 20.1% 的提昇。	zh_TW
dc.description.abstract	In multicore environment, using multiple threads is a common useful approach to improve application performance. Nevertheless, even in many simple applications, the performance might degrade when the number of threads increases. Users usually impute this phenomenon to the overhead of creation or termination of threads. However, in our observation, the more significant effect is the dispatching of threads. We discuss the problems on using threads, and present a novel User Dispatching Mechanism (UDispatch) that provides controllability in user space to improve application performance. Since user threads cannot directly control system resources, a virtual device is adopted between user space and operating system for portability and efficiency instead of adding new system calls through kernel modification. We provide an application programming interface (API) for users to manipulate UDispatch through modification of application source codes. To avoid source code modification, a command-line UDispatch Loader (UDLoader) is also provided to help users bind threads to specific cores directly. We implement UDispatch on two multimedia applications of multi-threading. The results show that a skip-line application speeds up to 171.8% and 111.6% on a 4-core machine and an 8-core machine, respectively, and an optimized H.264/AVC decoder speeds up to 20.1% on a 4-core machine.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T02:31:01Z (GMT). No. of bitstreams: 1 ntu-98-R96944013-1.pdf: 762654 bytes, checksum: f927085073a84e9f9bb3c3947ff5291c (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	1 Introduction 1 2 Related Work and Background 5 2.1 Threading Anomaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Scheduling Anomaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Main-Thread Anomaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Why We Need a User Dispatching Mechanism ? . . . . . . . . . . . . . . . . . . 10 3 User Dispatching Mechanism 12 3.1 UDispatch Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Examples of Using UDispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 Binding Threads to Cores . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 Finding the Core with the Least Load . . . . . . . . . . . . . . . . . . . 16 3.3 Comparison with Other Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 sched set/getaffinity() System Calls . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 proc Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Shell Commands in UDispatch 22 4.1 UDispatch Module Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 UDispatch Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Customized UDispatch Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Experiments 29 5.1 Skip-line Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1.1 Anomaly Occurrence Frequency in Skip-line Applications . . . . . . . . . 40 5.1.2 Skip-line Application with UDL . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 H.264 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.1 Anomaly Occurrence Frequency in H.264 Applications . . . . . . . . . . 53 5.2.2 H.264 Decoder with UDL . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Comparison between UDL and UDA . . . . . . . . . . . . . . . . . . . . . . . . 57 6 Conclusion 59 7 Future Work 61 Appendix 61 A UDispatch API 62 A.1 open UDispatch() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 A.2 close UDispatch() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 A.3 bind to cpu() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 A.4 auto bind() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 A.5 get bind info len() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 A.6 get bind info() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 A.7 get nr threads() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 A.8 get pids on task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 A.9 get nr cpus() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 A.10 get cpuload() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 A.11 get taskload() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 A.12 get nr tasks on cpu() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 A.13 get pids on cpu() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 A.14 UDispatch ctl() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 A.15 set UDL pids() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 A.16 enable UDLswitch() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 A.17 disable UDLswitch() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 B Low-Level Kenrnel API 76 B.1 set cpus allowed ptr() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 B.2 num online cpus() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 C Shell Commands in UDispatch 79 C.1 UDispatch Module Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 C.2 UDispatch Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Bibliography 82
dc.language.iso	en
dc.subject	載入器	zh_TW
dc.subject	執行緒	zh_TW
dc.subject	排程	zh_TW
dc.subject	分配	zh_TW
dc.subject	異常現象	zh_TW
dc.subject	多核心	zh_TW
dc.subject	系統呼叫	zh_TW
dc.subject	虛擬裝置	zh_TW
dc.subject	Scheduling	en
dc.subject	Loader	en
dc.subject	Virtual Device	en
dc.subject	System Call	en
dc.subject	Multicore	en
dc.subject	Anomaly	en
dc.subject	Threading	en
dc.subject	Dispatching	en
dc.title	一套於多核心系統上具可攜性且有效率的使用者分配資源機制	zh_TW
dc.title	A Portable and Efficient User Dispatching Mechanism for Multicore Systems	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張榮貴,羅習五
dc.subject.keyword	執行緒,排程,分配,異常現象,多核心,系統呼叫,虛擬裝置,載入器,	zh_TW
dc.subject.keyword	Threading,Scheduling,Dispatching,Anomaly,Multicore,System Call,Virtual Device,Loader,	en
dc.relation.page	89
dc.rights.note	有償授權
dc.date.accepted	2009-08-17
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 未授權公開取用	744.78 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。