適用於快取記憶體的封裝暨安置物件方法

Chun-Chieh Lin; 林俊杰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9420

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳俊良(Chuen-Liang Chen)
dc.contributor.author	Chun-Chieh Lin	en
dc.contributor.author	林俊杰	zh_TW
dc.date.accessioned	2021-05-20T20:21:46Z	-
dc.date.available	2010-02-10
dc.date.available	2021-05-20T20:21:46Z	-
dc.date.copyright	2009-02-10
dc.date.issued	2009
dc.date.submitted	2009-02-02
dc.identifier.citation	Bibliography [1] Raman E. and August D.I., “Chapter 5. Optimizations for memory hierarchies,” in the Handbook of Compiler Design Optimizations and Machine Code Generation, 2nd Ed. CRC Press, 2008. [2] Gloy N., Blackwell T., Smith M.D., and Calder B., “Procedure placement using temporal ordering information,” in Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'97), pp. 303, IEEE, 1997. [3] Bovet D. and Cesati M., Understanding the Linux Kernel, Third Edition, Chapter 18, O'Reilly, 2005. [4] Micron Technology, Inc, “Design and Use Considerations for NAND Flash Memory,” Micron Technology, Inc, 2006. [5] Hennessy J.L. and Patterson D.A., Computer Architecture: A quantitative Approach, 3rd Ed, Morgan Kaufmann Publishers, 2003. [6] Silberschatz A., Galvin P.B., and Gagne G., Operating System Concepts, 7th Ed, Wiley, 2004. [7] Burger D.C, Goodman J.R., and Sohi G.S., “Memory System”, in Computer Science Handbook, 2nd Ed., CRC Press, 2004. [8] Hill M.D., 'A case for direct-mapped caches,' Computer, Vol.21, No.12, pp.25-40, IEEE, Dec 1988. [9] Belady L.A., “A study of replacement algorithms for a virtual-storage computer,” IBM System Journal, Vol. 5, Number 2, pp. 78, IBM, 1966. [10] Smith A. J., “Cache memories,” ACM Computing Survey. Vol. 14, 3, pp. 473-530, ACM, 1982. [11] Coffman E.G. and Denning P.J., Operating System Theory, Prentice-Hall. 1973. [12] Intel, Intel Architecture Optimization Manual, Intel Corporation, 1997. [13] Goodman J.R., “Using cache memory to reduce processor-memory traffic,” in 25 Years of the International Symposia on Computer Architecture, ACM, 1998. [14] Przybylski S.A., Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers Inc., 1990. [15] Al-Sayed H.S., 'Cache memory application to microcomputers,' Tech. Rep. 78-6, Dep. of Computer Science, Iowa State Umv, Ames, Iowa, 1978. [16] Anacker W. and Wang C.P., “Performance evaluation of computing systems with memory hmrarchles,” IEEE Transactions on Computer. TC-16, 6 (Dec. 1967), pp. 764-773, IEEE, 1967. [17] Gibson D.H., 'Consideration in block oriented systems design,' in Proceedings of 1967 Spring Joint Computer Conference, Vol. 30, pp. 75-80, Thompson Books 1967. [18] Kaplan K.R. and Winder, R.O. “Cache-based computer systems,” IEEE Computer, Vol. 6, 3 (March 1973), pp. 30-36, IEEE, 1973. [19] Mattson R.L., “Evaluation of multilevel memories,” IEEE Transactions on Magnetics, Vol. 7, 4 (Dec. 1971), pp. 814-819, IEEE, 1971. [20] Meade R.M., 'On memory system design,' in Proceedings of Fall Joint Computer Conference, Vol. 37, pp. 33-43, AFIPS Press, 1970. [21] Strecker W.D., “Cache memories for PDP-11 family computers,” in Proceedings of the 3rd annual symposium on Computer architecture, pp.155-158, ACM, 1976. [22] Wolf M.E. and Lam M.S., “A data locality optimizing algorithm,” ACM SIGPLAN Notices, Vol. 26, 6 (June 1991), pp. 30 - 44, ACM, 1991. [23] Mowry T.C., Lam M.S., and Gupta A., “Design and evaluation of a compiler algorithm for prefetching.” in Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), pp 62-73, ACM. 1992. [24] Park J.S., Penner M., and Prasanna V.K., 'Optimizing graph algorithms for improved cache performance,' IEEE Transactions on Parallel and Distributed Systems, Vol. 15, 9, pp. 769-782, IEEE, 2004. [25] Caceres R., Douglis F., Li K., and Marsh B., “Operating system implications of solid-state mobile computers,” in Proceedings of the Fourth Workshop on Workstation Operating Systems, pp. 21-27, IEEE, 1993. [26] Verneer D., “eXecute-In-Place,” in Memory Card Magazine, March/April, Lippincott & Peto Inc., 1991. [27] Santarini M., “NAND versus NOR-Which flash is best for bootin’ your next system?” EDN October 2005, pp. 41-48. Reed Business Information, 2005. [28] Park C., Seo J., Bae S., Kim H., Kim S., and Kim B., “A low-cost memory architecture with NAND XIP for mobile embedded systems,” in ISSS+CODES 2003: First IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp. 138-143, ACM, 2003. [29] Garey M.R. and Johnson D.S., Computer and Intractability - A Guide to the Theory of NP-Completeness. Bell Telephone Laboratories, 1979. [30] Wang M., Lim S.K., Cong J., and Sarrafzadeh M., “Multi-way partitioning using bi-partition heuristics,” in Proceedings of the 2000 IEEE/ACM Asia South Pacific Design Automation Conference, pp. 667–672, ACM, 2000. [31] Kernighan B.W. and Lin S., “An effective heuristic procedure for partitioning graphs”, Bell System Technology Journal, Vol. 49, pp. 291-307, 1970. [32] Hendrickson B. and Leland R., “A multilevel algorithm for partitioning graphs,” in Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, ACM, 1995. [33] Pardalos P.M. and Hearn D., Aspects of Semidefinite Programming, Kluwer Academic Publishers, 2004. [34] Papadimitriou C.H. and Yannakakis M., “Optimization, approximation and complexity classes,” Journal of Computer and System Science, Vol. 43, pp. 425-440, Elsevier, 1991. [35] Karp R.M., “Reducibility among combinatorial problems,” in Complexity of Computer Computations, pp. 85–104, Plenum Press, 1972. [36] Aho A.V., Lam M.S., Sethi R., and Ullman J.D., “Chapter 8. Code Generation,” in Compilers: Principles, Techniques, and Tools 2nd Ed., pp. 505-582, Addison-Wesley Longman Publishing Co., Inc., 2006. [37] Goemans M.X. and Williamson D.P., “.879-approximation algorithms for MAX CUT and MAX 2SAT,” in Proceedings of the Twenty-Sixth Annual ACM Symposium on theory of Computing, pp. 422-431, ACM, 1994. [38] Goemans M.X. and Williamson D.P., “Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming,” Journal of the ACM, Vol. 42, 6 (Nov. 1995), pp. 1115-1145, ACM, 1995. [39] Frieze A.M. and Jerrum M., “Improved approximation algorithms for MAX k-CUT and MAX BISECTION”, in the Proceedings of 4th International IPCO Conference on Integer Programming and Combinatorial Optimization, Springer, 1995. [40] Motwani R. and Raghavan P, Randomized algorithms. Cambridge University Press, 1995. [41] Klerk E.D., Pasechnik D.V., and Warners J.P., “On approximate graph colouring and MAX-k-CUT algorithms based on the θ-function,” Journal of Combinatorial Optimization, Vol. 8, 3, pp. 267-294, Springer, 2004. [42] Kann V., Khanna S., Lagergren J., and Panconesi A., “On the hardness of approximating MAX k-CUT and its dual,” Chicago Journal of Theoretical Computer Science, 1997. [43] Kann V., Khanna S., Lagergren J., and Panconesi A., “On the hardness of approximating MAX k-CUT and its dual,” in Proceedings of Fourth Israel Symposium on Theory of Computing and Systems (ISTCS), pp. 61-67, IEEE Computer Society, 1996. [44] Coja-Oghlan A., Moore C., and Sanwalani V., “MAX k-CUT and approximating the chromatic number of random graphs,” Random Structures and Algorithms, Vol. 28, 3, pp. 289-322, Wiley, 2005. [45] Ghaddar B., Anjos M., and Liers F., “A branch-and-cut algorithm based on semidefinite programming for the minimum k-partition problem,” Optimization Online, 2007. [46] Laurent M. and Rendl F., “Semidefinite programming and integer programming,” Report PNA-R0210, CWI, Amsterdam, April 2002. [47] Kahruman S., Kolotoglu E., Butenko S., and Hicks I.V., “On greedy construction heuristics for the MAX-CUT problem,” International Journal on Computational Science and Engineering, Vol. 3, 3, pp. 211-218, Inderscience, 2007. [48] Cho J.D., Raje S., and Sarrafzadeh M., 'Fast approximation algorithms on MAXCUT, k-coloring, and k-color ordering for VLSI applications,' IEEE Transactions on Computers, Vol.47, 11, pp.1253-1266, IEEE, 1998. [49] Hwu W.W. and Chang P.P., “Achieving high instruction cache performance with an optimizing compiler,” in Proceedings of the 16th Annual International Symposium on Computer Architecture, pp. 242-251, IEEE Computer Society Press, 1989. [50] Chang P. P. and Hwu W.W., “Trace selection for compiling large C application programs to microcode,” in the Proceedings of the 21st annual workshop on Microprogramming and microarchitecture, pp. 21-29, ACM, 1988. [51] McFarling S., “Program optimization for instruction caches,” in Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 183–191, ACM, 1989. [52] Pettis K. and Hansen R.C., “Profile-guided code positioning,” in Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation, pp. 16–27, ACM, 1990. [53] Gloy N. and Smith M.D., “Procedure placement using temporal ordering information,” ACM Transactions on Programming Languages and Systems (TOPLAS), Vol 21, 5, pp. 977–1027, ACM, 1999. [54] Calder B., Krintz C., John S., and Austin T., “Cache-conscious data placement,” in Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 139-149, ACM, 1998. [55] Sherwood T., Calder B., and Emer J., “Reducing cache misses using hardware and software page placement,” in Proceedings of the 13th International Conference on Supercomputing, pp. 155-164, ACM, 1999. [56] Guillon C., Rastello F., Bidault T., and Bouchez F, “Procedure placement using temporal-ordering information: dealing with code size expansion,” in Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 268-279, ACM, 2004. [57] Hashemi A.H., Kaeli D.R., and Calder B., “Efficient procedure mapping using cache line coloring,” ACM SIGPLAN Notices Vol. 32, 5, pp. 171-182, ACM, 1997. [58] Kalamatianos J. and Kaeli D., “Temporal-based procedure reordering for improved instruction cache performance,” in the Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pp. 244, ACM, 1998. [59] Janapsatya A., Parameswaran S., and Henkel J., “REMcode: relocating embedded code for improving system efficiency,” In IEE Proceedings of Computers and Digital Techniques, Vol. 151, 6, pp. 457-465, IEE, 2004. [60] Tomiyama H. and Yasuura H., 'Optimal code placement of embedded software for instruction caches,' in Proceedings of European Design and Test Conference 1996. pp.96-101, IEEE, 1996. [61] Tomiyama H. and Yasuura H., “Code placement techniques for cache miss rate reduction,” ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol. 2, 4, pp. 410-429, ACM, 1997. [62] Um J. and Kim T., “Code placement with selective cache activity minimization for embedded real-time software design,” in Proceedings of the International Conference on Computer Aided Design 2003 (ICCAD’03), pp. 197-200, ACM, 2003. [63] Chilimbi T.M., Hill M.D., and Larus J.R., “Cache-conscious structure layout,” in Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation (PLDI), pp. 1 - 12, ACM, 1999. [64] Panda P.R., Semeria L., and De Micheli G., “Cache-efficient memory layout of aggregate data structures,” in Proceedings of the 14th International symposium on Systems synthesis, pp. 101-106, ACM, 2001. [65] Rabbah R.M. and Palem K.V., “Data remapping for design space optimization of embedded memory systems,” ACM Transactions on Embedded Computing Systems, Vol. 2, 2, pp. 186-218, ACM, 2003. [66] Palem K.V., Rabbah R.M., Mooney V.J., Korkmaz P., and Puttaswamy K., “Design space optimization of embedded memory systems via data remapping,” in Proceedings of the Joint Conference on Languages, Compilers and Tools For Embedded Systems: Software and Compilers For Embedded Systems, LCTES/SCOPES '02, pp. 28-37, ACM, 2002. [67] Chilimbi T.M., Davidson B., and Larus J.R., “Cache-conscious structure definition,” in Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI '99), pp. 13-24, ACM, 1999. [68] Petrank E. and Rawitz D., “The hardness of cache conscious data placement,” in Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 101 – 112, ACM, 2002. [69] Petrank E. and Rawitz D., “The hardness of cache conscious data placement,” Nordic Journal of Computing, Vol. 12, 3, pp. 275 – 307, Publishing Association Nordic Journal of Computing, 2005. [70] Panda P.R., Dutt, N.D., and Nicolau A., “Memory data organization for improved cache performance in embedded processor applications,” ACM Transactions on Design Automation of Electronic Systems, Vol. 2, 4, pp. 384–409, ACM 1997. [71] Panda P.R., Catthoor F., Dutt N.D., Danckaert K., Brockmeyer E., Kulkarni C., Vandercappelle A., and Kjeldsberg P.G., “Data and memory optimization techniques for embedded systems,” ACM Transactions on Design Automation of Electronic Systems Vol. 6, 2, pp. 149-206, ACM, 2001. [72] Parameswaran S. and Henkel J., 'Instruction code mapping for performance increase and energy reduction in embedded computer systems,' IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol.13, 4, pp. 498-502, IEEE, 2005. [73] Choi Y. and Kim T., “Memory layout techniques for variables utilizing efficient DRAM access modes in embedded system design,” in Proceedings of the 40th Conference on Design Automation, pp 881-886, ACM, 2003. [74] Choi Y. and Kim T., 'Memory access driven storage assignment for variables in embedded system design,' in Proceedings of the Asia and South Pacific Design Automation Conference 2004, pp. 478-481, IEEE, 2004. [75] Hettiaratchi S. and Cheung P.Y.K., 'Mesh partitioning approach to energy efficient data layout,' in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition 2003, pp. 1076-1081, IEEE Computer Society, 2003. [76] Kulkarni C., Ghez C., Miranda M., Catthoor F., and De Man H., 'Cache conscious data layout organization for embedded multimedia applications,' in Proceedings of Design, Automation and Test in Europe 2001, pp. 686-691, IEEE, 2001. [77] Samsung Electronics, OneNAND Features & Performance, Samsung Electronics, November 4, 2005. [78] Park C., Lim J., Kwon K., Lee J., and Sang L.M., “Compiler assisted demand paging for embedded systems with flash memory,” in Proceedings of the 4th ACM International Conference on Embedded software (EMSOFT’04), pp. 114-124 , ACM, 2004. [79] Denning P.J., “The working set model for program behavior,” Communications of the ACM, Vol. 11, 5, pp. 323-333, ACM, 1968. [80] Denning P.J., “The locality principle,” Communications of the ACM, Vol. 48, 7, pp. 19-24, ACM, 2005. [81] Denning P.J., “Working sets past and present,” IEEE Transactions on Software Engineering, Vol. 6, 1, pp. 64–84, IEEE, 1980. [82] Rubin S., Bodik R., and Chilimbi T.M., “An efficient profile-analysis framework for data-layout optimizations,” in Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of programming language, pp. 140 - 153, ACM, 2002. [83] Nevill-Manning C. and Witten I., “Identifying hierarchical structure in sequences: A linear-time algorithm,” Journal of Artificial Intelligence Research, Vol. 7, pp. 67-82, 1997. [84] Chilimbi T.M., “Efficient representations and abstractions for quantifying and exploiting data reference locality,” in Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, pp. 191–202, ACM, 2001. [85] Chilimbi T.M. and Shaham R., “Cache-conscious coallocation of hot data streams,” ACM SIGPLAN Notices, Vol. 41, 6, pp. 252-262, ACM, 2006. [86] Ryder K., “Optimizing program placement in virtual systems,” IBM Systems Journal, Vol. 13, 4, pp. 292, IBM, 1974. [87] Hatfield D.J. and Gerald J., “Program restructuring for virtual memory,” IBM Systems Journal, Vol. 10, 3, pp. 168, IBM, 1971. [88] Xu R. and Li Z., “Using cache mapping to improve memory performance handheld devices,” in Proceedings of 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'04), pp. 106-114, IEEE, 2004. [89] Stamos J.W., “Static grouping of small objects to enhance performance of a paged virtual memory,” in ACM Transactions on Computer Systems (TOCS), Vol. 2, 2, pp. 155–180, ACM, 1984. [90] Hirzel M., “Data layouts for object-oriented programs,” in Proceedings of SIGMETRICS '07 Conference, pp. 265 – 276, ACM, 2007. [91] Zhao Q., Rabbah R., and Wong W., “Dynamic memory optimization using pool allocation and prefetching”, ACM SIGARCH Computer Architecture News, Vol. 33, 5, pp. 27-32, ACM, 2005. [92] Chilimbi T.M. and Larus J.R., 'Using generational garbage collection to implement cache-conscious data placement,' in Proceedings of the International Symposium on Memory Management (ISMM-98) of ACM SIGPLAN Notices, Vol. 34, 3, pp. 37–48, ACM, 1998. [93] LaMarca A. and Ladner R., “The influence of caches on the performance of heaps,” Journal of Experimental Algorithmic, Vol. 1, ACM, 1996. [94] Seidl M.L. and Zorn B.G., “Segregating heap objects by reference behavior and lifetime,” in Proceedings of the Eighth International Conference on Architectural Support For Programming Languages and Operating Systems, pp. 12-23, ACM, 1998. [95] Truong D.N., Bodin F., and Seznec A., 'Improving cache behavior of dynamically allocated data structures,' in Proceedings of PACT’98, Conference on Parallel Architectures and Compilation Techniques, pp. 322–329, ACM, 1998. [96] Allen R., and Kennedy K., Optimizing Compilers for Modern Architectures: A Dependence-based Approach, 1st Ed., Morgan Kaufmann, 2001. [97] Smith J¬.E. and Goodman J.R., “A study of instruction cache organizations and replacement policies,” in Proceedings of the 10th Annual International Symposium on Computer Architecture, pp. 132-137, IEEE Computer Society, 1983. [98] Steinke S., Grunwald N., Wehmeyer L., Banakar R., Balakrishnan M., and Marwedel P., “Reducing energy consumption by dynamic copying of instructions onto onchip memory,” in Proceedings of the 15th International Symposium on System Synthesis, pp. 213-218, ACM, 2002. [99] Blazewicz J., Kubiak W., Morzy T., and Rusinkiewicz M., Handbook on Data Management in Information Systems, Springer, 2003. [100] METIS, Karypis Lab, Department of Computer Science & Engineering, University of Minnesota, http://glaros.dtc.umn.edu/gkhome/software. [101] Govindarajan R., “Chapter 19. Instruction Scheduling,” in the Handbook of Compiler Design Optimizations and Machine Code Generation, 2nd Ed. CRC Press, 2008. [102] Ball T. and Larus J. R., “Optimally profiling and tracing programs,” ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 16,4, pp. 1319-1360, ACM, 1994. [103] Sun Microsystems, J2ME Building Blocks for Mobile Devices, Sun Microsystems, May 19, 2000. [104] Lafond S. and Lilius J., “An energy consumption model for java virtual machine,” Turku Centre for Computer Science TUCS Technical Report No 597, TUCS, March 2004. [105] CaffeineMark 3.0, Pendragon Software Corp, http://www.benchmarkhq.ru/cm30. [106] Fuber S., ARM System-on-Chip Architecture, 2nd Ed., Addison-Wesley, 2000. [107] Huang X., Lewis B.T., and McKinley K.S, “Dynamic code management: improving whole program code locality in managed runtimes,” in Proceedings of the 2nd International Conference on Virtual execution environments, pp. 133-143, ACM, 2006.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9420	-
dc.description.abstract	快取記憶體在分層式記憶體架構中扮演加速存取動作的角色。程式與資料物件在記憶體中的排列順序是影響快取錯失率的重要原因之一。先賢們的研究都集中在找出適用於直接映射式快取記憶體的方法，其方法是把這些物件錯置排列至各個快取關連組。錯置排列的方法有助於減少衝突性錯失。然而某些系統的記憶區塊大到可以容納許多程式或資料物件，而且被讀進快取記憶體的最小單位是記憶區塊，而非物件，因此所有物件只得互搶快取記憶體內有限的空間。本論文提出一套方法論，旨於利用調整物件在記憶體中排列的方法來改善快取記憶體的效率。這套方法包含兩項重點：探索物件的組織，以及針對各種快取架構產生專屬的物件排列法。探索物件組織的方法涵蓋了解構資料或程式的成份，以及產生物件存取活動的軌跡。此外本文亦特別提出一個適用於虛擬機器（例如爪哇虛擬機器）的探索技術，其著眼點是基於此類程式具有特殊的軟體架構。本論文的重點是尋求產生適用於各類快取記憶體的物件排列之法。前提是假定物件的尺寸小於記憶區塊。這意味著賦予物件地址編號必須統合兩項動作，一則是用於快取關連組錯置物件法，二則是將物件合併到記憶體區塊。本論文提出的辦法是藉由存取活動的軌跡來建立物件關連模型。在發展的過程中，文本分析了物件關連模型、快取架構、及快取錯失之間的因果關係。然而這些因果關係實際上十分困難而無法算出最佳解。因此本文也提出的頗具實用性的技術，用以產生適用於不同快取架構的物件排列法。本論文亦實作了相當繁複的實驗來驗證所提出來的方法。實驗的結果頗具有說服力，可以支持本文提出的技術的有效性。	zh_TW
dc.description.abstract	The cache provides acceleration in access through the memory hierarchy. The order of arranging code and data objects in the main memory is an important factor that affects cache miss rates. Prior related researches focus on arranging objects interleaved between cache sets for the direct mapped cache. Interleaving the address of each items helps to resolve conflict misses. However, there are computer systems that a memory block can be large to collect a number of code and data objects, and the unit to be loaded to the cache is a memory block, not an object. Therefore, objects contend spaces within cache blocks as well. This dissertation provides a methodology for optimizing cache memory utilization of applications in various fields by arranging their relocatable objects within the main memory. The methodology includes the exploration of object space and generation of object layouts for all kinds of cache organization. The object space exploration involves techniques in inspecting the data and program integrant and acquiring the profile of objects accesses. The exploration also contains a technique particular for the virtual machine, e.g., the Java virtual machine, because of its unique program structure. Generating object layout adapted for cache memory is the keystone in this dissertation. The presumption is that objects are smaller than a memory block. That means assigning addresses to objects must incorporate two movements into one. The first is interleaving objects to cache sets. The second is gathering these objects to fit one cache block. Our study suggests creating the object affinity model by profile information. This study analyzes the relationship between the object affinity model, cache configurations, and cache misses. The packing and placement problem turns to be hard to find an optimal solution. Thereafter, this study proposes practical techniques of generating object layouts for different cache organizations. This dissertation also includes experiments to evaluate the proposed techniques. The experiments provide convincible results and support the effectiveness of the proposed approaches.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T20:21:46Z (GMT). No. of bitstreams: 1 ntu-98-D93922020-1.pdf: 3044383 bytes, checksum: 15481de3b44aca24d1c4eba2aaa3596a (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	CHAPTER 1 INTRODUCTION 1 1.1 Motivation 1 1.2 Usefulness 4 1.3 Scope and Organization 6 CHAPTER 2 BACKGROUND 9 2.1 Memory Hierarchy 9 2.1.1 Cache Organization 10 2.1.2 XIP and NAND Flash 15 2.2 Graph and Combinatorial Algorithms 17 2.3 Related Works 21 2.3.1 Placements 21 2.3.2 XIP and NAND Flash 26 2.3.3 Locality 27 2.3.4 Other Related Topics 27 CHAPTER 3 PROBLEM MODELING 31 3.1 Object Access Trace 31 3.2 One Page Cache Model 38 3.3 Direct Mapped Cache 42 3.4 Fully Associative Cache 48 CHAPTER 4 PRACTICAL APPROACHES 57 4.1 Hardness of Packing and Placement for Direct Mapped Cache 57 4.2 Approaches for Direct Mapped Cache 63 4.2.1 Packing Followed by Placement 64 4.2.2 Placement Followed by Packing 69 4.3 Approaches for Fully Associative Cache 70 4.3.1 One-Page Cache Method 70 4.3.2 Two-Pass Partitioning Method 71 4.4 Approaches for Set Associative Cache 73 CHAPTER 5 EXPLORATIONS OF OBJECTS AND TRACES 75 5.1 Generic Data Objects 76 5.2 Generic Code Objects 77 5.2.1 Motivation 77 5.2.2 Control Flow Analysis and Basic Blocks 80 5.2.3 Benchmark Overview 85 5.3 Partial Arrangement on Performance Bottleneck 92 5.4 Virtual Machine Interpreters 94 5.4.1 KVM Internal 95 5.4.2 Analyzing Control Flow 100 5.4.2.1 Indirect Control Flow Graph 100 5.4.2.2 Tracing the Locality of the Interpreter 101 5.5 Discussion on Effectiveness and Impact of Profiling 105 CHAPTER 6 EVALUATIONS AND EXPERIMENTS 107 6.1 Experimental Setup 107 6.2 Direct Mapped Cache: Experimental Analysis 109 6.3 Fully Associative Cache: Experimental Analysis 129 6.4 Set Associative Cache: Experimental Analysis 139 6.5 Experiments on Partial Arrangement 148 6.5.1 Direct Mapped Cache Experiment 148 6.5.2 Fully Associative Cache Experiment 153 6.6 Virtual Machine Experiment 158 6.6.1 Evaluation Environment 158 6.6.2 Virtual Machine Modification Procedures 159 6.6.3 Experimental Result 163 CHAPTER 7 CONCLUSIONS 167 BIBLIOGRAPHY 171
dc.language.iso	en
dc.title	適用於快取記憶體的封裝暨安置物件方法	zh_TW
dc.title	Cache Conscious Object Packing and Placement	en
dc.type	Thesis
dc.date.schoolyear	97-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	陳健輝(Gen-Huey Chen),賴飛羆(Feipei Lai),洪士灝(Shih-Hao Hung),金仲達(Chung-Ta King),鍾崇斌(Chung-Ping Chung)
dc.subject.keyword	快取記憶體,記憶體最佳化,程式碼排列,資料排列,虛擬機器,	zh_TW
dc.subject.keyword	cache memory,memory optimization,code layout,data layout,virtual machine,	en
dc.relation.page	184
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2009-02-02
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf	2.97 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。