| Methods and apparatus for dynamic register scratching -> Monitor Keywords |
|
Methods and apparatus for dynamic register scratchingUSPTO Application #: 20070234012Title: Methods and apparatus for dynamic register scratching Abstract: Apparatus and methods of reducing dynamic memory stack by a register stack engine are disclosed. An example apparatus and method identifies a local parameter of a caller function. A scratch register corresponding to the local parameter is moved to the top of a register stack, and a local parameter of a callee function is assigned to the scratch register. (end of abstract) Agent: Hanley, Flight & Zimmerman, LLC - Chicago, IL, US Inventors: Gerolf Hoflehner, Mark Davis USPTO Applicaton #: 20070234012 - Class: 712217000 (USPTO) Related Patent Categories: Electrical Computers And Digital Processing Systems: Processing Architectures And Instruction Processing (e.g., Processors), Dynamic Instruction Dependency Checking, Monitoring Or Conflict Resolution, Scoreboarding, Reservation Station, Or Aliasing The Patent Description & Claims data below is from USPTO Patent Application 20070234012. Brief Patent Description - Full Patent Description - Patent Application Claims FIELD OF THE DISCLOSURE [0001] This disclosure relates generally to processor memory register management and, more particularly, to systems and methods to decrease memory traffic on dynamic register stacks. BACKGROUND [0002] Microprocessors use registers to hold values of variables that are used in connection with the execution of instructions. The speed of instruction execution is, at least in part, dependent on the speed of access to data (e.g., variable values) stored in registers. Microprocessors typically have a number of physical on-chip registers, which can be accessed much more rapidly than memory. Generally, it is desirable to use the physical on-chip registers for executing instructions because such on-chip registers can be accessed more quickly, thereby decreasing instruction execution times. [0003] In certain processors such as the Intel.RTM. Itanium.RTM. processor, the on-chip registers are divided into static registers and stacked registers. A register stack engine defines a register stack as a limited number of stacked registers (e.g., ninety six in the case of the Itanium.RTM. processor) referred to as architectural registers. The register stack engine thus maps architectural stacked registers to physical registers. The physical registers allocated in the stack may be written to and then overwritten by subsequent instructions. The register stack engine may store and load the values of stacked registers to and from memory at function entries and exits. [0004] At a function entry to the processor, a special instruction, (e.g., "alloc") allocates the registers on the register stack for incoming parameters, temporal or local parameters and outgoing parameters that are needed for function calls. The incoming, local and outgoing parameters are used to store variables needed to execute the function and are referred to as architectural registers used by machine instructions. A result register is used by the alloc instruction to store the previous function state register. When the function exits, the previous function state register is used to restore the original values in the stacked registers for further use. The restoration of data to registers from memory increases bus traffic and slows instruction execution. [0005] Processors such as the Intel.RTM. Itanium.RTM. processor have a finite number of stacked registers. The Itanium.RTM. processor may allocate 96 stacked registers for immediate access at a function entry. However, this quantity of registers may be insufficient for executing complex applications with many instructions. Thus the register stack engine must save the contents of stacked registers to memory and restore the contents of such registers from memory. However, access to memory is time consuming and slows instruction execution. [0006] In operation, processor functions execute the alloc instruction to allocate registers for a function. The register stack engine first allocates stack registers and uses memory to store stacked registers from previous instructions when the stack registers have been exhausted. In practice, many applications are complex and the stack registers are frequently exhausted resulting in many memory store and restore actions. Thus, instruction execution is slowed by the register stack engine access to memory. BRIEF DESCRIPTION OF THE DRAWINGS [0007] FIG. 1 is a block diagram of an example processor system that uses an example scratch register method to allocate registers. [0008] FIGS. 2A-2C are block diagrams of an example allocation instruction for an initial instruction and a subsequent instruction. [0009] FIG. 3 is a block diagram illustrating the typical allocation of registers by a processor and using the example scratch register allocation method. [0010] FIG. 4 is a flow diagram representation of example machine readable instruction which may be executed to allocate registers. [0011] FIG. 5 is a flow diagram illustrating the application of the example register scratching process in an interference graph register allocation method. DETAILED DESCRIPTION [0012] In general, the methods and apparatus described herein include identifying a local parameter of a caller function and moving a scratch register corresponding to the local parameter to the top of a register stack. A local parameter of a callee function is then assigned to the scratch register. [0013] FIG. 1 is a block diagram showing an example processor system 10 with a register stack engine 12 ("RSE"). In the example of FIG. 1, the register stack engine 12 is implemented within a semiconductor package that includes, among other things, one or more of any variety of processor cores, one of which is shown at reference numeral 14, and one or more hardware blocks 16. The processor core(s) 14 may be any type of processing unit such as, for example, a microprocessor core from any of the Intel.RTM. families of microprocessors (e.g., the Itanium.RTM. family). In the illustrated example, the hardware blocks 16 include circuits, circuit blocks, logic, etc. that implement functionality commonly provided by one or more chips located external to the processor system 10. Example hardware blocks 16 are memory controllers, video graphic adapters, input/output (I/O) controller hubs (ICH), network interfaces, etc. Additionally or alternatively, the hardware blocks 16 may include circuits, circuit blocks, logic, etc. implemented by and/or within any of the processor cores. The processor core(s) 14 and the hardware blocks 16 may be implemented on a common substrate or may be implemented on one or more substrates and then combined, using any of a variety of techniques, into a multi-chip module (MCM). Alternatively, some of the structure of FIG. 1, including, by way of example, but not limited to the hardware blocks 16, may be located off-chip and coupled to the processor core(s) 14 via a bus or other connection device. [0014] As is conventional, the processor core(s) 14 execute machine readable instructions to implement an operating system (OS) 18 or basic input/output system (BIOS) 20. For instance, in an example processor system 10, the multiple processor cores 14 may collectively execute machine readable instructions to implement the OS 18. As such, the relationship shown in FIG. 1 between the processor core(s) 14, the OS 18 and the BIOS 20 is merely illustrative of one example implementation. The BIOS 20 is typically implemented using firmware and, thus, the term FW will be used herein to refer to the BIOS 20. [0015] In the example of FIG. 1, the FW 20 handles configuration and/or control of the hardware blocks 16 using any of a variety of techniques. Alternatively, other machine readable instructions executed by the processor core(s) 14 may configure and/or control the hardware blocks 16. In some implementations, the FW 20 may hide some or all aspects of the hardware blocks 16 from the OS 18. For example, the OS 18 may operate without specific knowledge of the implementation details (e.g., configuration registers, status registers, etc.) of the hardware blocks 16. The OS 18 and the FW 20 may optionally implement an interface 22 that allows the OS 18 to, for example, access registers and/or data normally only accessible to the FW 20. The example interface 22 of FIG. 1 may be implemented by, for example, an extended firmware interface (EFI), a FW runtime service, any variety of virtual machine monitor (VMM) and/or hypervisor executing between the OS 18 and the FW 20, etc. [0016] The processor system 10 includes on chip registers 30 that may be used to store variables for instruction execution by the OS 18 and/or the FW 20. The on chip registers include static registers 32 and stacked registers 34 which are partitioned into two different register types, (caller) scratch registers 36 and (callee) preserved registers 38. The processor system 10 also includes a memory 40. In the illustrated example, the memory 40 is random access memory (RAM) or cache memory. Of course other types of memory devices may be used for the memory 40. The (caller) scratch registers 36 are volatile over function calls. For example, on the Intel.RTM. Itanium.RTM. the maximal 8 incoming parameters and outgoing parameters on the register stack are of scratch register type. In this example, the processor system 10 has eight incoming scratch registers. However, different numbers of caller scratch registers may be used. [0017] Allocation of the register stack is managed by the register stack engine 12. The registers of the register stack are accessible by the application run by the processor core 14 for the purpose of executing instructions. Although shown as a separate block in FIG. 1, persons of ordinary skill in the art will appreciate that the register stack is actually representative of a set of registers physically located in the registers 36 and 38. [0018] FIG. 2A shows an example allocation instruction 50 that may be used by the register stack engine 12 to allocate registers on the register stack. The allocation instruction 50 allocates registers for incoming parameters 52, registers for temporal or local parameters 54 and outgoing parameters 56. The outgoing parameters 54 are needed for function calls from the processor system 10. Such an allocation instruction may be coded as: [0019] Alloc r=ar.pfs, in, loc, out Where "in" is the number of registers for the incoming parameters 52, "loc" is the number of temporal or local parameters 54 and "out" is the number of registers for outgoing parameters 56. The result register, "r" is used to store the state register of the previous function. The result register is used to restore the original values of the register (ar.pfs) and is a writable general register. [0020] The stacked parameter registers of the calling function and the previous function overlap. For example, the following functions in the Itanium.RTM. processor use overlapping registers for up to the eight caller scratch registers in the Itanium.RTM. processor. Of course, any number of caller scratch registers may be used in different types of processors: TABLE-US-00001 Foo ( ) { Alloc 0, 78, 2; ... Bar (a, b); } Bar (int p1, int p2) { Alloc 2, 38, 0; ... Return; } [0021] In this example, 80 registers are allocated at the entry of the foo function, 78 local parameters and 2 outgoing parameters. A total of 118 registers are allocated at the entry of the bar function, 2 incoming parameters and 38 additional local parameters. The two incoming parameter registers of the bar function are the same as the 2 outgoing parameter registers associated with the foo function. After the processor system 10 returns from executing the bar function, 80 registers are allocated representing the registers allocated by the foo function. Continue reading... Full patent description for Methods and apparatus for dynamic register scratching Brief Patent Description - Full Patent Description - Patent Application Claims Click on the above for other options relating to this Methods and apparatus for dynamic register scratching patent application. ### 1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored. 3. Each week you receive an email with patent applications related to your keywords. Start now! - Receive info on patent apps like Methods and apparatus for dynamic register scratching or other areas of interest. ### Previous Patent Application: Method and system for on-demand scratch register renaming Next Patent Application: Semiconductor device Industry Class: Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors) ### FreshPatents.com Support Thank you for viewing the Methods and apparatus for dynamic register scratching patent info. IP-related news and info Results in 2.67301 seconds Other interesting Feshpatents.com categories: Qualcomm , Schering-Plough , Schlumberger , Seagate , Siemens , Texas Instruments , |
||