| Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems -> Monitor Keywords |
|
Analyzing and transforming a computer program for executing on asymmetric multiprocessing systemsUSPTO Application #: 20080098208Title: Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems Abstract: A method is disclosed for transforming a portion of a computer program comprising a list of sequential instructions comprising control code and data processing code and a program separation indicator indicating a point where said sequential instructions may be divided to form separate sections that are capable of being separately executed and that each comprise different data processing code. The m method comprises the steps of: (i) analysing said portion of said program to determine if said sequential instructions can be divided at said point indicated by said program separation indicator and in response to determining that it can: (iia) providing data communication between said separate sections indicated by said program separation indicator, such that said separate sections can be decoupled from each other, such that at least one of said sections is capable of being separately executed by an execution mechanism that is separate from an execution mechanism executing another of said separate sections, said at least one of said sections being capable of generating data and communicating said data to at least one other of said separate sections; and in response to determining it can not: (iib) not performing step (iia). If step (iia) is not performed then a warning may be output, or the program may be amended so it can be separated at that point, or the program separation indicator may be removed and the sections that were to be separated merged. (end of abstract) Agent: Nixon & Vanderhye, PC - Arlington, VA, US Inventors: Alastair David Reid, Simon Andrew Ford, Yuan Lin USPTO Applicaton #: 20080098208 - Class: 712234 (USPTO) The Patent Description & Claims data below is from USPTO Patent Application 20080098208. Brief Patent Description - Full Patent Description - Patent Application Claims [0001]This application claims the benefit of U.S. Provisional Application No. 60/853,756, filed Oct. 24, 2006, the entire content of which is hereby incorporated by reference in this application. BACKGROUND OF THE INVENTION [0002]1. Field of the Invention [0003]The field of the invention relates to data processing and in particular to improving the performance of program execution. [0004]2. Description of the Prior Art [0005]It is becoming increasingly difficult to provide programs that are simple to write and yet perform efficiently on complex systems. Such complex systems may comprise a number of different processing or execution units and they may be heterogeneous or asymmetric, with specialised processing units being used to increase energy efficiency and lower gate count. In particular, the programming of embedded systems with their hardware restriction, demand for efficiency and the ever decreasing time to market is becoming a real problem. [0006]It is known in high level computing systems to provide multiple processors that are identical and to split processing between them to increase efficiency. In such systems a loop that executes say 100 times, may be split so that a first processor processes the loop for the first fifty times and the second for the second fifty. [0007]It is also known to address the problem of programming multiprocessing systems by sending individual instructions to be processed to separate execution units where the processors communicate with each other via FIFO queues. This is effective in allowing a sequential program to be processed using many different execution units on a complex system, however, there is a need for low latency high throughput data communication mechanism to enable this to function efficiently. Clearly such mechanisms have an overhead of area and power. [0008]Decoupling programs to produce a number of threads communicating via FIFO pipelines has been used many times before: Smith (James E. Smith, "Decoupled access/execute computer architectures", ACM Transactions Computing Systems, 2(4), 289-308, 1984) applies the technique manually to Cray assembly code; Palacharla and Smith (S. Palacharla and J. E. Smith, "Decoupling integer execution in superscalar processors", in MICRO 28: Proc. of International Symposium on Microarchitecture, 285-290, 1995) describe the use of program slicing to automate the separation. These uses of decoupling were targeted at hiding memory latency by having one thread perform all load-store operations while the other thread performs all arithmetic operations. Decoupling has experienced a revival in some very interesting recent work: Ottoni et al. (G. Ottoni, R. Rangan, A. Stoler and D. August, "Automatic Thread Extraction with Decoupled Software Pipelining", in MICRO '05: Proc. Int. Symposium on Microarchitecture, 2005), use decoupling to parallelize inner loops for multiprocessors, and Dai et al. (J. Dai, B. Huang, L. Li and L. Harrison, "Automatically partitioning packet processing applications for pipelined architectures", in Proc. 2005 Conf. on Programming language design and implementation, 237-248, ACM Press, 2005), use decoupling to parallelize packet processing code for multithreaded, multiprocessor packet processing processors. In both papers, the boundary between threads is automatically chosen to optimize performance through load balancing and reducing communication cost. [0009]The majority of prior work on decoupling uses what we call "fine-grained decoupling": the operations being split between threads are individual machine instructions and the data passed between threads consists of scalar values. In contrast, Ziegler et al. (H. Ziegler, B. So, M. Hall and P. Diniz, "Coarse-Grain Pipelining on Multiple {FPGA} Architectures", in FCCM '02: Proc. of Symposium on Field-Programmable Custom Computing Machines, 2002) and Du et al. (W. Du, R. Ferreira and G. Agrawal, "Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism", in Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003) perform what we call "coarse-grained decoupling": the operations being split between threads are procedure calls and the data passed between threads consists of arrays. An example is Ottoni et al.'s work where a special queue is needed to support access by a speculative, out-of-order processor. [0010]The architectures of system on chip (SoC) platforms found in high end consumer devices are getting more and more complex as designers strive to deliver compute intensive applications on ever shrinking power budgets. Workloads running on these platforms require the exploitation of large amounts of heterogeneous parallelism and increasingly irregular memory hierarchies. The conventional approach to programming such hardware is very low level. The consequence of this is that when the core functionality of an application is mapped to the platform, its logic becomes obscured by the transformations and the glue code used during the mapping process. This yields software which is intimately and inseparably tied to the details of the platform it was originally designed for, limiting the software's portability and ultimately the architectural choices available to future generations of platform architects. [0011]It would be desirable to provide a sequential program which could be automatically or at least semi-automatically transformed such that sections of it could be processed independently by different execution mechanisms, thereby improving performance. Many of the problems experienced in mapping applications onto SoC platforms come not from deciding how to map a program onto the hardware but from the fragmentation of the application that results from implementing these decisions. This is particularly so when the system itself decides how to parallelise the program as this becomes extremely complex and means that future analysis of the program by a programmer is very difficult. In particular, although sequential programs can be fairly readily understood by a human being, once they are parallelised it becomes increasingly difficult for the human mind to understand them. Thus, it would be desirable if some control could be provided as to where to divide such programs in order to generate a system that is more readily understandable by a human and yet can execute efficiently on a parallel system. SUMMARY OF THE INVENTION [0012]A first aspect of the present invention provides a method of transforming a portion of a computer program comprising a list of sequential instructions comprising control code and data processing code and a program separation indicator indicating a point where said sequential instructions may be divided to form separate sections that are capable of being separately executed and that each comprise different data processing code, said method comprising the steps of: (i) analysing said portion of said program to determine if said sequential instructions can be divided at said point indicated by said program separation indicator and in response to determining that it can: (iia) providing data communication between said separate sections indicated by said program separation indicator, such that said separate sections can be decoupled from each other, such that at least one of said sections is capable of being separately executed by an execution mechanism that is separate from an execution mechanism executing another of said separate sections, said at least one of said sections being capable of generating data and communicating said data to at least one other of said separate sections; and in response to determining it can not: (iib) not performing step (iia). [0013]As noted above it would be desirable to be able to split a program having a list of instructions to be processed sequentially into sections that can be processed independently. The list of sequential instructions that is to be split comprises a single thread and does not provide parallel/concurrent operations. There may however, be branches or loops within this portion of the program. Thus, if the program is to be run on a system that can handle parallel execution of instructions, either by virtue of it having several execution mechanisms or a single processing device that can handle parallel execution, these sections could be split up to be executed separately. [0014]The present method provides a tool for analysing a portion of the program to determine if the instructions can be divided at a point indicated by a separation indicator. These separation indicators are provided within at least a section of the program and indicate where it is desirable to divide the program. Thus, the division of the program is determined to some degree by these separation indicators and can thus, be controlled by a programmer. The method of the present invention forms an analysis of a program that actually includes the separation indicators and decides if the program can indeed be separated at these points. If it is decided that it can be it provides data communication between the two sections to allow them to be decoupled from each other. Thus, if appropriate the program can be split into sections suitable for separate execution allowing a program to be efficiently processed by a variety of different, often complex devices. If it decides it cannot be divided at this point then it does not perform the data communication step. [0015]In some embodiments, if it decides it cannot be separated at a point indicated by a separation indicator then a warning indicating an error in the computer program is output. Providing the programmer with a warning may be the most appropriate thing to do if the separation indicators are not in the correct position. [0016]In some embodiments, said step (iib) comprises amending said computer program such that said sequential instructions can be divided at said point and then performing step (iia). [0017]Alternatively, to outputting a warning the method can amend the computer program so that the sequential instructions can be divided at this point and then the data communication can be divided between the different sections. It may be that it is a relatively simple matter to amend the computer program so that it can be divided at the point indicated and if this is the case then the method can perform this step rather than outputting a warning. [0018]In some embodiments, said step of amending said computer program comprises inserting data transfer instructions at said point indicated by said program separation indicator. [0019]The step required to amend the computer program may be one of inserting data transfer instructions at the point indicated by the program separation indicator. [0020]In some embodiments, said step (iib) comprises merging said two sections together and removing said program separation indicator. [0021]One way of dealing with discovering that two sections are not suitable for being executed on separate execution mechanisms is to simply remove the program separation indicator and merge the two sections together. [0022]In some embodiments, said program separation indicator comprises at least one data transfer instruction, said data communication between said separate sections being provided in dependence upon said at least one data transfer instruction. Continue reading... Full patent description for Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems Brief Patent Description - Full Patent Description - Patent Application Claims Click on the above for other options relating to this Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems patent application. ### 1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored. 3. Each week you receive an email with patent applications related to your keywords. Start now! - Receive info on patent apps like Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems or other areas of interest. ### Previous Patent Application: Analyzing diagnostic data generated by multiple threads within an instruction stream Next Patent Application: System and method for dynamic page classification for memory dumping Industry Class: Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors) ### FreshPatents.com Support Thank you for viewing the Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems patent info. IP-related news and info Results in 3.09012 seconds Other interesting Feshpatents.com categories: Software: Finance , AI , Databases , Development , Document , Navigation , Error |
||