Method and apparatus for reward-based learning of improved systems management policies -> Monitor Keywords
Fresh Patents
Monitor Patents Patent Organizer File a Provisional Patent Browse Inventors Browse Industry Browse Agents Browse Locations
site info Site News  |  monitor Monitor Keywords  |  monitor archive Monitor Archive  |  organizer Organizer  |  account info Account Info  |  
08/30/07 - USPTO Class 706 |  67 views | #20070203871 | Prev - Next | About this Page  706 rss/xml feed  monitor keywords

Method and apparatus for reward-based learning of improved systems management policies

USPTO Application #: 20070203871
Title: Method and apparatus for reward-based learning of improved systems management policies
Abstract: In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves supplying a first policy and a reward mechanism. The first policy maps states of at least one component of a data processing system to selected management actions, while the reward mechanism generates numerical measures of value responsive to particular actions (e.g., management actions) performed in particular states of the component(s). The first policy and the reward mechanism are applied to the component(s), and results achieved through this application (e.g., observations of corresponding states, actions and rewards) are processed in accordance with reward-based learning to derive a second policy having improved performance relative to the first policy in at least one state of the component(s). (end of abstract)



Agent: Patterson & Sheridan LLP IBM Corporation - Shrewsbury, NJ, US
Inventors: Gerald James Tesauro, Rajarshi Das, Nicholas K. Jong, Jeffrrey O. Kephart
USPTO Applicaton #: 20070203871 - Class: 706053000 (USPTO)

Related Patent Categories: Data Processing: Artificial Intelligence, Knowledge Processing System, Knowledge Representation And Reasoning Technique, Frame-based Reasoning System

Method and apparatus for reward-based learning of improved systems management policies description/claims


The Patent Description & Claims data below is from USPTO Patent Application 20070203871, Method and apparatus for reward-based learning of improved systems management policies.

Brief Patent Description - Full Patent Description - Patent Application Claims
  monitor keywords

BACKGROUND

[0001] The present invention relates generally to data processing systems, and relates more particularly to autonomic computing (i.e., automated management of hardware and software components of data processing systems). Specifically, the present invention provides a method and apparatus for reward-based learning of improved systems management policies.

[0002] Due to the increasing complexity of modern computing systems and of interactions of such systems over networks, there is an urgent need to enable such systems to rapidly and effectively perform self-management functions (e.g., self-configuration, self-optimization, self-healing or self-protection) responsive to rapidly changing conditions and/or circumstances. This entails the development of effective policies pertaining to, for example, dynamic allocation of computational resources, performance tuning of system control parameters, dynamic configuration management, automatic repair or remediation of system faults and actions to mitigate or avoid observed or predicted malicious attacks or cascading system failures.

[0003] Devising such policies typically entails the development of explicit models of system behavior (e.g., based on queuing theory or control theory) and interactions with external components or processes (e.g., users submitting jobs to the system). Given such a model, an analysis is performed that predicts the consequences of various potential management actions on future system behavior and interactions and then selects the action resulting in the best predicted behavior. A common problem with such an approach is that devising the necessary models is often a knowledge- and labor-intensive, as well as time consuming, task. These drawbacks are magnified as the systems become more complex. Moreover, the models are imperfect, so the policies derived therefrom are also imperfect to some degree and can be improved.

[0004] Thus, there is a need in the art for a method and apparatus for reward-based learning of improved systems management policies.

SUMMARY OF THE INVENTION

[0005] In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves supplying a first policy and a reward mechanism. The first policy maps states of at least one component of a data processing system to selected management actions, while the reward mechanism generates numerical measures of value responsive to particular actions (e.g., management actions) performed in particular states of the component(s). The first policy and the reward mechanism are applied to the component(s), and results achieved through this application (e.g., observations of corresponding states, actions and rewards) are processed in accordance with reward-based learning to derive a second policy having improved performance relative to the first policy in at least one state of the component(s).

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] So that the manner in which the above recited embodiments of the invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be obtained by reference to the embodiments thereof which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.

[0007] FIG. 1 is a diagram of a networked data processing system in which the present invention may be implemented;

[0008] FIG. 2 is a high level block diagram of a single general purpose computing device that has been advantageously adapted to implement the method of the present invention;

[0009] FIG. 3 is a schematic illustration of one embodiment of a data center for executing the method of the present invention;

[0010] FIG. 4 is a flow chart illustrating a method for deriving a policy for making resource allocation decisions in a computing system; and

[0011] FIG. 5 is a schematic illustration of the basic operations and functionality of one embodiment of an application environment module according to the present invention.

[0012] To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.

DETAILED DESCRIPTION

[0013] In one embodiment, the present invention is a method for automatically learning a policy for managing a data processing system or at least one component thereof. The method may be implemented, for example, within a data processing system such as a network, a server, or a client computer, as well as in a data processing system component such as a network router, a storage device, an operating system, a database management program or a web application software platform.

[0014] Embodiments of the present invention employ reward-based learning methodologies, including well-known Reinforcement Learning (RL) techniques, in order to generate effective policies (i.e., deterministic or non-deterministic behavioral rules or mappings of computing system states to management actions) for management of a computing system. Within the context of the present invention, the term "reward-based learning" refers to machine learning methods that directly or indirectly learn policies based on one or more temporally related observations of an environment's current state, an action taken in the state, and an instantaneous "reward" (e.g., a scalar measure of value) obtained as a consequence of performing the given action in the given state. Further, within the context of the present invention, "Reinforcement Learning" refers to a general set of trial-and-error reward-based learning methods whereby an agent can learn to make good decisions in an environment through a sequence of interactions. Known Reinforcement Learning methods that may be implemented in accordance with the present invention include value-function learning methods (such as Temporal Difference Learning, Q-Learning or Sarsa), actor-critic methods and direct policy methods (e.g., policy gradient methods).

[0015] FIG. 1 is a schematic illustration of one embodiment of a network data processing system 100 comprising a network of computers (e.g., clients) in which the present invention may be implemented. The network data processing system 100 includes a network 102, a server 104, a storage unit 106 and a plurality of clients 108, 110 and 112. The network 102 is the medium used to provide communications links between the server 104, storage unit 106 and clients 108, 110, 112 connected together within network data processing system 100. The network 102 may include connections, such as wired or wireless communication links or fiber optic cables.

[0016] In the embodiment illustrated, the server 104 provides data, such as boot files, operating system images, and applications to the clients 108, 110, 112 (i.e., the clients 108, 110, and 112 are clients to server 104). The clients 108, 110, and 112 may be, for example, personal computers or network computers. Although the network data processing system 100 depicted in FIG. 1 comprises a single server 104 and three clients, 108, 100, 112, those skilled in the art will recognize that the network data processing system 100 may include additional servers, clients, and other devices not shown in FIG. 1.

[0017] In one embodiment, the network data processing system 100 is the Internet, with the network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. In further embodiments, the network data processing system 100 is implemented as an intranet, a local area network (LAN), or a wide area network (WAN). Furthermore, although FIG. 1 illustrates a network data processing system 100 in which the method of the present invention may be implemented, those skilled in the art will realize that the present invention may be implemented in a variety of other data processing systems, including servers (e.g., server 104) and client computers (e.g., clients 108, 110, 112). Thus, FIG. 1 is intended as an example, and not as an architectural limitation for the present invention.

[0018] For example, FIG. 2 is a high level block diagram of a single general purpose computing device 200 that has been advantageously adapted to implement the method of the present invention. In one embodiment, the general purpose computing device 200 comprises a processor 202, a memory 204, a system management module 205 and various input/output (I/O) devices 206 such as a display, a keyboard, a mouse, a modem, and the like. In one embodiment, at least one I/O device is a storage device (e.g., a disk drive, an optical disk drive, a floppy disk drive). It should be understood that the system management module 205 can be implemented as a physical device or subsystem that is coupled to a processor through a communication channel.

[0019] FIG. 3 is a schematic illustration of one embodiment of a data center 300 for executing the method of the present invention. The data center 300 comprises a plurality of application environment modules 301, 302, and 303, one or more resource arbiters 304 and a plurality of resources 305, 306, 307, 308 and 309. Each application environment module 301-303 is responsible for handling respective demands 313, 314 and 315 (e.g., requests for information processing services) that may arrive from a particular customer or set of clients (e.g., clients 108-112 in FIG. 1). Example client types include: online shopping services, online trading services, and online auction services.

[0020] In order to process client demands 313, 314 or 315, the application environments 301-303 may utilize the resources 305-309 within the data center 300. As each application environment 301-303 is independent from the others and provides different services, each application environment 301-303 has its own set of resources 305-309 at its disposal, the use of which must be optimized to maintain the appropriate quality of service (QoS) level for the application environment's clients. An arrow from an application environment 301-303 to a resource 305-309 denotes that the resource 305-309 is currently in use by the application environment 301-303 (e.g., in FIG. 3, resource 305 is currently in use by application environment 301). An application environment 301-303 also makes use of data or software objects, such as respective Service Level Agreements (SLAs) 310, 311 and 312 with its clients, in order to determine its service-level utility function U(S,D). An example SLA 310-312 may specify payments to be made by the client based on mean end-to-end response time averaged over, say, a five-minute time interval. Additionally the client workload may be divided into a number of service classes (e.g., Gold, Silver and Bronze), and the SLA 310-312 may specify payments based on details of response time characteristics within each service class.

Continue reading about Method and apparatus for reward-based learning of improved systems management policies...
Full patent description for Method and apparatus for reward-based learning of improved systems management policies

Brief Patent Description - Full Patent Description - Patent Application Claims

Click on the above for other options relating to this Method and apparatus for reward-based learning of improved systems management policies patent application.
###
monitor keywords

How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Method and apparatus for reward-based learning of improved systems management policies or other areas of interest.
###


Previous Patent Application:
Object categorization for information extraction
Next Patent Application:
Affinity propagation in adaptive network-based systems
Industry Class:
Data processing: artificial intelligence

###

FreshPatents.com Support
Thank you for viewing the Method and apparatus for reward-based learning of improved systems management policies patent info.
IP-related news and info


Results in 0.45689 seconds


Other interesting Feshpatents.com categories:
Novartis , Pfizer , Philips , Polaroid , Procter & Gamble , 174
filepatents (1K)

* Protect your Inventions
* US Patent Office filing
patentexpress PATENT INFO