Follow us on Twitter
twitter icon@FreshPatents

Browse patents:
Next
Prev

Media content analysis system and method / Yahoo! Inc.




Media content analysis system and method


Disclosed herein is an intelligent agent to analyze a media object. The agent comprises a trained model comprising a number of state layers for storing a history of actions taken by the agent in each of a number of previous iterations performed by the agent in analyzing a media object. The stored state may be used by the agent in a current iteration to determine whether or not to make, or abstain from making, a prediction from output generated by the model, identify...



Browse recent Yahoo! Inc. patents


USPTO Applicaton #: #20170046598
Inventors: Simon Osindero


The Patent Description & Claims data below is from USPTO Patent Application 20170046598, Media content analysis system and method.


FIELD OF THE DISCLOSURE

- Top of Page


The present disclosure relates to analyzing media content, such as and without limitation photographs, audio recordings, video, etc., which media content analysis may be used, for example, to tag or label the media content, to estimate aesthetic quality of the media content, to identify important element(s) and intelligently crop or resize an image to preserve the important element(s), as well as other applications.

BACKGROUND

- Top of Page


There is a vast amount of media objects available in digital form. A media object is typically stored in one or more media files. While the media files may be accessible to computer users it is difficult for them to discern the content of the media files and/or to locate the digital media of interest to them. Additionally, the quality and diversity of content of such digital media objects varies, which makes it even more difficult for the computer users to locate digital media objects of interest to them.

SUMMARY

- Top of Page


The disclosed systems and methods remedy such shortcomings in the art and provide improved computer systems and methods instituting increased functionality to the computer executing the disclosed systems and methods by automatically identifying media content. As discussed below, the accurate characterization of media content can yield improvements in numerous technological fields, such as for example image search, content promotion and recommendation, image monetization, ad monetization, and/or content selection from a set of captured imagery, to name a few.

The present disclosure seeks to address failings in the art and to analyze a media object in a targeted and efficient way. By way of a non-limiting example, an input image that is 3000 pixels by 3000 pixels is too large to process with current technologies. In accordance with one or more embodiments of the present application, areas, e.g., a 300 pixel by 300 pixel sub window, of the large input image, or other media object, can be identified, analyzed and one or more semantic predictions can be made about the large image using the sub-window. Rather than downscaling the input image's resolution and trying to recognize an object of interest at a much lower resolution than the input image's actual resolution, embodiments of the present disclosure can analyze the original image using a number of sub-windows at the input image's actual resolution. In accordance with one or more embodiments, a low-resolution, subsampled scan, saliency map, or other low-resolution indicator of regions can be used to identify a region, or area, of interest in a media object, and the identified region can be analyzed at the original resolution.

While embodiments of the present application are describes with reference to an input image, it should be apparent that any type of media object is contemplated with one or more such embodiments. By way of some non-limiting examples, a media object may be an audio media object and a collection of snippets can be analyzed to identify a portion of the audio media object, the media object may be a video media object, and a selection of low-resolution frames from the vide may be used to identify a portion of the video media object, the media object may be a combination of audio and video, etc.

The present disclosure relates to analyzing media content, such as and without limitation photographs, audio recordings, video, etc., which media content analysis may be used, for example, to tag or label the media content, to estimate aesthetic quality of the media content, to identify important element(s) and intelligently crop or resize an image to preserve the important element(s), as well as other applications.

In accordance with one or more embodiments, a media object analysis agent, or media analyzer, comprises a model that is trained to make decisions regarding which aspects of a media object to perform a detailed analysis. By way of a non-limiting example, a large image may be analyzed without incurring the cost of analyzing all of the high-resolution pixels of the image; the media analyzer can identify which portion(s) of the media object, such as small objects within the image, to conduct its analysis of the media object. In so doing, the media analyzer may ignore other portions of the media object that it determines are not as useful for the analysis thereby making the analysis more efficient. In accordance with one or more embodiments, a computational budget may be defined and used in optimizing the agent to perform within the defined budget.

In accordance with one or more embodiments, a method is provided, the method comprising using, by a computing device, a trained model as an agent to analyze a media object using a number of rounds of analysis, the trained model comprising a number of state layers to store an outcome from each round of analysis of the media object by the agent; making, by the computing device and using the agent, a determination in a current round of analysis of a next action to take in the analysis of the media object, the determination being made using a stored state from the number of state layers and results of the current round of analysis by the agent; and providing, by the computing device and using the agent, an output from the number of rounds of analysis of the media object, the output comprising a plurality of labels corresponding to the media object's content.

In accordance with one or more embodiments a system is provided, which system comprises at least one computing device, each computing device comprising a processor and a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic comprising using logic executed by the processor for using a trained model as an agent to analyze a media object using a number of rounds of analysis, the trained model comprising a number of state layers to store an outcome from each round of analysis of the media object by the agent; making logic executed by the processor for making, using the agent, a determination in a current round of analysis of a next action to take in the analysis of the media object, the determination being made using a stored state from the number of state layers and results of the current round of analysis by the agent; and providing logic executed by the processor for providing, using the agent, an output from the number of rounds of analysis of the media object, the output comprising a plurality of labels corresponding to the media object's content.

In accordance with yet another aspect of the disclosure, a computer readable non-transitory storage medium is provided, the medium for tangibly storing thereon computer readable instructions that when executed cause at least one processor to use a trained model as an agent to analyze a media object using a number of rounds of analysis, the trained model comprising a number of state layers to store an outcome from each round of analysis of the media object by the agent; make a determination, using the agent, in a current round of analysis of a next action to take in the analysis of the media object, the determination being made using a stored state from the number of state layers and results of the current round of analysis by the agent; and provide, using the agent, an output from the number of rounds of analysis of the media object, the output comprising a plurality of labels corresponding to the media object's content.

In accordance with one or more embodiments, a system is provided that comprises one or more computing devices configured to provide functionality in accordance with such embodiments. In accordance with one or more embodiments, functionality is embodied in steps of a method performed by at least one computing device. In accordance with one or more embodiments, program code to implement functionality in accordance with one or more such embodiments is embodied in, by and/or on a computer-readable medium.

DRAWINGS

The above-mentioned features and objects of the present disclosure will become more apparent with reference to the following description taken in conjunction with the accompanying drawings wherein like reference numerals denote like elements and in which:

FIG. 1 provides a process flow example in accordance with one or more embodiments of the present disclosure.

FIG. 2 provides an example of layers of an illustrative convolutional neural network which can be trained by supervised learning.

FIG. 3 provides an example of an expanded model for use in accordance with one or more embodiments of the present disclosure.

FIG. 4 provides an agent process flow for use in accordance with one or more embodiments of the present disclosure.

FIG. 5 illustrates some components that can be used in connection with one or more embodiments of the present disclosure.

FIG. 6 is a detailed block diagram illustrating an internal architecture of a computing device in accordance with one or more embodiments of the present disclosure.

DETAILED DESCRIPTION

- Top of Page


Subject matter will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific example embodiments. Subject matter may, however, be embodied in a variety of different forms and, therefore, covered or claimed subject matter is intended to be construed as not being limited to any example embodiments set forth herein; example embodiments are provided merely to be illustrative. Likewise, a reasonably broad scope for claimed or covered subject matter is intended. Among other things, for example, subject matter may be embodied as methods, devices, components, or systems. Accordingly, embodiments may, for example, take the form of hardware, software, firmware or any combination thereof (other than software per se). The following detailed description is, therefore, not intended to be taken in a limiting sense.

Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. Likewise, the phrase “in one embodiment” as used herein does not necessarily refer to the same embodiment and the phrase “in another embodiment” as used herein does not necessarily refer to a different embodiment. It is intended, for example, that claimed subject matter include combinations of example embodiments in whole or in part.

In general, terminology may be understood at least in part from usage in context. For example, terms, such as “and”, “or”, or “and/or,” as used herein may include a variety of meanings that may depend at least in part upon the context in which such terms are used. Typically, “or” if used to associate a list, such as A, B or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B or C, here used in the exclusive sense. In addition, the term “one or more” as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.

The detailed description provided herein is not intended as an extensive or detailed discussion of known concepts, and as such, details that are known generally to those of ordinary skill in the relevant art may have been omitted or may be handled in summary fashion. Certain embodiments of the present disclosure will now be discussed with reference to the aforementioned figures, wherein like reference numerals refer to like components.

The disclosed systems and methods remedy shortcomings in the art and provide improved computer systems and methods instituting increased functionality to the computer executing the disclosed systems and methods by automatically identifying media content. As discussed below, the accurate characterization of media content can yield improvements in numerous technological fields, such as for example image search, content promotion and recommendation, image monetization, ad monetization, and/or content selection from a set of captured imagery, to name a few.

The present disclosure includes a media content analysis system, method and architecture. In accordance with one or more embodiments, an action-decision selection agent comprises a model that is trained to decide what action, or actions, to take given a current state. By way of a non-limiting example, the agent may be used to predict one or more labels, or tags, for an input image, and possible actions that the agent may decide to take include without limitation make a label prediction, abstain from making a label prediction, select a new, or next, location of the image to analyze, analyze the new, or next, portion of the image at a higher or lower resolution than one or more other previously-analyzed image portions, end the analysis, etc.

In accordance with one or more embodiments, the agent uses an internal state, which may comprise a history of its observations, e.g., observations made prior to the current one, as well as its history of actions and predictions. In its analysis of the input image, the agent can use its internal state to determine what action, or actions, to take, e.g., the agent may make a determination whether or not to continue analyzing the image, and if so what action should be taken to continue the analysis, e.g., what action is be taken in the next iteration in the analysis. In addition to the agent\'s internal state and in accordance with one or more embodiments of the present disclosure, the agent may use an indicator, or indicators, of potential regions of interest in the input image. By way of a non-limiting example, the agent may use a saliency map or other low-resolution indicator of possible regions of interest to identify a next portion or area of the input image to analyze. By way of a further non-limiting example, the model may be trained to identify the next portion of area of the input image to analyze.

FIG. 1 provides a process flow example in accordance with one or more embodiments of the present disclosure. At step 102, training data comprising a plurality of media objects, e.g., images, and metadata associated with the media objects is used to train a label prediction model and to map symbolic labels to a semantic vector space. By way of a non-limiting example, a symbolic label can correspond to one or more characters, words etc., and can comprise a number of features representing, e.g., the meaning, of the character(s), word(s), etc. By way of a non-limiting example, a word, such as sky, dog, car, etc., used to annotate an image may be expressed as a vector of values representing the meaning of the word.

In accordance with one or more embodiments, the label prediction model can comprise a neural network, e.g., a convolutional neural network, which is trained, at step 102, using supervised learning, e.g. using the training data comprising a plurality of media objects and associated metadata. By way of a non-limiting example, the convolutional neural network can be trained using a gradient descent algorithm to learn the network\'s parameters, such as the weights associated with each node and biasing, and backpropagation, the latter of which can be used to determine the gradients for the gradient descent algorithm. The convolutional neural network can comprise a network of connected nodes and a set of parameters comprising a connection strength, or weight, between each pair of nodes and a bias associated with each node. By way of a non-limiting example, each input to a node can have an associated weight and the output of a node can be determined using each weighted input and a bias associated with the node.




← Previous       Next →

Download full PDF for full patent description, claims and images

Advertise on FreshPatents.com - Rates & Info


You can also Monitor Keywords and Search for tracking patents relating to this Media content analysis system and method patent application.

###


Browse recent Yahoo! Inc. patents

Keyword Monitor How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Media content analysis system and method or other areas of interest.
###


Previous Patent Application:
Mechanism for increasing the performance of multiple language programs by inserting called language ir into the calling language
Next Patent Application:
Media content based on playback zone awareness
Industry Class:

Thank you for viewing the Media content analysis system and method patent info.
- - -

Results in 0.04871 seconds


Other interesting Freshpatents.com categories:
Electronics: Semiconductor Audio Illumination Connectors Crypto

###

Data source: patent applications published in the public domain by the United States Patent and Trademark Office (USPTO). Information published here is for research/educational purposes only. FreshPatents is not affiliated with the USPTO, assignee companies, inventors, law firms or other assignees. Patent applications, documents and images may contain trademarks of the respective companies/authors. FreshPatents is not responsible for the accuracy, validity or otherwise contents of these public document patent application filings. When possible a complete PDF is provided, however, in some cases the presented document/images is an abstract or sampling of the full patent application for display purposes. FreshPatents.com Terms/Support
-g2-0.291

66.232.115.224
Browse patents:
Next
Prev

stats Patent Info
Application #
US 20170046598 A1
Publish Date
02/16/2017
Document #
14824561
File Date
08/12/2015
USPTO Class
Other USPTO Classes
International Class
/
Drawings
7


Intelligent Agent Iteration Media Content Media Object Semantic Vector Space

Follow us on Twitter
twitter icon@FreshPatents

Yahoo! Inc.


Browse recent Yahoo! Inc. patents





Browse patents:
Next
Prev
20170216|20170046598|media content analysis system and method|Disclosed herein is an intelligent agent to analyze a media object. The agent comprises a trained model comprising a number of state layers for storing a history of actions taken by the agent in each of a number of previous iterations performed by the agent in analyzing a media object. |Yahoo-Inc
';