| Fast microarray expression data analysis method for network exploration -> Monitor Keywords |
|
Fast microarray expression data analysis method for network explorationRelated Patent Categories: Data Processing: Measuring, Calibrating, Or Testing, Measurement System, Statistical Measurement, Probability DeterminationFast microarray expression data analysis method for network exploration description/claimsThe Patent Description & Claims data below is from USPTO Patent Application 20070192061, Fast microarray expression data analysis method for network exploration. Brief Patent Description - Full Patent Description - Patent Application Claims CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application is a continuation-in-part to co-pending U.S. Published Application No. 2005/0209838 entitled "FAST MICROARRAY EXPRESSION DATA ANALYSIS METHOD FOR NETWORK EXPLORATION," which is a divisional application of commonly assigned U.S. Pat. No. 6,909,970 entitled "FAST MICROARRAY EXPRESSION DATA ANALYSIS METHOD FOR NETWORK EXPLORATION," the disclosures of which are hereby incorporated herein by reference. FIELD OF THE INVENTION [0002] This invention relates to the field of bio-informatics and more particularly toward a method and algorithm for network exploration and reconstruction. The invention also has application to information, computer, software and data processing systems and more specifically to a method for facilitating the multiple correlation and comparisons of gene, protein and feature selection data. BACKGROUND OF THE INVENTION [0003] The micro-array was developed in 1995 to measure gene expression data in a massively parallel fashion. Since that time a significant increase in the amount of data per experiment has occurred (See http://www-binf.bio.uu.nl/.about.dutilh/gene-networks/thesis.html). In the case of gene exploration, this extensive data is important for use in assessing genes and their influence on expressed states in organisms. In particular, it is necessary to assess the function and operation of a particular gene; a gene being defined as a prescribed area on a nucleic acid molecule that is necessary for the production of a final expressed phenotype of an individual. On a more complex and broader scale, the interaction network is also of interest due to its influence in regulating higher cellular, physiological and behavioral properties. Recent attempts are being made to reconstruct the precise interaction network or its fragments based on large-scale array experiments for a condition-specific database, e.g., melanoma (Bittner et al., 2000). The critical first step in these efforts is to find the smallest subset of (predictors), within a desirable degree of precision, related to an arbitrary target. Based on such set of predictors, computed for every target of interest, it is possible to find the smallest set that can explain or predict behavior of any target in terms of expression. In the case of genes, finding the smallest set to predict a prescribed behavior could be a very complicated and arduous task given the massive amount of data that results from analyzing a complete organism's genome. [0004] Most important to scientists is the ability to select a minimal cardinality set that can represent a whole set of expressed information. In the pattern recognition literature, this is known as feature selection or dimensionality reduction, depending on the context. [0005] The issue at hand now is a question of mathematics and computation rather than pure biology. In particular, the specific problem at focus has been addressed from a computational standpoint. A number of algorithms can be applied from other fields and areas of study to help solve this arduous task. The specific problem at focus, from a computation standpoint, is to find the best (with respect to a given quality function) k-tuples, from a set of n features, for many values of k. One method to find the best k-tuple predictor subset is to conduct an exhaustive search through all possible k-tuples. Although this approach always leads to the best solution, it becomes intractable for even moderate values of k (the computational time grows exponentially with k). [0006] Also important to bio-informatics will be the methods developed for pattern recognition. In the context of pattern recognition, machine learning, data mining and their applications to various subject areas, e.g., medical diagnostics, manufacturing test design, image recognition etc., a similar problem of subset selection, known as feature selection is important. A number of approaches have been proposed and designed to address these problems or issues. The approaches include and are not limited to, sequential (backward and forward) search techniques, floating search techniques and genetic algorithms, etc. However, methods based on the sequential search techniques suffer from the nesting effect, i.e., they overlook good feature sets consisting of individually poor quality features. A second method called the floating search methods (Pudil et al., 2000; Somol et al., 2000) attempt to avoid the nesting problem by successively adding the best and removing the worst subsets of features from the candidate set of features. This introduces an exponential complexity in the search when the size of a subset grows. A significant drawback of these methods is that they become slow for large dimensional data sets as is the case with biological expression data. Genetic algorithms also do not have well defined stopping criteria and, in principle, can be exponentially complex. [0007] Most importantly, the above methods and algorithms are intended to be applied in the field of array data processing to enable computationally efficient searches for the smallest subset of features that predict a target's expression levels within a given level of confidence or accuracy. [0008] It would, therefore, be desirable to develop a method and algorithm that determines "good" solution sets with high probability in linear time, with respect to total number of features in a predictor set. For this reason, "sequential forward selection" (SFS) (Bishop, 1997; Pudil et al., 2000; Somol & Pudil, 2000) was developed to add the best (the one that leads to the largest improvement in the value of the quality function) new feature, at each successive stage of the algorithm, to the current set of features until the needed number of features is reached. It follows from construction that SFS suffers from the nesting problem and always overlooks better solutions sets whose features are of mediocre or poor quality. This is one of the shortcomings addressed by the present invention. While "sequential floating forward selection" (SFFS) also addresses the nesting problem, it maintains exponential time complexity for large data sets. The second shortcoming that this invention addresses is the exponential time complexity to find "good" solutions. The proposed method and invention finds a "good" solution set with high probability in linear time with respect to number of predictors. One of the floating search algorithms, called "oscillating search", (Somol & Pudil, 2000) can also find approximate solutions in linear time. However, the present invention and method guarantees an equal or better quality solution while maintaining the linear time complexity. In addition, the same generic method or algorithm can be used not only for gene network reconstruction, but also can be applied to protein data, feature selection for classification and other biological data that is very large and complex to organize and analyze. BRIEF SUMMARY OF THE INVENTION [0009] The invention is a method for determining a predictor set of features associated with a target. The method comprises the steps of selecting a predictor set of features, adding a complement to the predictor set based on a quality of prediction, checking to see if all of the features of the predictor set are repeated and then removing one feature from the predictor set. The algorithm and method repeats the steps of adding, checking and removing features until the features of the predictor set are repeated. If the features of the predictor set are repeated, the algorithm and method terminate. [0010] More specifically, the invention is a method for probabilistically determining a subset of features of size k that are closely related to a given target in terms of a selected quality function. The method of the invention operates by allowing a user to select a target of choice and the size (k) of the predictor set. Once a target has been selected, the method starts by selecting an arbitrary (ordered) subset of features of size k-1 and iteratively adds and removes single features (in order) from the selected subset. This process is iterated until a subset of features of size k is found whose quality of prediction of the target can no longer be improved by the process of deletion followed by addition of a feature. The algorithm terminates at this stage. In each iteration, the comparisons are based on a quality function that determines a quality of prediction associated between the predictors and the target. The method of invention can easily be extended to probabilistically determine the smallest (in size) subset of features that are closely related to a given target within an a priori set tolerance level in terms of a selected quality function. The method then takes as input a given target (user selected) and iteratively applies the method of invention for subsets of size 1, 2, 3, . . . , k, etc., until a predictor set that is closely related to the target expression, within the a priori set threshold, is found. The method can also be used for classification of experiments. The method in this case defines a target in terms of a vector of numbers representing the class of experiments under consideration. The method can then be used to identify a subset of features which can classify the data. [0011] The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS [0012] For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which: [0013] FIG. 1 illustrates a schematic view of the present invention in vector format showing the target and the predictors. [0014] FIG. 2 shows a block diagram of the method of the invention. [0015] FIG. 3 shows how the GSSA algorithm makes comparisons of data. [0016] FIG. 4 shows a simulated plot of the number of attractors v. the number of experiments. [0017] FIG. 5 shows a simulated plot of the probability of finding an optimal attractor v. number of genes. [0018] FIG. 6 shows a simulated plot of the execution time v. number of genes. [0019] FIG. 7 shows a simulated plot of the execution time v. number of predictors. Continue reading about Fast microarray expression data analysis method for network exploration... Full patent description for Fast microarray expression data analysis method for network exploration Brief Patent Description - Full Patent Description - Patent Application Claims Click on the above for other options relating to this Fast microarray expression data analysis method for network exploration patent application. ### 1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored. 3. Each week you receive an email with patent applications related to your keywords. Start now! - Receive info on patent apps like Fast microarray expression data analysis method for network exploration or other areas of interest. ### Previous Patent Application: Method and system for monitoring component consumption Next Patent Application: Web-based system of product performance assessment and quality control using adaptive pdf fitting Industry Class: Data processing: measuring, calibrating, or testing ### FreshPatents.com Support Thank you for viewing the Fast microarray expression data analysis method for network exploration patent info. IP-related news and info Results in 0.3144 seconds Other interesting Feshpatents.com categories: Daimler Chrysler , DirecTV , Exxonmobil Chemical Company , Goodyear , Intel , Kyocera Wireless , 174 |
* Protect your Inventions * US Patent Office filing
PATENT INFO |
|