| Computer-implemented methods and systems for modeling and recognition of speech -> Monitor Keywords |
|
Computer-implemented methods and systems for modeling and recognition of speechComputer-implemented methods and systems for modeling and recognition of speech description/claimsThe Patent Description & Claims data below is from USPTO Patent Application 20090271182, Computer-implemented methods and systems for modeling and recognition of speech. Brief Patent Description - Full Patent Description - Patent Application Claims This application is a continuation of and claims priority under 35 U.S.C. §120 to U.S. patent application Ser. No. 11/090,728, filed Mar. 25, 2005, and entitled “Computer-Implemented Methods and Systems for Modeling and Recognition of Speech,” which is a continuation of U.S. patent application Ser. No. 11/000,874, filed Dec. 1, 2004, which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Patent Application Nos. 60/525,947, filed Dec. 1, 2003, and 60/578,985, filed Jun. 10, 2004, which are hereby incorporated by reference herein in their entireties. The government may have certain rights in the present invention pursuant to grants from the Effective, Affordable, Reusable Speech-to-Text (EARS-NA) program at the Defense Advanced Research Projects Agency (DARPA), Contract No. MDA972-02-1-0024. The present invention generally relates to sound recognition. More particularly, the present invention relates to modeling audio signals for speech recognition, sound encoding and decoding, and artificial sound synthesis. In recent years, automatic speech recognition (ASR) systems have been employed in a wide variety of areas, such as, for example, telephone dialing, directory assistance, order entry, home banking, database inquiry, and dictation. For example, cellular telephones commonly employ ASR systems to simplify the user interface. Using ASR systems, many cellular telephones recognize and execute commands to initiate an outgoing phone call or answer an incoming phone call. For example, a cellular telephone having an ASR system may recognize a spoken name from a phone book or a contact list and automatically initiate a phone call to the phone number associated with the spoken name. In an ASR system, a user speaks into a microphone (i.e., inputs a speech signal). The inputted analog signal is digitized and the blocks of digital data are then transformed from the time domain into the frequency domain using a digital signal processing (DSP) chip. Once the ASR system has digitized the signal and calculated certain parameters, the system compares the signal to a library of known phrases and finds the closest match. To extract the features from the signal for comparison with data in the library, such ASR systems generally use short-term spectral features, such as mel-frequency cepstral (i.e., frequency-related) coefficients (MFCC). MFCCs are based on a Fast Fourier Transform (FFT), which converts the inputted signal from a time domain representation to a frequency domain representation. The MFCC representation is an example of an approach that further analyzes the FFT of the signal. The MFCC representation is generated by using a mathematical transformation called the cepstu which computes the inverse Fourier transform of the log-spectrum of the speech signal. These ASR systems uniformly employ short-time spectral analysis, usually over windows of about 10 to 30 milliseconds, as the basis for acoustic representations. It should be noted, however, that the detailed time structure below this timescale is lost and the time structure above this level is weakly represented in the form of deltas. The temporal structure in sub-10 millisecond transient segments contains important cues for both the perception of natural sounds as well as the understanding of stop bursts in speech. The gross temporal distribution of acoustic energy in windows of up to 1 second is a successful domain for the recognition of complete phonemes and the description of their dynamics. Thus, while the spectral structures resulting from the spectral analysis convey important linguistic information, they are only a partial representation of speech signals. Other feature extraction techniques, such as, for example, dynamic (delta) features and relative spectra processing technique (RASTA), have been adopted as post-processing techniques that operate on sequences of the short-term feature vectors. Such techniques provide a “locally-global” view in which features to be used in classification are based upon a speech segment of about one syllable\'s length. Accordingly; it is desirable to provide systems and methods that overcome these and other deficiencies of the prior art. In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In accordance with some embodiments of the present invention, computer implemented methods and systems of extracting speech features from signals for use in performing automatic speech recognition are provided. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated. In some embodiments, the time-to-frequency domain transformation is performed by applying a discrete cosine transform (DCT) or a discrete Fourier transform on the portion of the received signal. In some embodiments, the frequency domain linear prediction may include selecting a temporal window to apply the linear prediction and automatically determining a pole rate to distribute poles for modeling the temporal envelope. The poles generally characterize the temporal peaks of the temporal envelope. The pole rate may be automatically determined to capture both gross variation and stop burst transients of the signal. In some embodiments, an index of sharpness may be extracted from each of the poles. The index of sharpness of the FDLP poles {ρi} is defined as Continue reading about Computer-implemented methods and systems for modeling and recognition of speech... Full patent description for Computer-implemented methods and systems for modeling and recognition of speech Brief Patent Description - Full Patent Description - Patent Application Claims Click on the above for other options relating to this Computer-implemented methods and systems for modeling and recognition of speech patent application. ### 1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored. 3. Each week you receive an email with patent applications related to your keywords. Start now! - Receive info on patent apps like Computer-implemented methods and systems for modeling and recognition of speech or other areas of interest. ### Previous Patent Application: Dictionary for textual data compression and decompression Next Patent Application: Producing time uniform feature vectors Industry Class: Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression ### FreshPatents.com Support Thank you for viewing the Computer-implemented methods and systems for modeling and recognition of speech patent info. IP-related news and info Results in 2.75105 seconds Other interesting Feshpatents.com categories: Canon USA , Celera Genomics , Cephalon, Inc. , Cingular Wireless , Clorox , Colgate-Palmolive , Corning , Cymer , paws |
* Protect your Inventions * US Patent Office filing
PATENT INFO |
|