Computer-implemented methods and systems for modeling and recognition of speech -> Monitor Keywords
Fresh Patents
Monitor Patents Patent Organizer File a Provisional Patent Browse Inventors Browse Industry Browse Agents Browse Locations
site info Site News  |  monitor Monitor Keywords  |  monitor archive Monitor Archive  |  organizer Organizer  |  account info Account Info  |  
10/29/09 - USPTO Class 704 |  4 views | #20090271182 | Prev - Next | About this Page  704 rss/xml feed  monitor keywords

Computer-implemented methods and systems for modeling and recognition of speech

USPTO Application #: 20090271182
Title: Computer-implemented methods and systems for modeling and recognition of speech
Abstract: In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated. (end of abstract)



Agent: Wilmerhale/columbia University - New York, NY, US
Inventors: Marios Athineos, Marios Athineos, Daniel P.W. Ellis, Daniel P.W. Ellis
USPTO Applicaton #: 20090271182 - Class: 704205 (USPTO)

Computer-implemented methods and systems for modeling and recognition of speech description/claims


The Patent Description & Claims data below is from USPTO Patent Application 20090271182, Computer-implemented methods and systems for modeling and recognition of speech.

Brief Patent Description - Full Patent Description - Patent Application Claims
  monitor keywords CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of and claims priority under 35 U.S.C. §120 to U.S. patent application Ser. No. 11/090,728, filed Mar. 25, 2005, and entitled “Computer-Implemented Methods and Systems for Modeling and Recognition of Speech,” which is a continuation of U.S. patent application Ser. No. 11/000,874, filed Dec. 1, 2004, which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Patent Application Nos. 60/525,947, filed Dec. 1, 2003, and 60/578,985, filed Jun. 10, 2004, which are hereby incorporated by reference herein in their entireties.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

The government may have certain rights in the present invention pursuant to grants from the Effective, Affordable, Reusable Speech-to-Text (EARS-NA) program at the Defense Advanced Research Projects Agency (DARPA), Contract No. MDA972-02-1-0024.

FIELD OF THE INVENTION

The present invention generally relates to sound recognition. More particularly, the present invention relates to modeling audio signals for speech recognition, sound encoding and decoding, and artificial sound synthesis.

BACKGROUND OF THE INVENTION

In recent years, automatic speech recognition (ASR) systems have been employed in a wide variety of areas, such as, for example, telephone dialing, directory assistance, order entry, home banking, database inquiry, and dictation. For example, cellular telephones commonly employ ASR systems to simplify the user interface. Using ASR systems, many cellular telephones recognize and execute commands to initiate an outgoing phone call or answer an incoming phone call. For example, a cellular telephone having an ASR system may recognize a spoken name from a phone book or a contact list and automatically initiate a phone call to the phone number associated with the spoken name.

In an ASR system, a user speaks into a microphone (i.e., inputs a speech signal). The inputted analog signal is digitized and the blocks of digital data are then transformed from the time domain into the frequency domain using a digital signal processing (DSP) chip. Once the ASR system has digitized the signal and calculated certain parameters, the system compares the signal to a library of known phrases and finds the closest match.

To extract the features from the signal for comparison with data in the library, such ASR systems generally use short-term spectral features, such as mel-frequency cepstral (i.e., frequency-related) coefficients (MFCC). MFCCs are based on a Fast Fourier Transform (FFT), which converts the inputted signal from a time domain representation to a frequency domain representation. The MFCC representation is an example of an approach that further analyzes the FFT of the signal. The MFCC representation is generated by using a mathematical transformation called the cepstu which computes the inverse Fourier transform of the log-spectrum of the speech signal.

These ASR systems uniformly employ short-time spectral analysis, usually over windows of about 10 to 30 milliseconds, as the basis for acoustic representations. It should be noted, however, that the detailed time structure below this timescale is lost and the time structure above this level is weakly represented in the form of deltas. The temporal structure in sub-10 millisecond transient segments contains important cues for both the perception of natural sounds as well as the understanding of stop bursts in speech. The gross temporal distribution of acoustic energy in windows of up to 1 second is a successful domain for the recognition of complete phonemes and the description of their dynamics. Thus, while the spectral structures resulting from the spectral analysis convey important linguistic information, they are only a partial representation of speech signals.

Other feature extraction techniques, such as, for example, dynamic (delta) features and relative spectra processing technique (RASTA), have been adopted as post-processing techniques that operate on sequences of the short-term feature vectors. Such techniques provide a “locally-global” view in which features to be used in classification are based upon a speech segment of about one syllable\'s length.

Accordingly; it is desirable to provide systems and methods that overcome these and other deficiencies of the prior art.

SUMMARY OF THE INVENTION

In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals.

In accordance with some embodiments of the present invention, computer implemented methods and systems of extracting speech features from signals for use in performing automatic speech recognition are provided. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.

In some embodiments, the time-to-frequency domain transformation is performed by applying a discrete cosine transform (DCT) or a discrete Fourier transform on the portion of the received signal.

In some embodiments, the frequency domain linear prediction may include selecting a temporal window to apply the linear prediction and automatically determining a pole rate to distribute poles for modeling the temporal envelope. The poles generally characterize the temporal peaks of the temporal envelope. The pole rate may be automatically determined to capture both gross variation and stop burst transients of the signal.

In some embodiments, an index of sharpness may be extracted from each of the poles. The index of sharpness of the FDLP poles {ρi} is defined as



Continue reading about Computer-implemented methods and systems for modeling and recognition of speech...
Full patent description for Computer-implemented methods and systems for modeling and recognition of speech

Brief Patent Description - Full Patent Description - Patent Application Claims

Click on the above for other options relating to this Computer-implemented methods and systems for modeling and recognition of speech patent application.
###
monitor keywords

How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Computer-implemented methods and systems for modeling and recognition of speech or other areas of interest.
###


Previous Patent Application:
Dictionary for textual data compression and decompression
Next Patent Application:
Producing time uniform feature vectors
Industry Class:
Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

###

FreshPatents.com Support
Thank you for viewing the Computer-implemented methods and systems for modeling and recognition of speech patent info.
IP-related news and info


Results in 2.75105 seconds


Other interesting Feshpatents.com categories:
Canon USA , Celera Genomics , Cephalon, Inc. , Cingular Wireless , Clorox , Colgate-Palmolive , Corning , Cymer , paws
filepatents (1K)

* Protect your Inventions
* US Patent Office filing
patentexpress PATENT INFO