| Separating multiple audio signals recorded as a single mixed signal -> Monitor Keywords |
|
Separating multiple audio signals recorded as a single mixed signalUSPTO Application #: 20060056647Title: Separating multiple audio signals recorded as a single mixed signal Abstract: A method according to the invention separates multiple audio signals recorded as a mixed signal via a single channel. The mixed signal is A/D converted and sampled. A sliding window is applied to the samples to obtain frames. The logarithms of the power spectra of the frames are determined. From the spectra, the a posteriori probabilities of pairs of spectra are determined. The probabilities are used to obtain Fourier spectra for each individual signal in each frame. The invention provides a minimum-mean-squared error metho or a soft mask method for making this determination. The Fourier spectra are inverted to obtain corresponding signals, which are concatenated to recover the individual signals. (end of abstract)
Agent: Patent Department Mitsubishi Electric Research Laboratories, Inc. - Cambridge, MA, US Inventors: Bhiksha Ramakrishnan, Aarthi M. Reddy USPTO Applicaton #: 20060056647 - Class: 381119000 (USPTO) Related Patent Categories: Electrical Audio Signal Processing Systems And Devices, With Mixer The Patent Description & Claims data below is from USPTO Patent Application 20060056647. Brief Patent Description - Full Patent Description - Patent Application Claims FIELD OF THE INVENTION [0001] This invention relates generally separating audio speech signals, and more particularly to separating signals from multiple sources recorded via a single channel. BACKGROUND OF THE INVENTION [0002] In a natural setting, speech signals are usually perceived against a background of many other sounds. The human ear has the uncanny ability to efficiently separate speech signals from a plethora of other auditory signals, even if the signals have similar overall frequency characteristics, and are coincident in time. However, it is very difficult to achieve similar results with automated means. [0003] Most prior art methods use multiple microphones. This allows one to obtain sufficient information about the incoming speech signals to perform effective separation. Typically, no prior information about the speech signals is assumed, other than that the multiple signals that have been combined are statistically independent, or are uncorrelated with each other. [0004] The problem is treated as one of blind source separation (BSS). BSS can be performed by techniques such as deconvolution, decorrelation, and independent component analysis (ICA). BSS works best when the number of microphones is at least as many as the number of signals. [0005] A more challenging, and potentially far more interesting problem is that of separating signals from a single channel recording, i.e., when the multiple concurrent speakers and other sources of sound have been recorded by only a single microphone. Single channel signal separation attempts to extract a speech signal from a signal containing a mixture of audio signals. Most prior art methods are based on masking, where reliable components from the mixed signal spectrogram are inversed to obtain the speech signal. The mask is usually estimated in a binary fashion. This results in a hard mask. [0006] Because the problem is inherently underspecified, prior knowledge, either of the physical nature, or the signal or statistical properties of the signals, is assumed. Computational auditory scene analysis (CASA) based solutions are based on the premise that human-like performance is achievable through processing that models the mechanisms of human perception, e.g., via signal representations that are based on models of the human auditory system, the grouping of related phenomena in the signal, and the ability of humans to comprehend speech even when several components of the signal have been removed. [0007] In one signal-based method, basis functions are extracted from training instances of the signals. The basis functions are used to identify and separate the component signals of signal mixtures. [0008] Another method uses a combination of detailed statistical models and Weiner filtering to separate the component speech signals in a mixture. The method is largely founded on the following assumptions. Any time-frequency component of a mixed recording is dominated by only one of the components of the independent signals. This assumption is sometimes called the log-max assumption. Perceptually acceptable signals for any speaker can be reconstructed from only a subset of the time-frequency components, suppressing others to a floor value. [0009] The distributions of short-time Fourier transform (STFT) representations of signals from the individual speakers can be modeled by hidden Markov models (HMMs). Mixed signals can be modeled by factorial HMMs that combine the HMMs for the individual speakers. Speaker separation proceeds by first identifying the most likely combination of states to have generated each short-time spectral vector from the mixed signal. The means of the states are used to construct spectral masks that identify the time-frequency components that are estimated as belonging to each of the speakers. The time-frequency components identified by the masks are used to reconstruct the separated signals. [0010] The above technique has been extended by modeling narrow and wide-band spectral representations separately for the speakers. The overall statistical model for each speaker is thus a factorial HMM that combines the two spectral representations. The mixed speech signal is further augmented by visual features representing the speakers' lip and facial movements. Reconstruction is performed by estimating a target spectrum for the individual speakers from the factorial HMM apparatus, estimating a Weiner filter that suppresses undesired time-frequency components in the mixed signal, and reconstructing the signal from the remaining spectral components. [0011] The signals can also be decomposed into multiple frequency bands. In this case, the overall distribution for any speaker is a coupled HMM in which each spectral band is separately modeled, but the permitted trajectories for each spectral band are governed by all spectral bands. The statistical model for the mixed signal is a larger factorial HMM derived from the coupled HMMs for the individual speakers. Speaker separation is performed using the re-filtering technique. [0012] All of the above methods make simplifying approximations, e.g., utilizing the log-max assumption to describe the relationship of the log power spectrum of the mixed signal to that of the component signals. In conjunction with the log-max assumption, it is assumed that the distribution of the log of the maximum of two log-normal random variables is well defined by a normal distribution whose mean is simply the largest of the means of the component random variables. In addition, only the most likely combination of states from the HMMs for the individual speakers is used to identify the spectral masks for the speakers. [0013] If the power spectrum of the mixed signal is modeled as the sum of the power spectra of the component signals, the distribution of the sum of log-normal random variables is approximated as a log-normal distribution whose moments are derived as combinations of the statistical moments of the component random variables. [0014] In all of these techniques, speaker separation is achieved by suppressing time-frequency components that are estimated as not representing the speaker, and reconstructing signals from only the remaining time-frequency components. SUMMARY OF THE INVENTION [0015] A method according to the invention separates multiple audio signals recorded as a mixed signal via a single channel. The mixed signal is A/D converted and sampled. [0016] A sliding window is applied to the samples to obtain frames. The logarithms of the power spectra of the frames are determined. From the spectra, the a posteriori probabilities of pairs of spectra are determined. [0017] The probabilities are used to obtain Fourier spectra for each individual signal in each frame. The invention provides a minimum-mean-squared error metho or a soft mask method for making this determination. The Fourier spectra are inverted to obtain corresponding signals, which are concatenated to recover the individual signals. BRIEF DESCRIPTION OF THE DRAWINGS [0018] FIG. 1 is a block diagram of a method for separating multiple audio signals recorded as a mixed signal via a single channel; [0019] FIG. 2 is a graph of individual mixed signals to be separated from a mixed signal according to the invention; [0020] FIG. 3 is a block diagram of a first embodiment to determine Fourier spectra; and Continue reading... Full patent description for Separating multiple audio signals recorded as a single mixed signal Brief Patent Description - Full Patent Description - Patent Application Claims Click on the above for other options relating to this Separating multiple audio signals recorded as a single mixed signal patent application. ### 1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored. 3. Each week you receive an email with patent applications related to your keywords. Start now! - Receive info on patent apps like Separating multiple audio signals recorded as a single mixed signal or other areas of interest. ### Previous Patent Application: Phase equalization for multi-channel loudspeaker-room responses Next Patent Application: Method for manufacturing a carrier element for a hearing aid and a carrier element for a hearing aid Industry Class: Electrical audio signal processing systems and devices ### FreshPatents.com Support Thank you for viewing the Separating multiple audio signals recorded as a single mixed signal patent info. IP-related news and info Results in 0.83628 seconds Other interesting Feshpatents.com categories: Electronics: Semiconductor , Audio , Illumination , Connectors , Crypto , |
||