FIELD OF THE INVENTION
The present invention is related to processing of a sound signal. In particular, this invention is related to the modification of a sound signal such that even if the low-frequency portion (i.e., the bass portion) of the modified sound is absent, a human listener can still psychologically perceive the presence of such low-frequency portion during the listening of the modified sound.
BACKGROUND OF THE INVENTION
Music can be enjoyed live in front of the stage in a theatre. However, it is more common that one enjoys music through radios, televisions, DVD home theatres, MP3 players, multimedia personal computers, etc. In these entertainment devices, transducers such as loudspeakers, which convert electric signals into physical sound waves, are used to reproduce the music. However, the quality of music reproduction is typically poor in the low frequency portion so that there are continual demands for bass improvement. The quality problem is due to the physical limitations of the electro-acoustic transducer in dimension and structure. For instance, the pipe organ (Werkprinzip) requires use of an open pipe of length 32 feet (around 10 meters) to produce the CO tone in 16.35 Hz as reported in Eargel, J. M., Music, Sound, and Technology, Second edition, Van Nostrand Reinhold, 1995, the disclosure of which is incorporated by reference herein. Therefore, it is difficult to satisfy the requirements for good low-frequency reproduction in small churches or in general applications.
There have been some techniques in the art to improve the low frequency response of music reproduction devices. In one example, the difficulty of smaller-size pianos with shorter strings to reproduce the CO tone and other low frequency tones can be overcome by using strings that are thicker and stiffer, and that are stretched less tightly. However, it results in a certain degree of inharmonic distortion. In another example, a moving-coil loudspeaker can use a stronger magnetic field and a bass-reflex tube to extend its low frequency range to around ⅓ octave from its low cut-off frequency. This method has been employed in commercial products, e.g., in a product produced by TOA Corporation with model number SW-46S-UL2, which can provide a 30 Hz low frequency response using an 18 inch woofer and base-reflex design.
Owing to the cost and space of the bass implementation for pipe organs of small churches, a technique called “acoustic bass” was used and known as early as the 1700's. The organ manufacturers made two pipes sounding together to get lower notes. For instance, C4 and G4 pipes are used together to get the C3 note. A similar method can be applied to pianos with shortened strings.
Note that in the above-mentioned approach, the sound of the intended frequency is not present, but human listeners can still perceive the presence of this frequency. This phenomenon is known as the residue pitch effect, referred to also as the phenomenon of the missing fundamental. Moor, B. C. J., An Introduction to the Psychology of Hearing, Chapter 5, Fourth edition, Academic Press, 1997, provides background information of this phenomenon, the disclosure of which is incorporated by reference herein. Basically, the residue pitch effect is a psycho-acoustic effect in that the residue pitch (harmonics) of a tone can be perceived by human listeners as the presence of the fundamental frequency even if the fundamental frequency is missing or masked by other noise. The residue pitch effect has been used in U.S. Pat. No. 5,930,373 and U.S. Pat. No. 6,285,767 to enhance bass, resulting in the extension of the low cut-off frequency of a speaker by 1 to 1.5 octaves.
In U.S. Pat. No. 5,930,373, a method for introducing residue harmonics of low-frequency signal components into a sound signal is disclosed. In this method, the sound signal is partitioned into a high frequency signal and a low frequency signal. The low frequency signal is further partitioned into a number of signal components in different frequency bands. Residue harmonics are generated for each of these signal components. The residue harmonics are weighted and added to the original sound signal. In the generation of residue harmonics, this reference suggests that nonlinear transformation may be used.
In U.S. Pat. No. 6,285,767, a sound enhancement system that enhances the perception of low-frequency signal components in a sound signal is disclosed. In this patent, the fact that low-frequency signal components can give rise to harmonics generated by the nonlinearity of human ears is recognized. To emphasize the presence of these harmonics such that the original low-frequency signal components are more easily perceived, the disclosed sound enhancement system de-emphasizes the mid-frequency components originally in the sound by purposefully reducing their power levels.
In U.S. Pat. No. 6,410,838, a musical signal synthesizer for synthesizing complex musical sound waveforms rich in harmonics is disclosed. The waveforms are generated by means of a feedback loop and a simple nonlinearity is used to introduce the harmonics into the signal.
However, there remains a need in the art for improved methods for bass enhancement by incorporating human physiology into consideration. In one aspect, there is a need for improved methods that are based on the nonlinear response of the human ear to enhance the perception of bass frequencies. Music with bass enhancement by such improved methods has the advantage that it sounds more natural to human beings. In contrast, music enhanced by non-ear-based methods appears more artificial in perception. In another aspect, there is a need for improved methods that allow the removal of the low-frequency signal components in a sound but human listeners can still psychologically perceive the presence of such low-frequency signal components. The absence of low-frequency signal components implies that the bass quality achieved by a sound generator is nothing to do with the quality of the reproduced sound or music. Therefore, a cheaper sound generator can be used instead of a more-expensive, bass-enhanced sound generator, thereby lowering the material cost. Moreover, the removal of such signal components avoids human ears to duplicate generation of same or similar residue harmonics so that it can prolong the heavy-bass music listening time for human listeners with less demand on ears. Finally, removing the low-frequency portion is also a means for combating against unauthorized copying of the original sound signal via tapping an analog output of a device, while not degrading any sound quality in the bass content listening. The problem of unauthorized copying of a sound signal by tapping at the analog output of the device is known as the analog loophole problem.
SUMMARY OF THE INVENTION
The present invention discloses a method for enhancing the perceptibility of the low-frequency portion of a sound signal by means of a nonlinear function that emulates the middle-ear response of a human being. The resultant sound signal incorporates residue harmonics of the low-frequency signal components of the original sound signal. Upon reproduction of the resultant sound signal into the physical sound wave form, the low-frequency portion is perceivable to a human listener even when the low-frequency signal components are removed from the resultant sound signal.
The invention further discloses a plurality of compressive amplitude distortion units, each of which generates residue harmonics by nonlinear distortion of the low-frequency portion of a sound signal, wherein the input-output relationship of the nonlinear distortion is based on the middle-ear response of a human being. Even when the low-frequency portion is removed through filtering, a human listener who listens to the reproduced sound can perceive the original low-frequency portion due to the generated residue harmonics.
In addition, the present invention discloses two anti-piracy methods that utilize the absence of low-frequency signal components in a sound signal after the aforementioned processing, one for convenient detection of an unauthorized copy of the processed sound signal, another one for discouraging people to make such an unauthorized copy. These anti-piracy methods are for combating against the analog loophole problem.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows equal loudness contours versus frequency of sound.
FIG. 2 shows compressive amplitude distortion generated by the middle ear.
FIG. 3 illustrates the case in which a sound incorporating residue harmonics is presented to the ear of a listener.
FIG. 4 depicts a block diagram of a compressive amplitude distortion unit according to the present invention.
FIG. 5 depicts a block diagram of a compressive amplitude distortion unit according to another aspect of the present invention.
FIG. 6 shows an application of a compressive amplitude distortion unit.
FIG. 7 shows an additional application of a compressive amplitude distortion unit.
FIG. 8 shows another application of a compressive amplitude distortion unit.
FIG. 9 shows a further application of a compressive amplitude distortion unit.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Human ears can judge the amplitude of the input sound pressure level (SPL). FIG. 1 is a graph showing the equal loudness contours versus frequency. Each contour indicates equal perceived loudness against a reference loudness level at 1 kHz. FIG. 1 demonstrates that a nonlinear response is present at both ends of the audible frequency band. The high frequency band from 1 kHz to 15 kHz shows a much higher consistency in the shift of the pressure level as the pressure level increases. The shift is quite independent of the pressure level in the high frequency range and the equal-loudness contour only varies within a +/−10 dB region, except in the cases of 110 dB and 120 dB. However, for the low frequency band from 20 Hz to 300 Hz, FIG. 1 indicates that human ears are quite inefficient in responding to low frequency as the required SPL is 10 dB at 1 kHz but increases to 78 dB at 20 Hz (68 dB difference). Nevertheless, the situation is better at a high loudness level. For example, a 100 dB sound at 1 kHz gives the same loudness for a 128 dB sound at 20 Hz (28 dB difference). Therefore, the perceptible response in that range depends on both frequency and amplitude, and the efficiency or sensitivity of perception is proportional to the input amplitude level.
In physiology, the structure of human auditory system consists of three major parts. The outer ear comprises the pinna, the concha and the canal. At the end of the outer ear is the eardrum that vibrates according to the received sound, creating pressure changes in the middle ear. The middle ear consists of three tiny bones (the malleus, the incures, and the stapes) for converting the air pressure into fluid motion from the eardrum to the inner ear via the oval window. In the inner ear, the cochlea with snail shell shape contains the basilar membrane, about 35 mm in length, attached one end on the oval window, balanced fluid pressure on both sides, and connected with about 30,000 nerve fibers.
The outer ear has a simple structure, comprising the pinna, the concha and the canal. The canal is only 2.5 mm in diameter, like a tuned port to collect sound energy from the air. Its frequency response is similar to a band-pass filter with a pass band from about 1 kHz to 6 kHz. At the end of the canal, there is an eardrum. The response of the whole outer ear is indifferent to different intensity levels of the sound.
The bones of the middle ear convert the air pressure into fluid motion from the eardrum to the inner ear via the oval window. The pressure on the oval window is increased by around 20 to 30 times with respect to the surface pressure of the eardrum, whose surface area is greater than that of the oval window. In the transfer of pressure, the bones do not magnify the pressure or movement. In contrast, the muscle on the malleus and the stapes involuntarily contracts to attenuate the level of sound entering into the inner ear when the incoming sound is intense (about 75 dB SPL) in the low frequency range, a phenomenon known as auditory reflex.
The basilar membrane is the key part of the inner ear. Its one end, called the base, is attached next to the oval window and the other end, which is called the apex, is freely suspended in fluid. The nerve sensors along the basilar membrane are dedicated to detecting sound energy of different frequencies, from high on the base to low on the apex. The input sound propagates from the base to the apex in a manner similar to a traveling wave. Each place on the basilar membrane is responsive to only one characteristic frequency with maximal vibration amplitude; this phenomenon supports the place theory. A description of the place theory is given in Plack, C. J., The Sense of Hearing, Lawrence Erlbaum Associations, Inc., 2005, the disclosure of which is incorporated by reference herein. Although the sensing is very nonlinear and complicated for different input levels and for different frequencies, up to now there is no evidence showing that it is more efficient to detect an intense low-frequency sound for supporting our perception of loudness. In the presence of very intense low-frequency sound, the place theory cannot explain that every place in the basilar membrane vibrates irrespective of the characteristic frequency, as indicated in Plack.
In the description that follows, it will be shown that the distortion of the intense low-frequency sound in the middle ear can help to enhance perception when the sound is below around a Half Loudness Frequency. The Half Loudness Frequency, as used herein, refers to an audible frequency at which a person perceives that the loudness level of this audible frequency is one half that of a reference frequency (e.g., 4 kHz). The Half Loudness Frequency depends on the individual. The major cause of inter-individual differences is probably psychological rather than physiological as reported by de Barbenza, C. M., Bryan, M. E., and Tempest, W., “Individual loudness functions,” Journal of Sound and Vibration, volume 11, pages 399-419, April 1970, the disclosure of which is incorporated by reference herein. Although the Half Loudness Frequency can be any frequency, a rule of thumb is that it is usually between 150 Hz and 300 Hz. The type of distortion resulting from the intense low-frequency sound entering into the middle ear is termed “compressive amplitude distortion” in that it limits the dynamic range of the intense input sound according to the mechanism of the middle ear. The distortion generates overtones or residue harmonics of a fundamental frequency. The human auditory system can use any two consecutive harmonics of the sequence of residue harmonics to perceive the presence of the fundamental frequency. Therefore, the distortion generates additional information of the fundamental frequency in a more responsive band (between 300 Hz and 5 kHz) on the basilar membrane, allowing human listeners to perceive a low-frequency sound to be louder.
FIG. 2 illustrates the compressive amplitude distortion generated by the middle ear. An intense low-frequency sound with over 75 dB SPL and with a single frequency below the Half Loudness Frequency is presented to the outer ear. The undistorted sound is passed to the middle ear. The muscle contracts so that compressive amplitude distortion is generated. The distorted sound is passed to the inner ear for frequency interpretation.
It is possible to emulate the generation of residue harmonics for low-frequency signal components based on the same mechanism employed by the middle ear, i.e., compressive amplitude distortion, regardless of whether the low-frequency part of the sound is intense or not. This use is illustrated in FIG. 3. If the generated residue harmonics are intentionally added to the sound signal before it is presented to the outer ear of the listener, the resultant sound enables the listener to increase the perception of the low-frequency part of the sound. By the phenomenon of the missing fundamental, which has been described above, the listener is also able to perceive the presence of such low-frequency components even if the fundamental frequency is removed from the sound. This perception enhancement method is employed in the embodiments disclosed hereafter.
A first embodiment of the present invention is a method for enhancing the perceptibility of the low-frequency portion of a sound signal. Optionally, the sound signal may contain a direct-current (DC) component. Depending on the application, the sound signal can be represented in a suitable form appropriate for such application. Such suitable forms for representing the sound signal include, but are not limited to: an analog electrical signal; a digital signal; and a physical sound wave propagating in a medium such as air. In the disclosed method, the low-frequency signal components of the sound signal are first extracted. The signal that contains the extracted signal components is then processed by a nonlinear function with its input-output relationship emulating the middle-ear response of a human being. Residue harmonics are generated as a result, and are incorporated in the output signal of the nonlinear function. This output signal also contains the extracted low-frequency signal components. The amplitude of this output signal is adjusted such that the power of the signal after adjustment is in the same range as the power of the signal presented to the nonlinear function. The amplitude-adjusted signal is then added to the original sound signal, followed by filtering out all the low-frequency signal components. Enhanced perceptibility is obtained for the low-frequency portion of the resultant signal. Even if this portion is removed from the resultant signal, a human listener can still perceive the presence of such low-frequency portion in the reproduced sound.
The nonlinear function emulates the middle-ear response. Denote w(u) as the output of the nonlinear function for an input u, and f(x) as the amplitude of the middle-ear response for an input sound with amplitude x. As indicated above, it is intended to generate the residue harmonics in the nonlinear function regardless of whether the low-frequency part of the sound is intense or not. Therefore, w(u) is given by
w(u)=A·f(u/B) (EQN. 1)
where A is a factor determining the output range of the nonlinear function, and B is another factor determining the input value u where nonlinear distortion is prominent and residue harmonics are generated. The value of A provides a gain to the nonlinear function and hence does not affect the range of u where prominent nonlinear distortion occurs. It can be determined according to, e.g., the range of output values acceptable to the post-processing functions connected to this nonlinear function. The determination of B is illustrated by the following example. Suppose that residue harmonics are generated in the middle ear for an input sound pressure of 75 dB SPL (corresponding to x=1075/20×threshold sound pressure). It is now intended that the nonlinear function generates residue harmonics at a reference condition of u=1, which corresponds to the condition that the input sound pressure is, say, 30 dB SPL. Then B is set to B=10−75/20.
The middle-ear response used to develop the input-output relationship of the nonlinear function can be determined experimentally by, e.g., the method given in Aerts, J. R. M., and Dirckx, J. J. J., “Nonlinearity in eardrum vibration as a function of frequency and sound pressure,” Hearing Research, Volume 263, Pages 26-32, 2010, the disclosure of which is incorporated by reference herein.
Alternatively, the middle-ear response can be mathematically determined by theoretical means. From EQN. 1, it is seen that the nonlinear function differs from the middle-ear response only by a factor and a scaling of the input. It follows that, apart from a physical middle-ear response, the nonlinear function can also be obtained from a prototype middle-ear response, which embeds the essential feature of the physical response, i.e., compressive amplitude distortion, but exhibits this distortion at a certain reference condition, e.g., x=1. It is first observed that the amplitude of sound can take on a positive or a negative value, depending on the direction of the sound wave's force acting on the middle ear. Therefore, a function to model a prototype middle-ear response is a two-sided function having a property of symmetry. Hence, f(x) is given by