|
| Preprint No. 2998
Presented at the 89th AES Convention
1990 September 21-25
Los Angeles
SOUND FUSION AND THE
ACOUSTIC PRESENCE EFFECT
by
Arthur M. Noxon
Acoustic Sciences Corp.
Eugene, OR 97402 U.S.A.
ABSTRACT
In the perception of sound, early reflections
are corrolated with the direct signal by the listener. Comb coloration
effects arise when there are too few specular, coherent reflections.
Masking develops with random phase, incoherent reflections. An early
arriving, statistically diffuse group, composed of coherent reflections
with random time offsets produces excellent sound fusion. Essentially
an acoustic presence effect, applications include digital sampling,
instrument and vocal recording, and speech therapy for hearing impaired. |
|
| 0 INTRODUCTION
A clean, direct signal is the most common
" signal of choice " in the recording world. The rationale
is that any desired effect can always be added later with processing.
Even the most primitive, one-man jingle shop has a tiny closet,
its interior covered with sound absorptive foam or fiberglass. Inside
the "box" is a basic vocal booth, a mic, windscreen and
eventually, the talent.
An acoustic system has been developed to
saturate the sound fusion (Haas effect) time period with a group
of statistically diffuse coherent reflections. Three years ago,
the design strategy, mechanical configurations and the acoustic
signatures for this technique was introduced at the AES as a digital
sampling booth. This acoustic conduction has since been coined QSF,
which stands for "Quick Sound Field". Here is presented
a follow up report covering some of the applications for this acoustic
technique which have developed since its introduction. |
|
| 1 BACKGROUND
An anechoic recording space may seem simple
in concept but it is difficult in practice. Early reflections usually
do exist - off of the script stand, paper, window, light fixtures,
the floor and other patches of sound reflecting surface. A real-world
vocal booth has any number of discrete reflections and resonance
problems that add to and color the direct signal. A highly absorptive
space that is somewhat acoustically dirty is most difficult for
the engineer to mic and for talent to work in.
Mic placement is very sensitive to the coloration
effects of discrete early reflections and resonance. The sound of
the talent is colored by the effects of the mic position. Often,
setting up means no more than choosing the best coloration effects.
Since consistent sound of an audio track is very important to the
engineer, dubs take an inordinate amount of time as the engineer
fishes for mic and talent positions in the room, trying to recapture
the coloration of the prior day's work.
A dead vocal booth provides little to no
acoustic feedback for the talent. Talent suffers sensory deprivation
while in the box. A monitor system is essential for talent to be
able to adjust intonation in real time. Electronics and earphones
are resorted to in the absence of a natural acoustic return. This
then further contributes to isolation of the talent in that the
direct sound path of their voice is also cut off. By the time traditional
recording techniques have been applied, the only natural acoustic
feedback left for talent is conduction through the jawbone.
Sensory deprivation and coloration effects
found in a typical vocal booth limit its effectiveness. Time is
a shortage commodity in the studio. Wasted time in any business,
especially the recording studio is to be avoided. The typical vocal
booth wastes studio time. Setting up a mic is a delicate time consuming
balancing act - talent and mic position vs. room color. Retakes
due to a lack of real time acoustic monitoring for the talent takes
up additional studio time. A dub is very difficult to set up in
order to recapture the original sound. And then, there is the post
processing time spent in the effects rack trying to convert the
track into a lifelike, naturally bright and open sound.
It is to be expected that the traditional
vocal booth will eventually be redefined, steps taken to bolster
its positive features and reduce the negative effects. One form
of this is accomplished by putting to work the Haas effect in which
early reflections are corrolated with the direct signal. By arranging
for a diffuse group of coherent early reflections, the room coloration
effects that appear when there are too few reflections are averaged
out. Any low level discrete reflections that might remain are overwhelmed
by the diffuse reflections. The diffusion must also be rapidly attenuated
in order to not stretch into the echo effect time period, outside
of 50 ms. Therefore, in addition to a strong diffusing function,
this new class vocal booth must retain a very fast decay rate. |
|
| 2 ETC - VO BOOTH
The generally recommended ETC for control
rooms is a direct signal followed by an early time gap (ETG) due
to a reflection-free zone. Outside of this is found a diffuse room
ambience with an RT60 of about 1/5 to 1/2 second. The purpose of
the ETG is to allow the engineer to hear local colorations of the
signal at the mic. It is therefore 50 to 40 ms long, the time of
the Haas or sound fusion effect.
The ETC for a voice over (VO) booth has to
fit inside of the ETG of the control room. The VO Booth has to be
at least 50 dB within the 55 ms ETG. The VO Booth RT60 ought to
be on the order of 70 ms.
The only remaining detail is to establish
the content of the decay envelope of the VO Booth. There are two
phases to the very early reflections. Echolocation cues occur within
the first 5 ms. Ambience and coloration effects occupy the balance
of the time period.
The direct signal needs to have a 5 ms very
early time gap (VETG). This allows time delay phase pan techniques
to be used by the engineer. Beyond the echolocation time gap lies
the rapidly decaying ambience signal.
If there are just a few discrete reflections,
mic ambience is colored due to phase add and cancel effects. If
there are no reflections, we have the dead room sound and no ambience.
We could have many reflections at the mic. If they are orderly,
as with a flutter echo, they would produce coloration. If disorderly,
they would create colorless ambience. However, the quality of these
reflections needs to be carefully specified. |
Fig. 1 - ETC Control Room
Fig. 2 - ETC VO Booth |
|
| 3 COHERENT OR INCOHERENT
REFLECTIONS
The ear/brain system is a sound processor.
But, so is a mic/spectrum analyzer. While they both recognize the
spectral character of sound, there are important differences. The
ear/brain acts as a correlation type signal detector. The very early
reflections are correlated with the direct signal. By this process
the early reflections are additive to and enhance the definition
of the perceived signal. This is not news - it is the well known
Haas, precedence, or sound fusion effect.
On the other hand, a correlation signal processor
differentiates between two types of echo. The coherent reflection
has a simple time delay offset but otherwise is a phase aligned
representation of the direct signal. An incoherent reflection can
also be time delayed but is a phase scrambled representation of
the direct signal.
A coherent reflection can have the same spectral
content as an incoherent reflection. They would look identical to
a spectrum analyzer. However, the isolated coherent reflection would
produce comb filter, phase add and cancel effects when added to
the direct signal. The single incoherent reflection would simply
add sound power to the direct signal. In correlation signal enhancement
only coherent signals are processed into a spectral display- Incoherent
signals such as noise, reverberation and including random phase
reflections mask the spectral detail of the direct signal. (This
is easily audited by listening to harmonic detail of a plucked guitar
string with and without random phase reflections in the rearfield.)
An envelope of statistically diffuse but
coherent early reflections that lies within the 50 ms time window
of the Haas effect comprises a near field ambience effect that adds
to the quality of the direct signal. The composite signal has more
top end, is brighter and more natural. It is a more open sound and
with air. Statistically diffuse, Haas effect ambience is an acoustic
enhancement technique that puts signal that the engineers prefer
onto tape. |
|
| 4 THE HAAS BOX
This class of vocal booth must retain a very
fast decay rate and in addition develop a strong diffusion function.
It typically has an RT-60 decay time of 80 to 100 ms and a diffusion
rate of over 1000 reflections per second. The booth has absorbers
and reflectors distributed over its entire interior surface. The
component of direct sound that hits a reflector is backscattered,
partially back towards the mic, partially into an absorptive strip
and partially onto other reflectors. This process uses only specular
and diffractive diffusion to maintain the coherent quality in its
early reflections.
The mean free path in these small rooms is
about 4 feet. The broadband absorption coefficient is about 50%.
That means the expanding wave front loses about 5 dB every 4 ms.
This pencils out to a 60 dB decay in 80 ms and to a 60 dB decay
in 80 ms. The wall of such a vocal booth would likely have reflectors
alternating with absorption on about 9 inch centers. A 5 foot wide
wall would splinter a flat wave front into maybe 7 separately expanding
reflections. This sound scattering process continues throughout
the decay. The result is easily counted in the ETC and one to two
separate reflections per millisecond is the diffusion pate. For
all practical purposes, the mic receives a direct signal followed
by 4 to 5 ms of no sound; then, as the first arrivals hit, so begins
the controlled decay/diffusion process in the room.
A typical vocal booth has a window. In designer
studios it would be tilted to not reflect signal into the mic. In
a highly diffuse/absorptive room there should not be a large area
of untreated reflection regardless of the angle. Current practice
in these rooms sees tall, absorptive/reflective wall mounted acoustic
units with narrow strips of wall space between. The free wall space
between the acoustic control units can easily be glass or plexiglass
strips which provides a mope open feeling in an otherwise small
room. Visual openness contributes to mope comfort for the talent
in long recording sessions.
The statistical populated envelope of very
early, coherent reflections is essential to the stability of the
acoustic space inside the booth. Engineers report a wide and smooth
acoustic space. They even lose track of which mic is open and have
to mark the faders. Usually, in a more traditional rooms an engineer
simply hears which mic is where. In a statistically diffuse space,
the mic position can be changed without changing the envelope. It
is the envelope that is distinguishable and not its internal detail.
Moving the mic only changes the fine structure as to which reflection
arrives when and how strongly. This does not change the statistical
envelope or the quality of sound. In a room 4 foot by 6 foot, there
would be a 2 x 4 foot central area in which the sound remains uniform,
regardless of mic or sound source location.
The floor plane is a large reflecting surface.
It is left untreated, to be an acoustic mirror effectively doubling
the height of the room. Ceiling treatment must be accordingly more
severe to keep the vertical decay and diffusion rate up with that
of the walls. |
Fig. 3 - QSF Vocal Booth |
|
| 5 VOICE OVER GOBO
The Haas ambience effect can be approximated
out in the open room or field - of course, not to the degree available
in an iso booth format, but this QSF gobo setup boosts the signal
to noise ratio at the mic by 5 to 7 dBA. This is accomplished by
increasing the "direct" signal strength I to 2 dB while
reducing the room noise by 4 to 5 dB.
This " gobo " is not the large,
flat rug-covered plywood gobo of years past. The present method
is to use a set of 7 to 9 sound control units, typically placed
on 18" centers in a horseshoe pattern. The mic is located in
the middle and the talent occupies the open heel end of the pattern.
These Traps have two sides. The broadband absorptive side faces
outward to intercept inbound room noise and reflections. The membrane
reflective side, effective 400 Hz and above, faces inward to produce
the statistical group of early coherent reflections. In this system,
absorption is replaced by transmission. Sound is not absorbed between
the reflectors. It is leaked out of the space. In either case controlled
decay and diffusion take place. |
Fig. 4 - QSF Vocal Gobo |
|
| Gain of the "direct" signal
is accomplished by adding very early multiple reflections of the
direct signal to the direct signal. This is completed within the
first 50 ms of the sound fusion time period. Although sound fusion
generally lasts 50 to 60 ms, a "smearing" accompanies
the presence of strong, late high frequency reflections. This is
undesirable for the recording engineer. The end of the sound fusion
period marks the onset of echo detection. For lower frequencies
the echo onset time is later and for highs, sooner than 50 ms.
In the QSF method of developing the statistical
ambience, the comb filter effect associated with any individual
reflection does not occur due to the large number of random time
offset reflections. With 20 to 50 reflections occupying a time span
of 20 to 25 ms, the comb filter effect that would arise with any
one reflection is obscured by the averaging effect of the other
reflections. |
Fig. 5 - QSF ETC, 0-20
ms |
|
| A good signal at the mic can be time delayed
for stereo phase pan positioning. The echolocation process occurs
within the first 5 ms following the direct signal. Because of the
distance between the mic and the reflecting side of the gobo, no
reflections arrive within the first 5 ms. The direct signal is well
isolated for control in the mix.
Not only is the direct signal enhanced but
the ambient noise floor is reduced at the mic by this technique.
The backside of each Trap is broadband absorptive and facing outwards
towards the room. Sound in the room is absorbed before it gets to
the mic.
Sound that does penetrate the perimeter is
weakened because the wavelet expands due to diffractive edge effects.
Easily a 5 dBA noise level reduction is noted inside the gobo. There
may be times when a stronger signal to room noise is required. The
closer the Traps are to each other the less outside noise they will
let in so the direct signal becomes stronger. |
Fig. 6 - QSF Gobo Isolation |
|
| Noise in a room also originates with the
talent. Sound does leak out between the traps. Some of this is attenuated
by the absorptive half of the trap and the remainder expands rapidly
due to edge diffraction effects. The sound leaked to the room is
rapidly diffusing. The important feature is that a sound from such
a gobo produces no flutter effect. Sound that does bounce off a
wall is absorbed by the backside of the gobo traps. The system can
also be used near walls with minimal impact. |
Fig. 7 - A Diffusive 'Source' |
|
| Incidentally, another application of such
a gobo system takes advantage of its reversibility. If all the Traps
are rotated then the full bandwidth absorptive side faces the mic.
This creates the traditional dead sounding vocal booth. By adjusting
a pair of reflectors slightly inward, the interior diffusive top
end can be brought up. This is best done in pairs to take advantage
of diffusive multiple scattering available from facing reflecting
surfaces. |
Fig. 8 - Dead Configuration |
|
|