Twenty Years of Ircam Spat: Looking Back, Looking Forward
Ircam Spatの20年——過去と未来
ABSTRACT
要旨
Ircam Spat is a software suite dedicated to sound spatialization and artificial reverberation. This paper traces the evolution of the software during the last few years and presents its current state with a focus on the newly available features.
Ircam’s Spatialisateur (commonly referred to as Spat∼) is a tool dedicated to real-time sound spatialization and artificial reverberation. It comes as a library of audio signal processors which core consists of an algorithmic reverberation engine [1] and a panning module. Since its origins in the early 1990s Spat∼ has been conceived as a modular and scalable framework[2,3] and it is integrated in the Max/MSP(*1) environment. Thanks to this modular architecture and design, the tool can be seamlessly adapted for various use cases (live concerts, mixing, post-production, virtual reality, interactive installations, etc.) and configured according to the reproduction system — headphones or loudspeaker arrangement— and the available processing power.
The Spatialisateur can be controlled by a high-level interface which includes perceptual features derived from psychoacoustic studies [4, 5]. This allows the user to intuitively specify the characteristics of the desired room effect and to interpolate continuously towards a different acoustic quality, going through natural-sounding transformations.
In terms of software engineering, the Spatialisateur was originally developed as a set of Max/MSP patches and abstractions. During the years 1990–2000 the most critical elements (filtering and reverberation stages) were progressively ported to C code as externals objects. However this architecture (Max abstractions and a few C externals) was not flexible enough for configuring the Spat∼ modules: the Spatialisateur was typically restricted to one input source (mono or stereo) and a dozen of output channels. In 2008–2010 it was then decided to completely refactor the software. The refactoring further aimed at taking better advantage of the massively multichannel audio interfaces and computing power now available.
The Spatialisateur has been completely rewritten from scratch in C++ language (and marginally in objective-C). Indeed object-oriented programming is well suited for the modular and scalable design of Spatialisateur. The most critical signal processing blocks are highly optimized and vectorized (e.g. with the Accelerate framework(*2) under MacOS).
The source code is split in two main libraries: one is dedicated to the signal processing tasks while the other contains the graphical user interfaces. These two libraries are crossplatform, independent from the host application —i.e. they do not rely on Max/MSP— and autonomous (the only dependency is the Juce framework(*3), which is used for developing the graphical interfaces).
The benefits from this complete refactoring are manifold:
この完全なリファクタリングの恩恵は多種多様である:
• the signal processing algorithms show better performances (compared to the patch implementation),
・信号処理アルゴリズムが(パッチによる実装に比べて)より良いパフォーマンスを見せる。
• the algorithms can be flexibly configured: the number of input and output channels of the objects are defined by the @numinputs and @numoutputs attributes (see Figure 4) and they typically vary from 1 to 512 depending on the setup,
• the Spatialisateur no longer relies on Max/MSP and it can spread in new environments: several bindings have been developed in order to cover various use cases (see Figure 1). Max/MSP externals are used for concerts or interactive installations. Audio plugins(*4) can be inserted in digital audio workstations for mixing and post-production applications. Several tools have been integrated into the OpenMusic environment [6] for spatialization authoring and synthesis. Finally mexfunctions are generated for Matlab and they are used (internally) for research and numerical simulation purposes.
In the remainder of this paper, we will focus on the new features of Spat∼ running in Max/MSP as this is the most commonly employed environment and it contains the latest innovations.
Figure 1. Spatialisateur bindings in various working environments.
図1.様々な作業環境でバインディングされるSpatialisateur
3. REVERBERATION ENGINE
3. リバーブ・エンジン
Spat∼ is built around a multichannel algorithmic reverberation engine. The generated room effect consists of four temporal segments (Figure 2): the direct sound, a few early reflections generated by a delay line with multiple tap delays, a set of dense late reflections (referred to as cluster) generated by multiple delay lines feeding a decorrelation matrix, and a late reverb tail synthesized by a feedback delay network (FDN) [1].
Figure 2. Archetype of an impulse response generated by the Spatialisateur reverberation engine.
図2.Spatialisateurのリバーブ・エンジンが生成するインパルス応答の典型。
OD: direct sound. R1: early reflections. R2: late reflections. R3: reverberation tail.
OD:直接音、R1:初期反射、R2:後期反射、R3:リバーブ・テイル
With version 4.x of Spat∼ several changes have been applied to the FDN code, compared to the original implementation of Jot [1]: the number of internal feedback channels which was limited to 8 can now be increased to 16 or 32. A higher number of feedback channels allows to increase the modal or echo density; this reveals useful when creating long reverberation effects (several seconds long) or when processing percussive sounds with sharp transients. A set of filters simulating the air absorption have further been inserted into the FDN loop. These filters have a lowpass response which shapes the reverberation profile in a more realistic way. Finally the control over the decay relief (i.e. reverberation time), which was originally operated in two or three frequency bands, has been extended to an arbitrary number of bands (see e.g. the spat.multiverb∼ object).
Besides these evolutions, the global architecture of the reverberation processor (and its link with the panning modules) remains similar to the original design [2, 7]; as a reminder, Figure 3 presents a synoptic view of the audio processing chain.
Figure 3. Global architecture of Spat∼. The “Input” module handles the source pre-processing such as filtering and simulating the Doppler effect and air absorption. The “Room” module generates the reverberation effect split in four temporal segments, each filtered independently. These four sections are later panned according to a simple space-time model: early reflections R1 “surround” the direct sound while sections R2 and R3 are spatially diffuse. Finally the “Output” module decodes or transcodes the signals according to the playback system, and it includes loudspeaker equalization and delay/gain alignment.
From now on, the panning stage is implemented as a unique external, spat.pan∼. The object is polymorphic: according to its attributes it configures the appropriate number of input and output channels as well as the type of spatialization to use. A wide range of spatialization algorithms are supported: traditional stereo techniques (AB, XY, MS), binaural synthesis, amplitude panning (e.g. VBAP in 2D or 3D [8], VBIP [9], DBAP [10], SPCAP [11]), etc.
The remainder of this section details some panning families that include the most substantial innovations.
本章の後半では、最も重要な新手法を含む、パンニングの種類について詳述する。
4.1 Higher Order Ambisonics
4.1 高次アンビソニックス
First order Ambisonic (B-format) encoding and decoding has been supported in the Spatialisateur for many years. In the early 2000s, Daniel [12] extended and formalized the Ambisonic paradigm for higher orders (HOA). These scientific advances have been integrated into Spat∼ in the last few years: the spat.pan∼ object now supports 2D and 3D HOA encoding for any order (the algorithms are all implemented with recursive formulae). For instance, pieces created in the performance hall of Ircam (Espace de Projection [13]) typically use 7th order 3D HOA, thus encoding the sound field on 64 spherical harmonics. Near-field Compensation Coding (NFC-HOA, see [14]) is also available in Spat∼ and order-dependent high-pass compensation filters are applied in order to avoid problematic bass-boost when synthesizing finite distance sources. The spat.hoatransform∼ object is used to apply transformations in the spatial Fourier domain; one can e.g. weight the different spherical harmonic components in order to create spatial blur effects, or apply transformation matrix to efficiently rotate the 3D sound scene. Finally spat.decoder∼ performs HOA decoding in 2D or 3D. Traditional decoding approaches consist in sampling the spherical harmonic excitation at the loudspeaker positions or matching the excitation modes of a continuous sound field to those of the loudspeakers. As these techniques perform unsatisfactorily with non-uniformly distributed loudspeaker arrays, a new decoder design has been proposed [15] and implemented in Spat∼. This so-called energy-preserving decoder avoids sudden variations in the decoded energy and preserves the apparent width of the virtual sources even with sparse loudspeaker coverage. Furthermore spat.decoder∼ can perform dual-band decoding, with adjustable crossover frequency, and in-phase or max-re optimizations can be applied in each band [12]. Figure 4 depicts the standard workflow for creating synthetic HOA scenes.
1次アンビソニックス(Bフォーマット)エンコーディングとデコーディングは、長年Spatialisateurでサポートしてきた。2000年代の初め、ダニエル[12]はアンビソニックス・パラダイムを高次に拡張し、正式化した(HOA)。これらの科学的進展を、ここ数年でSpat~の内部に統合してきた。spat.pan~オブジェクトは現在、あらゆる次数での2Dと3DのHOAエンコーディングをサポートする(このアルゴリズムはすべて再帰式で実装されている。) 例えば、Ircamの演奏ホール(Espace de Projection)で制作された作品は、通常は7次3DのHOAを用い、したがって64の球面調和関数で音場をエンコードする。近距離補正コーディング(NFC-HOA、[14]を参照)もSpat~で使用でき、有限距離の音源を合成する際に問題となる低音のブーストを避けるために、次数依存性ハイパス補正フィルタを適用する。spat.hoatransform∼オブジェクトは、空間フーリエ領域で変換を適用するために用いられる。例えば、空間ブラー効果をつくるために、異なる球面調和関数要素の重みづけをしたり、3D音場を効率的に回転させるための変換行列を適用することができる。 最後に、spat.decoder∼が2Dと3DでHOAのデコーディングを行う。 伝統的なデコーディング方法は、ラウドスピーカの位置における球面調和関数のサンプリング、もしくは連続音場の励起モードをラウドスピーカのそれと一致させることにある。こうした技術は不均一に分散されたラウドスピーカ配置では満足の行く動作をしないため、新しいデコーダのデザインが提案され[15]、Spat~に実装されている。このいわゆるエネルギー保存デコーダは、デコードされたエネルギーの急な変化を防ぎ、ラウドスピーカのカバー範囲がまばらであっても仮想音源の外見上の幅を保持する。さらにspat.decoder∼は、調整可能なクロスオーバー周波数でデュアルバンドのデコーディングを行うことができ、inPhaseもしくはmaxReの最適化も各バンドに適用できる[12]。
In addition to that, several utility tools have been developed in order to ease frequent operations and to increase the uptake of Ambisonic: spat.hoaconverter∼ converts between various existing formats (N3D, SN3D, FuMa, etc.), spat.eigenencode∼ encodes in the HOA domain the signals captured by a spherical microphone array, spat.hoasorting∼ sorts the HOA channels according to different conventions, etc.
Realtime binaural and transaural synthesis have been supported in Spat∼ since its first versions [7]. Such processing techniques rely on filtering the input signals with a set of HRTFs(*5) which characterize the transfer path of sound from a position in space to the ear of a listener. HRTFs are usually measured for a large number of positions on a surrounding sphere in the far-field of a listener’s head. However, it is well known from literature that HRTFs are significantly different for sources located in the proximal region of the subject. Virtual reality environments constitute a typical use case where binaural synthesis of nearby sources is required. On the one hand, a novel technique has been developed for radial extrapolation of HRTFs [16] and on the other hand, the binaural synthesis renderer (spat.pan∼) has been adapted to sources located in the proximal region. The solution adopted for the realtime implementation is based on non-individualized filters [17] for correcting the ILD(*6), cross-ear selection of HRTF filters, and geometrical corrections of monaural gains and delays [18].
*5 Head-Related Transfer Function *6 Interaural Level Differences
*5 Head-Related Transfer Function *6 Interaural Level Differences
4.3 TransPan
4.3 TransPan
TransPan [19] is a new tool dedicated (mainly) to 5.1 mixing and conceived in partnership with the CNSMDP(*7). It aims at overcoming some of the weaknesses of the 5.1 standard in terms of stability and precision of lateral sources. To that purpose, TransPan combines several spatialization layers: 1) a traditional surround panning (constant-power panning and/or multichannel microphone recordings) and 2) a binaural/transaural [7] processing layer using two loudspeaker pairs (L/R and Ls/Rs). The binaural/transaural layer provides the spatial precision that lacks for the lateral images, and can also be employed for creating virtual sources in elevation (outside of the 5.1 horizontal plane).
The spat.transpan∼ object comes as a mixing engine capable of combining the different spatialization layers. The balance (levels, delays and equalization) between the layers can be adjusted manually or automatically based on empirical mixing curves that aim at maximizing the spatial impressions while minimizing the risk of tonal coloration (inherent to any transaural processing). Figure 5 illustrates one view of the spat.transpan∼ control interface.
spat.viewer (see Figure 6) is a tool for the visualization and editing of spatial sound scenes. The object allows to display and manipulate a variable number of sources and loudspeakers in a 2D representation (view from the top, from the bottom, or both side-by-side). The look and feel is highly customizable through Max/MSP attributes. The position and orientation of the scene entities (sources or loudspeakers) can be controlled by a set of Max/MSP messages with a clear, human-readable, syntax. Many coordinate systems are supported. Furthermore, several utility objects are provided for efficient manipulation of the geometrical data (translation, rotation, scaling, nearest neighbors search, source grouping, conversion between absolute and relative coordinate systems, etc.). It is also worth noting that spat.viewer is not tied to the Spatialisateur rendering engine, and it could be used to operate other spatialization processors.
spat.oper (see Figures 7 and 8) is the main control interface for the Spatialisateur (i.e. for the spat.spat∼ rendering object). spat.oper is a “perceptual operator” which provides high-level control over the acoustical quality of the virtual room [4, 5]. It allows to navigate along different perceptual axes called “perceptual factors”. These axes are mutually independent and they correlate with physical criteria observed in real concert halls. Amongst the nine perceptual factors, three are related to the source itself (presence, warmth, brillance), three characterize the room itself (reverberance, heaviness and liveness), and three account for the interaction between the source and the room (room presence, running reverberance and envelopment). spat.oper can be set to imitate the acoustics of an existing room and allows to interpolate naturally (and continuously) between different acoustic qualities.
Figure 7. Screenshot of spat.oper: the reverb tab.
図7.spat.operのスクリーンショット:リバーブ・タブ
Besides the perceptual factors, spat.oper allows to control the filtering parameters (filtering of the direct and reverberant sounds), the radiation of the sources (aperture and orientation of the radiation pattern), the characteristics of the distance attenuation law, as well as miscellaneous options. spat.oper displays as a tabbed component: there is one tab for each source (Figure 8) and one tab for each virtual room (Figure 7). Finally, spat.oper embeds a spat.viewer component for a schematic view of the sound scene.
Many other graphical user interfaces have been developed (and are progressively enhanced) for controlling, monitoring and authoring sound spatialization. It is beyond the scope of this paper to describe these interfaces in detail. A quick overview is presented in Figure 9.
Figure 8. spat.oper: source tab. selection of the source tab. selection of the room tab. perceptual factors for the source. localization and radiation panel. direct sound filter. reverberated sound filter. visualization of the sound scene. other options.
Figure 9. Several user interfaces included in Spat∼. Jitter interface for spat.viewer. parametric equalizer. HRTFs inspection (magnitude/phase responses). multichannel level meter. geometrical model with image sources. binary matrix. matrix control. authoring of spatial trajectories. 3D sound field visualization.
Due to the increase in available processing power, convolution-based reverberation processing became widely applied during the last few decades. Several new objects have then been added to the Spatialisateur suite for manipulating and processing room impulse responses (IR).
spat.conv∼ is a multichannel convolution engine based on overlap-saved block-partitioned FFT algorithms [20]. The computation of the high latency blocks is handled in a background thread in order to take advantage of modern multi-core processors. The user can adjust the minimum block size which offers a tradeoff between audio latency (which can be set to zero) and computational load.
spat.converb∼ is another convolution processor based on the same DSP algorithms. spat.converb∼ further truncates the impulse response in four temporal segments (with adjustable lengths) according to the Spatialisateur paradigm (Figure 2); additionally a parametric filter is applied to each segment of the IR. The user can control the four filters either at the low-level (gains and cutoff frequencies) or by means of the high-level perceptual approach of spat.oper [21].
spat.smk∼ is a tool for measuring the impulse response of linear time-invariant systems using the swept-sine technique [22]. After configuration of the sweep signal (duration, waveform, frequency range, etc.), the object performs the measurement of the system (which can be multichannel), deconvolves the raw signals, and saves the data as well as meaningful metadata to disk. After each measurement, the object estimates various criteria in realtime (signal-to-noise ratio, harmonic distortion ratio, reverberation time, etc.) for controlling the quality of the measure.
In addition to that, spat.smk∼ can be linked to other acoustical tools such as spat.edc which computes and displays the energy decay curve of the IR, spat.mixingtime which estimates and displays the mixing time, or spat.ir.infos which derives many acoustical criteria (clarity, central time, early decay time, etc.).
Measured room impulse responses can be further analyzed thanks to the spat.rat object (Room Acoustics Toolbox). This external implements a time-frequency analysis of the energy decay relief (spectral energy density after any time) [23]. From this analysis it is possible to extract all the (frequency dependent) characteristics of the decay: noise level, reverberation time and initial power spectrum.
The benefits of such estimation are twofold: 1) it is possible to denoise the measured IRs (the part corrupted by noise is subtracted and replaced by a synthetic late reverberation conform to the original energy decay relief) [23]; such denoising is crucial if the IR is to be used in a convolution processor. 2) it is also possible to estimate the perceptual factors corresponding to the IR; in other words the spat.rat object can derive a set of parameters for spat.oper such that the Spat∼ FDN seamlessly mimics the measured IR.
Convolution-based approaches (such as spat.conv∼, see paragraph 6.1) guarantee for an authentic and natural listening experience as they preserve the acoustical signature of existing rooms. However convolution processors are often difficult to parametrize: the control over the reverberation effect is, in general, limited to only a few low-level parameters such as early-to-reverb ratio. Also the computational cost of such engines depends on the length of the processed IR. On the other hand, parametric reverberators (such as FDNs) are scalable and they allow for a continuous tuning of the time and frequency behavior of the room response. A commonly reported drawback of FDN rendering is the lack of authenticity in the early part of the room response (coloration artifacts, insufficient echo and/or modal densities).
spat.hybrid∼ is a new realtime audio signal processing unit that proposes a hybrid approach [24]: it combines a convolution reverb for recreating the early reflections of a measured impulse response with a feedback delay network for synthesizing the reverberation tail. The FDN is automatically adjusted so as to match the energy decay profile of the measured IR, and the generated room effect is perceptually indistinguishable from a pure convolution reverb. As a result, the hybridization approach benefits both from the authenticity of the convolution technique and the flexibility of FDN-based renderer. The spat.hybrid∼ object can further be controlled with a high-level model [24, 21].
When working with massively multichannel data, users often face inappropriate or inefficient tools, even for basic operations. The Spatialisateur library has thus been supplemented by several simple tools for dealing with these common tasks. It is beyond the scope of this paper to examine them exhaustively and we just mention here a few examples:
• spat.sfplay∼ and spat.sfrecord∼ serve as a replacement to conventional Max/MSP sfplay∼ and sfrecord∼ objects. They exhibit similar functionalities and messaging, but they can handle up to 250 channels(*8) and they support WAV RF64 format which allows the usual 4GB file size limit to be overridden.
• spat.virtualspeakers∼ relies on a virtual speakers paradigm [25] to down-mix any multichannel stream into a binaural stereo track preserving the spatial image of the original sound scene.
• Similarly, spat.delaycalibration∼ and spat.gaincalibration∼ can be used to quickly and automatically calibrate a loudspeaker setup and adjust the respective gain and delay of each channel. Unlike spat.align∼ these objects rely on in-situ acoustical measurements.
*8 This corresponds to the current limitation of Max/MSP.
*8 これはMax/MSPの現在の制限に相当する。
7.2 Multichannel audio effects
7.2 マルチチャンネル・オーディオ・エフェクト
It is common practice to apply various audio effects to the output channels of a reproduction setup. Using standard Max/MSP objects in this context appears to be quite impracticable as they (generally) process monophonic signals only. The Spatialisateur library thus offers multichannel versions of a wide range of classic audio effects: spat.compressor∼ (compressor/expander), spat.eq∼ (parametric equalizer based on bi-quadratic filters), spat.graphiceq∼ (graphic equalizer with adjustable number of bands), spat.limiter∼, spat.noisegate∼, spat.softclipping∼, spat.dcfilter∼, etc.
The Spatialisateur package includes a large set of measured HRTFs which can be used in the binaural synthesis modules. Originally HRTF data were stored in Max/MSP “coll” files. Such textual format was not very flexible and it was not possible to embed metadata. Later, another storage format —based on SDIF(*9) — has been developed. Although fully functional, this proprietary format was a strong restriction regarding the interchange with other institutions or third party applications.
In the latest versions of Spatialisateur, these formats (coll and SDIF) are deprecated and superseded by the “Spatially Oriented Format for Acoustics” (SOFA). SOFA [26] is a new format for storage and exchange of acoustical data, notably HRTFs. SOFA format is open, extensible, self-described, machine-independent, network-transparent, it supports data compression, and it was recently approved by the AES committee as “AES69-2015 standard for file exchange – Spatial acoustic data file format”. Since 2013 Spat∼ uses libsofa(*10) , a C++ implementation of the SOFA specifications which allows HRTFs to be loaded from the many available databases(*11).
Spatialisateurの最新バージョンでは、これらのフォーマット(collとSDIF)は廃止され、「Spatially Oriented Format for Acoustics」(SOFA)により、取って代わられている。SOFA[26]は、音響データ、特にHRTFの保存と交換のための、新しいフォーマットである。SOFAフォーマットはオープンで、拡張性があり、自己記述型で、マシンに依存せず、ネットワーク透過型で、データ圧縮をサポートし、また最近ではAESコミッティーによって「AES69-2015ファイル交換スタンダード——空間音響データファイル・フォーマット」として公認された。2013年以来Spat~は、利用可能な多くのデータベースからHRFTを読み込める、SOFA仕様のC++実装であるlibsofa(*10)を用いている(*11)。
This paper presented the latest developments of Ircam Spatialisateur since version 4.x, which constituted a complete refactoring. Substantial improvements and new functionalities have been introduced in every module of the software: the reverberation module, panning module and control interfaces. All in all, more than 150 external objects are shipped with the current public release(*12).
Recent research has focused on the analysis, transformation and re-synthesis of room impulses responses, with an emphasis on directional room impulses responses measured with spherical microphone arrays. Such directional IRs capture rich space-time-frequency features and they are prone to efficient manipulations in the HOA domain. Future work will consider the potential exploitation of directional room impulse responses using a Spatialisateur-like approach. To that purpose, the FDN processing architecture might be substituted with a convolutional or hybrid framework using an HOA-like underlying space-time representation format, and further studies on spatial perception of sound fields is needed to upgrade the perceptual factors model accordingly.
The authors would like to thank everyone —researchers, composers, sound artists, sound engineers, etc.— who closely or remotely contributed and still contributes to improving Spat∼ with invaluable ideas and discussions.
[1] J.-M. Jot and A. Chaigne, “Digital Delay Networks for Designing Artificial Reverberators,” in Proceedings of the 90th Convention of the Audio Engineering Society, Paris, 1991.
[2] J.-M. Jot, “Real-time spatial processing of sounds for music, multimedia and interactive human-computer interfaces,” ACM Multimedia Systems Journal (Special issue on Audio and Multimedia), vol. 7, no. 1, pp. 55 – 69, 1997.
[3] J.-M. Jot and O. Warusfel, “A Real-Time Spatial Sound Processor for Music and Virtual Reality Applications,” in Proceedings of the International Computer Music Conference, Banff, 1995.
[4] J.-P. Jullien, “Structured model for the representation and the control of room acoustic quality,” in Proceedings of the 15th International Congress on Acoustics (ICA), Trondheim, 1995, pp. 517 — 520.
[5] E. Kahle and J.-P. Jullien, “Subjective Listening Tests in Concert Halls: Methodology and Results,” in Proceedings of the 15th International Congress on Acoustics (ICA), Trondheim, June 1995, pp. 521 – 524.
[6] J. Bresson, C. Agon, and G. Assayag, “OpenMusic. Visual Programming Environment for Music Composition, Analysis and Research,” in Proceedings of ACM MultiMedia (OpenSource Software Competition), Scottsdale, 2011.
[7] J.-M. Jot, V. Larcher, and O. Warusfel, “Digital Signal Processing Issues in the Context of Binaural and Transaural Stereophony,” in Proceedings of the 98th Convention of the Audio Engineering Society, Paris, Feb. 1995.
[8] V. Pulkki, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning,” Journal of the Audio Engineering Society, vol. 45, no. 6, pp. 456–466, June 1997.
[9] J.-M. Jot, V. Larcher, and J.-M. Pernaux, “A comparative study of 3-D audio encoding and rendering techniques,” in Proceedings of the 16th Audio Engineering Society International Conference on Spatial Sound Reproduction, Rovaniemi, 1999.
[10] T. Lossius, P. Balthazar, and T. de la Hogue, “DBAP - Distance-Based Amplitude Panning,” in Proceedings of the International Computer Music Conference, Montréal, 2009.
[11] R. Sadek and C. Kyriakakis, “A Novel Multichannel Panning Method for Standard and Arbitrary Loudspeaker Configurations,” in Proceedings of the 117th Audio Engineering Society Convention, San Francisco, 2004.
[12] J. Daniel, “Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia,” Ph.D. dissertation, Université de Paris VI, 2001.
[13] M. Noisternig, T. Carpentier, and O. Warusfel, “Espro 2.0 – Implementation of a surrounding 350-loudspeaker array for sound field reproduction.” in Proceedings of the 25th Audio Engineering Society UK Conference, 2012.
[14] J. Daniel and S. Moreau, “Further Study of Sound Field Coding with Higher Order Ambisonics,” in Proceedings of the 116th Audio Engineering Society Convention, Berlin, 2004.
[15] F. Zotter, H. Pomberger, and M. Noisternig, “Energy-Preserving Ambisonic Decoding,” Acta Acustica united with Acustica, vol. 98, 2012.
[16] M. Pollow, K.-V. Nguyen, O. Warusfel, T. Carpentier, M. Müller-Trapet, M. Vorländer, and M. Noisternig, “Calculation of Head-Related Transfer Functions for Arbitrary Field Points Using Spherical Harmonics Decomposition,” Acta Acustica united with Acustica, vol. 98, 2012.
[17] R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” Journal of the Acoustical Society of America, vol. 104, no. 5, pp. 3048– 3058, 1998.
[18] D. Romblom and B. Cook, “Near-Field Compensation for HRTF Processing,” in Proceedings of the 125th Audio Engineering Society Convention, San Francisco, 2008.
[19] A. Baskind, T. Carpentier, M. Noisternig, O. Warusfel, and J.-M. Lyzwa, “Binaural and transaural spatialization techniques in multichannel 5.1 production,” in Proceedings of the 27th Tonmeistertagung – VDT International Convention, Köln, November 2012.
[20] W. G. Gardner, “Efficient Convolution without Input-Output Delay,” in Proceedings of the 97th Convention of the Audio Engineering Society, San Francisco, 1994.
[21] M.Noisternig, T.Carpentier, and O.Warusfel,“Perceptual Control of Convolution Based Room Simulators,” in The Acoustics 2012 Hong Kong Conference, Hong Kong, 2012.
[22] A. Farina, “Simultaneous measurement of impulse response and distortion with a swept-sine technique,” in Proceedings of the 108th Convention of the Audio Engineering Society, Paris, 2000.
[23] J.-M. Jot, L. Cerveau, and O. Warusfel, “Analysis and Synthesis of Room Reverberation Based on a Statistical Time- Frequency Model,” in Proceedings of the 103rd Convention of the Audio Engineering Society, New York, 1997.
[24] T. Carpentier, M. Noisternig, and O. Warusfel, “Hybrid Reverberation Processor with Perceptual Control,” in Pro- ceedings of the 17th Int. Conference on Digital Audio Effects (DAFx-14), Erlangen, Sept. 2014.
[25] H. Møller, “Fundamentals of Binaural Technology,” Applied Acoustics, vol. 36, pp. 171 – 218, 1992.
[26] P. Majdak, Y. Iwaya, T. Carpentier, R. Nicol, M. Parmen- tier, A. Roginska, Y. Suzuki, K. Watanabe, H. Wierstorf, H. Ziegelwanger, and M. Noisternig, “Spatially Oriented Format for Acoustics: A Data Exchange Format Representing Head-Related Transfer Functions,” in Proceedings of the 134rd Convention of the Audio Engineering Society, Roma, May 2013.
[1] J.-M. Jot and A. Chaigne, “Digital Delay Networks for Designing Artificial Reverberators,” in Proceedings of the 90th Convention of the Audio Engineering Society, Paris, 1991.
[2] J.-M. Jot, “Real-time spatial processing of sounds for music, multimedia and interactive human-computer interfaces,” ACM Multimedia Systems Journal (Special issue on Audio and Multimedia), vol. 7, no. 1, pp. 55 – 69, 1997.
[3] J.-M. Jot and O. Warusfel, “A Real-Time Spatial Sound Processor for Music and Virtual Reality Applications,” in Proceedings of the International Computer Music Conference, Banff, 1995.
[4] J.-P. Jullien, “Structured model for the representation and the control of room acoustic quality,” in Proceedings of the 15th International Congress on Acoustics (ICA), Trondheim, 1995, pp. 517 — 520.
[5] E. Kahle and J.-P. Jullien, “Subjective Listening Tests in Concert Halls: Methodology and Results,” in Proceedings of the 15th International Congress on Acoustics (ICA), Trondheim, June 1995, pp. 521 – 524.
[6] J. Bresson, C. Agon, and G. Assayag, “OpenMusic. Visual Programming Environment for Music Composition, Analysis and Research,” in Proceedings of ACM MultiMedia (OpenSource Software Competition), Scottsdale, 2011.
[7] J.-M. Jot, V. Larcher, and O. Warusfel, “Digital Signal Processing Issues in the Context of Binaural and Transaural Stereophony,” in Proceedings of the 98th Convention of the Audio Engineering Society, Paris, Feb. 1995.
[8] V. Pulkki, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning,” Journal of the Audio Engineering Society, vol. 45, no. 6, pp. 456–466, June 1997.
[9] J.-M. Jot, V. Larcher, and J.-M. Pernaux, “A comparative study of 3-D audio encoding and rendering techniques,” in Proceedings of the 16th Audio Engineering Society International Conference on Spatial Sound Reproduction, Rovaniemi, 1999.
[10] T. Lossius, P. Balthazar, and T. de la Hogue, “DBAP - Distance-Based Amplitude Panning,” in Proceedings of the International Computer Music Conference, Montréal, 2009.
[11] R. Sadek and C. Kyriakakis, “A Novel Multichannel Panning Method for Standard and Arbitrary Loudspeaker Configurations,” in Proceedings of the 117th Audio Engineering Society Convention, San Francisco, 2004.
[12] J. Daniel, “Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia,” Ph.D. dissertation, Université de Paris VI, 2001.
[13] M. Noisternig, T. Carpentier, and O. Warusfel, “Espro 2.0 – Implementation of a surrounding 350-loudspeaker array for sound field reproduction.” in Proceedings of the 25th Audio Engineering Society UK Conference, 2012.
[14] J. Daniel and S. Moreau, “Further Study of Sound Field Coding with Higher Order Ambisonics,” in Proceedings of the 116th Audio Engineering Society Convention, Berlin, 2004.
[15] F. Zotter, H. Pomberger, and M. Noisternig, “Energy-Preserving Ambisonic Decoding,” Acta Acustica united with Acustica, vol. 98, 2012.
[16] M. Pollow, K.-V. Nguyen, O. Warusfel, T. Carpentier, M. Müller-Trapet, M. Vorländer, and M. Noisternig, “Calculation of Head-Related Transfer Functions for Arbitrary Field Points Using Spherical Harmonics Decomposition,” Acta Acustica united with Acustica, vol. 98, 2012.
[17] R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” Journal of the Acoustical Society of America, vol. 104, no. 5, pp. 3048– 3058, 1998.
[18] D. Romblom and B. Cook, “Near-Field Compensation for HRTF Processing,” in Proceedings of the 125th Audio Engineering Society Convention, San Francisco, 2008.
[19] A. Baskind, T. Carpentier, M. Noisternig, O. Warusfel, and J.-M. Lyzwa, “Binaural and transaural spatialization techniques in multichannel 5.1 production,” in Proceedings of the 27th Tonmeistertagung – VDT International Convention, Köln, November 2012.
[20] W. G. Gardner, “Efficient Convolution without Input-Output Delay,” in Proceedings of the 97th Convention of the Audio Engineering Society, San Francisco, 1994.
[21] M.Noisternig, T.Carpentier, and O.Warusfel,“Perceptual Control of Convolution Based Room Simulators,” in The Acoustics 2012 Hong Kong Conference, Hong Kong, 2012.
[22] A. Farina, “Simultaneous measurement of impulse response and distortion with a swept-sine technique,” in Proceedings of the 108th Convention of the Audio Engineering Society, Paris, 2000.
[23] J.-M. Jot, L. Cerveau, and O. Warusfel, “Analysis and Synthesis of Room Reverberation Based on a Statistical Time- Frequency Model,” in Proceedings of the 103rd Convention of the Audio Engineering Society, New York, 1997.
[24] T. Carpentier, M. Noisternig, and O. Warusfel, “Hybrid Reverberation Processor with Perceptual Control,” in Pro- ceedings of the 17th Int. Conference on Digital Audio Effects (DAFx-14), Erlangen, Sept. 2014.
[25] H. Møller, “Fundamentals of Binaural Technology,” Applied Acoustics, vol. 36, pp. 171 – 218, 1992.
[26] P. Majdak, Y. Iwaya, T. Carpentier, R. Nicol, M. Parmen- tier, A. Roginska, Y. Suzuki, K. Watanabe, H. Wierstorf, H. Ziegelwanger, and M. Noisternig, “Spatially Oriented Format for Acoustics: A Data Exchange Format Representing Head-Related Transfer Functions,” in Proceedings of the 134rd Convention of the Audio Engineering Society, Roma, May 2013.