Sparse Representation of a Spatial Sound Field in a Reverberant Environment. Koyama, S., and L. Daudet. IEEE Journal on Selected Topics in Signal Processing 13, no. 1 (2019): 172–184.
Résumé: © 20072012 IEEE. This paper investigates soundfield modeling in a realistic reverberant setting. Starting from a few pointlike microphone measurements, the goal is to estimate the direct source field within a whole threedimensional (3D) space around these microphones. Previous sparse sound field decompositions assumed only a spatial sparsity of the source distribution, but could generally not handle reverberation. We here add an explicit model of the reverberant sound field, that has two components: the first component sparse in the planewave domain, the other component lowrank as a multiplication of transfer functions and source signals. We derive the corresponding decomposition algorithm based on the alternating direction method of multipliers. We furthermore provide empirical rules for tuning the two parameters to be set in the algorithm. Numerical and experimental results indicate that the decomposition and reconstruction performances are significantly improved, in the case of reverberant environments.
MotsClés: reverberation; Sound field decomposition; sound field recording; source identification; sparse representation


Joint Source and Sensor Placement for Sound Field Control Based on Empirical Interpolation Method. Koyama, S., G. Chardon, and L. Daudet. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 501–505. Vol. 2018April., 2018.
Résumé: © 2018 IEEE. This study proposes a principled method to jointly determine the placement of acoustic sources (loudspeakers) and sensors (control points/microphones) in sound field control. The goal of this setup is to efficiently produce a sound field using multiple loudspeakers, approximately matching a target sound field over a region of interest. Therefore, the loudspeaker and controlpoint placement problem can be seen as the problem of finding interpolating functions (associated with individual loudspeaker sound fields) and sampling points (corresponding to control points or microphones) to approximate the target sound field in the given domain. We here solve this problem using the empirical interpolation method, originally developed for the numerical analysis of partial differential equations. The proposed method enables a joint determination of loudspeaker and controlpoint placement, from a large set of candidate locations, independently of the desired sound field. Numerical simulation results indicate that accurate and stable sound field control can be achieved by the proposed method, with significantly better results than with random and regular placements.
MotsClés: Interpolation; Magic points; Sound field control; Sound field reproduction; Source and sensor placement


Compressive acoustic holography with blocksparse regularization. FernandezGrande, E., and L. Daudet. Journal of the Acoustical Society of America 143, no. 6 (2018): 3737–3746.
Résumé: © 2018 Acoustical Society of America. Sparse reconstruction methods, such as Compressive Sensing, are powerful methods in acoustic array processing, as they make wideband reconstruction possible. However, when addressing sound fields that are not necessarily sparse (e.g., in acoustic nearfields, reflective environments, extended sources, etc.), the methods can lead to a poor reconstruction of the sound field. This study examines the use of sparse analysis priors to promote blocksparse solutions. In particular, a Fused Total Generalized Variation (FTGV) method is developed, to analyze the sound field in the nearfield of acoustic sources. The method promotes sparsity both on the spatial derivatives of the solution and on the solution itself, thus seeking solutions where the nonzero coefficients are grouped together. The performance of the method is examined numerically and experimentally, and compared with established methods. The results indicate that the FTGV method is suitable to examine both compact and spatially extended sources. The method is promising for its generality, robustness to noise, and the capability to provide a wideband reconstruction of sound fields that are not necessarily sparse.


Comparison of reverberation models for sparse sound field decomposition. Koyama, S., and L. Daudet. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 214–218. Vol. 2017October., 2017.
Résumé: © 2017 IEEE. Sparse representations of sound fields have become popular in various acoustic inverse problems. The simplest models assume spatial sparsity, where a small number of sound sources are located in the nearfield. However, the performance of these models deteriorates in the presence of strong reverberation. To properly treat the reverberant components, we introduce three types of reverberation models: a lowrank model, a sparse model in the planewave domain, and a combined lowrank+sparse model. We discuss corresponding decomposition algorithms based on ADMM convex optimization. Numerical simulations indicate that the decomposition accuracy is significantly improved by the additive model of lowrank and sparse plane wave models.
MotsClés: convex optimization; inverse problems; reverberation; sound field analysis; Sound field decomposition; sparse representations


Robust source localization from wavefield separation including prior information. Nowakowski, T., J. De Rosny, and L. Daudet. Journal of the Acoustical Society of America 141, no. 4 (2017): 2375–2386.
Résumé: © 2017 Acoustical Society of America.Strong reverberation is a challenge for narrowband source localization, as most of the existing methods are based on timesofarrival measurements, that is affected by boundaries. Amongst the methods that explicitly take into account the reverberation, wavefield separation projector processing (WSPP) splits the acoustic wave field into the direct path of the sources and the reverberation. However, WSPP requires a very large number of microphones, making this method impractical. This article studies three ways of alleviating this constraint, extending WSPP by adding different prior information on the wavefield. The first method is based on using the knowledge of the critical distance of the room to decrease the selectivity of the field separation. The second method adds constraints called “virtual measurements” when the room geometry is partially known. Finally, the last method requires a simple calibration step to estimate the Green's functions between each pair of microphones; this also extends the model to weakly inhomogeneous propagation media. It is shown numerically and experimentally that these methods allow a precise source localization, with a reduced number of microphones.


Intensityonly measurement of partially uncontrollable transmission matrix: demonstration with wavefield shaping in a microwave cavity. Del Hougne, P., B. Rajaei, L. Daudet, and G. Lerosey. Optics Express 24, no. 16 (2016): 18631–18641.


Fast Phase Retrieval for High Dimensions: A BlockBased Approach. Rajaei, B., S. Gigan, F. Krzakala, and L. Daudet. Ieee Signal Processing Letters 23, no. 8 (2016): 1179–1182.
MotsClés: Convex optimization; inverse problems; phase retrieval (PR)


Compressive Sensing in Acoustic Imaging. Bertin, N., L. Daudet, V. Emiya, and R. Gribonval. In Applied and Numerical Harmonic Analysis, 169–192., 2015.


Geometricbased reverberator using acoustic rendering networks. Bai, H., G. Richard, and L. Daudet. In 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2015., 2015.
Résumé: © 2015 IEEE. Many virtual reality applications incorporate realistic room acoustic simulation to provide increased immersiveness and realism. Traditional geometric methods, although providing modeling accuracy, are usually impractical for use in interactive applications. At the same time, artificial reverberators, with feedback rendering structure, are widely used as a lowcost alternative. This paper presents the design of a geometricbased artificial reverberator inspired by the acoustic rendering equation (ARE) and the feedback delay networks (FDN). The simplified acoustic rendering equation, which models both specular and diffuse reflections, is incorporated with the FDN structure. Our reverberator, despite of modeling the diffuse and late reverberation, is also capable of simulating the early/specular reflections with accuracy. This novel work is among the very few works which are capable to simulate early reflections using feedback delay networks.
MotsClés: acoustic rendering equation; feedback delay networks; reverberation; room acoustics


Localization of acoustic sensors from passive Green's function estimation. Nowakowski, T., L. Daudet, and J. De Rosny. Journal of the Acoustical Society of America 138, no. 5 (2015): 3010–3018.
Résumé: © 2015 Acoustical Society of America. A number of methods have recently been developed for passive localization of acoustic sensors, based on the assumption that the acoustic field is diffuse. This article presents the more general case of equipartition fields, which takes into account reflections off boundaries and/or scatterers. After a thorough discussion on the fundamental differences between the diffuse and equipartition models, it is shown that the method is more robust when dealing with wideband noise sources. Finally, experimental results show, for two types of boundary conditions, that this approach is especially relevant when acoustic sensors are close to boundaries.


Microphone array position calibration in the frequency domain using a single unknown source. Nowakowski, T., L. Daudet, and J. De Rosny. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 330–334. Vol. 2015August., 2015.
Résumé: © 2015 IEEE. We study the problem of microphone array localization in a strongly reverberant room, where time of arrivals (TOA) or time difference of arrivals (TDOA) cannot always be measured precisely. Instead, we use frequencydomain measurements to calibrate the array position, based on the modes of the room, excited by a wideband single source, that can be unknown. By using the fact that each measured mode can be decomposed as a sum of modelbased polynomials, we build a cost function whose minimum indicates the positions of the microphones. A simple Block Coordinate Descent algorithm can be used to minimize this cost function. Numerical results indicate that this algorithm converges to the right solution, and therefore that using frequency measurements for position calibration is a valid concept for dense arrays, as an alternative to timedomain methods in reverberant domains.
MotsClés: Array position calibration; modal interpolation; reverberation


Late Reverberation Synthesis: From Radiance Transfer to Feedback Delay Networks. Bai, H., G. Richard, and L. Daudet. IEEE/ACM Transactions on Speech and Language Processing 23, no. 12 (2015): 2260–2271.
Résumé: © 2014 IEEE. In room acoustic modeling, feedback delay networks (FDN) are known to efficiently model late reverberation due to their capacity to generate exponentially decaying dense impulses. However, this method relies on a careful tuning of the different synthesis parameters, either estimated from a prerecorded impulse response from the real acoustic scene, or set manually from experience. In this paper, we present a new method, which still inherits the efficiency of the FDN structure, but aims at linking the parameters of the FDN directly to the geometry setting. This relation is achieved by studying the sound energy exchange between each delay line using the acoustic radiance transfer method (RTM). Experimental results show that the late reverberation modeled by this method is in good agreement with the virtual geometry setting.
MotsClés: Acoustic radiance transfer; feedback delay networks (FDNs); reverberation; room acoustics


A Blind Dereverberation Method for Narrowband Source Localization. Chardon, G., T. Nowakowski, J. De Rosny, and L. Daudet. Ieee Journal Of Selected Topics In Signal Processing 9, no. 5 (2015): 815–824.
MotsClés: Source localization; microphone array; reverberation


Referenceless measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques. Dremeau, A., A. Liutkus, D. Martina, O. Katz, C. Schuelke, F. Krzakala, S. Gigan, and L. Daudet. Optics Express 23, no. 9 (2015): 11898–11911.


Investigation of the Harpist/Harp Interaction. Chadefaux, D., J.  L. Le Carrou, B. Fabre, and L. Daudet. In Lecture Notes in Computer Science, 3–19. Vol. 8905., 2014.
MotsClés: Harp; Highspeed video analysis; Motion capture; Acoustics; Data mining


Investigation of the Harpist/Harp Interaction. Chadefaux, D., J. L. Le Carrou, B. Fabre, and L. Daudet. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3–19. Vol. 8905., 2014.
Résumé: © Springer International Publishing Switzerland 2014 This paper presents a contribution to the field of the musician/instrument interaction analysis. This study aims at investigating the mechanical parameters that govern the harp plucking action as well as the gestural strategies set up by harpists to control a musical performance. Two specific experimental procedures have been designed to accurately describe the harpist motion in realistic playing contexts. They consist in filming the plucking action and the harpists gestures using a highspeed camera and a motion capture system, respectively. Simultaneously, acoustical measurements are performed to relate the kinematic investigation to sound features. Results describe the musical gesture characteristics. Mechanical parameters governing the finger/string interaction are highlighted and their influence on the produced sound are discussed. Besides, the relationship between non soundproducing gestures and musical intent is pointed out. Finally, the way energy is shared between harpist arm joints according to various playing techniques is analyzed.
MotsClés: Acoustics; Data mining; Harp; Highspeed video analysis; Motion capture


A general framework for dictionary based audio fingerprinting. Moussallam, M., and L. Daudet. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 3077–3081., 2014.
Résumé: Fingerprintbased Audio recognition system must address concurrent objectives. Indeed, fingerprints must be both robust to distortions and discriminative while their dimension must remain to allow fast comparison. This paper proposes to restate these objectives as a penalized sparse representation problem. On top of this dictionarybased approach, we propose a structured sparsity model in the form of a probabilistic distribution for the sparse support. A practical suboptimal greedy algorithm is then presented and evaluated on robustness and recognition tasks. We show that some existing methods can be seen as particular cases of this algorithm and that the general framework allows to reach other points of a Paretolike continuum. © 2014 IEEE.
MotsClés: Audio Fingerprinting; Sparse Representation


Image transmission through a scattering medium: Inverse problem and sparsitybased imaging. Gigan, S., S. M. Popoff, A. Liutkus, D. Martina, O. Katz, G. Chardon, R. Carminati, G. Lerosey, M. A. Fink., A. C. Boccara et al. In 2014 13th Workshop on Information Optics, WIO 2014., 2014.
Résumé: © 2014 IEEE. We demonstrate how to measure accurately the transmission matrix of a complex medium. With this information, we show how to focus light, recover an image, and even perform efficient reconstruction of a sparse object.


Random calibration for accelerating MRARFI guided ultrasonic focusing in transcranial therapy. Liu, N., A. Liutkus, J. F. Aubry, L. Marsac, M. Tanter, and L. Daudet. Physics in Medicine and Biology 60, no. 3 (2015): 1069–1085.
Résumé: © 2015 Institute of Physics and Engineering in Medicine. Transcranial focused ultrasound is a promising therapeutic modality. It consists of placing transducers around the skull and emitting shaped ultrasound waves that propagate through the skull and then concentrate on one particular location within the brain. However, the skull bone is known to distort the ultrasound beam. In order to compensate for such distortions, a number of techniques have been proposed recently, for instance using Magnetic Resonance Imaging feedback. In order to fully determine the focusing distortion due to the skull, such methods usually require as many calibration signals as transducers, resulting in a lengthy calibration process. In this paper, we investigate how the number of calibration sequences can be significantly reduced, based on random measurements and optimization techniques. Experimental data with six human skulls demonstrate that the number of measurements can be up to three times lower than with the standard methods, while restoring 90% of the focusing efficiency.
MotsClés: brain; calibration; focused ultrasound; MRARFI; therapeutic; transcranial; ultrasound


Blind Denoising with Random Greedy Pursuits. Moussallam, M., A. Gramfort, L. Daudet, and G. Richard. Ieee Signal Processing Letters 21, no. 11 (2014): 1341–1345.
MotsClés: Please add index terms


Convex Optimization Approaches for Blind Sensor Calibration Using Sparsity. Bilen, C., G. Puy, R. Gribonval, and L. Daudet. Ieee Transactions On Signal Processing 62, no. 18 (2014): 4847–4856.
MotsClés: Compressed sensing; blind calibration; phase estimation; convex optimization; gain calibration


Imaging with nature: compressive imaging using a multiply scattering medium. Liutkus, A., D. Martina, S. Popoff, G. Chardon, O. Katz, G. Lerosey, S. Gigan, L. Daudet, and I. Carron. Scientific reports 4 (2014): 5552.


Low Frequency Interpolation of Room Impulse Responses Using Compressed Sensing. Mignot, R., G. Chardon, and L. Daudet. IeeeAcm Transactions On Audio Speech And Language Processing 22, no. 1 (2014): 205–216.
MotsClés: Compressed sensing; room impulse responses; wavefield reconstruction; plane waves; interpolation; sparsity


An overview of informed audio source separation. Liutkus, A., J.  L. Durrieu, L. Daudet, and G. Richard. In International Workshop on Image Analysis for Multimedia Interactive Services., 2013.
Résumé: Audio source separation consists in recovering different unknown signals called sources by filtering their observed mixtures. In music processing, most mixtures are stereophonic songs and the sources are the individual signals played by the instruments, e.g. bass, vocals, guitar, etc. Source separation is often achieved through a classical generalized Wiener filtering, which is controlled by parameters such as the power spectrograms and the spatial locations of the sources. For an efficient filtering, those parameters need to be available and their estimation is the main challenge faced by separation algorithms. In the blind scenario, only the mixtures are available and performance strongly depends on the mixtures considered. In recent years, much research has focused on informed separation, which consists in using additional available information about the sources to improve the separation quality. In this paper, we review some recent trends in this direction. © 2013 IEEE.


Gestural strategies in the harp performance. Chadefaux, D., J.  L. L. Carrou, M. M. Wanderley, B. Fabre, and L. Daudet. Acta Acustica united with Acustica 99, no. 6 (2013): 986–996.
Résumé: This paper describes an experimentallybased analysis of the interaction between musician and instrument in the case of the classical concert harp. The study highlights gestural strategies used by three harpists while performing a short musical excerpt. As a result of years of practicing, a trained musician has developed the ability to deal with a number of tradeoffs among simultaneous objectives while playing. She/he has obviously to set the instrument into vibration, but also to convey some musical intention to the audience and to communicate with other musicians, while keeping a safe posture with respect to articular and muscle pain. In order to precisely describe the motion strategies carried out by trained harpists, an experiment has been designed using a motion capture system and corresponding video and audio recordings. This provides accurate threedimensional positioning of several markers disposed on the harpist and on the harp, within the execution of a musical piece. From the acquired gestural and acoustical signals, a set of kinematic and dynamic descriptors were extracted. The investigation shows that while each musician uses their own specific and repeatable upperlimb movements, the global soundproducing gesture is mostly controlled by the shoulders. Soundfacilitating hand gestures are highlighted for their supporting role to the musician throughout the musical piece. © S. Hirzel Verlag · EAA.


Room reverberation reconstruction: Interpolation of the early part using compressed sensing. Mignot, R., L. Daudet, and F. Ollivier. IEEE Transactions on Audio, Speech and Language Processing 21, no. 11 (2013): 2301–2312.
Résumé: This paper deals with the interpolation of the Room Impulse Responses (RIRs) within a whole volume, from as few measurements as possible, and without the knowledge of the geometry of the room. We focus on the early reflections of the RIRs, that have the key property of being sparse in the time domain: this can be exploited in a framework of modelbased Compressed Sensing. Starting from a set of RIRs randomly sampled in the spatial domain of interest by a 3D microphone array, we propose a modified Matching Pursuit algorithm to estimate the position of a small set of virtual sources. Then, the reconstruction of the RIRs at interpolated positions is performed using a projection onto a basis of monopoles, which correspond to the estimated virtual sources. An extension of the proposed algorithm allows the interpolation of the positions of both source and receiver, using the acquisition of four different source positions. This approach is validated both by numerical examples, and by experimental measurements using a 3D array with up to 120 microphones. © 20062012 IEEE.
MotsClés: Compressed sensing; interpolation; microphone arrays; room impulse responses; source localization


Suied, C., A. Drémeau, D. Pressnitzer, and L. Daudet. Auditory sketches: Sparse representations of sounds based on perceptual models. Vol. 7900 LNCS., 2013.
Résumé: An important question for both signal processing and auditory science is to understand which features of a sound carry the most important information for the listener. Here we approach the issue by introducing the idea of “auditory sketches”: sparse representations of sounds, severely impoverished compared to the original, which nevertheless afford good performance on a given perceptual task. Starting from biologicallygrounded representations (auditory models), a sketch is obtained by reconstructing a highly undersampled selection of elementary atoms. Then, the sketch is evaluated with a psychophysical experiment involving human listeners. The process can be repeated iteratively. As a proof of concept, we present data for an emotion recognition task with short nonverbal sounds. We investigate 1/ the type of auditory representation that can be used for sketches 2/ the selection procedure to sparsify such representations 3/ the smallest number of atoms that can be kept 4/ the robustness to noise. Results indicate that it is possible to produce recognizable sketches with a very small number of atoms per second. Furthermore, at least in our experimental setup, a simple and fast undersampling method based on selecting local maxima of the representation seems to perform as well or better than a more traditional algorithm aimed at minimizing the reconstruction error. Thus, auditory sketches may be a useful tool for choosing sparse dictionaries, and also for identifying the minimal set of features required in a specific perceptual task. © 2013 SpringerVerlag.


A parametric model and estimation techniques for the inharmonicity and tuning of the piano. Rigaud, F., B. David, and L. Daudet. Journal of the Acoustical Society of America 133, no. 5 (2013): 3107–3118.
Résumé: Inharmonicity of piano tones is an essential property of their timbre that strongly influences the tuning, leading to the socalled octave stretching. It is proposed in this paper to jointly model the inharmonicity and tuning of pianos on the whole compass. While using a small number of parameters, these models are able to reflect both the specificities of instrument design and tuners practice. An estimation algorithm is derived that can run either on a set of isolated note recordings, but also on chord recordings, assuming that the played notes are known. It is applied to extract parameters highlighting some tuners choices on different piano types and to propose tuning curves for outoftune pianos or piano synthesizers. © 2013 Acoustical Society of America.
MotsClés: Estimation algorithm; Estimation techniques; Instrument designs; Parametric models; Tuning curve; Harmonic analysis; Tuners; Musical instruments


Lowcomplexity computation of plate eigenmodes with Vekua approximations and the method of particular solutions. Chardon, G., and L. Daudet. Computational Mechanics (2013): 1–10.
Résumé: This paper extends the method of particular solutions (MPS) to the computation of eigenfrequencies and eigenmodes of thin plates, in the framework of the KirchhoffLove plate theory. Specific approximation schemes are developed, with plane waves (MPSPW) or FourierBessel functions (MPSFB). This framework also requires a suitable formulation of the boundary conditions. Numerical tests, on two plates with various boundary conditions, demonstrate that the proposed approach provides competitive results with standard numerical schemes such as the finite element method, at reduced complexity, and with large flexibility in the implementation choices. © 2013 SpringerVerlag Berlin Heidelberg.
MotsClés: Algorithms; Biharmonic equation; Eigenvalues; Kirchhoff plate theory; Numerical methods


Informed source separation using iterative reconstruction. Sturmel, N., and L. Daudet. IEEE Transactions on Audio, Speech and Language Processing 21, no. 1 (2013): 176–183.
Résumé: This paper presents a technique for Informed Source Separation (ISS) of a single channel mixture, based on the Multiple Input Spectrogram Inversion (MISI) phase estimation method. The reconstruction of the source signals is iterative, alternating between a timefrequency consistency enforcement and a remixing constraint. A dual resolution technique is also proposed, for sharper transients reconstruction. The two algorithms are compared to a stateoftheart Wienerbased ISS technique, on a database of fourteen monophonic mixtures, with standard source separation objective measures. Experimental results show that the proposed algorithms outperform both this reference technique and the oracle Wiener filter by up to 3 dB in distortion, at the cost of a significantly heavier computation. © 2012 IEEE.
MotsClés: Adaptive Wiener filtering; informed source separation; phase reconstruction; spectrogram inversion; Consistency enforcement; Iterative reconstruction; Multiple inputs; Objective measure; Phase estimation; Phase reconstruction; Reference technique; Resolution techniques; Single channels; Source signals; Spectrograms; Standard sources; Time frequency; Wiener filtering; WIENER filters; Algorithms; Spectrographs; Iterative methods


DReaM: A novel system for joint source separation and multitrack coding. Marchand, S., R. Badeau, C. Baras, L. Daudet, D. Fourer, L. Girin, S. Gorlow, A. Liutkus, J. Pinel, G. Richard et al. In 133rd Audio Engineering Society Convention 2012, AES 2012, 749–758. Vol. 2., 2012.
Résumé: Active listening consists in interacting with the music playing, has numerous applications from pedagogy to gaming, and involves advanced remixing processes such as generalized karaoke or respatialization. To get this new freedom, one might use the individual tracks that compose the mix. While multitrack formats loose backward compatibility with popular stereo formats and increase the file size, classic source separation from the stereo mix is not of sufficient quality. We propose a coder/decoder scheme for informed source separation. The coder determines the information necessary to recover the tracks and embeds it inaudibly in the mix, which is stereo and has a size comparable to the original. The decoder enhances the source separation with this information, enabling active listening.
MotsClés: Active listening; Backward compatibility; File sizes; Karaoke; Engineering; Industrial engineering; Cryptography


Phasebased informed source separation of music. Sturmel, N., L. Daudet, and L. Girin. In 15th International Conference on Digital Audio Effects, DAFx 2012 Proceedings., 2012.
Résumé: This paper presents an informed source separation technique of monophonic mixtures. Although the vast majority of the separation methods are based on the timefrequency energy of each source, we introduce a new approach using solely phase information to perform the separation. The sources are iteratively reconstructed using an adaptation of the Multiple Input Spectrogram Inversion (MISI) algorithm from Gunawan and Sen. The proposed method is then tested against conventional MISI and Wiener filtering on monophonic signals and oracle conditions. Results show that at the cost of a larger computation time, our method outperforms both MISI and Wiener filtering in oracle conditions with much higher objective quality even with phase quantization.
MotsClés: Computation time; Monophonic signals; Multiple inputs; Phase information; Phase quantization; Separation methods; Separation techniques; Spectrograms; Time frequency; Wiener filtering; Adaptive filtering; Iterative methods; Separation; Source separation


Piano sound analysis using Nonnegative Matrix Factorization with inharmonicity constraint. Rigaud, F., B. David, and L. Daudet. In European Signal Processing Conference, 2462–2466., 2012.
Résumé: This paper presents a method for estimating the tuning and the inharmonicity coefficient of piano tones, from single notes or chord recordings. It is based on the Nonnegative Matrix Factorization (NMF) framework, with a parametric model for the dictionary atoms. The key point here is to include as a relaxed constraint the inharmonicity law modelling the frequencies of transverse vibrations for stiff strings. Applications show that this can be used to finely estimate the tuning and the inharmonicity coefficient of several notes, even in the case of high polyphony. The use of NMF makes this method relevant when tasks like music transcription or source/note separation are targeted. © 2012 EURASIP.
MotsClés: inharmonicity coefficient estimation; nonnegative matrix factorization; piano tuning; Keypoints; Music transcription; Nonnegative matrix factorization; Parametric models; Piano sounds; Piano tuning; Transverse vibrations; Estimation; Factorization; Musical instruments; Signal processing; Harmonic analysis


Informed audio source separation: A comparative study. Liutkus, A., S. Gorlow, N. Sturmel, S. Zhang, L. Girin, R. Badeau, L. Daudet, S. Marchand, and G. Richard. In European Signal Processing Conference, 2397–2401., 2012.
Résumé: The goal of source separation algorithms is to recover the constituent sources, or audio objects, from their mixture. However, blind algorithms still do not yield estimates of sufficient quality for many practical uses. Informed Source Separation (ISS) is a solution to make separation robust when the audio objects are known during a socalled encoding stage. During that stage, a small amount of side information is computed and transmitted with the mixture. At a decoding stage, when the sources are no longer available, the mixture is processed using the side information to recover the audio objects, thus greatly improving the quality of the estimates at a cost of additional bitrate which depends on the size of the side information. In this study, we compare six methods from the state of the art in terms of quality versus bitrate, and show that a good separation performance can be attained at competitive bitrates. © 2012 EURASIP.
MotsClés: Audio source separation; Bit rates; Blind algorithms; Comparative studies; Separation algorithms; Separation performance; Side information; State of the art; Algorithms; Mixtures; Separation; Source separation


Audio source separation informed by redundancy with greedy multiscale decompositions. Moussallam, M., G. Richard, and L. Daudet. In European Signal Processing Conference, 2644–2648., 2012.
Résumé: This paper describes a greedy algorithm for audio source separation of repeated musical patterns. The problem is understood as retrieving from a set of mixtures the part that is redundant among them and the parts that are specific to only one mixture. The key assumption is the sparsity of all the sources in the same multiscale dictionary. Synthetic and real life examples of source separation of hand cut repeated musical patterns are exposed. Results shows that the proposed method succeeds in simultaneously providing a sparse approximant of the mixtures and a separation of the sources. © 2012 EURASIP.
MotsClés: audio source separation; greedy decompositions; Simultaneous sparse approximation; Approximants; Audio source separation; Greedy algorithms; Multiscale Decomposition; Multiscales; Sparse approximations; Mixtures; Source separation


A framework for fingerprintbased detection of repeating objects in multimedia streams. Fenet, S., M. Moussallam, Y. Grenier, G. Richard, and L. Daudet. In European Signal Processing Conference, 1464–1468., 2012.
Résumé: We present an original framework for the detection of repeating objects in multimedia streams. This framework is designed so that it can work with any fingerprint model. A fingerprint is extracted for each incoming frame of the multimedia stream. The framework then manages this fingerprint so that if one similar frame comes later in the stream, it will be identified as a repetition. The framework has been tested with two distinct fingerprint models on simulated and realworld data. The results show that the framework performs well with both presented models and that it is suitable for industrial usecases. © 2012 EURASIP.
MotsClés: Fingerprint; framework; indexing; repeating objects; Fingerprint; framework; Multimedia stream; Real world data; repeating objects; Computer simulation; Indexing (of information); Industrial applications; Media streaming; Signal processing; Pattern recognition


Narrowband source localization in an unknown reverberant environment using wavefield sparse decomposition. Chardon, G., and L. Daudet. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 9–12., 2012.
Résumé: We propose a method for narrowband localization of sources in an unknown reverberant field. A sparse model for the wavefield is introduced, derived from the physical equations. We compare two localization algorithms that take advantage on the structured sparsity naturally present into the model: a greedy iterative scheme, and an ℓ 1 minimization method. Both methods are validated in 2D on numerical simulations, and on experimental data with a chaoticshaped plate. These results, robust with respect to the specific sampling of the field and to noise, show that this approach may be an interesting alternative to traditional approaches of source localization, when a large number of narrowband sensors are deployed. © 2012 IEEE.
MotsClés: acoustic waves; plate vibrations; room acoustics; source localization; sparsity; Iterative schemes; Localization algorithm; Localization of sources; Minimization methods; Narrow bands; Physical equations; Plate vibration; Reverberant environment; Room acoustics; Source localization; Sparse decomposition; sparsity; Wavefields; Acoustic waves; Acoustics; Architectural acoustics; Signal processing; Iterative methods


Blind calibration for compressed sensing by convex optimization. Gribonval, R., G. Chardon, and L. Daudet. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 2713–2716., 2012.
Résumé: We consider the problem of calibrating a compressed sensing measurement system under the assumption that the decalibration consists in unknown gains on each measure. We focus on blind calibration, using measures performed on a few unknown (but sparse) signals. A naive formulation of this blind calibration problem, using ℓ 1 minimization, is reminiscent of blind source separation and dictionary learning, which are known to be highly nonconvex and riddled with local minima. In the considered context, we show that in fact this formulation can be exactly expressed as a convex optimization problem, and can be solved using offtheshelf algorithms. Numerical simulations demonstrate the effectiveness of the approach even for highly uncalibrated measures, when a sufficient number of (unknown, but sparse) calibrating signals is provided. We observe that the success/failure of the approach seems to obey sharp phase transitions. © 2012 IEEE.
MotsClés: blind signal separation; calibration; compressed sensing; dictionary learning; sparse recovery; Blind Signal Separation; Calibration problems; Compressive sensing; Convex optimization problems; Dictionary learning; Local minimums; Measurement system; Sparse recovery; Blind source separation; Convex optimization; Signal reconstruction; Calibration


Random timefrequency subdictionary design for sparse representations with greedy algorithms. Moussallam, M., L. Daudet, and G. Richard. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 3577–3580., 2012.
Résumé: Sparse signal approximation can be used to design efficient low bitrate coding schemes. It heavily relies on the ability to design appropriate dictionaries and corresponding decomposition algorithms. The size of the dictionary, and therefore its resolution, is a key parameter that handles the tradeoff between sparsity and tractability. This work proposes the use of a non adaptive random sequence of subdictionaries in a greedy decomposition process, thus browsing a larger dictionary space in a probabilistic fashion with no additional projection cost nor parameter estimation. This technique leads to very sparse decompositions, at a controlled computational complexity. Experimental evaluation is provided as proof of concept for low bit rate compression of audio signals. © 2012 IEEE.
MotsClés: Matching Pursuits; Random Subdictionaries; Sparse Audio Coding; Audio Coding; Audio signal; Bitrate coding; Decomposition algorithm; Decomposition process; Experimental evaluation; Greedy algorithms; Key parameters; Low Bit Rate; Matching pursuit; Proof of concept; Random sequence; Random Subdictionaries; Sparse decomposition; Sparse representation; Sparse signals; Time frequency; Algorithms; Parameter estimation; Signal processing; Design


Structured Bayesian orthogonal matching pursuit. Drémeau, A., C. Herzet, and L. Daudet. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 3625–3628., 2012.
Résumé: Taking advantage of the structures inherent in many sparse decompositions constitutes a promising research axis. In this paper, we address this problem from a Bayesian point of view. We exploit a Boltzmann machine, allowing to take a large variety of structures into account, and focus on the resolution of a joint maximum a posteriori problem. The proposed algorithm, called Structured Bayesian Orthogonal Matching Pursuit (SBOMP), is a structured extension of the Bayesian Orthogonal Matching Pursuit algorithm (BOMP) introduced in our previous work [1]. In numerical tests involving a recovery problem, SBOMP is shown to have good performance over a wide range of sparsity levels while keeping a reasonable computational complexity. © 2012 IEEE.
MotsClés: Boltzmann machine; greedy algorithm; Structured sparse representation; Boltzmann machines; Greedy algorithms; Maximum a posteriori; Numerical tests; Orthogonal matching pursuit; Sparse decomposition; Sparse representation; Signal processing; Algorithms


Iterative phase reconstruction of Wiener filtered signals. Sturmel, N., and L. Daudet. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 101–104., 2012.
Résumé: This paper deals with phase estimation in the framework of underdetermined blind source separation, using an estimated spectrogram of the source and its associated Wiener filter. By thresholding the Wiener mask, two domains are defined on the spectrogram : a confidence domain where the phase is kept as the phase of the mixture, and its complement where the phase is updated with a projection similar to the widelyused Griffin and Lim technique. We show that with this simple technique, the choice of parameters results in a simple tradeoff between distortion and interference. Experiments show that this technique brings significant improvements over the classical Wiener filter, while being much faster than other iterative methods. © 2012 IEEE.
MotsClés: Blind source separation; Phase reconstruction; Spectrogram; STFT; Wiener filter; Choice of parameters; Confidence domain; Filtered signals; Phase estimation; Phase reconstruction; Spectrograms; STFT; Thresholding; Two domains; WIENER filters; Blind source separation; Signal processing; Spectrographs; Iterative methods


Dynamic strategy for window splitting, parameters estimation and interpolation in spatial parametric audio coders. Capobianco, J., G. Pallone, and L. Daudet. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 397–400., 2012.
Résumé: In most parametric stereo audio coders, sets of spatial parameters are extracted from the audio channels in a timefrequency domain. In order to reduce the amount of data, the parameters plane is highly downsampled, and transmitted together with a mono downmix. Then, in the decoding process, it is necessary to interpolate the upmix matrix computed from these parameters. Usually, this is done in the same way for each portion of signal, regardless of its nature. In this article, we propose a dynamic strategy of window splitting, estimation of the parameters and interpolation of the upmix matrix based on transient detection in the audio signal. Subjective tests show an improvement when applied to the new stereo parametric tool from MPEG USAC. © 2012 IEEE.
MotsClés: Parametric audio coding; stereo; Audio channels; Audio coders; Audio signal; Decoding process; Dynamic strategies; Parameters estimation; Parametric audio coding; Parametric stereo; Spatial parameters; stereo; Subjective tests; Time frequency domain; Transient detection; Interpolation; Motion Picture Experts Group standards; Signal processing; Speech coding; Parameter estimation


Linear mixing models for active listening of music productions in realistic studio conditions. Sturmel, N., A. Liutkus, J. Pinel, L. Girin, S. Marchand, G. Richard, R. Badeau, and L. Daudet. In 132nd Audio Engineering Society Convention 2012, 780–789., 2012.
Résumé: The mixing/demixing of audio signals as addressed in the signal processing literature (the “source separation” problem) and the music production in studio remain quite separated worlds. Scientific audio scene analysis rather focuses on “natural” mixtures and most often uses linear (convolutive) models of point sources placed in the same acoustic space. In contrast, the sound engineer can mix musical signals of very different nature and belonging to different acoustic spaces, and exploits many audio effects including nonlinear processes. In the present paper we discuss these differences within the strongly emerging framework of active music listening, which is precisely at the crossroads of these two worlds: it consists in giving to the listener the ability to manipulate the different musical sources while listening to a musical piece. We propose a model that allows the description of a general studio mixing process as a linear stationary process of “generalized source image signals” considered as individual tracks. Such a model can be used to allow the recovery of the isolated tracks while preserving the professional sound quality of the mixture. A simple addition of these recovered tracks enables the enduser to recover the fullquality stereo mix, while these tracks can also be used for, e.g., basic remix / karaoke / soloing and reorchestration applications.
MotsClés: Audio effects; Audio scenes; Audio signal; End users; Karaoke; Linear mixing models; Mixing process; Music production; Musical pieces; Musical signals; Nonlinear process; Point sources; Sound Quality; Source images; Stationary process; Recovery; Signal analysis; Studios; Audio acoustics


Matching Pursuits with random sequential subdictionaries. Moussallam, M., L. Daudet, and G. Richard. Signal Processing 92, no. 10 (2012): 2532–2544.
Résumé: Matching Pursuits are a class of greedy algorithms commonly used in signal processing, for solving the sparse approximation problem. They rely on an atom selection step that requires the calculation of numerous projections, which can be computationally costly for large dictionaries and burdens their competitiveness in coding applications. We propose using a nonadaptive random sequence of subdictionaries in the decomposition process, thus parsing a large dictionary in a probabilistic fashion with no additional projection cost nor parameter estimation. A theoretical modeling based on order statistics is provided, along with experimental evidence showing that the novel algorithm can be efficiently used on sparse approximation problems. An application to audio signal compression with multiscale timefrequency dictionaries is presented, along with a discussion of the complexity and practical implementations. © 2012 Elsevier B.V. All rights reserved.
MotsClés: Audio signal compression; Matching Pursuits; Random dictionaries; Sparse approximation; Audio signal compression; Decomposition process; Experimental evidence; Greedy algorithms; Matching pursuit; Multiscales; Novel algorithm; Order statistics; Practical implementation; Random sequence; Sparse approximations; Theoretical modeling; Time frequency; Competition; Parameter estimation; Signal encoding; Approximation algorithms


Nearfield acoustic holography using sparse regularization and compressive sampling principles. Chardon, G., L. Daudet, A. Peillot, F. Ollivier, N. Bertin, and R. Gribonval. Journal of the Acoustical Society of America 132, no. 3 (2012): 1521–1534.
Résumé: Regularization of the inverse problem is a complex issue when using nearfield acoustic holography (NAH) techniques to identify the vibrating sources. This paper shows that, for convex homogeneous plates with arbitrary boundary conditions, alternative regularization schemes can be developed based on the sparsity of the normal velocity of the plate in a welldesigned basis, i.e., the possibility to approximate it as a weighted sum of few elementary basis functions. In particular, these techniques can handle discontinuities of the velocity field at the boundaries, which can be problematic with standard techniques. This comes at the cost of a higher computational complexity to solve the associated optimization problem, though it remains easily tractable with outofthebox software. Furthermore, this sparsity framework allows us to take advantage of the concept of compressive sampling; under some conditions on the sampling process (here, the design of a random array, which can be numerically and experimentally validated), it is possible to reconstruct the sparse signals with significantly less measurements (i.e., microphones) than classically required. After introducing the different concepts, this paper presents numerical and experimental results of NAH with two plate geometries, and compares the advantages and limitations of these sparsitybased techniques over standard Tikhonov regularization. © 2012 Acoustical Society of America.
MotsClés: Arbitrary boundary conditions; Basis functions; Compressive sampling; Homogeneous plates; Nearfield Acoustic Holography; Optimization problems; Random array; Regularization schemes; Sampling process; Sparse signals; Tikhonov regularization; Two plates; Velocity field; Weighted Sum; Acoustic holography; Inverse problems; Velocity; Signal sampling; acoustics; algorithm; article; computer simulation; holography; instrumentation; mathematical computing; methodology; regression analysis; reproducibility; sound; theoretical model; transducer; vibration; Acoustics; Algorithms; Computer Simulation; Holography; LeastSquares Analysis; Models, Theoretical; Numerical Analysis, ComputerAssisted; Reproducibility of Results; Sound; Transducers; Vibration


Boltzmann machine and meanfield approximation for structured sparse decompositions. Dremeau, A., C. Herzet, and L. Daudet. IEEE Transactions on Signal Processing 60, no. 7 (2012): 3425–3438.
Résumé: Taking advantage of the structures inherent in many sparse decompositions constitutes a promising research axis. In this paper, we address this problem from a Bayesian point of view. We exploit a Boltzmann machine, allowing to take a large variety of structures into account, and focus on the resolution of a marginalized maximum a posteriori problem. To solve this problem, we resort to a meanfield approximation and the “variational Bayes expectation maximization” algorithm. This approach results in a soft procedure making no hard decision on the support or the values of the sparse representation. We show that this characteristic leads to an improvement of the performance over stateoftheart algorithms. © 2012 IEEE.
MotsClés: BernoulliGaussian model; Boltzmann machine; meanfield approximation; structured sparse representation; Boltzmann machines; Expectation Maximization; Hard decisions; Maximum a posteriori; Mean field approximation; Sparse decomposition; Sparse representation; Stateoftheart algorithms; Variational bayes; Algorithms; Problem solving


Experimentally based description of harp plucking. Chadefaux, D., J.  L. Le Carrou, B. Fabre, and L. Daudet. Journal of the Acoustical Society of America 131, no. 1 (2012): 844–855.
Résumé: This paper describes an experimental study of string plucking for the classical harp. Its goal is to characterize the playing parameters that play the most important roles in expressivity, and in the way harp players recognize each other, even on isolated noteswhat we call the acoustical signature of each player. We have designed a specific experimental setup using a highspeed camera that tracks some markers on the fingers and on the string. This provides accurate threedimensional positioning of the finger and of the string throughout the plucking action, in different musical contexts. From measurements of ten harp players, combined with measurements of the soundboard vibrations, we extract a set of parameters that finely control the initial conditions of the string's free oscillations. Results indicate that these initial conditions are typically a complex mix of displacement and velocity, with additional rotation. Although remarkably reproducible by a single playerand the more so for professional playerswe observe that some of these control parameters vary significantly from one player to another. © 2012 Acoustical Society of America.
MotsClés: Control parameters; Experimental setup; Experimental studies; Free oscillation; Initial conditions; Acoustics; Physics; acoustics; article; finger; human; motion; motor performance; music; physiology; sound detection; tensile strength; touch; vibration; Acoustics; Fingers; Humans; Motion; Motor Skills; Music; Sound Spectrography; Tensile Strength; Touch; Vibration


Compressed sensing for acoustic response reconstruction: Interpolation of the early part. Mignot, R., L. Daudet, and F. Ollivier. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 225–228., 2011.
Résumé: The goal of this paper is to interpolate Room Impulse Responses (RIRs) within a whole volume, from a few measurements. We here focus on the early reflections, that have the key property of being sparse in the time domain: this can be exploited in a framework of modelbased Compressed Sensing. Starting from a set of RIRs randomly sampled in space by a 3D microphone array, we use a modified Matching Pursuit algorithm to estimate the position of a small set of virtual sources. Then, the reconstruction of the RIRs at interpolated positions is performed using a projection onto a basis of monopoles. This approach is validated both by numerical and experimental measurements using a 120microphone 3D array. © 2011 IEEE.
MotsClés: Compressed Sensing; Interpolation; Microphone Arrays; Room Impulse Responses; Source Localization; 3D arrays; Acoustic response; Compressed sensing; Experimental measurements; Matching pursuit algorithms; Microphone Arrays; Room impulse response; Source localization; Time domain; Virtual sources; Audio signal processing; Interpolation; Microphones; Signal reconstruction; Three dimensional; Audio acoustics


A parametric model of piano tuning. Rigaud, F., B. David, and L. Daudet. In Proceedings of the 14th International Conference on Digital Audio Effects, DAFx 2011, 393–400., 2011.
Résumé: A parametric model of aural tuning of acoustic pianos is presented in this paper. From a few parameters, a whole tessitura model is obtained, that can be applied to any kind of pianos. Because the tuning of piano is strongly linked to the inharmonicity of its strings, a 2parameter model for the inharmonicity coefficient along the keyboard is introduced. Constrained by piano string design considerations, its estimation requires only a few notes in the bass range. Then, from tuning rules, we propose a 4parameter model for the fundamental frequency evolution on the whole tessitura, taking into account the model of the inhamonicity coefficient. The global model is applied to 5 different pianos (4 grand pianos and 1 upright piano) to control the quality of the tuning. Besides the generation of tuning reference curves for nonprofessional tuners, potential applications could include the parametrization of synthesizers, or its use in transcription / source separation algorithm as a physical constraint to increase robustness.
MotsClés: Acoustic pianos; Fundamental frequencies; Global models; Grand piano; Parametric models; Parametrizations; Physical constraints; Piano strings; Piano tuning; Potential applications; Reference curves; Separation algorithms; Tuning rules; Algorithms; Models; Musical instruments


Decompositions in sound elements and musical applications. Lagrange, M., R. Badeau, B. David, N. Bertin, O. Derrien, S. Marchand, and L. Daudet. Traitement du Signal 28, no. 6 (2011): 665–689.
Résumé: In this paper is presented the DESAM project which was divided in two parts. The first one was devoted to the theoretical and experimental study of parametric and nonparametric techniques for decomposing audio signals into sound elements. The second part focused on some musical applications of these decompositions. Most aspects that have been considered in this project have led to the proposal of new methods which have been grouped together into the socalled DESAM Toolbox, a set of Matlab® functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of stateoftheart signal processing tools that decompose music recordings according to different signal models, giving rise to different “midlevel” representations. © 2011 Lavoisier.
MotsClés: Audio processing; Sound modeling; Spectral models; Audio processing; Core functions; Music information retrieval; Music recording; Nonparametric techniques; Sound modeling; Spectral models; Theoretical and experimental; Audio signal processing; Decomposition; Audio acoustics


Localization and identification of sound sources using “compressive sampling” techniques. Peillot, A., F. Ollivier, G. Chardon, and L. Daudet. In 18th International Congress on Sound and Vibration 2011, ICSV 2011, 2713–2720. Vol. 4., 2011.
Résumé: “Compressive sampling” (CS) is a new signal acquisition strategy that intends to reduce significantly the amount of recorded data by picking only a limited number of samples. CS theory asserts that one can reconstruct a given signal from a few randomly distributed samples if only the signal is sparse in a proper basis. CS ensures a minimum loss of information but requires, for the reconstruction of the signal, the use of dedicated sparsitypromoting algorithms. In this paper, CS is applied to the source localization problem using an array of randomly distributed microphones. In this case, the signal of interest is sparse in the spatial domain, i.e a few positions in space contain sources. We focus on the nearfield beamforming where the array of sensors is sensitive to the sources directivity. The localization method is extended to complex sources and we attempt to identify them in terms of multipoles. Numerical simulations and experimental results prove this sparsitypromoting method to be powerful for source localization. However the identification step, quite successful on ideal data, is not sufficiently robust when applied to experimental data and need further investigations.
MotsClés: Array of sensors; Compressive sampling; Directivity; Localization and identification; Localization method; Multipoles; Nearfield; Number of samples; Randomly distributed; Signal acquisitions; Signal of interests; Sound source; Source localization; Spatial domains; Safety engineering; Vibrations (mechanical); Signal processing


Signal reconstruction from STFT magnitude: A state of the art. Sturmel, N., and L. Daudet. In Proceedings of the 14th International Conference on Digital Audio Effects, DAFx 2011, 375–386., 2011.
Résumé: This paper presents a review on techniques for signal reconstruction without phase, i.e. when only the spectrogram (the squared magnitude of the Short Time Fourier Transform) of the signal is known. The now standard Griffin and Lim algorithm will be presented, and compared to more recent blind techniques. Two important issues are raised and discussed: first, the definition of relevant criteria to evaluate the performances of different algorithms, and second the question of the unicity of the solution. Some ways of reducing the complexity of the problem are presented with the injection of additional information in the reconstruction. Finally, issues that prevents optimal reconstruction are examined, leading to a discussion on what seem the most promising approaches for future research.
MotsClés: Blind technique; Short time Fourier transforms; Spectrograms; State of the art; Algorithms; Signal reconstruction; Signal analysis


Recursive nearest neighbor search in a sparse and multiscale domain for comparing audio signals. Sturm, B. L., and L. Daudet. Signal Processing 91, no. 12 (2011): 2836–2851.
Résumé: We investigate recursive nearest neighbor search in a sparse domain at the scale of audio signals. Essentially, to approximate the cosine distance between the signals we make pairwise comparisons between the elements of localized sparse models built from large and redundant multiscale dictionaries of timefrequency atoms. Theoretically, error bounds on these approximations provide efficient means for quickly reducing the search space to the nearest neighborhood of a given data; but we demonstrate here that the best bound defined thus far involving a probabilistic assumption does not provide a practical approach for comparing audio signals with respect to this distance measure. Our experiments show, however, that regardless of these nondiscriminative bounds, we only need to make a few atom pair comparisons to reveal, e.g., the origin of an excerpted signal, or melodies with similar timefrequency structures. © 2011 Elsevier B.V. All rights reserved.
MotsClés: Audio similarity; Multiscale decomposition; Sparse approximation; Time – frequency dictionary; Audio signal; Audio similarity; Distance measure; Error bound; Multiscale Decomposition; Multiscales; Nearest Neighbor search; Nearest neighborhood; Pair comparisons; Pairwise comparison; Probabilistic assumptions; Search spaces; Sparse approximation; Time – frequency dictionary; Time frequency; Timefrequency atoms; Error analysis


Plate impulse response spatial interpolation with subNyquist sampling. Chardon, G., A. Leblanc, and L. Daudet. Journal of Sound and Vibration 330, no. 23 (2011): 5678–5689.
Résumé: Impulse responses of vibrating plates are classically measured on a fine spatial grid satisfying the ShannonNyquist spatial sampling criterion, and interpolated between measurement points. For homogeneous and isotropic plates, this study proposed a more efficient sampling and interpolation process, inspired by the recent paradigm of compressed sensing. Remarkably, this method can accommodate any starconvex shape and unspecified boundary conditions. Here, impulse responses are first decomposed as sums of damped sinusoids, using the Simultaneous Orthogonal Matching Pursuit algorithm. Finally, modes are interpolated using a plane wave decomposition. As a beneficial side effect, these algorithms can also be used to obtain the dispersion curve of the plate with a limited number of measurements. Experimental results are given for three different plates of different shapes and boundary conditions, and compared to classical Shannon interpolation. © 2011 Elsevier Ltd. All rights reserved.
MotsClés: Aplane; Compressed sensing; Damped sinusoids; Dispersion curves; Efficient sampling; Interpolation process; Isotropic plates; Measurement points; Orthogonal matching pursuit; Side effect; Spatial grids; Spatial interpolation; Spatial sampling; SubNyquist sampling; Vibrating plate; Algorithms; Boundary conditions; Impulse response; Interpolation


Compressively sampling the plenacoustic function. Mignot, R., G. Chardon, and L. Daudet. In Proceedings of SPIE – The International Society for Optical Engineering. Vol. 8138., 2011.
Résumé: Directly measuring the full set of acoustic impulse responses within a room would require an unreasonably large number of measurements. Considering that the acoustic wavefield is sparse in some dictionaries, Compressed Sensing allows the recovery of the full wavefield with a reduced set of measurements, but raises challenging computational and memory issues. Two practical algorithms are presented and compared: one that exploits the structured sparsity of the soundfield, with projections of the modes onto plane waves sharing the same wavenumber, and one that computes a sparse decomposition on a dictionary of independent plane waves with time/space variable separation. © 2011 Copyright Society of PhotoOptical Instrumentation Engineers (SPIE).
MotsClés: Compressed Sensing; Interpolation; Plane waves; Room Impulse Responses; Sparsity; Compressed sensing; Plane wave; Practical algorithms; Room impulse response; Sparse decomposition; Sparsity; Variable separation; Wave numbers; Wavefields; Signal reconstruction; Elastic waves


Soft Bayesian pursuit algorithm for sparse representations. Drémeau, A., C. Herzet, and L. Daudet. In IEEE Workshop on Statistical Signal Processing Proceedings, 341–344., 2011.
Résumé: This paper deals with sparse representations within a Bayesian framework. For a BernoulliGaussian model, we here propose a method based on a meanfield approximation to estimate the support of the signal. In numerical tests involving a recovery problem, the resulting algorithm is shown to have good performance over a wide range of sparsity levels, compared to various stateoftheart algorithms. © 2011 IEEE.
MotsClés: BernoulliGaussian model; meanfield approximation; Sparse representations; Bayesian frameworks; BernoulliGaussian model; Mean field approximation; Numerical tests; Sparse representation; Stateoftheart algorithms; Signal processing; Algorithms


Audio signal representations for factorization in the sparse domain. Moussallam, M., L. Daudet, and G. Richard. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 513–516., 2011.
Résumé: In this paper, a new class of audio representations is introduced, together with a corresponding fast decomposition algorithm. The main feature of these representations is that they are both sparse and approximately shiftinvariant, which allows similarity search in a sparse domain. The common sparse support of detected similar patterns is then used to factorize their representations. The potential of this method for simultaneous structural analysis and compressing tasks is illustrated by preliminary experiments on simple musical data. © 2011 IEEE.
MotsClés: Audio Signal Decomposition; Audio Similarity; Factorization; Matching Pursuit; Sparse Representation; Audio representation; Audio signal; Audio Similarity; Fast decomposition; Matching pursuit; Shift invariant; Similar pattern; Similarity search; Sparse representation; Audio acoustics; Factorization; Speech communication; Structural analysis; Audio signal processing


Sturm, B. L., and L. Daudet. On similarity earch in audio signals using adaptive sparse approximations. Vol. 6535 LNCS., 2011.
Résumé: We explore similarity search in data compressed and described by adaptive methods of sparse approximation, specifically audio signals. The novelty of this approach is that one circumvents the need to compute and store a database of features since sparse approximation can simultaneously provide a description and compression of data. We investigate extensions to a method previously proposed for similarity search in a homogenous image database using sparse approximation, but which has limited applicability to search heterogeneous databases with variablelength queries – necessary for any useful audio signal search procedure. We provide a simple example as a proof of concept, and show that similarity search within adapted sparse domains can provide fast and efficient ways to search for data similar to a given query. © 2011 SpringerVerlag Berlin Heidelberg.
MotsClés: Adaptive methods; Audio signal; Heterogeneous database; Image database; Proof of concept; Search procedures; Similarity search; Sparse approximations; Query languages; Data compression


Optimal subsampling of multichannel damped sinusoids. Chardon, G., and L. Daudet. In 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop, SAM 2010, 25–28., 2010.
Résumé: In this paper, we investigate the optimal ways to sample multichannel impulse responses, composed of a small number of exponentially damped sinusoids, under the constraint that the total number of samples is fixed – for instance with limited storage / computational power. We compute CramérRao bounds for multichannel estimation of the parameters of a damped sinusoid. These bounds provide the length during which the signals should be measured to get the best results, roughly at 2 times the typical decay time of the sinusoid. Due to bandwidth constraints, the signals are best sampled irregularly, and variants of Matching Pursuit and MUSIC adapted to the irregular sampling and multichannel data are compared to the CramérRao bounds. In practical situation, this method leads to savings in terms of memory, data throughput and computational complexity. © 2010 IEEE.
MotsClés: Array signal processing; Compressed sensing; Spectral analysis; Array signal processing; Bandwidth constraint; Compressed sensing; Computational power; Damped sinusoids; Data throughput; Decay time; Exponentially damped sinusoids; Irregular sampling; Limited storage; Matching pursuit; Multichannel; Multichannel data; Multichannel estimation; Number of samples; Spectral analysis; Optimization; Sensor arrays; Signal processing; Signal reconstruction; Spectrum analysis; Computational complexity


The DESAM toolbox: Spectral analysis of musical audio. Lagrange, M., R. Badeau, B. David, N. Bertin, J. Echeveste, O. Derrien, S. Marchand, and L. Daudet. In 13th International Conference on Digital Audio Effects, DAFx 2010 Proceedings., 2010.
Résumé: In this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of stateoftheart signal processing tools that decompose music files according to different signal models, giving rise to different “midlevel” representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities.
MotsClés: Core functions; Matlab functions; Music files; Music information retrieval; Music signals; Musical audio; Signal models; Spectral models; Signal processing; Spectrum analysis; Audio acoustics


Hybrid coding/indexing strategy for informed source separation of linear instantaneous underdetermined audio mixtures. Parvaix, M., L. Girin, L. Daudet, J. Pinel, and C. Baras. In 20th International Congress on Acoustics 2010, ICA 2010 – Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society, 3987–3994. Vol. 5., 2010.
Résumé: We present a system for underdetermined source separation of nonstationary audio signals from a stereo 2channel linear instantaneous mixture. This system is dedicated to isolate the different instruments/voices of a piece of music, so that an enduser can separately manipulate those source signals. The problem is addressed with a specific informed approach, that is implemented with a coder corresponding to the step of music production, and a separate decoder corresponding to the step of signal restitution. At the coder, source signals are assumed to be available, and are used to i) generate the stereo 2channel mix signal, and ii) extract a small amount of distinctive features embedded into the mix signal using an inaudible watermarking technique. At the decoder, extracting and exploiting the watermark from the transmitted mix signal enables an enduser who has no direct access to the original source signals to separate these source signals from the mix signal. In the present study, we propose a new hybrid system that merges two techniques of informed source separation: a subset of the source signals are encoded using a “sourceschannel coding” approach, and another subset are selected for local inversion of the mixture. The respective codes and indexes are transmitted to the decoder using a new highcapacity watermarking technique. At the decoder, the encoded source signals are decoded and then subtracted from the mixture signal, before local inversion of the remaining submixture signal leads to the estimation of the second subset of source signals. This hybrid separation technique enables to efficiently combine the advantages of both coding and inversion approaches. We report experiments with 5 different source signals separated from stereo mixtures, with a remarkable quality, enabling separate manipulation during music restitution.
MotsClés: Audio signal; End users; Highcapacity; Hybrid coding; Instantaneous mixtures; Mixture signals; Music production; Nonstationary; Separation techniques; Source signals; Underdetermined; Watermarking techniques; Audio watermarking; Hybrid systems; Separation; Signal analysis; Mixtures


Musical instrument identification using multiscale melfrequency cepstral coefficients. Sturm, B. L., M. Morvidone, and L. Daudet. In European Signal Processing Conference, 477–481., 2010.
Résumé: We investigate the benefits of evaluating Melfrequency cepstral coefficients (MFCCs) over several time scales in the context of automatic musical instrument identification for signals that are monophonic but derived from real musical settings. We define several sets of features derived from MFCCs computed using multiple time resolutions, and compare their performance against other features that are computed using a single time resolution, such as MFCCs, and derivatives of MFCCs. We find that in each task – pairwise discrimination, and one vs. all classification – the features involving multiscale decompositions perform significantly better than features computed using a single timeresolution. © EURASIP, 2010.
MotsClés: Melfrequency cepstral coefficients; Multiscale Decomposition; Multiscales; Musical instrument identification; Musical setting; Time resolution; Timescales; Signal processing; Decomposition


How sparsely can a signal be approximated while keeping its class identity? Moussallam, M., T. Fillon, G. Richard, and L. Daudet. In MML'10 – Proceedings of the 3rd ACM International Workshop on Machine Learning and Music, Colocated with ACM Multimedia 2010, 25–28., 2010.
Résumé: This paper explores the degree of sparsity of a signal approximation that can be reached while ensuring that a sufficient amount of information is retained, so that its main characteristics remains. Here, sparse approximations are obtained by decomposing the signals on an overcomplete dictionary of multiscale timefrequency “atoms”. The resulting representation is highly dependent on the choice of dictionary, decomposition algorithm and depth of the decomposition. The class identity is measured by indirect means as the speech/music discrimination power of features derived from the sparse representation compared to those of classical PCMbased features. Evaluation is performed on French Broadcast TV and Radio recordings from the QUAERO project database with two different statistical classifiers.
MotsClés: Algorithms; Experimentation; Amount of information; Broadcast TV; Decomposition algorithm; Experimentation; Main characteristics; Multiscales; Overcomplete dictionaries; Project database; Signal approximation; Sparse approximations; Sparse representation; Speech/music discrimination; Statistical classifier; Time frequency; Classification (of information); Learning systems; Speech recognition; Algorithms


Structured and incoherent parametric dictionary design. Yaghoobi, M., L. Daudet, and M. E. Davies. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 5486–5489., 2010.
Résumé: A new dictionary selection approach for sparse coding, called parametric dictionary design, has recently been introduced. The aim is to choose a dictionary from a class of admissible dictionaries which can be presented parametrically. The designed dictionary satisfies a constraint, here the incoherence property, which can help conventional sparse coding methods to find sparser solutions in average. In this paper, an extra constraint will be applied on the parametric dictionaries to find a structured dictionary. Various structures can be imposed on dictionaries to promote a correlation between the atoms. We intentionally choose a structure to implement the dictionary using a set of filter banks. This indeed helps to implement the dictionarysignal multiplications more efficiently. The price we pay for the extra structure is that the designed dictionary is not as incoherent as unstructured parametric designed dictionaries. ©2010 IEEE.
MotsClés: Dictionary selection; Parametric dictionary design; Sparse approximation; Structured dictionary; Dictionary selection; Parametric dictionary design; Sparse approximations; Sparse coding; Structured dictionary; Filter banks; Signal processing; Design


Incorporating scale information with cepstral features: Experiments on musical instrument recognition. Morvidone, M., B. L. Sturm, and L. Daudet. Pattern Recognition Letters 31, no. 12 (2010): 1489–1497.
Résumé: We present two sets of novel features that combine multiscale representations of signals with the compact timbral description of Melfrequency cepstral coefficients (MFCCs). We define one set of features, OverCs, from overcomplete transforms at multiple scales. We define the second set of features, SparCs, from a signal model found by sparse approximation. We compare the descriptiveness of our features against that of MFCCs by performing two simple tasks: pairwise musical instrument discrimination, and musical instrument classification. Our tests show that both OverCs and SparCs improve the characterization of the global timbre and local stationarity of an audio signal than do mean MFCCs with respect to these tasks. © 2009 Elsevier B.V. All rights reserved.
MotsClés: Audio signal classification; Musical instrument recognition; Sparse decompositions; Timefrequency/timescale features; Audio signal; Audio signal classification; Cepstral features; Melfrequency cepstral coefficients; Multiple scale; Multiscale representations; Overcomplete; Signal models; Sparse approximations; Sparse decomposition; Stationarity; Audio acoustics; Instruments; Signal analysis; Speech recognition; Statistical tests; Wavelet transforms; Electronic musical instruments


Pattern recognition of nonspeech audio. Aucouturier, J.  J., and L. Daudet. Pattern Recognition Letters 31, no. 12 (2010): 1487–1488.


Editorial for the special issue on signal models and representations of musical and environmental sounds. David, B., M. Goto, L. Daudet, and P. Smaragdis. IEEE Transactions on Audio, Speech and Language Processing 18, no. 3 (2010): 417–419.


Parametric Dictionary Design for Sparse Coding. Yaghoobi, M., L. Daudet, and M. E. Davies. IEEE Transactions on Signal Processing 57, no. 12 (2009): 4800–4810.
Résumé: This paper introduces a new dictionary design method for sparse coding of a class of signals. It has been shown that one can sparsely approximate some natural signals using an overcomplete set of parametric functions. A problem in using these parametric dictionaries is how to choose the parameters. In practice, these parameters have been chosen by an expert or through a set of experiments. In the sparse approximation context, it has been shown that an incoherent dictionary is appropriate for the sparse approximation methods. In this paper, we first characterize the dictionary design problem, subject to a constraint on the dictionary. Then we briefly explain that equiangular tight frames have minimum coherence. The complexity of the problem does not allow it to be solved exactly. We introduce a practical method to approximately solve it. Some experiments show the advantages one gets by using these dictionaries. © 2009 IEEE.
MotsClés: Dictionary design; Exact sparse recovery; Gammatone filter banks; Incoherent dictionary; Parametric dictionary; Sparse approximation; Design method; Design problems; Exact sparse recovery; Incoherent dictionary; Overcomplete; Parametric functions; Practical method; Sparse approximations; Sparse coding; Sparse recovery; Tight frame; Approximation theory; Filter banks; Design

