Institut Langevin - Ondes et Images : Laurent DAUDET

Aucun résultat

Université Paris Diderot - Paris 7

Thèmes de recherche

Traitement du signal
Imagerie acoustique par échantillonnage compressif
Traitement du signal pour l’optique à travers les milieux multiplement diffusants

Site du thème Imagerie et détection non-conventionnelles (NCIS)

Teaching for the academic year 2016-2017

I am currently on leave from teaching activities

The news

Sept 2016 : Starting a new challenge : I am officially on leave to become full-time CTO of the LightOn startup ! Let us make
Machine Learning scalable and sustainable thanks to new optical components

Keynote speaker at ELM’2015, Hangzhou (Dec 2015).

Sept 2015 : my 5-year IUF fellowship has ended - a great time with plenty of exciting projects !

June 2015 : Completed the stunning IHEST program

Postdoc position on Compressed Sensing Imaging through multiply scattering materials : this position has been filled, applications are closed.

A short video without equations (but in French, sorry !) about our study of Nearfield Acoustic Holography using Compressed Sensing, on INRIA videotheque

Dec 2013 : field trip in Democratic Republic of Congo to record bat trajectories !

Video presentation (In French) La vérité si je m’embrouille, Journées Scientifiques annuelles de l’Institut Universitaire de France, Toulouse (Avril 2013)

Who’s hiding in the picture ?

Keynote speaker at CMMR’2012, London (June 2012).

Best peer-reviewed paper award for the article "Linear mixing models for active listening of music productions in realistic studio conditions", by Sturmel N., Liutkus A., Pinel J., Girin L., Marchand S., Richard G., Badeau R., and Daudet L., at the AES 132nd Convention, Budapest (April 2012).

April 2012 : Named Visiting professor at NII, Tokyo, for a collaboration with Prof. Nobutaka Ono.

The gang

Shoichi Koyama (Visiting Lecturer from University of Tokyo), JSPS grant.

Ivan Dokmanic (Postdoc, joint project with Martin Vetterli @ EPFL and Stéphane Mallat @ ENS)

Former members

Hequn Bai
Julien Capobianco
Delphine Chadefaux
Gilles Chardon
Laure Cornu
Angélique Drémeau
Pierre Leveau
Na Liu
Antoine Liutkus
Rémi Mignot
Manuel Moussalam
Thibault Nowakowski
Antoine Peillot
Boshra Rajaei
Emmanuel Ravelli
François Rigaud
David Sodoyer
Bob Sturm
Nicolas Sturmel

The press

Liste de publications (Sept. 2016)

Please email me for PDFs that you cannot find below or on my
Google scholar profile

Publications à l’Institut Langevin

Optimizing Source and Sensor Placement for Sound Field Control: An Overview Koyama, S., G. Chardon, and L. Daudet IEEE/ACM Transactions on Audio Speech and Language Processing 28, 696-714 (2020) Résumé: © 2014 IEEE. In order to control an acoustic field inside a target region, it is important to choose suitable positions of secondary sources (loudspeakers) and sensors (control points/microphones). This article provides an overview of state-of-the-art source and sensor placement methods in sound field control. Although the placement of both sources and sensors greatly affects control accuracy and filter stability, their joint optimization has not been thoroughly investigated in the acoustics literature. In this context, we reformulate five general source and/or sensor placement methods that can be applied for sound field control. We compare the performance of these methods through extensive numerical simulations in both narrowband and broadband scenarios. Mots-clés: interpolation; sound field control; sound field reproduction; Source and sensor placement; subset selection
Sparse Representation of a Spatial Sound Field in a Reverberant Environment Koyama, S., and L. Daudet IEEE Journal on Selected Topics in Signal Processing 13, no. 1, 172-184 (2019) Résumé: © 2007-2012 IEEE. This paper investigates sound-field modeling in a realistic reverberant setting. Starting from a few point-like microphone measurements, the goal is to estimate the direct source field within a whole three-dimensional (3-D) space around these microphones. Previous sparse sound field decompositions assumed only a spatial sparsity of the source distribution, but could generally not handle reverberation. We here add an explicit model of the reverberant sound field, that has two components: the first component sparse in the plane-wave domain, the other component low-rank as a multiplication of transfer functions and source signals. We derive the corresponding decomposition algorithm based on the alternating direction method of multipliers. We furthermore provide empirical rules for tuning the two parameters to be set in the algorithm. Numerical and experimental results indicate that the decomposition and reconstruction performances are significantly improved, in the case of reverberant environments. Mots-clés: reverberation; Sound field decomposition; sound field recording; source identification; sparse representation
Joint Source and Sensor Placement for Sound Field Control Based on Empirical Interpolation Method Koyama, S., G. Chardon, and L. Daudet ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2018-April, 501-505 (2018) Résumé: © 2018 IEEE. This study proposes a principled method to jointly determine the placement of acoustic sources (loudspeakers) and sensors (control points/microphones) in sound field control. The goal of this setup is to efficiently produce a sound field using multiple loudspeakers, approximately matching a target sound field over a region of interest. Therefore, the loudspeaker and control-point placement problem can be seen as the problem of finding interpolating functions (associated with individual loudspeaker sound fields) and sampling points (corresponding to control points or microphones) to approximate the target sound field in the given domain. We here solve this problem using the empirical interpolation method, originally developed for the numerical analysis of partial differential equations. The proposed method enables a joint determination of loudspeaker and control-point placement, from a large set of candidate locations, independently of the desired sound field. Numerical simulation results indicate that accurate and stable sound field control can be achieved by the proposed method, with significantly better results than with random and regular placements. Mots-clés: Interpolation; Magic points; Sound field control; Sound field reproduction; Source and sensor placement
Compressive acoustic holography with block-sparse regularization Fernandez-Grande, E., and L. Daudet Journal of the Acoustical Society of America 143, no. 6, 3737-3746 (2018) Résumé: © 2018 Acoustical Society of America. Sparse reconstruction methods, such as Compressive Sensing, are powerful methods in acoustic array processing, as they make wideband reconstruction possible. However, when addressing sound fields that are not necessarily sparse (e.g., in acoustic near-fields, reflective environments, extended sources, etc.), the methods can lead to a poor reconstruction of the sound field. This study examines the use of sparse analysis priors to promote block-sparse solutions. In particular, a Fused Total Generalized Variation (F-TGV) method is developed, to analyze the sound field in the near-field of acoustic sources. The method promotes sparsity both on the spatial derivatives of the solution and on the solution itself, thus seeking solutions where the non-zero coefficients are grouped together. The performance of the method is examined numerically and experimentally, and compared with established methods. The results indicate that the F-TGV method is suitable to examine both compact and spatially extended sources. The method is promising for its generality, robustness to noise, and the capability to provide a wideband reconstruction of sound fields that are not necessarily sparse.
Comparison of reverberation models for sparse sound field decomposition Koyama, S., and L. Daudet IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2017-October, 214-218 (2017) Résumé: © 2017 IEEE. Sparse representations of sound fields have become popular in various acoustic inverse problems. The simplest models assume spatial sparsity, where a small number of sound sources are located in the near-field. However, the performance of these models deteriorates in the presence of strong reverberation. To properly treat the reverberant components, we introduce three types of reverberation models: a low-rank model, a sparse model in the plane-wave domain, and a combined low-rank+sparse model. We discuss corresponding decomposition algorithms based on ADMM convex optimization. Numerical simulations indicate that the decomposition accuracy is significantly improved by the additive model of low-rank and sparse plane wave models. Mots-clés: convex optimization; inverse problems; reverberation; sound field analysis; Sound field decomposition; sparse representations
Robust source localization from wavefield separation including prior information Nowakowski, T., J. De Rosny, and L. Daudet Journal of the Acoustical Society of America 141, no. 4, 2375-2386 (2017) Résumé: © 2017 Acoustical Society of America.Strong reverberation is a challenge for narrowband source localization, as most of the existing methods are based on times-of-arrival measurements, that is affected by boundaries. Amongst the methods that explicitly take into account the reverberation, wavefield separation projector processing (WSPP) splits the acoustic wave field into the direct path of the sources and the reverberation. However, WSPP requires a very large number of microphones, making this method impractical. This article studies three ways of alleviating this constraint, extending WSPP by adding different prior information on the wavefield. The first method is based on using the knowledge of the critical distance of the room to decrease the selectivity of the field separation. The second method adds constraints called “virtual measurements” when the room geometry is partially known. Finally, the last method requires a simple calibration step to estimate the Green's functions between each pair of microphones; this also extends the model to weakly inhomogeneous propagation media. It is shown numerically and experimentally that these methods allow a precise source localization, with a reduced number of microphones.
Intensity-only measurement of partially uncontrollable transmission matrix: demonstration with wave-field shaping in a microwave cavity Del Hougne, P., B. Rajaei, L. Daudet, and G. Lerosey Optics Express 24, no. 16, 18631-18641 (2016)
Fast Phase Retrieval for High Dimensions: A Block-Based Approach Rajaei, B., S. Gigan, F. Krzakala, and L. Daudet Ieee Signal Processing Letters 23, no. 8, 1179-1182 (2016) Mots-clés: Convex optimization; inverse problems; phase retrieval (PR)
Compressive Sensing in Acoustic Imaging Bertin, N., L. Daudet, V. Emiya, and R. Gribonval Applied and Numerical Harmonic Analysis, 169-192 (2015)
Geometric-based reverberator using acoustic rendering networks Bai, H., G. Richard, and L. Daudet 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2015 (2015) Résumé: © 2015 IEEE. Many virtual reality applications incorporate realistic room acoustic simulation to provide increased immersiveness and realism. Traditional geometric methods, although providing modeling accuracy, are usually impractical for use in interactive applications. At the same time, artificial reverberators, with feedback rendering structure, are widely used as a low-cost alternative. This paper presents the design of a geometric-based artificial reverberator inspired by the acoustic rendering equation (ARE) and the feedback delay networks (FDN). The simplified acoustic rendering equation, which models both specular and diffuse reflections, is incorporated with the FDN structure. Our reverberator, despite of modeling the diffuse and late reverberation, is also capable of simulating the early/specular reflections with accuracy. This novel work is among the very few works which are capable to simulate early reflections using feedback delay networks. Mots-clés: acoustic rendering equation; feedback delay networks; reverberation; room acoustics
Localization of acoustic sensors from passive Green's function estimation Nowakowski, T., L. Daudet, and J. De Rosny Journal of the Acoustical Society of America 138, no. 5, 3010-3018 (2015) Résumé: © 2015 Acoustical Society of America. A number of methods have recently been developed for passive localization of acoustic sensors, based on the assumption that the acoustic field is diffuse. This article presents the more general case of equipartition fields, which takes into account reflections off boundaries and/or scatterers. After a thorough discussion on the fundamental differences between the diffuse and equipartition models, it is shown that the method is more robust when dealing with wideband noise sources. Finally, experimental results show, for two types of boundary conditions, that this approach is especially relevant when acoustic sensors are close to boundaries.
Microphone array position calibration in the frequency domain using a single unknown source Nowakowski, T., L. Daudet, and J. De Rosny ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2015-August, 330-334 (2015) Résumé: © 2015 IEEE. We study the problem of microphone array localization in a strongly reverberant room, where time of arrivals (TOA) or time difference of arrivals (TDOA) cannot always be measured precisely. Instead, we use frequency-domain measurements to calibrate the array position, based on the modes of the room, excited by a wide-band single source, that can be unknown. By using the fact that each measured mode can be decomposed as a sum of model-based polynomials, we build a cost function whose minimum indicates the positions of the microphones. A simple Block Coordinate Descent algorithm can be used to minimize this cost function. Numerical results indicate that this algorithm converges to the right solution, and therefore that using frequency measurements for position calibration is a valid concept for dense arrays, as an alternative to time-domain methods in reverberant domains. Mots-clés: Array position calibration; modal interpolation; reverberation
Late Reverberation Synthesis: From Radiance Transfer to Feedback Delay Networks Bai, H., G. Richard, and L. Daudet IEEE/ACM Transactions on Speech and Language Processing 23, no. 12, 2260-2271 (2015) Résumé: © 2014 IEEE. In room acoustic modeling, feedback delay networks (FDN) are known to efficiently model late reverberation due to their capacity to generate exponentially decaying dense impulses. However, this method relies on a careful tuning of the different synthesis parameters, either estimated from a pre-recorded impulse response from the real acoustic scene, or set manually from experience. In this paper, we present a new method, which still inherits the efficiency of the FDN structure, but aims at linking the parameters of the FDN directly to the geometry setting. This relation is achieved by studying the sound energy exchange between each delay line using the acoustic radiance transfer method (RTM). Experimental results show that the late reverberation modeled by this method is in good agreement with the virtual geometry setting. Mots-clés: Acoustic radiance transfer; feedback delay networks (FDNs); reverberation; room acoustics
A Blind Dereverberation Method for Narrowband Source Localization Chardon, G., T. Nowakowski, J. De Rosny, and L. Daudet Ieee Journal Of Selected Topics In Signal Processing 9, no. 5, 815-824 (2015) Mots-clés: Source localization; microphone array; reverberation
Reference-less measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques Dremeau, A., A. Liutkus, D. Martina, O. Katz, C. Schuelke, F. Krzakala, S. Gigan, and L. Daudet Optics Express 23, no. 9, 11898-11911 (2015)
Investigation of the Harpist/Harp Interaction Chadefaux, D., J.-L. Le Carrou, B. Fabre, and L. Daudet Lecture Notes in Computer Science 8905, 3-19 (2014) Mots-clés: Harp; High-speed video analysis; Motion capture; Acoustics; Data mining
Investigation of the Harpist/Harp Interaction Chadefaux, D., J. L. Le Carrou, B. Fabre, and L. Daudet Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8905, 3-19 (2014) Résumé: © Springer International Publishing Switzerland 2014 This paper presents a contribution to the field of the musician/instrument interaction analysis. This study aims at investigating the mechanical parameters that govern the harp plucking action as well as the gestural strategies set up by harpists to control a musical performance. Two specific experimental procedures have been designed to accurately describe the harpist motion in realistic playing contexts. They consist in filming the plucking action and the harpists gestures using a high-speed camera and a motion capture system, respectively. Simultaneously, acoustical measurements are performed to relate the kinematic investigation to sound features. Results describe the musical gesture characteristics. Mechanical parameters governing the finger/string interaction are highlighted and their influence on the produced sound are discussed. Besides, the relationship between non sound-producing gestures and musical intent is pointed out. Finally, the way energy is shared between harpist arm joints according to various playing techniques is analyzed. Mots-clés: Acoustics; Data mining; Harp; High-speed video analysis; Motion capture
A general framework for dictionary based audio fingerprinting Moussallam, M., and L. Daudet ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 3077-3081 (2014) Résumé: Fingerprint-based Audio recognition system must address concurrent objectives. Indeed, fingerprints must be both robust to distortions and discriminative while their dimension must remain to allow fast comparison. This paper proposes to restate these objectives as a penalized sparse representation problem. On top of this dictionary-based approach, we propose a structured sparsity model in the form of a probabilistic distribution for the sparse support. A practical suboptimal greedy algorithm is then presented and evaluated on robustness and recognition tasks. We show that some existing methods can be seen as particular cases of this algorithm and that the general framework allows to reach other points of a Pareto-like continuum. © 2014 IEEE. Mots-clés: Audio Fingerprinting; Sparse Representation
Image transmission through a scattering medium: Inverse problem and sparsity-based imaging Gigan, S., S. M. Popoff, A. Liutkus, D. Martina, O. Katz, G. Chardon, R. Carminati, G. Lerosey, M. A. Fink., A. C. Boccara, I. Carron, and L. Daudet 2014 13th Workshop on Information Optics, WIO 2014 (2014) Résumé: © 2014 IEEE. We demonstrate how to measure accurately the transmission matrix of a complex medium. With this information, we show how to focus light, recover an image, and even perform efficient reconstruction of a sparse object.
Random calibration for accelerating MR-ARFI guided ultrasonic focusing in transcranial therapy Liu, N., A. Liutkus, J. F. Aubry, L. Marsac, M. Tanter, and L. Daudet Physics in Medicine and Biology 60, no. 3, 1069-1085 (2015) Résumé: © 2015 Institute of Physics and Engineering in Medicine. Transcranial focused ultrasound is a promising therapeutic modality. It consists of placing transducers around the skull and emitting shaped ultrasound waves that propagate through the skull and then concentrate on one particular location within the brain. However, the skull bone is known to distort the ultrasound beam. In order to compensate for such distortions, a number of techniques have been proposed recently, for instance using Magnetic Resonance Imaging feedback. In order to fully determine the focusing distortion due to the skull, such methods usually require as many calibration signals as transducers, resulting in a lengthy calibration process. In this paper, we investigate how the number of calibration sequences can be significantly reduced, based on random measurements and optimization techniques. Experimental data with six human skulls demonstrate that the number of measurements can be up to three times lower than with the standard methods, while restoring 90% of the focusing efficiency. Mots-clés: brain; calibration; focused ultrasound; MR-ARFI; therapeutic; transcranial; ultrasound
Blind Denoising with Random Greedy Pursuits Moussallam, M., A. Gramfort, L. Daudet, and G. Richard Ieee Signal Processing Letters 21, no. 11, 1341-1345 (2014) Mots-clés: Please add index terms
Convex Optimization Approaches for Blind Sensor Calibration Using Sparsity Bilen, C., G. Puy, R. Gribonval, and L. Daudet Ieee Transactions On Signal Processing 62, no. 18, 4847-4856 (2014) Mots-clés: Compressed sensing; blind calibration; phase estimation; convex optimization; gain calibration
Imaging with nature: compressive imaging using a multiply scattering medium. Liutkus, A., D. Martina, S. Popoff, G. Chardon, O. Katz, G. Lerosey, S. Gigan, L. Daudet, and I. Carron Scientific reports 4, 5552 (2014)
Room reverberation reconstruction: Interpolation of the early part using compressed sensing Mignot, R., L. Daudet, and F. Ollivier IEEE Transactions on Audio, Speech and Language Processing 21, no. 11, 2301-2312 (2013) Résumé: This paper deals with the interpolation of the Room Impulse Responses (RIRs) within a whole volume, from as few measurements as possible, and without the knowledge of the geometry of the room. We focus on the early reflections of the RIRs, that have the key property of being sparse in the time domain: this can be exploited in a framework of model-based Compressed Sensing. Starting from a set of RIRs randomly sampled in the spatial domain of interest by a 3D microphone array, we propose a modified Matching Pursuit algorithm to estimate the position of a small set of virtual sources. Then, the reconstruction of the RIRs at interpolated positions is performed using a projection onto a basis of monopoles, which correspond to the estimated virtual sources. An extension of the proposed algorithm allows the interpolation of the positions of both source and receiver, using the acquisition of four different source positions. This approach is validated both by numerical examples, and by experimental measurements using a 3D array with up to 120 microphones. © 2006-2012 IEEE. Mots-clés: Compressed sensing; interpolation; microphone arrays; room impulse responses; source localization
An overview of informed audio source separation Liutkus, A., J.-L. Durrieu, L. Daudet, and G. Richard International Workshop on Image Analysis for Multimedia Interactive Services (2013) Résumé: Audio source separation consists in recovering different unknown signals called sources by filtering their observed mixtures. In music processing, most mixtures are stereophonic songs and the sources are the individual signals played by the instruments, e.g. bass, vocals, guitar, etc. Source separation is often achieved through a classical generalized Wiener filtering, which is controlled by parameters such as the power spectrograms and the spatial locations of the sources. For an efficient filtering, those parameters need to be available and their estimation is the main challenge faced by separation algorithms. In the blind scenario, only the mixtures are available and performance strongly depends on the mixtures considered. In recent years, much research has focused on informed separation, which consists in using additional available information about the sources to improve the separation quality. In this paper, we review some recent trends in this direction. © 2013 IEEE.
Gestural strategies in the harp performance Chadefaux, D., J.-L. L. Carrou, M. M. Wanderley, B. Fabre, and L. Daudet Acta Acustica united with Acustica 99, no. 6, 986-996 (2013) Résumé: This paper describes an experimentally-based analysis of the interaction between musician and instrument in the case of the classical concert harp. The study highlights gestural strategies used by three harpists while performing a short musical excerpt. As a result of years of practicing, a trained musician has developed the ability to deal with a number of trade-offs among simultaneous objectives while playing. She/he has obviously to set the instrument into vibration, but also to convey some musical intention to the audience and to communicate with other musicians, while keeping a safe posture with respect to articular and muscle pain. In order to precisely describe the motion strategies carried out by trained harpists, an experiment has been designed using a motion capture system and corresponding video and audio recordings. This provides accurate three-dimensional positioning of several markers disposed on the harpist and on the harp, within the execution of a musical piece. From the acquired gestural and acoustical signals, a set of kinematic and dynamic descriptors were extracted. The investigation shows that while each musician uses their own specific and repeatable upper-limb movements, the global sound-producing gesture is mostly controlled by the shoulders. Sound-facilitating hand gestures are highlighted for their supporting role to the musician throughout the musical piece. © S. Hirzel Verlag · EAA.
A parametric model and estimation techniques for the inharmonicity and tuning of the piano Rigaud, F., B. David, and L. Daudet Journal of the Acoustical Society of America 133, no. 5, 3107-3118 (2013) Résumé: Inharmonicity of piano tones is an essential property of their timbre that strongly influences the tuning, leading to the so-called octave stretching. It is proposed in this paper to jointly model the inharmonicity and tuning of pianos on the whole compass. While using a small number of parameters, these models are able to reflect both the specificities of instrument design and tuners practice. An estimation algorithm is derived that can run either on a set of isolated note recordings, but also on chord recordings, assuming that the played notes are known. It is applied to extract parameters highlighting some tuners choices on different piano types and to propose tuning curves for out-of-tune pianos or piano synthesizers. © 2013 Acoustical Society of America. Mots-clés: Estimation algorithm; Estimation techniques; Instrument designs; Parametric models; Tuning curve; Harmonic analysis; Tuners; Musical instruments
Low-complexity computation of plate eigenmodes with Vekua approximations and the method of particular solutions Chardon, G., and L. Daudet Computational Mechanics, 1-10 (2013) Résumé: This paper extends the method of particular solutions (MPS) to the computation of eigenfrequencies and eigenmodes of thin plates, in the framework of the Kirchhoff-Love plate theory. Specific approximation schemes are developed, with plane waves (MPS-PW) or Fourier-Bessel functions (MPS-FB). This framework also requires a suitable formulation of the boundary conditions. Numerical tests, on two plates with various boundary conditions, demonstrate that the proposed approach provides competitive results with standard numerical schemes such as the finite element method, at reduced complexity, and with large flexibility in the implementation choices. © 2013 Springer-Verlag Berlin Heidelberg. Mots-clés: Algorithms; Biharmonic equation; Eigenvalues; Kirchhoff plate theory; Numerical methods
Informed source separation using iterative reconstruction Sturmel, N., and L. Daudet IEEE Transactions on Audio, Speech and Language Processing 21, no. 1, 176-183 (2013) Résumé: This paper presents a technique for Informed Source Separation (ISS) of a single channel mixture, based on the Multiple Input Spectrogram Inversion (MISI) phase estimation method. The reconstruction of the source signals is iterative, alternating between a time-frequency consistency enforcement and a re-mixing constraint. A dual resolution technique is also proposed, for sharper transients reconstruction. The two algorithms are compared to a state-of-the-art Wiener-based ISS technique, on a database of fourteen monophonic mixtures, with standard source separation objective measures. Experimental results show that the proposed algorithms outperform both this reference technique and the oracle Wiener filter by up to 3 dB in distortion, at the cost of a significantly heavier computation. © 2012 IEEE. Mots-clés: Adaptive Wiener filtering; informed source separation; phase reconstruction; spectrogram inversion; Consistency enforcement; Iterative reconstruction; Multiple inputs; Objective measure; Phase estimation; Phase reconstruction; Reference technique; Resolution techniques; Single channels; Source signals; Spectrograms; Standard sources; Time frequency; Wiener filtering; WIENER filters; Algorithms; Spectrographs; Iterative methods
DReaM: A novel system for joint source separation and multi-track coding Marchand, S., R. Badeau, C. Baras, L. Daudet, D. Fourer, L. Girin, S. Gorlow, A. Liutkus, J. Pinel, G. Richard, N. Sturmel, and S. Zang 133rd Audio Engineering Society Convention 2012, AES 2012 2, 749-758 (2012) Résumé: Active listening consists in interacting with the music playing, has numerous applications from pedagogy to gaming, and involves advanced remixing processes such as generalized karaoke or respatialization. To get this new freedom, one might use the individual tracks that compose the mix. While multi-track formats loose backward compatibility with popular stereo formats and increase the file size, classic source separation from the stereo mix is not of sufficient quality. We propose a coder/decoder scheme for informed source separation. The coder determines the information necessary to recover the tracks and embeds it inaudibly in the mix, which is stereo and has a size comparable to the original. The decoder enhances the source separation with this information, enabling active listening. Mots-clés: Active listening; Backward compatibility; File sizes; Karaoke; Engineering; Industrial engineering; Cryptography
Phase-based informed source separation of music Sturmel, N., L. Daudet, and L. Girin 15th International Conference on Digital Audio Effects, DAFx 2012 Proceedings (2012) Résumé: This paper presents an informed source separation technique of monophonic mixtures. Although the vast majority of the separation methods are based on the time-frequency energy of each source, we introduce a new approach using solely phase information to perform the separation. The sources are iteratively reconstructed using an adaptation of the Multiple Input Spectrogram Inversion (MISI) algorithm from Gunawan and Sen. The proposed method is then tested against conventional MISI and Wiener filtering on monophonic signals and oracle conditions. Results show that at the cost of a larger computation time, our method outperforms both MISI and Wiener filtering in oracle conditions with much higher objective quality even with phase quantization. Mots-clés: Computation time; Monophonic signals; Multiple inputs; Phase information; Phase quantization; Separation methods; Separation techniques; Spectrograms; Time frequency; Wiener filtering; Adaptive filtering; Iterative methods; Separation; Source separation
Piano sound analysis using Non-negative Matrix Factorization with inharmonicity constraint Rigaud, F., B. David, and L. Daudet European Signal Processing Conference, 2462-2466 (2012) Résumé: This paper presents a method for estimating the tuning and the inharmonicity coefficient of piano tones, from single notes or chord recordings. It is based on the Non-negative Matrix Factorization (NMF) framework, with a parametric model for the dictionary atoms. The key point here is to include as a relaxed constraint the inharmonicity law modelling the frequencies of transverse vibrations for stiff strings. Applications show that this can be used to finely estimate the tuning and the inharmonicity coefficient of several notes, even in the case of high polyphony. The use of NMF makes this method relevant when tasks like music transcription or source/note separation are targeted. © 2012 EURASIP. Mots-clés: inharmonicity coefficient estimation; non-negative matrix factorization; piano tuning; Keypoints; Music transcription; Nonnegative matrix factorization; Parametric models; Piano sounds; Piano tuning; Transverse vibrations; Estimation; Factorization; Musical instruments; Signal processing; Harmonic analysis
Informed audio source separation: A comparative study Liutkus, A., S. Gorlow, N. Sturmel, S. Zhang, L. Girin, R. Badeau, L. Daudet, S. Marchand, and G. Richard European Signal Processing Conference, 2397-2401 (2012) Résumé: The goal of source separation algorithms is to recover the constituent sources, or audio objects, from their mixture. However, blind algorithms still do not yield estimates of sufficient quality for many practical uses. Informed Source Separation (ISS) is a solution to make separation robust when the audio objects are known during a so-called encoding stage. During that stage, a small amount of side information is computed and transmitted with the mixture. At a decoding stage, when the sources are no longer available, the mixture is processed using the side information to recover the audio objects, thus greatly improving the quality of the estimates at a cost of additional bitrate which depends on the size of the side information. In this study, we compare six methods from the state of the art in terms of quality versus bitrate, and show that a good separation performance can be attained at competitive bitrates. © 2012 EURASIP. Mots-clés: Audio source separation; Bit rates; Blind algorithms; Comparative studies; Separation algorithms; Separation performance; Side information; State of the art; Algorithms; Mixtures; Separation; Source separation
Audio source separation informed by redundancy with greedy multiscale decompositions Moussallam, M., G. Richard, and L. Daudet European Signal Processing Conference, 2644-2648 (2012) Résumé: This paper describes a greedy algorithm for audio source separation of repeated musical patterns. The problem is understood as retrieving from a set of mixtures the part that is redundant among them and the parts that are specific to only one mixture. The key assumption is the sparsity of all the sources in the same multiscale dictionary. Synthetic and real life examples of source separation of hand cut repeated musical patterns are exposed. Results shows that the proposed method succeeds in simultaneously providing a sparse approximant of the mixtures and a separation of the sources. © 2012 EURASIP. Mots-clés: audio source separation; greedy decompositions; Simultaneous sparse approximation; Approximants; Audio source separation; Greedy algorithms; Multi-scale Decomposition; Multiscales; Sparse approximations; Mixtures; Source separation
A framework for fingerprint-based detection of repeating objects in multimedia streams Fenet, S., M. Moussallam, Y. Grenier, G. Richard, and L. Daudet European Signal Processing Conference, 1464-1468 (2012) Résumé: We present an original framework for the detection of repeating objects in multimedia streams. This framework is designed so that it can work with any fingerprint model. A fingerprint is extracted for each incoming frame of the multimedia stream. The framework then manages this fingerprint so that if one similar frame comes later in the stream, it will be identified as a repetition. The framework has been tested with two distinct fingerprint models on simulated and real-world data. The results show that the framework performs well with both presented models and that it is suitable for industrial use-cases. © 2012 EURASIP. Mots-clés: Fingerprint; framework; indexing; repeating objects; Fingerprint; framework; Multimedia stream; Real world data; repeating objects; Computer simulation; Indexing (of information); Industrial applications; Media streaming; Signal processing; Pattern recognition
Narrowband source localization in an unknown reverberant environment using wavefield sparse decomposition Chardon, G., and L. Daudet ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 9-12 (2012) Résumé: We propose a method for narrowband localization of sources in an unknown reverberant field. A sparse model for the wavefield is introduced, derived from the physical equations. We compare two localization algorithms that take advantage on the structured sparsity naturally present into the model: a greedy iterative scheme, and an ℓ 1 minimization method. Both methods are validated in 2D on numerical simulations, and on experimental data with a chaotic-shaped plate. These results, robust with respect to the specific sampling of the field and to noise, show that this approach may be an interesting alternative to traditional approaches of source localization, when a large number of narrowband sensors are deployed. © 2012 IEEE. Mots-clés: acoustic waves; plate vibrations; room acoustics; source localization; sparsity; Iterative schemes; Localization algorithm; Localization of sources; Minimization methods; Narrow bands; Physical equations; Plate vibration; Reverberant environment; Room acoustics; Source localization; Sparse decomposition; sparsity; Wavefields; Acoustic waves; Acoustics; Architectural acoustics; Signal processing; Iterative methods
Blind calibration for compressed sensing by convex optimization Gribonval, R., G. Chardon, and L. Daudet ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2713-2716 (2012) Résumé: We consider the problem of calibrating a compressed sensing measurement system under the assumption that the decalibration consists in unknown gains on each measure. We focus on blind calibration, using measures performed on a few unknown (but sparse) signals. A naive formulation of this blind calibration problem, using ℓ 1 minimization, is reminiscent of blind source separation and dictionary learning, which are known to be highly non-convex and riddled with local minima. In the considered context, we show that in fact this formulation can be exactly expressed as a convex optimization problem, and can be solved using off-the-shelf algorithms. Numerical simulations demonstrate the effectiveness of the approach even for highly uncalibrated measures, when a sufficient number of (unknown, but sparse) calibrating signals is provided. We observe that the success/failure of the approach seems to obey sharp phase transitions. © 2012 IEEE. Mots-clés: blind signal separation; calibration; compressed sensing; dictionary learning; sparse recovery; Blind Signal Separation; Calibration problems; Compressive sensing; Convex optimization problems; Dictionary learning; Local minimums; Measurement system; Sparse recovery; Blind source separation; Convex optimization; Signal reconstruction; Calibration
Random time-frequency subdictionary design for sparse representations with greedy algorithms Moussallam, M., L. Daudet, and G. Richard ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 3577-3580 (2012) Résumé: Sparse signal approximation can be used to design efficient low bit-rate coding schemes. It heavily relies on the ability to design appropriate dictionaries and corresponding decomposition algorithms. The size of the dictionary, and therefore its resolution, is a key parameter that handles the tradeoff between sparsity and tractability. This work proposes the use of a non adaptive random sequence of subdictionaries in a greedy decomposition process, thus browsing a larger dictionary space in a probabilistic fashion with no additional projection cost nor parameter estimation. This technique leads to very sparse decompositions, at a controlled computational complexity. Experimental evaluation is provided as proof of concept for low bit rate compression of audio signals. © 2012 IEEE. Mots-clés: Matching Pursuits; Random Subdictionaries; Sparse Audio Coding; Audio Coding; Audio signal; Bit-rate coding; Decomposition algorithm; Decomposition process; Experimental evaluation; Greedy algorithms; Key parameters; Low Bit Rate; Matching pursuit; Proof of concept; Random sequence; Random Subdictionaries; Sparse decomposition; Sparse representation; Sparse signals; Time frequency; Algorithms; Parameter estimation; Signal processing; Design
Structured Bayesian orthogonal matching pursuit Drémeau, A., C. Herzet, and L. Daudet ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 3625-3628 (2012) Résumé: Taking advantage of the structures inherent in many sparse decompositions constitutes a promising research axis. In this paper, we address this problem from a Bayesian point of view. We exploit a Boltzmann machine, allowing to take a large variety of structures into account, and focus on the resolution of a joint maximum a posteriori problem. The proposed algorithm, called Structured Bayesian Orthogonal Matching Pursuit (SBOMP), is a structured extension of the Bayesian Orthogonal Matching Pursuit algorithm (BOMP) introduced in our previous work [1]. In numerical tests involving a recovery problem, SBOMP is shown to have good performance over a wide range of sparsity levels while keeping a reasonable computational complexity. © 2012 IEEE. Mots-clés: Boltzmann machine; greedy algorithm; Structured sparse representation; Boltzmann machines; Greedy algorithms; Maximum a posteriori; Numerical tests; Orthogonal matching pursuit; Sparse decomposition; Sparse representation; Signal processing; Algorithms
Iterative phase reconstruction of Wiener filtered signals Sturmel, N., and L. Daudet ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 101-104 (2012) Résumé: This paper deals with phase estimation in the framework of underdetermined blind source separation, using an estimated spectrogram of the source and its associated Wiener filter. By thresholding the Wiener mask, two domains are defined on the spectrogram : a confidence domain where the phase is kept as the phase of the mixture, and its complement where the phase is updated with a projection similar to the widely-used Griffin and Lim technique. We show that with this simple technique, the choice of parameters results in a simple trade-off between distortion and interference. Experiments show that this technique brings significant improvements over the classical Wiener filter, while being much faster than other iterative methods. © 2012 IEEE. Mots-clés: Blind source separation; Phase reconstruction; Spectrogram; STFT; Wiener filter; Choice of parameters; Confidence domain; Filtered signals; Phase estimation; Phase reconstruction; Spectrograms; STFT; Thresholding; Two domains; WIENER filters; Blind source separation; Signal processing; Spectrographs; Iterative methods
Dynamic strategy for window splitting, parameters estimation and interpolation in spatial parametric audio coders Capobianco, J., G. Pallone, and L. Daudet ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 397-400 (2012) Résumé: In most parametric stereo audio coders, sets of spatial parameters are extracted from the audio channels in a time-frequency domain. In order to reduce the amount of data, the parameters plane is highly down-sampled, and transmitted together with a mono downmix. Then, in the decoding process, it is necessary to interpolate the upmix matrix computed from these parameters. Usually, this is done in the same way for each portion of signal, regardless of its nature. In this article, we propose a dynamic strategy of window splitting, estimation of the parameters and interpolation of the upmix matrix based on transient detection in the audio signal. Subjective tests show an improvement when applied to the new stereo parametric tool from MPEG USAC. © 2012 IEEE. Mots-clés: Parametric audio coding; stereo; Audio channels; Audio coders; Audio signal; Decoding process; Dynamic strategies; Parameters estimation; Parametric audio coding; Parametric stereo; Spatial parameters; stereo; Subjective tests; Time frequency domain; Transient detection; Interpolation; Motion Picture Experts Group standards; Signal processing; Speech coding; Parameter estimation
Linear mixing models for active listening of music productions in realistic studio conditions Sturmel, N., A. Liutkus, J. Pinel, L. Girin, S. Marchand, G. Richard, R. Badeau, and L. Daudet 132nd Audio Engineering Society Convention 2012, 780-789 (2012) Résumé: The mixing/demixing of audio signals as addressed in the signal processing literature (the "source separation" problem) and the music production in studio remain quite separated worlds. Scientific audio scene analysis rather focuses on "natural" mixtures and most often uses linear (convolutive) models of point sources placed in the same acoustic space. In contrast, the sound engineer can mix musical signals of very different nature and belonging to different acoustic spaces, and exploits many audio effects including non-linear processes. In the present paper we discuss these differences within the strongly emerging framework of active music listening, which is precisely at the crossroads of these two worlds: it consists in giving to the listener the ability to manipulate the different musical sources while listening to a musical piece. We propose a model that allows the description of a general studio mixing process as a linear stationary process of "generalized source image signals" considered as individual tracks. Such a model can be used to allow the recovery of the isolated tracks while preserving the professional sound quality of the mixture. A simple addition of these recovered tracks enables the end-user to recover the full-quality stereo mix, while these tracks can also be used for, e.g., basic remix / karaoke / soloing and re-orchestration applications. Mots-clés: Audio effects; Audio scenes; Audio signal; End users; Karaoke; Linear mixing models; Mixing process; Music production; Musical pieces; Musical signals; Nonlinear process; Point sources; Sound Quality; Source images; Stationary process; Recovery; Signal analysis; Studios; Audio acoustics
Matching Pursuits with random sequential subdictionaries Moussallam, M., L. Daudet, and G. Richard Signal Processing 92, no. 10, 2532-2544 (2012) Résumé: Matching Pursuits are a class of greedy algorithms commonly used in signal processing, for solving the sparse approximation problem. They rely on an atom selection step that requires the calculation of numerous projections, which can be computationally costly for large dictionaries and burdens their competitiveness in coding applications. We propose using a non-adaptive random sequence of subdictionaries in the decomposition process, thus parsing a large dictionary in a probabilistic fashion with no additional projection cost nor parameter estimation. A theoretical modeling based on order statistics is provided, along with experimental evidence showing that the novel algorithm can be efficiently used on sparse approximation problems. An application to audio signal compression with multiscale time-frequency dictionaries is presented, along with a discussion of the complexity and practical implementations. © 2012 Elsevier B.V. All rights reserved. Mots-clés: Audio signal compression; Matching Pursuits; Random dictionaries; Sparse approximation; Audio signal compression; Decomposition process; Experimental evidence; Greedy algorithms; Matching pursuit; Multiscales; Novel algorithm; Order statistics; Practical implementation; Random sequence; Sparse approximations; Theoretical modeling; Time frequency; Competition; Parameter estimation; Signal encoding; Approximation algorithms
Near-field acoustic holography using sparse regularization and compressive sampling principles Chardon, G., L. Daudet, A. Peillot, F. Ollivier, N. Bertin, and R. Gribonval Journal of the Acoustical Society of America 132, no. 3, 1521-1534 (2012) Résumé: Regularization of the inverse problem is a complex issue when using near-field acoustic holography (NAH) techniques to identify the vibrating sources. This paper shows that, for convex homogeneous plates with arbitrary boundary conditions, alternative regularization schemes can be developed based on the sparsity of the normal velocity of the plate in a well-designed basis, i.e., the possibility to approximate it as a weighted sum of few elementary basis functions. In particular, these techniques can handle discontinuities of the velocity field at the boundaries, which can be problematic with standard techniques. This comes at the cost of a higher computational complexity to solve the associated optimization problem, though it remains easily tractable with out-of-the-box software. Furthermore, this sparsity framework allows us to take advantage of the concept of compressive sampling; under some conditions on the sampling process (here, the design of a random array, which can be numerically and experimentally validated), it is possible to reconstruct the sparse signals with significantly less measurements (i.e., microphones) than classically required. After introducing the different concepts, this paper presents numerical and experimental results of NAH with two plate geometries, and compares the advantages and limitations of these sparsity-based techniques over standard Tikhonov regularization. © 2012 Acoustical Society of America. Mots-clés: Arbitrary boundary conditions; Basis functions; Compressive sampling; Homogeneous plates; Nearfield Acoustic Holography; Optimization problems; Random array; Regularization schemes; Sampling process; Sparse signals; Tikhonov regularization; Two plates; Velocity field; Weighted Sum; Acoustic holography; Inverse problems; Velocity; Signal sampling; acoustics; algorithm; article; computer simulation; holography; instrumentation; mathematical computing; methodology; regression analysis; reproducibil
Boltzmann machine and mean-field approximation for structured sparse decompositions Dremeau, A., C. Herzet, and L. Daudet IEEE Transactions on Signal Processing 60, no. 7, 3425-3438 (2012) Résumé: Taking advantage of the structures inherent in many sparse decompositions constitutes a promising research axis. In this paper, we address this problem from a Bayesian point of view. We exploit a Boltzmann machine, allowing to take a large variety of structures into account, and focus on the resolution of a marginalized maximum a posteriori problem. To solve this problem, we resort to a mean-field approximation and the "variational Bayes expectation- maximization" algorithm. This approach results in a soft procedure making no hard decision on the support or the values of the sparse representation. We show that this characteristic leads to an improvement of the performance over state-of-the-art algorithms. © 2012 IEEE. Mots-clés: Bernoulli-Gaussian model; Boltzmann machine; mean-field approximation; structured sparse representation; Boltzmann machines; Expectation Maximization; Hard decisions; Maximum a posteriori; Mean field approximation; Sparse decomposition; Sparse representation; State-of-the-art algorithms; Variational bayes; Algorithms; Problem solving
Experimentally based description of harp plucking Chadefaux, D., J.-L. Le Carrou, B. Fabre, and L. Daudet Journal of the Acoustical Society of America 131, no. 1, 844-855 (2012) Résumé: This paper describes an experimental study of string plucking for the classical harp. Its goal is to characterize the playing parameters that play the most important roles in expressivity, and in the way harp players recognize each other, even on isolated notes-what we call the acoustical signature of each player. We have designed a specific experimental setup using a high-speed camera that tracks some markers on the fingers and on the string. This provides accurate three-dimensional positioning of the finger and of the string throughout the plucking action, in different musical contexts. From measurements of ten harp players, combined with measurements of the soundboard vibrations, we extract a set of parameters that finely control the initial conditions of the string's free oscillations. Results indicate that these initial conditions are typically a complex mix of displacement and velocity, with additional rotation. Although remarkably reproducible by a single player-and the more so for professional players-we observe that some of these control parameters vary significantly from one player to another. © 2012 Acoustical Society of America. Mots-clés: Control parameters; Experimental setup; Experimental studies; Free oscillation; Initial conditions; Acoustics; Physics; acoustics; article; finger; human; motion; motor performance; music; physiology; sound detection; tensile strength; touch; vibration; Acoustics; Fingers; Humans; Motion; Motor Skills; Music; Sound Spectrography; Tensile Strength; Touch; Vibration
Compressed sensing for acoustic response reconstruction: Interpolation of the early part Mignot, R., L. Daudet, and F. Ollivier IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 225-228 (2011) Résumé: The goal of this paper is to interpolate Room Impulse Responses (RIRs) within a whole volume, from a few measurements. We here focus on the early reflections, that have the key property of being sparse in the time domain: this can be exploited in a framework of model-based Compressed Sensing. Starting from a set of RIRs randomly sampled in space by a 3D microphone array, we use a modified Matching Pursuit algorithm to estimate the position of a small set of virtual sources. Then, the reconstruction of the RIRs at interpolated positions is performed using a projection onto a basis of monopoles. This approach is validated both by numerical and experimental measurements using a 120-microphone 3D array. © 2011 IEEE. Mots-clés: Compressed Sensing; Interpolation; Microphone Arrays; Room Impulse Responses; Source Localization; 3D arrays; Acoustic response; Compressed sensing; Experimental measurements; Matching pursuit algorithms; Microphone Arrays; Room impulse response; Source localization; Time domain; Virtual sources; Audio signal processing; Interpolation; Microphones; Signal reconstruction; Three dimensional; Audio acoustics
A parametric model of piano tuning Rigaud, F., B. David, and L. Daudet Proceedings of the 14th International Conference on Digital Audio Effects, DAFx 2011, 393-400 (2011) Résumé: A parametric model of aural tuning of acoustic pianos is presented in this paper. From a few parameters, a whole tessitura model is obtained, that can be applied to any kind of pianos. Because the tuning of piano is strongly linked to the inharmonicity of its strings, a 2-parameter model for the inharmonicity coefficient along the keyboard is introduced. Constrained by piano string design considerations, its estimation requires only a few notes in the bass range. Then, from tuning rules, we propose a 4-parameter model for the fundamental frequency evolution on the whole tessitura, taking into account the model of the inhamonicity coefficient. The global model is applied to 5 different pianos (4 grand pianos and 1 upright piano) to control the quality of the tuning. Besides the generation of tuning reference curves for non-professional tuners, potential applications could include the parametrization of synthesizers, or its use in transcription / source separation algorithm as a physical constraint to increase robustness. Mots-clés: Acoustic pianos; Fundamental frequencies; Global models; Grand piano; Parametric models; Parametrizations; Physical constraints; Piano strings; Piano tuning; Potential applications; Reference curves; Separation algorithms; Tuning rules; Algorithms; Models; Musical instruments
Decompositions in sound elements and musical applications Lagrange, M., R. Badeau, B. David, N. Bertin, O. Derrien, S. Marchand, and L. Daudet Traitement du Signal 28, no. 6, 665-689 (2011) Résumé: In this paper is presented the DESAM project which was divided in two parts. The first one was devoted to the theoretical and experimental study of parametric and non-parametric techniques for decomposing audio signals into sound elements. The second part focused on some musical applications of these decompositions. Most aspects that have been considered in this project have led to the proposal of new methods which have been grouped together into the so-called DESAM Toolbox, a set of Matlab® functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music recordings according to different signal models, giving rise to different "mid-level" representations. © 2011 Lavoisier. Mots-clés: Audio processing; Sound modeling; Spectral models; Audio processing; Core functions; Music information retrieval; Music recording; Non-parametric techniques; Sound modeling; Spectral models; Theoretical and experimental; Audio signal processing; Decomposition; Audio acoustics
Localization and identification of sound sources using "compressive sampling" techniques Peillot, A., F. Ollivier, G. Chardon, and L. Daudet 18th International Congress on Sound and Vibration 2011, ICSV 2011 4, 2713-2720 (2011) Résumé: "Compressive sampling" (CS) is a new signal acquisition strategy that intends to reduce significantly the amount of recorded data by picking only a limited number of samples. CS theory asserts that one can reconstruct a given signal from a few randomly distributed samples if only the signal is sparse in a proper basis. CS ensures a minimum loss of information but requires, for the reconstruction of the signal, the use of dedicated sparsity-promoting algorithms. In this paper, CS is applied to the source localization problem using an array of randomly distributed microphones. In this case, the signal of interest is sparse in the spatial domain, i.e a few positions in space contain sources. We focus on the near-field beamforming where the array of sensors is sensitive to the sources directivity. The localization method is extended to complex sources and we attempt to identify them in terms of multipoles. Numerical simulations and experimental results prove this sparsity-promoting method to be powerful for source localization. However the identification step, quite successful on ideal data, is not sufficiently robust when applied to experimental data and need further investigations. Mots-clés: Array of sensors; Compressive sampling; Directivity; Localization and identification; Localization method; Multipoles; Near-field; Number of samples; Randomly distributed; Signal acquisitions; Signal of interests; Sound source; Source localization; Spatial domains; Safety engineering; Vibrations (mechanical); Signal processing
Signal reconstruction from STFT magnitude: A state of the art Sturmel, N., and L. Daudet Proceedings of the 14th International Conference on Digital Audio Effects, DAFx 2011, 375-386 (2011) Résumé: This paper presents a review on techniques for signal reconstruction without phase, i.e. when only the spectrogram (the squared magnitude of the Short Time Fourier Transform) of the signal is known. The now standard Griffin and Lim algorithm will be presented, and compared to more recent blind techniques. Two important issues are raised and discussed: first, the definition of relevant criteria to evaluate the performances of different algorithms, and second the question of the unicity of the solution. Some ways of reducing the complexity of the problem are presented with the injection of additional information in the reconstruction. Finally, issues that prevents optimal reconstruction are examined, leading to a discussion on what seem the most promising approaches for future research. Mots-clés: Blind technique; Short time Fourier transforms; Spectrograms; State of the art; Algorithms; Signal reconstruction; Signal analysis
Recursive nearest neighbor search in a sparse and multiscale domain for comparing audio signals Sturm, B. L., and L. Daudet Signal Processing 91, no. 12, 2836-2851 (2011) Résumé: We investigate recursive nearest neighbor search in a sparse domain at the scale of audio signals. Essentially, to approximate the cosine distance between the signals we make pairwise comparisons between the elements of localized sparse models built from large and redundant multiscale dictionaries of timefrequency atoms. Theoretically, error bounds on these approximations provide efficient means for quickly reducing the search space to the nearest neighborhood of a given data; but we demonstrate here that the best bound defined thus far involving a probabilistic assumption does not provide a practical approach for comparing audio signals with respect to this distance measure. Our experiments show, however, that regardless of these non-discriminative bounds, we only need to make a few atom pair comparisons to reveal, e.g., the origin of an excerpted signal, or melodies with similar timefrequency structures. © 2011 Elsevier B.V. All rights reserved. Mots-clés: Audio similarity; Multiscale decomposition; Sparse approximation; Time - frequency dictionary; Audio signal; Audio similarity; Distance measure; Error bound; Multi-scale Decomposition; Multiscales; Nearest Neighbor search; Nearest neighborhood; Pair comparisons; Pair-wise comparison; Probabilistic assumptions; Search spaces; Sparse approximation; Time - frequency dictionary; Time frequency; Time-frequency atoms; Error analysis
Plate impulse response spatial interpolation with sub-Nyquist sampling Chardon, G., A. Leblanc, and L. Daudet Journal of Sound and Vibration 330, no. 23, 5678-5689 (2011) Résumé: Impulse responses of vibrating plates are classically measured on a fine spatial grid satisfying the ShannonNyquist spatial sampling criterion, and interpolated between measurement points. For homogeneous and isotropic plates, this study proposed a more efficient sampling and interpolation process, inspired by the recent paradigm of compressed sensing. Remarkably, this method can accommodate any star-convex shape and unspecified boundary conditions. Here, impulse responses are first decomposed as sums of damped sinusoids, using the Simultaneous Orthogonal Matching Pursuit algorithm. Finally, modes are interpolated using a plane wave decomposition. As a beneficial side effect, these algorithms can also be used to obtain the dispersion curve of the plate with a limited number of measurements. Experimental results are given for three different plates of different shapes and boundary conditions, and compared to classical Shannon interpolation. © 2011 Elsevier Ltd. All rights reserved. Mots-clés: A-plane; Compressed sensing; Damped sinusoids; Dispersion curves; Efficient sampling; Interpolation process; Isotropic plates; Measurement points; Orthogonal matching pursuit; Side effect; Spatial grids; Spatial interpolation; Spatial sampling; Sub-Nyquist sampling; Vibrating plate; Algorithms; Boundary conditions; Impulse response; Interpolation
Compressively sampling the plenacoustic function Mignot, R., G. Chardon, and L. Daudet Proceedings of SPIE - The International Society for Optical Engineering 8138 (2011) Résumé: Directly measuring the full set of acoustic impulse responses within a room would require an unreasonably large number of measurements. Considering that the acoustic wavefield is sparse in some dictionaries, Compressed Sensing allows the recovery of the full wavefield with a reduced set of measurements, but raises challenging computational and memory issues. Two practical algorithms are presented and compared: one that exploits the structured sparsity of the soundfield, with projections of the modes onto plane waves sharing the same wavenumber, and one that computes a sparse decomposition on a dictionary of independent plane waves with time/space variable separation. © 2011 Copyright Society of Photo-Optical Instrumentation Engineers (SPIE). Mots-clés: Compressed Sensing; Interpolation; Plane waves; Room Impulse Responses; Sparsity; Compressed sensing; Plane wave; Practical algorithms; Room impulse response; Sparse decomposition; Sparsity; Variable separation; Wave numbers; Wavefields; Signal reconstruction; Elastic waves
Soft Bayesian pursuit algorithm for sparse representations Drémeau, A., C. Herzet, and L. Daudet IEEE Workshop on Statistical Signal Processing Proceedings, 341-344 (2011) Résumé: This paper deals with sparse representations within a Bayesian framework. For a Bernoulli-Gaussian model, we here propose a method based on a mean-field approximation to estimate the support of the signal. In numerical tests involving a recovery problem, the resulting algorithm is shown to have good performance over a wide range of sparsity levels, compared to various state-of-the-art algorithms. © 2011 IEEE. Mots-clés: Bernoulli-Gaussian model; mean-field approximation; Sparse representations; Bayesian frameworks; Bernoulli-Gaussian model; Mean field approximation; Numerical tests; Sparse representation; State-of-the-art algorithms; Signal processing; Algorithms
Audio signal representations for factorization in the sparse domain Moussallam, M., L. Daudet, and G. Richard ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 513-516 (2011) Résumé: In this paper, a new class of audio representations is introduced, together with a corresponding fast decomposition algorithm. The main feature of these representations is that they are both sparse and approximately shift-invariant, which allows similarity search in a sparse domain. The common sparse support of detected similar patterns is then used to factorize their representations. The potential of this method for simultaneous structural analysis and compressing tasks is illustrated by preliminary experiments on simple musical data. © 2011 IEEE. Mots-clés: Audio Signal Decomposition; Audio Similarity; Factorization; Matching Pursuit; Sparse Representation; Audio representation; Audio signal; Audio Similarity; Fast decomposition; Matching pursuit; Shift invariant; Similar pattern; Similarity search; Sparse representation; Audio acoustics; Factorization; Speech communication; Structural analysis; Audio signal processing
Optimal subsampling of multichannel damped sinusoids Chardon, G., and L. Daudet 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop, SAM 2010, 25-28 (2010) Résumé: In this paper, we investigate the optimal ways to sample multichannel impulse responses, composed of a small number of exponentially damped sinusoids, under the constraint that the total number of samples is fixed - for instance with limited storage / computational power. We compute Cramér-Rao bounds for multichannel estimation of the parameters of a damped sinusoid. These bounds provide the length during which the signals should be measured to get the best results, roughly at 2 times the typical decay time of the sinusoid. Due to bandwidth constraints, the signals are best sampled irregularly, and variants of Matching Pursuit and MUSIC adapted to the irregular sampling and multichannel data are compared to the Cramér-Rao bounds. In practical situation, this method leads to savings in terms of memory, data throughput and computational complexity. © 2010 IEEE. Mots-clés: Array signal processing; Compressed sensing; Spectral analysis; Array signal processing; Bandwidth constraint; Compressed sensing; Computational power; Damped sinusoids; Data throughput; Decay time; Exponentially damped sinusoids; Irregular sampling; Limited storage; Matching pursuit; Multi-channel; Multichannel data; Multichannel estimation; Number of samples; Spectral analysis; Optimization; Sensor arrays; Signal processing; Signal reconstruction; Spectrum analysis; Computational complexity
The DESAM toolbox: Spectral analysis of musical audio Lagrange, M., R. Badeau, B. David, N. Bertin, J. Echeveste, O. Derrien, S. Marchand, and L. Daudet 13th International Conference on Digital Audio Effects, DAFx 2010 Proceedings (2010) Résumé: In this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different "mid-level" representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities. Mots-clés: Core functions; Matlab functions; Music files; Music information retrieval; Music signals; Musical audio; Signal models; Spectral models; Signal processing; Spectrum analysis; Audio acoustics
Hybrid coding/indexing strategy for informed source separation of linear instantaneous under-determined audio mixtures Parvaix, M., L. Girin, L. Daudet, J. Pinel, and C. Baras 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society 5, 3987-3994 (2010) Résumé: We present a system for under-determined source separation of non-stationary audio signals from a stereo 2-channel linear instantaneous mixture. This system is dedicated to isolate the different instruments/voices of a piece of music, so that an end-user can separately manipulate those source signals. The problem is addressed with a specific informed approach, that is implemented with a coder corresponding to the step of music production, and a separate decoder corresponding to the step of signal restitution. At the coder, source signals are assumed to be available, and are used to i) generate the stereo 2-channel mix signal, and ii) extract a small amount of distinctive features embedded into the mix signal using an inaudible watermarking technique. At the decoder, extracting and exploiting the watermark from the transmitted mix signal enables an end-user who has no direct access to the original source signals to separate these source signals from the mix signal. In the present study, we propose a new hybrid system that merges two techniques of informed source separation: a subset of the source signals are encoded using a "sources-channel coding" approach, and another subset are selected for local inversion of the mixture. The respective codes and indexes are transmitted to the decoder using a new high-capacity watermarking technique. At the decoder, the encoded source signals are decoded and then subtracted from the mixture signal, before local inversion of the remaining sub-mixture signal leads to the estimation of the second subset of source signals. This hybrid separation technique enables to efficiently combine the advantages of both coding and inversion approaches. We report experiments with 5 different source signals separated from stereo mixtures, with a remarkable quality, enabling separate manipulation during music restitution. Mots-clés: Audio signal; End users; High-capacity; Hybrid coding; Instantaneous mixtures; Mixture signals; Music production; Nonstationary; Separation techniques; Source signals; Under-determined; Watermarking techniques; Audio watermarking; Hybrid systems; Separation; Signal analysis; Mixtures
Musical instrument identification using multiscale mel-frequency cepstral coefficients Sturm, B. L., M. Morvidone, and L. Daudet European Signal Processing Conference, 477-481 (2010) Résumé: We investigate the benefits of evaluating Mel-frequency cepstral coefficients (MFCCs) over several time scales in the context of automatic musical instrument identification for signals that are monophonic but derived from real musical settings. We define several sets of features derived from MFCCs computed using multiple time resolutions, and compare their performance against other features that are computed using a single time resolution, such as MFCCs, and derivatives of MFCCs. We find that in each task - pairwise discrimination, and one vs. all classification - the features involving multiscale decompositions perform significantly better than features computed using a single timeresolution. © EURASIP, 2010. Mots-clés: Mel-frequency cepstral coefficients; Multi-scale Decomposition; Multiscales; Musical instrument identification; Musical setting; Time resolution; Time-scales; Signal processing; Decomposition
How sparsely can a signal be approximated while keeping its class identity? Moussallam, M., T. Fillon, G. Richard, and L. Daudet MML'10 - Proceedings of the 3rd ACM International Workshop on Machine Learning and Music, Co-located with ACM Multimedia 2010, 25-28 (2010) Résumé: This paper explores the degree of sparsity of a signal approximation that can be reached while ensuring that a sufficient amount of information is retained, so that its main characteristics remains. Here, sparse approximations are obtained by decomposing the signals on an overcomplete dictionary of multiscale time-frequency "atoms". The resulting representation is highly dependent on the choice of dictionary, decomposition algorithm and depth of the decomposition. The class identity is measured by indirect means as the speech/music discrimination power of features derived from the sparse representation compared to those of classical PCM-based features. Evaluation is performed on French Broadcast TV and Radio recordings from the QUAERO project database with two different statistical classifiers. Mots-clés: Algorithms; Experimentation; Amount of information; Broadcast TV; Decomposition algorithm; Experimentation; Main characteristics; Multiscales; Overcomplete dictionaries; Project database; Signal approximation; Sparse approximations; Sparse representation; Speech/music discrimination; Statistical classifier; Time frequency; Classification (of information); Learning systems; Speech recognition; Algorithms
Structured and incoherent parametric dictionary design Yaghoobi, M., L. Daudet, and M. E. Davies ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 5486-5489 (2010) Résumé: A new dictionary selection approach for sparse coding, called parametric dictionary design, has recently been introduced. The aim is to choose a dictionary from a class of admissible dictionaries which can be presented parametrically. The designed dictionary satisfies a constraint, here the incoherence property, which can help conventional sparse coding methods to find sparser solutions in average. In this paper, an extra constraint will be applied on the parametric dictionaries to find a structured dictionary. Various structures can be imposed on dictionaries to promote a correlation between the atoms. We intentionally choose a structure to implement the dictionary using a set of filter banks. This indeed helps to implement the dictionary-signal multiplications more efficiently. The price we pay for the extra structure is that the designed dictionary is not as incoherent as unstructured parametric designed dictionaries. ©2010 IEEE. Mots-clés: Dictionary selection; Parametric dictionary design; Sparse approximation; Structured dictionary; Dictionary selection; Parametric dictionary design; Sparse approximations; Sparse coding; Structured dictionary; Filter banks; Signal processing; Design
Incorporating scale information with cepstral features: Experiments on musical instrument recognition Morvidone, M., B. L. Sturm, and L. Daudet Pattern Recognition Letters 31, no. 12, 1489-1497 (2010) Résumé: We present two sets of novel features that combine multiscale representations of signals with the compact timbral description of Mel-frequency cepstral coefficients (MFCCs). We define one set of features, OverCs, from overcomplete transforms at multiple scales. We define the second set of features, SparCs, from a signal model found by sparse approximation. We compare the descriptiveness of our features against that of MFCCs by performing two simple tasks: pairwise musical instrument discrimination, and musical instrument classification. Our tests show that both OverCs and SparCs improve the characterization of the global timbre and local stationarity of an audio signal than do mean MFCCs with respect to these tasks. © 2009 Elsevier B.V. All rights reserved. Mots-clés: Audio signal classification; Musical instrument recognition; Sparse decompositions; Time-frequency/time-scale features; Audio signal; Audio signal classification; Cepstral features; Mel-frequency cepstral coefficients; Multiple scale; Multiscale representations; Over-complete; Signal models; Sparse approximations; Sparse decomposition; Stationarity; Audio acoustics; Instruments; Signal analysis; Speech recognition; Statistical tests; Wavelet transforms; Electronic musical instruments
Pattern recognition of non-speech audio Aucouturier, J.-J., and L. Daudet Pattern Recognition Letters 31, no. 12, 1487-1488 (2010)
Editorial for the special issue on signal models and representations of musical and environmental sounds David, B., M. Goto, L. Daudet, and P. Smaragdis IEEE Transactions on Audio, Speech and Language Processing 18, no. 3, 417-419 (2010)
Parametric Dictionary Design for Sparse Coding Yaghoobi, M., L. Daudet, and M. E. Davies IEEE Transactions on Signal Processing 57, no. 12, 4800-4810 (2009) Résumé: This paper introduces a new dictionary design method for sparse coding of a class of signals. It has been shown that one can sparsely approximate some natural signals using an overcomplete set of parametric functions. A problem in using these parametric dictionaries is how to choose the parameters. In practice, these parameters have been chosen by an expert or through a set of experiments. In the sparse approximation context, it has been shown that an incoherent dictionary is appropriate for the sparse approximation methods. In this paper, we first characterize the dictionary design problem, subject to a constraint on the dictionary. Then we briefly explain that equiangular tight frames have minimum coherence. The complexity of the problem does not allow it to be solved exactly. We introduce a practical method to approximately solve it. Some experiments show the advantages one gets by using these dictionaries. © 2009 IEEE. Mots-clés: Dictionary design; Exact sparse recovery; Gammatone filter banks; Incoherent dictionary; Parametric dictionary; Sparse approximation; Design method; Design problems; Exact sparse recovery; Incoherent dictionary; Over-complete; Parametric functions; Practical method; Sparse approximations; Sparse coding; Sparse recovery; Tight frame; Approximation theory; Filter banks; Design
Low Frequency Interpolation of Room Impulse Responses Using Compressed Sensing Mignot, R., G. Chardon, and L. Daudet Ieee-Acm Transactions On Audio Speech And Language Processing 22, no. 1, 205-216 (2014) Mots-clés: Compressed sensing; room impulse responses; wavefield reconstruction; plane waves; interpolation; sparsity

Publications plus anciennes

2010

Plumbley M., Blumensath T., Daudet L., Gribonval R. and Davies M.E., Sparse Representations in Audio and Music : From Coding to
Source Separation, Proceedings of the IEEE, vol. 98 (6), pp. 995-1005, (June 2010).

Daudet L., Audio sparse decompositions in parallel : let the greed be shared !, IEEE Signal Processing Magazine, vol. 27(2), pp. 90-96, (Mar. 2010).

Ravelli E., Richard G. and Daudet L., Audio signal representations for indexing in the transform domain, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 18(3), pp. 434-446, (Mar. 2010).

2009

Defrance G., Daudet L. and Polack J.-D., Using Matching Pursuit for estimating mixing time within Room Impulse Responses, Acta Acustica, vol. 95 (6), pp. 1071-1081 (Nov./ Dec. 2009).

2008

Ravelli E., Richard G. and Daudet L., Union of MDCT bases for audio coding, IEEE Transactions on Audio, Speech, and Language Processing, vol. 16 (8), pp. 1361-1372, (Nov. 2008).

Defrance G., Daudet L. and Polack J.-D., Finding the onset of a room impulse response : straigthforward ?, Journal of the Acoustical Society of America - Express Letters, vol. 124 (4), pp. EL248-254 (Oct. 2008).

Sturm B., Shynk J., Daudet L. and Roads C., Dark energy in sparse atomic decompositions, IEEE Transactions on Audio, Speech, and Language Processing, vol. 16 (3), pp. 671-676, (Mar. 2008).

Leveau P., Vincent E., Richard G. and Daudet L., Instrument-specific harmonic atoms for mid-level music representations, IEEE Transactions on Audio, Speech, and Language Processing, Vol.16 (1), pp. 116-128, (Jan. 2008).

Févotte C., Torrésani B., Daudet L. and Godsill S.J., Denoising of musical audio using sparse linear regression and structured priors, IEEE Transactions on Audio, Speech, and Language Processing, Vol.16 (1), pp. 174-185, (Jan. 2008).

2007

Ravelli M. and Daudet L.,
Embedded polar quantization, IEEE Signal Processing Letters, Vol. 14 (10), pp. 657-660, (Oct. 2007).

2006

Bello J.-P., Daudet L. and Sandler M., Automatic Piano Transcription Using Frequency and Time-Domain Information, IEEE Transactions on Audio, Speech, and Language Processing, Vol.14 (6), pp. 2242-2251, (Nov. 2006).

Daudet L., Sparse and structured decompositions of signals with the Molecular Matching Pursuit, IEEE Transactions on Audio, Speech, and Language Processing, Vol.14 (5), pp. 1808-1816, (Sept. 2006).

Davies M. and Daudet L., Sparse audio representations using the MCLT,
Signal Processing, special issue : “Sparse Approximations in Signal and Image Processing”, Vol.86 (3), pp. 457-470, (March 2006).

Daudet L., A review on techniques for the extraction of transients in musical signals, Computer Music Modeling and Retrieval, Springer Lecture Notes in Computer Science series : 3902, pp. 219-232 (2006).

L’Institut Langevin

Actualités

Recherche

Membres

Publications

Séminaires & écoles thématiques

Enseignement

Nous recrutons

Contact

Laurent DAUDET

Thèmes de recherche

Teaching for the academic year 2016-2017

The news

The gang

The press