Résumé
À l’heure actuelle, nous disposons d’une quantité d’informations audio à la fois importante et grandissante par le biais des bases de données publiques ou privées (sites Internet, cédéroms, ina, sacem) et des contenus télé et radiodiffusés. La description par mots- clés, jusqu’ici utilisée, est peu adaptée à la richesse de cette information, puisqu’elle entraîne une indexation subjective et coûteuse (è cause de l’importante intervention humaine). Le domaine de l’indexation audio tente done de répondre au besoin d’outils (semi- )automatiques de description de contenus audio afin d’en améliorer l’accès. Cet article propose un état- de- l’art de l’indexation audio, è travers la description de techniques liées à la discrimination en classes (plus ou moins grossières), ainsi qu’à la présentation des analyses spécifiques aux deux grandes classes que sont la parole et la musique (cette dernière etant largement privilégiée). Des comparatifs concernant les performances des systèmes existants y sont présentés, ainsi que l’adresse de sites Internet proposant des démonstrations.
Abstract
Nowadays, an important and growing quantity of audio information is available by means of public or private databases (Internet sites, CD- ROMs, french Audiovisual National Institute: ina, musical copyright protection associations such as sacem) and TV/radio broadcasts. Keyword description, used until now, is poorly adapted to this information, because of its subjectiveness and cost (both due to substantial human intervention). So researches in audio indexation aim to fulfil the need of (semi- )automatic tools for audio content description, in order to improve the access to audio documents. This article reviews state- of- the- art audio indexation, by describing techniques related to the discrimination of (more or less broad) classes, and by reviewing specific analyses applied to the most considered classes : speech and music (with more focus on the latter). Comparisons between the performances of existing systems are presented, as well as the addresses of the Internet sites offering demonstrations.
Bibliographie
Baeza-Yates (R.A.), Perleberg (C.H.), “Fast and Practical Approximate String Matching,”Information Processing Letter 59, pp. 21–27, 1992.
Berger (K.W.), “Some Factors in the Recognition of Timbre,”Journal of the Acoustical Society of America,36 (10), pp.1888–1891, 1964.
Blackburn Steven (G.), “Content Based Retrieval and Navigation of Music,” a Mini-Thesis submitted for transfer of registration from Mphil to Ph.D., University of Southampton, Faculty of Engineering and Applied Science, Department of Electronics and Computer Science, 10 March 1999.
Chen James (C.C.), Chen Arbee (L.P.), “Query by Rhythm, An Approach for Song Retrieval in Music Databases,”Proceedings of 8 th Int. Workshop on Research Issues on Data Engineering, 1998.
Chou (T.-C), Chen Arbee (L.P.), Llu (C.-C), “Music Databases : Indexing Techniques and Implementation,”Proceedings Int. Workshop on MultiMedia Database Management Systems, pp. 46–53, 1996.
De Mori (R.), “Spoken Dialogues with Computers, ” Academic Press, 1998.
Dowling (W.J.), “Scale and Contour: Two Components of a Theory of Memory for Melodies,” Psychological review,85, pp. 341–354, 1978.
Ethington (R.), Punch, (B.), “SeaWave: A System for Musical Timbre Description,”Computer Music Journal,18: 1, pp. 30–39, Spring 1994.
Feiten (B.), GÜnzel (S.), “Automatic Indexing of a Sound Database Using Self-organizing Neural Nets,”Computer Music Journal, pp. 53–65, Summer 1994.
Foote (J.T.), “A Similarity Measure for Automatic Audio Classification,”Proceedings of AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio, Corpora. Stanford, March 1997.
Foote (J.T.), “;An Overview of Audio Information Retrieval,” ACM-SpringerMultimedia Systems, In press, December 1997.
Ghias (A.), Logan (H.), Chamberlin (D.), Smith (B.C.), “Query By Humming, Musical Information Retrieval in an Audio Database,”Proceedings Third International Conference on Multimedia, pp. 231–236, 1995.
Gold (B.), Rabiner (L.), “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,”Journal of the Acoustical Society of America,46, pp 442–448, 1969.
Grey (J. M.), “Timbre Discrimination in Musical Patterns,”Journal of the Acoustical Society of America,64 (2), pp. 467–472, 1987.
Handel (S.), “Listening,” MIT press, Cambridge, Massachussets, 1989.
Hermansky (H.), Pavel (M.), Tibrewala (S.), “Towards asr using Partially Corrupted Speech,”Int. Conf. on Spoken Language Processing, pp. 458–461, October 1996.
Hess (W. H.), “Pitch Determination of Speech signals”, Algorithms and devices, Heidelberg, Germany, Springer verlag 83.
Junqua (J.-C.), “Robust Speech Recognition for Embedded Systems”,Workshop on Robust Methods for Speech Recognition in Adverse Conditions, May 25–26, 1999, Tampere, Finland.
Kageyama (T), Mochizuki (K.), Takashima (Y.), “Melody Retrieval with Humming,”icmc’93 Tokyo proceedings, pp. 349–351.
Kimber (D.), Wilcox (L.), Acoustic Segmentation for Audio Browsers,Proceedings of Interface Conference, Sydney, Australia, July 1996.
Klapuri (A.), “Number Theorical Means of Resolving a Mixture of Several Harmonic Sounds,” IXeusipco, Greece, Sept. 1998.
Krumhansl (C.L.), dans “Structure and perception of electroacoustic sound and music”, S. Nielzen & O. Olsson (Eds) Elsevier, Amsterdam, 1989, pp. 43–53.
Leggetter (C.J.), Woodland (P.C.), “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language,9, pp. 171–185, 1995.
Lemstrom (K.), Haapaniemi (A.), Ukkonen (E.) “Retrieving Music - To Index Or Not To Index,”Proceedings ACM Multimedia 98 - art demos technical demos - poster papers, pp. 64–66, Bistol, UK, September 1998.
Lepain (P.), Andre-Obrecht (R.), “Micro-segmentation ’enregistrements musicaux,”Deuxiemes journees d’Informatique Musicale, laforia, 1995, pp. 81–90.
Liu (Z.), Huang (J.), Wang (Y.), Chen (T.), “Audio Feature Extraction & Analysis for Scene Classification,“ieee workshop on Multimedia Signal Processing, June 23–25, Princeton, New Jersey, USA, 1997.
Liu (C.-C), Hsu (J.-L.), Chen Arbee (L.P.), Efficient Theme and Non-trivial Repeating Pattern Discovering in databases,icde’99, Proceedings 15 th Int. Conf. on Data Engineering, pp. 14–21, 1999.
Martin (K.D.), “A Blackboard System for Automatic Transcription of Simple Polyphonic Music,” MIT Media Laboratory Perceptual Computing Section Technical Report No. 385, 1996.
Me Adams (S.), Winsberg (S.), Donnadieu (S.), De Soete (G.), Krimphoff (J.), Psychological research,58, 177–192 (1995)
McNab Rodger (J.), Smith Lloyd (A.), Brainbridge (D.), Wittenian (H.), “The New Zealand Digital Library : MELody inDEX,” 1997. http://www.dlib.org/dlib/may97/meldex/05witten. html
Medan (Y.), Yair (E.), Chazan (D.), “Super Resolution Pitch Determination of Speech Signals,”IEEE trans on Signal Processing assp-39 (1), pp. 40–48, 1991.
Nack (F.), Lindsay (A.), “Everything you wanted to know about mpeg-7“, ieee Multimedia, 3, pp 65–77, oct. 99.
Noll (P.), ”Cepstrum Pitch Determination, Journal of the Acoustical Society of America”,41 (2), pp. 293–309, 1967.
Ortmanns (S.), Ney (H.), A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,Computer Speech and Language,11, pp. 43–72, 1997.
Pye (D.), Hollinghurst (N.J.), Mills (T.J.), Wood (K.R.), “Audio-Visual Segmentation for Content-Based Retrieval.” ftp://ftp.uk.research.att.com/pub/docs/att/ tr.1998.15.pdf
Rossignol (S.), Rodet (X.), Soumagne (J.), Collette (J.L.), Depalle (P.), “Feature Extraction and Temporal Segmentation of Acoustic Signals,“International Computer Music Conference (icmc’98), 1998.
Saldanha (E. L.), Corso (J. F.), “Timbre Cues and the Identification of Musical Instruments,”Journal of the Acoustical Society of America,36 (11), pp. 2021–2026, 1964.
Salosaari (P.), Jarvelin (K.), “;musir - A Retrieval Model for Music,” Technical report rn-1998-1, University of Tampere, Departement of information studies, July 1998.
Saunders (J.), “Real-Time Discrimination of Broadcast Speech/Music,”Proceedings icassp’96, pp. 993–996, 1996.
Scheirer (E.D.), Slaney (M.), “Construction and Evaluation of a Robust Multifeatures Speech/Music Discriminator”,ieee Transaction on Acoustics, Speech, and Signal Processing (icassp’97,), Vol. 2, pp. 1331–1334, 1997.
Sonoda (T), Goto (M.), Muraoka (Y.), “A www-based Melody Retrieval System,”Proceedings icmc98, pp. 349–352, 1998.
Uitdenbogerd (A.L.), Zobel (J.), “Manipulation of Music For Melody Matching,” acm Multimedia 98, pp. 235–240, Bristol, uk., Sept. 1998.
Wilcox (L.), Bush (M.), “Training and Search Algorithms for an Interactive Wordspotting System,”Proceedings icassp’92, San Francisco, 2, pp. 97–100, March 1992.
Wold (E.), Blum (T), Keislar (D.), Wheaton (J.), (Muscle Fish), “Classification, Search, and Retrieval of Audio,” crc Handbook of Multimedia Computing, 1999.
Wyse (L.), Smoliar (S.W.), “Toward Content- Based Audio Indexing and Retrieval and a New Speaker Discrimination Technique,” Readings in Computational Auditory Scene Analysis, D.F. Rosenthal and H.G. Okuno, Lawrence Erlbaum, 1998
Zhang (T), Kuo (C.-C.J.), “Content-Based Classification and Retrieval of Audio,”Proceedings of the spie — The Int. Soc. For Optical Engineering,3461, pp. 432–443, 1998.
Zhang (T), Kuo (C.-C.J.), “Hierarchical System for Content-Based Audio Classification and Retrieval,”Conf. on Multimedia storage and Archiving Systems III, spie,3527, pp. 398–409, November 1998.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Carré, M., Philippe, P. Indexation Audio: un état de I’art. Ann. Télécommun. 55, 507–525 (2000). https://doi.org/10.1007/BF02995205
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02995205
Mots clés
- Etat actuel technique
- Indexation
- Base donnée multimedia
- Parole
- Musique
- Extraction forme
- Classification automatique
- Analyse signal
- Reconnaissance son