Indexation Audio: un état de I’art

Carré, Matthieu; Philippe, Pierrick

doi:10.1007/BF02995205

Indexation Audio: un état de I’art

State of the art in audio indexing

Published: September 2000

Volume 55, pages 507–525, (2000)
Cite this article

Annales Des Télécommunications Aims and scope Submit manuscript

Matthieu Carré¹ &
Pierrick Philippe¹

101 Accesses
3 Altmetric
Explore all metrics

Résumé

À l’heure actuelle, nous disposons d’une quantité d’informations audio à la fois importante et grandissante par le biais des bases de données publiques ou privées (sites Internet, cédéroms, ina, sacem) et des contenus télé et radiodiffusés. La description par mots- clés, jusqu’ici utilisée, est peu adaptée à la richesse de cette information, puisqu’elle entraîne une indexation subjective et coûteuse (è cause de l’importante intervention humaine). Le domaine de l’indexation audio tente done de répondre au besoin d’outils (semi- )automatiques de description de contenus audio afin d’en améliorer l’accès. Cet article propose un état- de- l’art de l’indexation audio, è travers la description de techniques liées à la discrimination en classes (plus ou moins grossières), ainsi qu’à la présentation des analyses spécifiques aux deux grandes classes que sont la parole et la musique (cette dernière etant largement privilégiée). Des comparatifs concernant les performances des systèmes existants y sont présentés, ainsi que l’adresse de sites Internet proposant des démonstrations.

Abstract

Nowadays, an important and growing quantity of audio information is available by means of public or private databases (Internet sites, CD- ROMs, french Audiovisual National Institute: ina, musical copyright protection associations such as sacem) and TV/radio broadcasts. Keyword description, used until now, is poorly adapted to this information, because of its subjectiveness and cost (both due to substantial human intervention). So researches in audio indexation aim to fulfil the need of (semi- )automatic tools for audio content description, in order to improve the access to audio documents. This article reviews state- of- the- art audio indexation, by describing techniques related to the discrimination of (more or less broad) classes, and by reviewing specific analyses applied to the most considered classes : speech and music (with more focus on the latter). Comparisons between the performances of existing systems are presented, as well as the addresses of the Internet sites offering demonstrations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bibliographie

Baeza-Yates (R.A.), Perleberg (C.H.), “Fast and Practical Approximate String Matching,”Information Processing Letter 59, pp. 21–27, 1992.
Article MathSciNet Google Scholar
Berger (K.W.), “Some Factors in the Recognition of Timbre,”Journal of the Acoustical Society of America,36 (10), pp.1888–1891, 1964.
Article Google Scholar
Blackburn Steven (G.), “Content Based Retrieval and Navigation of Music,” a Mini-Thesis submitted for transfer of registration from Mphil to Ph.D., University of Southampton, Faculty of Engineering and Applied Science, Department of Electronics and Computer Science, 10 March 1999.
Chen James (C.C.), Chen Arbee (L.P.), “Query by Rhythm, An Approach for Song Retrieval in Music Databases,”Proceedings of 8 ^th Int. Workshop on Research Issues on Data Engineering, 1998.
Chou (T.-C), Chen Arbee (L.P.), Llu (C.-C), “Music Databases : Indexing Techniques and Implementation,”Proceedings Int. Workshop on MultiMedia Database Management Systems, pp. 46–53, 1996.
De Mori (R.), “Spoken Dialogues with Computers, ” Academic Press, 1998.
Dowling (W.J.), “Scale and Contour: Two Components of a Theory of Memory for Melodies,” Psychological review,85, pp. 341–354, 1978.
Article Google Scholar
Ethington (R.), Punch, (B.), “SeaWave: A System for Musical Timbre Description,”Computer Music Journal,18: 1, pp. 30–39, Spring 1994.
Article Google Scholar
Feiten (B.), GÜnzel (S.), “Automatic Indexing of a Sound Database Using Self-organizing Neural Nets,”Computer Music Journal, pp. 53–65, Summer 1994.
Foote (J.T.), “A Similarity Measure for Automatic Audio Classification,”Proceedings of AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio, Corpora. Stanford, March 1997.
Foote (J.T.), “;An Overview of Audio Information Retrieval,” ACM-SpringerMultimedia Systems, In press, December 1997.
Ghias (A.), Logan (H.), Chamberlin (D.), Smith (B.C.), “Query By Humming, Musical Information Retrieval in an Audio Database,”Proceedings Third International Conference on Multimedia, pp. 231–236, 1995.
Gold (B.), Rabiner (L.), “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,”Journal of the Acoustical Society of America,46, pp 442–448, 1969.
Article Google Scholar
Grey (J. M.), “Timbre Discrimination in Musical Patterns,”Journal of the Acoustical Society of America,64 (2), pp. 467–472, 1987.
Article Google Scholar
Handel (S.), “Listening,” MIT press, Cambridge, Massachussets, 1989.
Google Scholar
Hermansky (H.), Pavel (M.), Tibrewala (S.), “Towards asr using Partially Corrupted Speech,”Int. Conf. on Spoken Language Processing, pp. 458–461, October 1996.
Hess (W. H.), “Pitch Determination of Speech signals”, Algorithms and devices, Heidelberg, Germany, Springer verlag 83.
Junqua (J.-C.), “Robust Speech Recognition for Embedded Systems”,Workshop on Robust Methods for Speech Recognition in Adverse Conditions, May 25–26, 1999, Tampere, Finland.
Kageyama (T), Mochizuki (K.), Takashima (Y.), “Melody Retrieval with Humming,”icmc’93 Tokyo proceedings, pp. 349–351.
Kimber (D.), Wilcox (L.), Acoustic Segmentation for Audio Browsers,Proceedings of Interface Conference, Sydney, Australia, July 1996.
Klapuri (A.), “Number Theorical Means of Resolving a Mixture of Several Harmonic Sounds,” IXeusipco, Greece, Sept. 1998.
Krumhansl (C.L.), dans “Structure and perception of electroacoustic sound and music”, S. Nielzen & O. Olsson (Eds) Elsevier, Amsterdam, 1989, pp. 43–53.
Google Scholar
Leggetter (C.J.), Woodland (P.C.), “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language,9, pp. 171–185, 1995.
Article Google Scholar
Lemstrom (K.), Haapaniemi (A.), Ukkonen (E.) “Retrieving Music - To Index Or Not To Index,”Proceedings ACM Multimedia 98 - art demos technical demos - poster papers, pp. 64–66, Bistol, UK, September 1998.
Lepain (P.), Andre-Obrecht (R.), “Micro-segmentation ’enregistrements musicaux,”Deuxiemes journees d’Informatique Musicale, laforia, 1995, pp. 81–90.
Liu (Z.), Huang (J.), Wang (Y.), Chen (T.), “Audio Feature Extraction & Analysis for Scene Classification,“ieee workshop on Multimedia Signal Processing, June 23–25, Princeton, New Jersey, USA, 1997.
Liu (C.-C), Hsu (J.-L.), Chen Arbee (L.P.), Efficient Theme and Non-trivial Repeating Pattern Discovering in databases,icde’99, Proceedings 15 ^th Int. Conf. on Data Engineering, pp. 14–21, 1999.
Martin (K.D.), “A Blackboard System for Automatic Transcription of Simple Polyphonic Music,” MIT Media Laboratory Perceptual Computing Section Technical Report No. 385, 1996.
Me Adams (S.), Winsberg (S.), Donnadieu (S.), De Soete (G.), Krimphoff (J.), Psychological research,58, 177–192 (1995)
Article Google Scholar
McNab Rodger (J.), Smith Lloyd (A.), Brainbridge (D.), Wittenian (H.), “The New Zealand Digital Library : MELody inDEX,” 1997. http://www.dlib.org/dlib/may97/meldex/05witten. html
Medan (Y.), Yair (E.), Chazan (D.), “Super Resolution Pitch Determination of Speech Signals,”IEEE trans on Signal Processing assp-39 (1), pp. 40–48, 1991.
Article Google Scholar
Nack (F.), Lindsay (A.), “Everything you wanted to know about mpeg-7“, ieee Multimedia, 3, pp 65–77, oct. 99.
Noll (P.), ”Cepstrum Pitch Determination, Journal of the Acoustical Society of America”,41 (2), pp. 293–309, 1967.
Article MathSciNet Google Scholar
Ortmanns (S.), Ney (H.), A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,Computer Speech and Language,11, pp. 43–72, 1997.
Article Google Scholar
Pye (D.), Hollinghurst (N.J.), Mills (T.J.), Wood (K.R.), “Audio-Visual Segmentation for Content-Based Retrieval.” ftp://ftp.uk.research.att.com/pub/docs/att/ tr.1998.15.pdf
Rossignol (S.), Rodet (X.), Soumagne (J.), Collette (J.L.), Depalle (P.), “Feature Extraction and Temporal Segmentation of Acoustic Signals,“International Computer Music Conference (icmc’98), 1998.
Saldanha (E. L.), Corso (J. F.), “Timbre Cues and the Identification of Musical Instruments,”Journal of the Acoustical Society of America,36 (11), pp. 2021–2026, 1964.
Article Google Scholar
Salosaari (P.), Jarvelin (K.), “;musir - A Retrieval Model for Music,” Technical report rn-1998-1, University of Tampere, Departement of information studies, July 1998.
Saunders (J.), “Real-Time Discrimination of Broadcast Speech/Music,”Proceedings icassp’96, pp. 993–996, 1996.
Scheirer (E.D.), Slaney (M.), “Construction and Evaluation of a Robust Multifeatures Speech/Music Discriminator”,ieee Transaction on Acoustics, Speech, and Signal Processing (icassp’97,), Vol. 2, pp. 1331–1334, 1997.
Google Scholar
Sonoda (T), Goto (M.), Muraoka (Y.), “A www-based Melody Retrieval System,”Proceedings icmc98, pp. 349–352, 1998.
Uitdenbogerd (A.L.), Zobel (J.), “Manipulation of Music For Melody Matching,” acm Multimedia 98, pp. 235–240, Bristol, uk., Sept. 1998.
Wilcox (L.), Bush (M.), “Training and Search Algorithms for an Interactive Wordspotting System,”Proceedings icassp’92, San Francisco, 2, pp. 97–100, March 1992.
Google Scholar
Wold (E.), Blum (T), Keislar (D.), Wheaton (J.), (Muscle Fish), “Classification, Search, and Retrieval of Audio,” crc Handbook of Multimedia Computing, 1999.
Wyse (L.), Smoliar (S.W.), “Toward Content- Based Audio Indexing and Retrieval and a New Speaker Discrimination Technique,” Readings in Computational Auditory Scene Analysis, D.F. Rosenthal and H.G. Okuno, Lawrence Erlbaum, 1998
Zhang (T), Kuo (C.-C.J.), “Content-Based Classification and Retrieval of Audio,”Proceedings of the spie — The Int. Soc. For Optical Engineering,3461, pp. 432–443, 1998.
Google Scholar
Zhang (T), Kuo (C.-C.J.), “Hierarchical System for Content-Based Audio Classification and Retrieval,”Conf. on Multimedia storage and Archiving Systems III, spie,3527, pp. 398–409, November 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

France Telecom R & D 4, rue du Clos Courtel, BP 59, 35512, Cesson Sevigné Cedex
Matthieu Carré & Pierrick Philippe

Authors

Matthieu Carré
View author publications
You can also search for this author in PubMed Google Scholar
Pierrick Philippe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Matthieu Carré or Pierrick Philippe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carré, M., Philippe, P. Indexation Audio: un état de I’art. Ann. Télécommun. 55, 507–525 (2000). https://doi.org/10.1007/BF02995205

Download citation

Received: 18 February 2000
Accepted: 18 June 2000
Issue Date: September 2000
DOI: https://doi.org/10.1007/BF02995205

Mots clés

Key words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indexation Audio: un état de I’art

Résumé

Abstract

Access this article

Subscribe and save

Buy Now

Bibliographie

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Mots clés

Key words

Subscribe and save

Buy Now

Search

Navigation