Authors:
Chuanming Dong
1
;
2
;
Philippe Gambette
3
and
Catherine Dominguès
2
Affiliations:
1
ADEME, Agence de l’Environnement et de la Maítrise de l’ Énergie, F-49004, Angers, France
;
2
LASTIG, Univ. Gustave Eiffel, ENSG, IGN, F-77420 Champs-sur-Marne, France
;
3
LIGM, Univ. Gustave Eiffel, CNRS, ESIEE Paris, F-77454 Marne-la-Vallée, France
Keyword(s):
Information Extraction, Deep Learning, Word Embedding, Semantic Annotation, Industrial Pollution.
Abstract:
We study the extraction and reorganization of event-related information in texts regarding industrial pollution. The object is to build a memory of polluted sites that gathers the information about industrial events from various databases and corpora. An industrial event is described through several features as the event trigger, the industrial activity, the institution, the pollutant, etc. In order to efficiently collect information from a large corpus, it is necessary to automatize the information extraction process. To this end, we manually annotated a part of a corpus about soil industrial pollution, then we used it to train information extraction models with deep learning methods. The models we trained achieve 0.76 F-score on event feature extraction. We intend to improve the models and then use them on other text resources to enrich the polluted sites memory with extracted information about industrial events.