Projet ANR CABeRneT

Presentation

Projet funded by the French National Research Agency, under grant number ANR-13-JS02-0009-01.

Presentation at ANR kick-off meeting in January 2014
Presentation at the Digitéo-STIC forum (Université Paris Saclay) in November 2014
Presentation at the innovation forum at TALN in June 2015

Dates

2013 - 2017

Summary

Much clinical and biomedical knowledge is contained in the text of published articles, Electronic Health Records (EHRs) or online patient forums and is not directly accessible for automatic computation. Natural Language Processing (NLP) techniques have been successfully developed to extract information from text and convert it to machine-readable representations. The most advanced applications have focused on identifying clinically relevant entities and concepts from English text. However, for many biomedical informatics tasks it is necessary to go beyond the identification of isolated instances in single documents – the context of concept occurrences and the nature of the relationships between co-occurring concepts are often crucial for a specific understanding of the analyzed text. Furthermore, while most of the literature is available in English, EHRs in French hospitals are written in French. Therefore, it is important to develop advanced methods for French that will provide structured representations of clinical text compatible with existing representations for English.

This research project will focus on the following aims:

Providing material for text analysis in a specialized domain (i.e. the biomedical domain) in French
Adaptation to a specialized domain of NLP tools developed for the general language
Application to the automatic detection of links between clinical characteristics and medical history of patients described in EHRs,

The proposed research is innovative and will provide an in-depth study of multiple biomedical texts in French (EHRs) and in English (literature). It will be guided by linguistic principles and by the application to personalized medicine. A global approach should ensure that the methods used can be generalized to other biomedical applications.

Participants

People involved : Aurélie Névéol (PI), Louise Deléger, Cyril Grouin, Thomas Lavergne, Anne-Laure Ligozat, Pierre Zweigenbaum (LIMSI), Anita Burgun, Anne-Sophie Jannot, Bastien Rance (INSERM), Guergana Savova, Pei J. Chen (Boston Children's Hospital)

Publications

Rabary CT, Lavergne T, Névéol A. Étiquetage morpho-syntaxique en domaine de spécialité: le domaine médical. Traitement Automatique de la Langue Naturelle - TALN. 2015.
Tapi Nzali MD, Névéol A, Tannier X. Analyse d'expressions temporelles dans les dossiers électroniques patients. Traitement Automatique de la Langue Naturelle - TALN. 2015.
Tapi Nzali MD, Tannier X, Névéol A. Automatic Extraction of Time Expressions Accross Domains in French Narratives. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP. 2015:492-498 [acceptance rate: 24%]
D'hondt E, Tannier X, Névéol A. Redundancy in French Electronic Health Records: A preliminary study. Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, LOUHI. 2015:21-30 [acceptance rate: 49%]
Grouin C, Griffon N, Névéol A. Étude des risques de réidentification des patients à partir d'un corpus désidentifié de comptes-rendus cliniques en français. Atelier ETeRNAL - TALN. 2015.
Grouin C, Griffon N, Névéol A. Is it possible to recover personal health information from an automatically de-identified corpus of French EHRs? Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, LOUHI. 2015:31-39 [acceptance rate: 49%]
Névéol A, Grouin C, Tannier X, Hamon T, Kelly L, Goeuriot L, Zweigenbaum P. CLEF eHealth Evaluation Lab 2015 Task 1b: clinical named entity recognition. CLEF 2015, Online Working Notes, CEUR-WS 1391. 2015.
Goeuriot L, Kelly L, Suominen H, Hanlen L, Névéol A, Grouin C, Palotti J, Zuccon G. Overview of the CLEF eHealth Evaluation Lab 2015. Lecture Notes in Computer Science, vol 9283. Information Access Evaluation. Multilinguality, Multimodality, and Interaction. Springer International Publishing. 2015:429-443.
Deléger L, Grouin C, Névéol A. Automatic Content Extraction for Designing a French Clinical Corpus. Proc AMIA Annu Symp. 2014.
Grouin C, Deléger L, Escudié JB, Groisy G, Jannot AS, Rance B, Tannier X, Névéol A. How to de-identify a large clinical corpus in 10 days. Proc AMIA Annu Symp. 2014.
Névéol A, Dalianis HK, Savova G, Zweigenbaum P Didactic Panel: Clinical Natural Language Processing in Languages Other Than English . Proc AMIA Annu Symp. 2014.
Grouin C, Lavergne T, Névéol A. Optimizing annotation efforts to build reliable annotated corpora for training statistical models. 8th Linguistic Annotation Workshop - LAW VIII. 2014. [acceptance rate: 35%]
Pham AD, Névéol A, Lavergne T, Yasunaga D, Clément O, Meyer G, Morello R, Burgun A. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings. BMC Bioinformatics. 2014 Aug 7;15:266. [impact factor 2014: 2.67]
Deléger L, Grouin C, Ligozat AL, Zweigenbaum P, Névéol A. Annotation of specialized corpora using a comprehensive entity and relation scheme. LREC 2014. 2014.
Névéol A, Grosjean J, Darmoni SJ, Zweigenbaum P. Language Resources for French in the Biomedical Domain. LREC 2014. 2014.
Deléger L, Névéol A. Identification automatique de zones dans des documents pour la constitution d'un corpus médical en français. Traitement Automatique de la Langue Naturelle - TALN. 2014:568-573
Grouin C, Névéol A. De-Identification of Clinical Notes in French: towards a Protocol for Reference Corpus Development. J Biomed Inform. 2014 Aug;50:151-61. [impact factor 2013: 2.131] Software