Projet funded by the French National Research Agency, under grant number ANR-13-JS02-0009-01.


2013 - 2017


Much clinical and biomedical knowledge is contained in the text of published articles, Electronic Health Records (EHRs) or online patient forums and is not directly accessible for automatic computation. Natural Language Processing (NLP) techniques have been successfully developed to extract information from text and convert it to machine-readable representations. The most advanced applications have focused on identifying clinically relevant entities and concepts from English text. However, for many biomedical informatics tasks it is necessary to go beyond the identification of isolated instances in single documents – the context of concept occurrences and the nature of the relationships between co-occurring concepts are often crucial for a specific understanding of the analyzed text. Furthermore, while most of the literature is available in English, EHRs in French hospitals are written in French. Therefore, it is important to develop advanced methods for French that will provide structured representations of clinical text compatible with existing representations for English.

This research project will focus on the following aims:

  1. Providing material for text analysis in a specialized domain (i.e. the biomedical domain) in French
  2. Adaptation to a specialized domain of NLP tools developed for the general language
  3. Application to the automatic detection of links between clinical characteristics and medical history of patients described in EHRs,
  4. predictive biomarkers identified by immunologic or genetic studies and evidence of such associations reported in the literature

The proposed research is innovative and will provide an in-depth study of multiple biomedical texts in French (EHRs) and in English (literature). It will be guided by linguistic principles and by the application to personalized medicine. A global approach should ensure that the methods used can be generalized to other biomedical applications.


logo du LIMSI  logo de l'INSERM  logo du BCH

People involved : Aurélie Névéol (PI), Louise Deléger, Cyril Grouin, Thomas Lavergne, Anne-Laure Ligozat, Pierre Zweigenbaum (LIMSI), Anita Burgun, Anne-Sophie Jannot, Bastien Rance (INSERM), Guergana Savova, Pei J. Chen (Boston Children's Hospital)



Aurélie Névéol (LIMSI)