Know more

Our use of cookies

Cookies are a set of data stored on a user’s device when the user browses a web site. The data is in a file containing an ID number, the name of the server which deposited it and, in some cases, an expiry date. We use cookies to record information about your visit, language of preference, and other parameters on the site in order to optimise your next visit and make the site even more useful to you.

To improve your experience, we use cookies to store certain browsing information and provide secure navigation, and to collect statistics with a view to improve the site’s features. For a complete list of the cookies we use, download “Ghostery”, a free plug-in for browsers which can detect, and, in some cases, block cookies.

Ghostery is available here for free: https://www.ghostery.com/fr/products/

You can also visit the CNIL web site for instructions on how to configure your browser to manage cookie storage on your device.

In the case of third-party advertising cookies, you can also visit the following site: http://www.youronlinechoices.com/fr/controler-ses-cookies/, offered by digital advertising professionals within the European Digital Advertising Alliance (EDAA). From the site, you can deny or accept the cookies used by advertising professionals who are members.

It is also possible to block certain third-party cookies directly via publishers:

Cookie type

Means of blocking

Analytical and performance cookies

Realytics
Google Analytics
Spoteffects
Optimizely

Targeted advertising cookies

DoubleClick
Mediarithmics

The following types of cookies may be used on our websites:

Mandatory cookies

Functional cookies

Social media and advertising cookies

These cookies are needed to ensure the proper functioning of the site and cannot be disabled. They help ensure a secure connection and the basic availability of our website.

These cookies allow us to analyse site use in order to measure and optimise performance. They allow us to store your sign-in information and display the different components of our website in a more coherent way.

These cookies are used by advertising agencies such as Google and by social media sites such as LinkedIn and Facebook. Among other things, they allow pages to be shared on social media, the posting of comments, and the publication (on our site or elsewhere) of ads that reflect your centres of interest.

Our EZPublish content management system (CMS) uses CAS and PHP session cookies and the New Relic cookie for monitoring purposes (IP, response times).

These cookies are deleted at the end of the browsing session (when you log off or close your browser window)

Our EZPublish content management system (CMS) uses the XiTi cookie to measure traffic. Our service provider is AT Internet. This company stores data (IPs, date and time of access, length of the visit and pages viewed) for six months.

Our EZPublish content management system (CMS) does not use this type of cookie.

For more information about the cookies we use, contact INRA’s Data Protection Officer by email at cil-dpo@inra.fr or by post at:

INRA
24, chemin de Borde Rouge –Auzeville – CS52627
31326 Castanet Tolosan CEDEX - France

Dernière mise à jour : Mai 2018

Menu Logo Principal AgroParisTech Université Paris-Saclay

INRA GABI Unit

GABI : Génétique Animale et Biologie IntégrativeUnité Mixte de Recherche INRA - AgroParisTech

Dissertation Defense of Audrey Hulot

26 November 2020

Dissertation Defense of Audrey Hulot
Audrey Hulot defended her dissertation on November 23: "Omics data analysis: clustering and network inference"

The PhD dissertation defense was held on November 26 at 2pm, by video-conferencing.
Jury Composition:
Henri-Jean GARCHON, PU-PH, Université Paris-Saclay, PhD advisor
Florence JAFFREZIC, Directrice de recherche, Université Paris-Saclay - INRAE Jouy-en-Josas, co-supervisor
Julien CHIQUET, Directeur de recherche, Université Paris Saclay - AgroParisTech, INRAE, co-supervisor
Nathalie VIALANEIX, Directrice de recherche, MIAT - INRAE Toulouse, Reporter
Grégory NUEL, Directeur de recherche, CNRS - Sorbonne Université, Reporter
Guillaume ASSIE, PU-PH,Université de Paris - INSERM, Examiner
Guillemette MAROT, Maître de conférences, Université de Lille, Examiner
Marie-Laure MARTIN-MAGNIETTE, Directrice de recherche, INRAE - Institut des Sciences des Plantes Paris Saclay,  Examiner

Titre : Analyse de données -omiques : clustering et inférence de réseaux

Mots clés :  Données -omiques, Clustering, Inférence de réseaux, Grande dimension, Biomarqueurs, Intégration de données

Résumé :  Le développement des méthodes de biologie haut-débit (séquençage et spectrométrie de masse) a permis de générer de grandes masses de données, dites -omiques, qui nous aident à mieux comprendre les processus biologiques.
Cependant, isolément, chaque source -omique ne permet d'expliquer que partiellement ces processus. Mettre en relation les différentes sources de donnés -omiques devrait permettre de mieux comprendre les processus biologiques mais constitue un défi considérable.
Dans cette thèse, nous nous intéressons particulièrement aux méthodes de clustering et d’inférence de réseaux, appliquées aux données -omiques.
La première partie du manuscrit présente trois méthodes. Les deux premières méthodes sont applicables dans un contexte où les données peuvent être de nature hétérogène.
La première concerne un algorithme d’agrégation d’arbres, permettant la construction d’un clustering hiérarchique consensus. La complexité sous-quadratique de cette méthode a fait l’objet d’une démonstration, et permet son application dans un contexte de grande dimension. Cette méthode est disponible dans le package  R  mergeTrees, accessible sur le CRAN.
La seconde méthode concerne l’intégration de données provenant d’arbres ou de réseaux, en transformant les objets via la distance cophénétique ou via le plus court chemin, en matrices de distances. Elle utilise le Multidimensional Scaling et l’Analyse Factorielle Multiple et peut servir à la construction d’arbres et de réseaux consensus.
Enfin, dans une troisième méthode, on se place dans le contexte des modèles graphiques gaussiens, et cherchons à estimer un graphe, ainsi que des communautés d’entités, à partir de plusieurs tables de données. Cette méthode est basée sur la combinaison d’un Stochastic Block Model, un Latent block Model et du Graphical Lasso.
Cette thèse présente en deuxième partie les résultats d’une étude de données transcriptomiques et métagénomiques, réalisée dans le cadre d’un projet appliqué, sur des données concernant la Spondylarthrite ankylosante.

Title:  Omics data analysis: clustering and network inference

Keywords:  Omics data, Clustering, Network Inference, High dimension, Biomarkers, Data Integration

Abstract:  The development of biological high-throughput technologies (next-generation sequencing and mass spectrometry) have provided researchers with a large amount of data, also known as -omics,  that help better understand the biological processes.
However, each source of data separately explains only a very small part of a given process. Linking the different -omics sources between them should help us understand more of these processes.
In this manuscript, we will focus on two approaches, clustering and network inference, applied to omics data.
The first part of the manuscript presents three methodological developments on this topic. The first two methods are applicable in a situation where the data are heterogeneous.
The first method is an algorithm for aggregating trees, in order to create a consensus out of a set of trees. The complexity of the process is sub-quadratic, allowing to use it on data leading to a great number of leaves in the trees. This algorithm is available in an  R -package named  mergeTrees  on the CRAN.
The second  method deals with the integration of data from trees and networks, by transforming these objects into distance matrices using cophenetic and shortest path distances, respectively. This method relies on Multidimensional Scaling and Multiple Factor Analysis and can also be used to build consensus trees or networks.
Finally, we use the Gaussian Graphical Models setting and seek to estimate a graph, as well as communities in the graph, from several tables. This method is based on a combination of Stochastic Block Model, Latent Block Model and Graphical Lasso.
The second part of the manuscript presents analyses conducted on transcriptomics and metagenomics data to identify targets to gain insight into the predisposition of Ankylosing Spondylitis.