Innovative training in methods for future data

Research Projects

The network covers a spectrum of innovative research projects spanning the collection of omics data and development of omics technologies, to integrated analysis of omics datasets with relation to complex diseases and ageing, biostatistical and integrative analysis methods.



Structure of the research programme – the blue areas are work package 1 (wet lab), and the green areas are work package 2 (data science).



The ESRs will work in one of the two research work packages (as in the above figure) but will need to incorporate knowledge and innovative aspects from the other work package as well. This will be accomplished by assigning supervisors with different disciplines, by inter-disciplinary secondments, by extensive courses in basic scientific skills, and by working on the same study/datasets.  To enhance translational aspects, training will be in academia as well as at our industrial partners.

List of individual research projects:

ESR Project Title Host Objectives
ESR 1 High-throughput glycoproteomic analysis LUMC 1. Development of a high-throughput MALDI-TOF-MS approach for protein- and site-specific glycosylation profiling. Protein glycosylation is known to modulate protein functions in a protein-specific and site-specific manner. Therefore, glycoanalytical methods will be developed which examine protein glycosylation in a site-specific manner. We plan to affinity-purify glycoproteins and generate  glycopeptides by proteolysis. Importantly, sialic acids will be subjected to linkage-specific derivatization. This overcomes the lability of sialic acids allowing the robust analysis of sialylated glycopeptides by MALDI-TOF-MS. In addition, the linkage-specific derivatization will allow to detect disease-associated changes in sialic acid linkages.
2. The newly developed methods will be applied for large-scale glycomics.
[Deliverable D1.1]
ESR 2 In-depth glycoproteomic analysis glyXera This project will be complementary to the ESR 1 project and will address in-depth glycoproteomics profiling by LC-ESI-MS. When compared to MALDI-TOF-MS, the throughput of the LC- ESI-MS approach will be approximately one to two orders of magnitude lower. On the other hand, only by combining efficient separation techniques with high-dynamic range, ultrahigh-resolution ESI mass spectrometry, in-depth glycoproteomics is becoming feasible.
1. We will develop the workflow on the analysis of multiple glycopeptide clusters in a single run.
2. The methods will be applied to study glycosylation changes in IBD and controls. This will result in a dataset on protein-specific and site-specific plasma protein N-glycosylation. Data will be analysed in collaboration with WP3.
[Deliverable D1.2]
ESR 3 Epigenetic regulation of protein glycosylation and quality control mechanisms in protein synthesis University of Zagreb 1. To study functional role of the genes identified by GWAS to be the most relevant for IgG glycosylation and show pleiotropy with autoimmune and inflammatory diseases such as inflammatory bowel disease (IBD), or systemic lupus erythematosus (SLE).
2. To study epigenetic regulation of the GWAS hits relevant for IgG glycosylation by analysis of promoter methylation (using bisulfite-sequencing), as well as by analyses of histone modifications (using chromatin immunoprecipitation analysis, ChIP-qPCR and ChIP-seq).
3. To study how compromised accuracy of protein synthesis influences epigenetic regulation of the GWAS hits relevant for IgG glycosylation.
[Deliverable D1.3]
ESR 4 Complex genetics of protein glycosylation Genos Glycans are coded in a complex dynamic network containing hundreds of genes. Therefore standard genetic approaches which study associations with individual genetic loci are not the optimal approach. In this project, we will
1. develop algorithms to study effects of gene-gene interactions in the complex pathway of protein glycosylation and
2. develop methods for complex analysis of multiple omics datasets (genetics and glycans).
3. We will generate, normalize and analyse new glycomics data sets
[Deliverable D1.4]
ESR 5 Statistical methods for integrative analysis of omics data UMC Utrecht In this project we will consider latent variable regression methods for integration of multiple novel datasets (DNA seq, glycomics, epigenetics, metagenomics). We will consider O2PLS methods, which decompose two datasets in three subspaces, namely a common, an independent and a residual subspace. We will extend the methods to a probabilistic approach to include biological information via penalization (priors), to deal with missing data and to model the measurement error. An important question to address is how to determine the size of the common subspace to ensure that all important information is how to determine the size of the common subspace to ensure that all important information is included. Methods will be implemented in open source software. The methods will be validated through the proof of principle on ageing.
[Deliverable D2.2]
ESR 6 Network-based methods for analysis of multiple omics data University of Bologna This project aims to develop networks and multiple networks (multiplex) methods for the analysis, the integration and modeling of multi-omics data.
1. The Network reconstruction will be based on 2 methods: i) unsupervised procedure (data driven) by using a correlation approach. ii) supervised methods by using the so called “a priori biological knowledge” as a prior for assessing the network structure using KEGG (Kyoto Encyclopedy of Genes and Genomes) PPI (protein-protein interaction network) and Recon-X (metabolic network).
2. To extend the network reconstruction procedure to multiplex and multi-layered networks by combining networks obtained with different omics measurements. To accommodate the correlation among strata, we will develop an adequate multi-link approach.
3. To construct network propagation procedure, capable to identify “interesting nodes”(such as genes), to a multiplex architecture. The propagation algorithm is a diffusive process on a network, described as a Markov process. All the methods will be implemented in open source software suchs R and Python also if some modules will be preliminarily tested with commercial software as Matlab and Mathematica.
[Deliverable D2.1]
ESR 7 Mixed models for measurement error and overdispersion University of Leeds This project uses mixed models to correct for the presence of latent structure in the data, namely measurement error in the covariate of interest and overdispersion in count data. Examples of measurement error are using principal components (PCs) in the model to reduce the dimension of large datasets (e.g. Glycomics), biomarkers measured with high throughput omics platforms instead of well controlled lab techniques and metagenomics datasets. A lot of research has been performed for data from unrelated individuals, from which it is known that ignoring measurement errors causes regression coefficient estimates to attenuate towards the null. Little is known about the effect of measurement errors in covariates in random effects models for correlated data, e.g. in twins and sibling pairs. The second topic is to adjust for overdispersion in multinomial models for count data using flexible random effects. Typically metagenomics datasets are analysed at phylum level and overdispersion is modeled by using gamma distributed random effects. However these models assume that all correlations are negative, which might not be true for metagenomic data. In our data sets we observe that some of the categories are positively correlated. We will consider more flexible mixed models. The methods will be applied to data from the centenarians. The methods will be implemented in R packages.
[Deliverable D2.3]
ESR 8 Flexible survival models University of Leeds This project aims to develop flexible statistical methods for analysis of (correlated) survival data. The assumption of proportional hazards made by Cox model is often violated in omics research. The ESR will work with data from the Leiden Longevity study which comprises multiple omics datasets in 420 families with at least two nonagenarian siblings. The study design requires adjustments for correlation, delayed entry and ascertainment (at least two nonagenarians). We have developed a method based on inverse probability weighting, but the model assumes a parametric hazard. Here we will extend the method to semi-parametric Cox models by using splines for the hazard. In addition flexible models for the covariates will be considered. Such a model is highly relevant since it is observed that the effect of covariates on mortality changes over time. Finally we will consider assessment of augmented prediction of omics datasets for mortality. The methods will be applied to the Leiden Longevity Study and survival data in ORCADES and TwinUK. The methods will be implemented in software (R).
[Deliverable D2.4]
ESR 9 Application of omics to data on centenarians University of Bologna A system medicine approach for ageing will be used. Integrative analysis methods will be applied to analyse the genetic, epigenetic, metagenomic and glycomics datasets available for centenarians and controls. Specific objectives are:
1. To identify epigenetic biomarkers for ageing by comparing epigenome datasets between the centenarians and controls;
2. To identify N-glycan biomarkers for ageing by comparing glycomics datasets between the centenarians and controls;
3. To analyse the composition of human gut microbiome at the phylum level as a function of host age by using ecological models; 4) To link the results from above with genomic datasets.
[Deliverable D2.5]
ESR 10 Application of omics to TwinsUK King’s College London Large dataset analysis methods will be applied to the study of aging and its relationship with chronic musculoskeletal pain (CWP), a highly prevalent condition largely overlooked by research efforts. TwinsUK has phenotyped twins for CWP (n=6000). The objectives are
1. To study epigenetic differences in pain-discordant monozygotic twins and its relationship with ageing;
2. Application of pathway analysis to explore further the underlying genetic architecture of the most highly associated epigenetic CpG sites;
3. To study glycomic associations with the chronic pain syndromes and its relationship with ageing.
[Deliverable D2.6]
ESR 11 Genetic variants in protein glycosylation University of Edinburgh In this project we will investigate the contribution of low frequency and rare genetic variants to glycomic and glycoproteomic variation. We will approach this in two ways, (a) firstly using new advances in imputation, whereby with an imputation reference panel of ~35,000 European whole genome sequences, it is now possible to impute genotypes down to about 0.1% minor allele frequency with reasonable accuracy. Through such efforts and the availability of sequence data from the cohorts themselves, a whole new class of genetic variation will be assayed for effects on glycomic traits. (b) Secondly, populations with high kinship, as are many of our platform resources, allow the application of diverse methods based on genomic sharing, such as regional heritability approaches and updated implementations of linkage, which complement association designs in their ability to detect rarer variants even in the face of heterogeneity. The latest methods of testing all rare variants jointly (burden tests) will also be applied to multiple glycans. We will apply and develop these tools to better define the genetic architecture underlying human glycomic variation.
[Deliverable D2.7]