Research Day 16 September 2020

Group photo from the Research Day Sept 2020


A Research Day took place virtually via MS Teams on 16 September 2020. The ESRs gave presentations on the work they have done in their research projects, and they also discussed their results in a broader context with the aim of showing the multidisciplinary aspects within the network.

Discussions took place after each session, and the Scientific Advisory Board also provided valuable feedback to the group.



09.00 – 09.10  Welcome, Jeanine Houwing-Duistermaat
09.10 – 10.30  ESR research presentations – Work Package 1, Chair Manfred Wuhrer

  • ESR1 Tamás Pongrácz
    “The structure and role of lactone intermediates in linkage-specific sialic acid derivatization reactions”
  • ESR2 Frania Zuñiga Bañuelos
    “In-depth N-glycoproteomic analysis of plasma proteins”
  • ESR3 Samira Smajlović
    “Modulation of the HNF1A and FOXA2 genes, using CRISPR/dCas9 molecular tools, in studies of their role in glucose stimulated insulin secretion”
  • ESR4 Azra Frkatović
    “Genetic regulation of IgG N-glycosylation”

10.30 – 11.00  Discussions on WP1 presentations
11.00 – 11.30  Break
11.30 – 12.30  ESR research presentations – Work package 2, Chair Hae-Won Uh

  • ESR5 Zhujie Gu
    “Statistical Integration of TwinsUK Methylation and Glycomics Data Using GO2PLS”
  • ESR6 Iva Budimir
    “Characterization of DNA methylation correlation structure in Down Syndrome”
  • ESR7 Maarten van Schaik
    “A random effect model for high-dimensional regression of count categories”

12.30 – 13.00  Discussions on WP2 presentations
13.00 – 14.00  Lunch
14.00 – 15.20  ESR research presentations – Work package 2, Chair Gastone Castellani

  • ESR8 Annah Muli
    “The effect of misspecification of frailty distribution on survival probability estimates when analysing survival data in twins”
  • ESR10 Md Shafiqur Rahman
    “Genome-wide association study identifies RNF123 associated with chronic widespread musculoskeletal pain”
  • ESR11 Arianna Landini
    “GWAS of transferrin N-glycans: one step closer to understanding the genetics of protein glycosylation”

15.20 – 15.50  Discussions on WP2 presentations
15.50 – 16.20  Break
16.20 – 16.45  Session on how ESRs should leave their data and information
16.45 – 17.15  Feedback from Scientific Advisory Board, Frederique Lisacek and Ivo Ugrina


o   ESR1 Tamás Pongrácz
“The structure and role of lactone intermediates in linkage-specific sialic acid derivatization reactions”
Abstract:        Sialic acids occur ubiquitously throughout vertebrate glycomes and often endcap glycans in either α2,3- or α2,6-linkage with diverse biological roles. Linkage-specific sialic acid characterization is increasingly performed by mass spectrometry, aided by differential sialic acid derivatization to discriminate between linkage isomers. Typically, during the first step of such derivatization reactions, in the presence of a carboxyl group activator and a catalyst, α2,3-linked sialic acids condense with the subterminal monosaccharides to form lactones, while α2,6-linked sialic acids form amide or ester derivatives. In a second step, the lactones are converted into amide derivatives. Notably, the structure and role of the lactone intermediates in the reported reactions remained ambiguous, leaving it unclear to which extent the amidation of α2,3-linked sialic acids depended on direct aminolysis of the lactone, rather than lactone hydrolysis and subsequent amidation. In this report, we used mass spectrometry to unravel the role of the lactone intermediate in the amidation of α2,3-linked sialic acids by applying controlled reaction conditions on simple and complex glycan standards. The results unambiguously show that in common sialic acid derivatization protocols prior lactone formation is a prerequisite for the efficient, linkage-specific amidation of α2,3-linked sialic acids, which proceeds predominantly via direct aminolysis. Furthermore, nuclear magnetic resonance spectroscopy confirmed that exclusively the C2 lactone intermediate is formed on a sialyllactose standard. These insights allow a more rationalized method development for linkage-specific sialic derivatization in the future.

o   ESR2 Frania Zuñiga Bañuelos
“In-depth N-glycoproteomic analysis of plasma proteins”
Abstract:        A workflow has been developed to perform an in-depth N-glycoproteomic analysis. This analysis will provide insights into the occupancy of N-glycosylation sites per protein (macroheterogeneity), including the diversity and abundance of the N-glycans bound to each N-glycosylation site (microheterogeneity). The workflow comprises critical steps such as fractionation of low-abundant proteins and glycopeptide enrichment. These steps were optimized during the last year of the project. The optimization led towards a balance among protein identification, time, effort, and workflow-cost, that is being assessed

o   ESR3 Samira Smajlović
“Modulation of the HNF1A and FOXA2 genes, using CRISPR/dCas9 molecular tools, in studies of their role in glucose stimulated insulin secretion”
Introduction HNF1A (the hepatocyte nuclear factor 1A), together with FOXA2, is a master regulator of a network of genes responsible for proper glucose stimulated insulin secretion (GSIS) in mice pancreatic β cells. These genes regulate, in coordinate manner, N-acetylglucosaminyltransferase (Gnt)-IV responsible for proper glycosylation of glucose transporter receptors GLUT1 and GLUT2 present on the surface of β cells as well as a proper glucose intake. Here, I wanted to examine if these two genes have the same key role in GSIS in human pancreatic β cells.

Aims of the study. The first aim of this study was to see if the inactivation of HNF1A and FOXA2 would change a proper GSIS. For this purpose, I manipulated HNF1A promoter methylation in a model cell line for human pancreatic β cells (1.1B4 cell line) where I targeted 4 CpG sites, previously established to have putative role in this gene regulation. Additionally, I manipulated gene expression directly using dCas9-VPR for gene reactivation. Also, I examined whether a greater effect on gene expression could be achieved if the both fusion constructs, dSaCas9-TET1 and dSpCas9-VPR, were used simultaneously in the same cell. The effect of the dCas9 constructs in 1.1B4 cells was monitored over time through a time course experiment. The change in gene expression level was confirmed by analysis of the HNF1A protein level in the same cells. As the first genome-wide association study of the human plasma N-glycome revealed that HNF1A is a major regulator of protein fucosylation, my second aim was to analyze if the change in the HNF1A expression would affect expression of downstream genes including glycosyltransferases, fucosyltransferases and fucose biosynthesis genes.

Results. When using dSpCas9-VPR fusion solely the change in transcriptional activity of HNF1A gene was the highest. On the other hand, when using both constructs simultaneously (dSaCas9-TET1 and dSpCas9-VPR) the increase in HNF1A expression was followed by significant downregulation of most fucosylstranferases 8th day post transfection. Analysis of methylation level on 4 CpG sites in 1. exon of HNF1A and gene expression level revealed an inverse association between the two, confirming the regulatory role of the 4 CpG sites for the transcription of this gene in 1.1B4 cell line.

Targeting of FOXA2 gene in 1.1B4 cells using SpdCas9-KRAB fusion construct was followed by decrease in the level of gene transcript by around 4 times compared to non- targeting control. In the ongoing experiments, I will focus on the targeting of HNF1A and FOXA2 simultaneously, using the dSpCas9-KRAB for gene repression, and monitor the expression of downstream genes, as well as the effect on glucose uptake in 1.1B cells.

My final goal is to monitor changes of the glycan phenotype, both whole N-glycome as well as GLUT receptors’ glycosylation. Proper glycosylation of GLUT receptors is considered as crucial for their proper localization on cell surface and proper GSIS. Therefore, I will examine localization of GLUT receptors and monitor GSIS following epigenetic manipulations of HNF1A and FOXA2 genes using CRISPR/dCas9 molecular tools.

o   ESR4 Azra Frkatović
“Genetic regulation of IgG N-glycosylation”
Abstract:        The complex regulatory network behind IgG glycosylation comprises of numerous components including enzymes, transcription factors, transporters and other proteins, thus making our understanding of this process very limited. Genome-wide association study (GWAS) of IgG N-glycosylation was performed using both LCMS and UPLC-measured IgG glycan samples, with aim to increase power to identify novel genetic variants involved in this biosynthetic pathway. After discovery GWAS of eleven glycan traits in 13705 individuals of European descent and total of 43 discovered genomic loci, replication analysis was performed (n=2840). With aim of prioritizing gene candidates, a set of strategies was employed: prediction of variant effects in associated genomic loci, colocalization analysis with expression QTLs in blood and relevant immune cell types, and function exploration. Gene set and tissue enrichment analyses were performed using the FUMA (Functional Mapping and Annotation of GWAS). After finalizing the list of candidate genes, we aim to propose a functional network of genes involved in regulation of IgG glycosylation that will be functionally tested in the relevant cell culture.

o   ESR5 Zhujie Gu
“Statistical Integration of TwinsUK Methylation and Glycomics Data Using GO2PLS”
Abstract:        Integration of correlated omics datasets is an open research topic. Various methods have been proposed for this purpose, such as PLS-related approaches, which decompose datasets into joint and residual parts. Omics data are heterogeneous (e.g. differences in scale, dimensionality, etc.) and the joint parts estimated in PLS contain data-specific variations. O2PLS was proposed to capture the heterogeneity using data specific parts and better estimate the joint parts. However, O2PLS does not identify relevant features. Selection of a small subset of relevant features is need for further investigation and better interpretation.

We extended O2PLS to Group Sparse O2PLS (GO2PLS) that performs feature selection while integrating two heterogeneous omics datasets. We utilized the group structures in the omics data (e.g., CpG islands in the methylation data) to improve the reliability of the selection procedure. The simulation study showed that GO2PLS improved the performance concerning feature selection, joint score estimation, and joint loading estimation, comparing to O2PLS. The method was applied to integrate the methylation and glycomics datasets from the TwinsUK study. The selected methylation groups turned out to be relevant to the immune system, in which glycans play an important role.

Finally, we will reflect on the method development process of GO2PLS from a multidisciplinary perspective. More discussion and input from domain experts are needed based on peer reviewers’ comments. The datasets and challenges of the next project will be introduced. Examples of how multidisciplinary thinking has changed the way in which we approach projects will be discussed.

o   ESR6 Iva Budimir
“Characterization of DNA methylation correlation structure in Down Syndrome”
Abstract:        Cytosine methylation in the humane genome is an important and well-studied epigenetic mark which has the potential to regulate gene expression. Since the process of methylation predominantly occurs at CG dinucleotide sequences, the target of studies are these so-called CpG sites. One of the available tools, Infinium 450k assay measures the level of methylation of ~450,000 CpG sites spread across the genome [1]. Even though there are general rules in which CpG methylation or the lack of it regulates the expression of genes, there are many exceptions and we are far from understanding the exact regulatory mechanisms.

Studies of methylation patterns suggest the bimodal behaviour where different clusters of CpG sites tend to be either hyper-methylated or hypo-methylated, rarely existing in intermediate states [2]. This group behaviour suggests that CpG sites are not independent, but rather that the methylation profile is guided by the complicated network structure of CpG sites. The aim of this study was to reconstruct and characterize the correlation network of DNA methylation in blood in the context of Down syndrome. To reconstruct the network, we considered a publicly available data set that included DNA methylation data of 728 healthy control blood samples [3]. For computational reasons, we focused on a smaller subset of CpG sites. Specifically, we studied only ~4000 CpGs sites which are located on chromosome 21. We investigated the ability of penalization methods (lasso and ridge) in the estimation of partial correlations among CpGs. However, the resulting networks contained only a few non-zero edges, possibly because of the presence of (larger) groups of CpGs which are mutually strongly correlated. Hence we estimated the relationships between CpG methylation computing the mean of 100 Pearson correlation matrices obtained using the bootstrap approach. On the reconstructed control network, we performed a cluster analysis and calculated different network measurements.

Our results show that not only nearby CpGs were highly correlated, but also some CpG sites which are far away considering their chromosomal position. Moreover, not all nearby CpGs (located in the same CpG island) showed a strong correlation. Lastly, we compared the control network obtained on the 728 samples with the networks obtained from the Down syndrome data set consisting of methylation measurements for Down syndrome patients (29), their unaffected siblings (29) and their mothers (29) [4].

1. Bibikova M, Barnes B, Tsan C, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–295. doi:10.1016/j.ygeno.2011.07.007
2. Lövkvist C, Dodd IB, Sneppen K, Haerter JO. DNA methylation in human epigenomes depends on local topology of CpG sites. Nucleic Acids Res. 2016;44(11):5123–5132. doi:10.1093/nar/gkw124
3. Johansson A, Enroth S, Gyllensten U. Continuous Aging of the Human DNA Methylome Throughout the Human Lifespan. PLoS One 2013;8(6): e67378. PMID: 23826282
4. Bacalini MG, Gentilini D, Boattini A, et al. Identification of a DNA methylation signature in blood cells from persons with Down Syndrome. Aging (Albany NY). 2015;7(2):82–96. doi:10.18632/aging.100715

o   ESR7 Maarten van Schaik
“A random effect model for high-dimensional regression of count categories”
Abstract:        Omics research often involves the analysis of data sets that takes the form of multivariate count data, such as taxa or gene counts. Negative binomial regression is often used to model count data as a function of covariates, some of which are categorical. When the number of categories is high, the model needs to specify many parameters, which is one per category. However, models with many parameters tend to overfit the data, which can negatively impact the estimation of model parameters or prediction. Previous studies proposed to decrease the number of parameters by including only the most common categories or pooling them together. This strategy has some disadvantages. First, the choice may be arbitrary and create a challenge in the interpretation and, second, some valuable information might be lost.

To address this challenge, we propose a negative binomial regression method that uses random effects for the covariates, instead of fixed effects. Specifically, we assume that the category-specific effects follow a distribution, and we only have to estimate the mean and variance of this distribution. Although this model yields a reduced number of parameters to be estimated, it may result in biased estimates.

We study the performance of the models via a simulation study and present some of its preliminary results. Furthermore, a motivational case is given in the form of an application to a microbiome data set. Here the interest is to assess the effect of helminth infection and albendazole treatment on the count distribution.

o   ESR8 Annah Muli
“The effect of misspecification of frailty distribution on survival probability estimates when analysing survival data in twins”
Abstract:        Our goal is to estimate individual specific probabilities of fracture in the next time period. We will have access to data on fracture incidence in twins. We propose to use a parametric Cox proportional hazard model including a frailty with a Weibull baseline hazard. Here, the frailty represents unobservable random effects which are shared by the twins and influence fracture incidence. Based on computational arguments, the frailty is typically assumed to follow a gamma distribution. However we have shown that if the frailty distribution is not correctly specified the estimates of the regression coefficient and of the survival probabilities may be biased. It also appeared that in many cases, the baseline hazard scale parameter corrects for the wrong frailty distribution. This suggests that using a flexible baseline hazard instead of a parametric one might solve this problem and improve estimation of the regression coefficients and survival probabilities.

We propose a method that makes use of splines for the baseline hazard to make it flexible. Full maximum likelihood is then used for estimation of parameters. We will show results for simulations of the proposed method. Based on these results we will propose a model for the observed incidences in the twins study. We will discuss which of the following covariates should be included standard demographic variables (gender, year of birth, ethnicity, height, and weight), bone mineral density, body mass index, smoking status (Yes/No), number of falls in past 6 months, and IgG glycans.

o   ESR9 Anna Carbó Meix
“Multifaceted Biomarker of Aging for Estimating Biological Age”
Abstract:        Aging is a time-dependent multifactorial process that results in a global deterioration of the physiological functions and elevated risk of pathologies, including cardiovascular disorders (CVDs), neurodegenerative diseases, cancer and diabetes. It is well-known that age is a major risk factor for functional impairments, chronic diseases and mortality; however, the aging rate is not universal for humans, as it depends on the individual exposure/resilience trade-off. Consequently, chronological age may not be a reliable indicator of the body’s physiological decline, but rather a proxy of the aging growth rate. In this line, the unprecedented growth rate of world’s aging population is highlighting the need for better understanding the aging process and the determinants of healthy aging. Many research studies have described biomarkers of aging, which constitute tiny pieces encompassing the complex puzzle of the aging process, but, thus far, none has succeeded to holistically capture the key mechanisms of aging. The purpose of this study was to develop an aging clock by using different sources of phenotypical data – clinical, anthropometrical, blood measures – and molecular data – N-glycans – in N=1146 individuals, including controls, prediabetics and diabetics, males and females, between 20 and 85 years of age. Accordingly, the aims of the study were: 1) to find the best statistical model, composed of a multifaceted biomarker, that best describes chronological age, 2) to measure the associations of this aging clock with chronological age, disease status (healthy, prediabetic or diabetic), and continuous phenotypical and glycomic traits, and 3) to assess whether the groups of prediabetic and diabetic patients show an accelerated aging as compared with controls, and, within each group, whether there are differences between the two sexes. Results indicate that the Phenotypical Age (defined as the fitted values of age in the training set) is significantly associated with i) chronological age in both males and females, ii) disease status (prediabetic/diabetic) in both males and females, iii) insulin, triglycerides, C reactive protein, and eosinophils, independent of age, sex and disease status, and iv) diabetic males are significantly older than control males, control females, prediabetic females, and diabetic females. Our main conclusions are that Phenotypical Age is a combinatorial biomarker capable of reflecting 2 of the 7 pillars of aging, inflammation and metabolism, and that it could be a representative of biological age. Overall, our results stress the importance to study aging, separately in men and women; in addition, they reflect the significant differences that can be observed in the general population in average life expectancy, the relative prevalence of metabolic diseases and the onset of these diseases between the two sexes.

o   ESR10 Md Shafiqur Rahman
“Genome-wide association study identifies RNF123 associated with chronic widespread musculoskeletal pain”
Abstract:        Chronic widespread musculoskeletal pain (CWP) is a major symptom of fibromyalgia, a complex trait with poorly understood pathogenesis. While clearly heritable (48-54% heritability), its genetic architecture remains to be determined. Candidate gene approaches to study genetic factors contributing to CWP have yielded inconsistent findings and are naturally biased. Genome-wide association studies (GWAS) of CWP are of limited size and scope. We aimed to get insight into genetic background of CWP via genome-wide association study (GWAS). For the discovery, we performed GWAS on 6,914 CWP cases (defined below) and 242,929 controls of European descent from UK Biobank (UKB). Independent SNPs passing a p-value threshold 5.0E-08 were submitted for replication in 43,080 individuals of European ancestry (14,177 CWP cases and 28,903 controls) from six independent cohorts originated from the UK (TwinsUK and The English Longitudinal Study of Ageing), the Netherlands (The Rotterdam Study I, II and III), and Norway (The Nord-Trøndelag Health Survey). Also, in-silico follow-up was performed to obtain additional insight. Phenotype definition primarily considered widespread-ness of the pain and secondarily exclusion of diagnostic confounders such as rheumatoid arthritis and myalgia. The latter could not be applied to most of the replication cohorts due to lack of data availability, which is one of the limitations of this study. We identified 3 genome-wide significant loci (tagged by rs1491985, rs10490825, rs165599) harbouring genes RNF123, ATP2C1, and COMT/ARVCF. Association of CWP with the RNF123 locus was replicated in the sample size based meta-analysis of six cohorts (p=0.0002) whereas the ATP2C1 locus showed suggestive association (p=0.0227). The COMT/ARVCF locus did not replicate. We found independent genetic correlations between CWP and depressive symptoms, body mass index (BMI), age at first birth, and years of schooling. Mendelian randomisation analysis revealed causal impact of BMI on CWP (OR=1.014, 95% CI= 1.01-1.02; p=2.38E-09). Conversely, CWP was found causally related to BMI (OR=3.62, CI= 2.35-5.57, p=2.19E-09). Gene-based analysis revealed several clusters of CWP mapped genes exhibiting significant downregulation in the muscle, brain, whole blood, pancreas and heart tissues. The study identified RNF123 as the new CWP gene, provided evidence for bi-directional causality between CWP and BMI, and established tissue specificity in expression of multiple genes likely related to CWP. In depth investigation of these genes may provide breakthrough in pathogenesis of CWP.

MDSR received funding from the European Union’s Horizon 2020 research and innovation program IMforFUTURE, under H2020-MSCA-ITN grant agreement number 721815. The research was carried out using the UK Biobank Resource under project number 18219. We would like to thank all the participants of UK Biobank, The Nord-Trøndelag Health Survey, English Longitudinal Study of Aging, TwinsUK and Rotterdam study I, II and III.

o   ESR11 Arianna Landini
“GWAS of transferrin N-glycans: one step closer to understanding the genetics of protein glycosylation”
Abstract:        Glycomics, studying the collection of glycans in biological systems, is an emerging field among omics data. Despite glycans being involved in the aging process and in a wide variety of complex diseases, genetic regulation of glycosylation is yet not fully understood. In this study, we uncovered for the first time which genes are responsible for regulating N-glycosylation of transferrin glycoproteins. Further, we investigated whether the same genetic variants are pleiotropically involved in N-glycosylation process of transferrin and immunoglobulin G (IgG) glycoproteins.

We performed genome-wide association meta-analysis of 35 transferrin N-glycan traits (N=1890) and 24 IgG N-glycan traits (N=2020) in European-descendent CROATIA-Korcula and VIKING cohorts. We then performed traits colocalisation analysis to assess the presence of pleiotropic variants having a direct effect on both transferrin and IgG N-glycosylation.

We identified 10 loci significantly associated (P < 1.43 ×10⁻⁹) with transferrin N-glycosylation, mapped to genes encoding glycosyltransferases (MGAT5, FUT6, FUT8, ST3GAL4, B3GAT1), transcription factors already known to be (HNF1a) or that might be (FOXI1) involved in regulating N-glycosylation, genes coding for integral membrane glycoproteins receptors (MSR1) and transferrin protein (TF), or previously associated with IgG N-glycosylation (NXPE1/NXPE4). Three of these genomic regions (TF, FOXI1, MSR1) have never been previously associated with protein N-glycosylation. Six of them, discovered in CROATIA-Korcula cohort, replicated in VIKING cohort (MGAT5, TF, ST3GAL4, B3GAT1, FUT8, FUT6). N-glycosylation of both transferrin and IgG proteins resulted to be associated with FUT8 and FUT6 genes. Using traits colocalisation methods, we did not find in these shared genomic regions evidence of pleiotropic variants affecting at the same time transferrin and IgG N-glycosylation, but rather multiple causal variants, exhibiting an independent effect on each protein.

Uncovering for the first time genes responsible for transferrin N-glycosylation, our results contribute expanding the current knowledge about genetic regulation of N-glycosylation. Further, they suggest that, while the same enzyme is involved in N-glycosylation of transferrin and IgG proteins, multiple causal genetic variants could independently affect each protein.