Latest pubblication: mEthAE: an Explainable AutoEncoder for methylation data

July 2023

We are proud to see finalized the outstanding work of our Sonja Katz, TranSYS fellow from LifeGlimmer. The mEthAE tool is a revolutionary chromosome-wise autoencoder for interpretable dimensionality reduction of methylation data.

Sonja Katz,  Vitor A.P. Martins dos Santos, Edoardo Saccenti, Gennady V. Roshchupkin


Despite the wealth of knowledge generated through epigenome-wide association studies our understanding of the relationships of CpG sites is still limited, as analysis of DNA methylation data remains difficult due its high dimensionality. To combat this problem, deep learning algorithms, such as autoencoders, are increasingly applied to capture the complex patterns and reduce dimensionality into latent space. We believe that the way an autoencoder groups together CpGs in its latent dimensions has biological meaning and might reveal novel insights regarding the relationship of CpGs. Therefore, in this work, we propose a chromosome-wise autoencoder for interpretable dimensionality reduction of methylation data (mEthAE). Our framework shows an impressive reduction in dimensions of up to 400-fold compared to the provided input, without compromising on reconstruction accuracy or predictive power in the latent space. Through our perturbation-based interpretability approach we revealed groups of CpGs which are highly connected across all latent dimensions (global CpGs) and were significantly more often reported in EWAS studies, indicating our interpretability method can successfully identify CpGs with biological relevance. In an attempt to gain a deeper understanding of the relationship between individual CpG sites, we focused on interpreting individual latent features and found that CpGs connected to a common feature do not share biological associations, correlation patterns, or are located in close proximity on the chromosome. We conclude that while there is evidence that the autoencoder does not group CpGs randomly, the logic behind the observed CpG relationships can not be delineated easily. With regards to the analyses done in this work, we believe that the autoencoder groups CpGs according to long range non-linear interaction patterns that lack characterisation in the current epigenetic research landscape.

November 2022

Sonja Katz, TranSYS fellow from Lifeglimmer, spearheaded this outstanding work on developing a cutting-edge clinical decision support system using machine learning to accurately predict mortality in Necrotizing Soft Tissue Infections, showcasing our expertise in advancing medical care.

Sonja Katz, Jaco Suijker, Christopher Hardt, Martin Bruun Madsen, Annebeth Meij-de Vries, Anouk Pijpe, Steinar Skrede, Ole Hyldegaard, Erik Solligård, Anna Norrby-Teglund, Edoardo Saccenti , Vitor A.P. Martins dos Santos

In short

Necrotizing Soft Tissue Infections (NSTI) are severe infections with high mortality rates. To address the need for early prediction of outcomes and treatment recommendations, a machine learning model based on a Random Forest algorithm was developed. Interviews with medical professionals helped identify relevant clinical needs, resulting in 24 questions. Using data from the prospective INFECT cohort, 16 predictive parameters related to sepsis were identified, enabling accurate 30-day mortality predictions (AUC = 0.91). The model outperformed the SOFA score and showed comparable performance to the SAPS II score. The developed model proved stable even with missing data or early available variables. This study establishes the basis for a comprehensive clinical decision support system encompassing various outcomes and clinical questions.

Latest conference paper on Automated Semantification of Bioassays

July 2022

LifeGlimmer is proud to endorse the excellent work of Marco Anthegini, PerICo ESR and LifeGlimmer bioinformatician, on his approach to automatically semantifying biological assays, which outperforms state-of-the-art approaches.

Anteghini, Marco, Jennifer D’Souza, Vitor AP dos Santos, and Sören Auer. "Easy semantification of bioassays." In International Conference of the Italian Association for Artificial Intelligence, pp. 198-212. Springer, Cham, 2022.


Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.

TranSYS Marie Curie fellow from LifeGlimmer presents at
BioSB 2022

July 2022

TranSYS fellow Sonja Katz from LifeGlimmer GmbH was selected as oral presenters at the BioSB 2022 conference taking place from 27th-28th of June in Lunteren (The Netherlands).

The Dutch Bioinformatics & Systems Biology conference (BioSB) discusses the latest developments in bioinformatics, systems and computational biology and interrelated disciplines, and their wide-ranging applications in life sciences & health, agriculture, food & nutrition.

The session “Multi-omics deep-nets” accommodated presentations covering the growing efforts in applying deep learning algorithms to better interpret omics-derived data. With her project “methAE: an interpretable autoencoder for methylation data” Sonja presented her work proposing a deep unsupervised autoencoder for interpretable dimensionality reduction of methylation data.

We were incredibly grateful to have been given the opportunity to present at such an insightful event and want to thank all of the organizers of BioSB 2022. See you next year at BioSB 2023!

Full abstracts can be found in the BioSB 2022 abstract book at

Latest publication: OrganelX Web Server for Sub-Peroxisomal and Sub-Mitochondrial protein localisation

June 2022

The novel work of our PerICo ESR, Marco Anteghini, features the first-of-its-kind web server for sequence localisation predictive tasks and is now available on bioRXiv!

OrganelX Web Server for Sub-Peroxisomal and Sub-Mitochondrial protein localisation

Marco Anteghini, Asmaa Haja, Vitor AP Martins dos Santos, Lambert Schomaker, Edoardo Saccenti

bioRxiv 2022.06.21.497045; doi:

Read more about it here:

Computational approaches for sub-organelle protein localisation and identification are often neglected while general methods, not suitable for specific use cases, are promoted instead. In particular, organelle-specific research lacks user-friendly and easily accessible computational tools that allow researchers to perform computational analysis before starting time-consuming and expensive wet-lab experiments. We present the Organelx e-Science Web Server which hosts three sequence localisation predictive algorithms: In-Pero and In-Mito for classifying sub-peroxisomal and sub-mitochondrial protein localisations given their FASTA sequences, as well as the Is-PTS1 algorithm for detecting and validating potential peroxisomal proteins carrying a PTS1 signal. These tools can be used for a fast and accurate screening while looking for new peroxisomal and mitochondrial proteins. To our knowledge, this is the only service that provides these functionalities and can fasten the daily research of the peroxisomal science community.

LifeGlimmer represented at RECOMB 2022 in La Jolla, California

May 2022

RECOMB 2022 was the 26th edition of a series of algorithmic computational biology conferences bridging the areas of computational, mathematical, statistical and biological sciences. This year's edition took place in La Jolla, California, from May 22, 2022 – May 25, 2022.

LifeGlimmer representatives Marco Anteghini (PerICo) and Sonja Katz (TranSYS) were selected for poster presentations highlighting their work on peroxisomal proteins and clinical decision support systems. 

More information the RECOMB 2022 is available at

New publication BMC Medicine: Gene association networks revealing patient-specific responses in necrotising soft tissue infections (NSTI)

May 2022

We are proud to announce the result of a collaborative effort of the PerMIT consortium (supported by the ERA PerMed) on applying a personalised medicine approach to necrotising soft tissue infections. 

Lorna Morris, senior researcher at LifeGlimmer, played a leading role in designing and executing this interdisciplinary study. 

Read the full abstract here: 

Jahagirdar, Sanjeevan, Lorna Morris, Nirupama Benis, Oddvar Oppegaard, Mattias Svenson, Ole Hyldegaard, Steinar Skrede, Anna Norrby-Teglund, Vitor AP Martins dos Santos, and Edoardo Saccenti. "Analysis of host-pathogen gene association networks reveals patient-specific response to streptococcal and polymicrobial necrotising soft tissue infections." BMC medicine 20, no. 1 (2022): 1-18.


Necrotising soft tissue infections (NSTIs) are rapidly progressing bacterial infections usually caused by either several pathogens in unison (polymicrobial infections) or Streptococcus pyogenes (mono-microbial infection). These infections are rare and are associated with high mortality rates. However, the underlying pathogenic mecha‑

nisms in this heterogeneous group remain elusive. Methods: In this study, we built interactomes at both the population and individual levels consisting of host-patho‑

gen interactions inferred from dual RNA-Seq gene transcriptomic profles of the biopsies from NSTI patients. Results: NSTI type-specifc responses in the host were uncovered. The S. pyogenes mono-microbial subnetwork was enriched with host genes annotated with involved in cytokine production and regulation of response to stress. The polymicrobial network consisted of several signifcant associations between diferent species (S. pyogenes, Porphyromonas asaccharolytica and Escherichia coli) and host genes. The host genes associated with S. pyogenes in this subnetwork were characterised by cellular response to cytokines. We further found several virulence factors including hyaluronan synthase, Sic1, Isp, SagF, SagG, ScfAB-operon, Fba and genes upstream and downstream of EndoS along

with bacterial housekeeping genes interacting with the human stress and immune response in various subnetworks between host and pathogen. Conclusions: At the population level, we found aetiology-dependent responses showing the potential modes of entry and immune evasion strategies employed by S. pyogenes, congruent with general cellular processes such as diferentiation and proliferation. After stratifying the patients based on the subject-specifc networks to study the patient-specifc response, we observed diferent patient groups with diferent collagens, cytoskeleton and actin monomers in association with virulence factors, immunogenic proteins and housekeeping genes which we utilised to postulate difering modes of entry and immune evasion for diferent bacteria in relationship to the patients’ phenotype.

Latest work on the peroxisomal protein inventory of Zebrafish published in Frontiers in Physiology

February 2022

We are proud to announce the contribution of our PerICo ESR Marco Anteghini to a project orchestrated by colleagues from the University of Exeter, revolving around combining bioinformatics analyses with molecular cell biology.

Kamoshita, Maki, Rechal Kumar, Marco Anteghini, Markus Kunze, Markus Islinger, Vítor Martins dos Santos, and Michael Schrader. "Insights into the peroxisomal protein inventory of zebrafish." Frontiers in Physiology (2022): 322.


Peroxisomes are ubiquitous, oxidative subcellular organelles with important functions in cellular lipid metabolism and redox homeostasis. Loss of peroxisomal functions causes severe disorders with developmental and neurological abnormalities. Zebrafish are emerging as an attractive vertebrate model to study peroxisomal disorders as well as cellular lipid metabolism. Here, we combined bioinformatics analyses with molecular cell biology and reveal the first comprehensive inventory of Danio rerio peroxisomal proteins, which we systematically compared with those of human peroxisomes. Through bioinformatics analysis of all PTS1-carrying proteins, we demonstrate that D. rerio lacks two well-known mammalian peroxisomal proteins (BAAT and ZADH2/PTGR3), but possesses a putative peroxisomal malate synthase (Mlsl) and verified differences in the presence of purine degrading enzymes. Furthermore, we revealed novel candidate peroxisomal proteins in D. rerio, whose function and localisation is discussed. Our findings confirm the suitability of zebrafish as a vertebrate model for peroxisome research and open possibilities for the study of novel peroxisomal candidate proteins in zebrafish and humans.

LifeGlimmer joins the ELIXIR BioHackthon 2021

November 2021

Last week researchers, developers, and passionate hackers from all over the world gathered in Barcelona for the 4th edition of the ELIXIR BioHackathon 2021. 

But what is the #BioHackEU21? The one-week event (8th-12th November 2021) hosted by ELIXIR Europe is the perfect opportunity to engage with people from all areas in Bioinformatics to collaborate on a joint software project. Not only did it provide the chance to network and exchange ideas, but also kick-started novel collaborations through hands-on programming activities.  The hybrid event with more than 420 participants from all over the world (including not only Europe but also the US, Japan, and Australia) worked to advance a total of 37 different projects, ranging from standardised workflows, ontology tooling, metadata validation, to training and many more.

Marco Anteghini and Sonja Katz, PhD candidates trained by LifeGlimmer, joined the open-source project “MOWL - Machine Learning with Ontologies”, coordinated by Maxat Kulmanov (, Postdoctoral Research Fellow in the Bio-Ontology Research Group at the King Abdullah University of Science and Technology located in Thuwal, Saudi Arabia. Our efforts can be found in the official GitHub repository [1] and will be made available as pre-print through BioHackrXiv [2].

Many thanks to @ELIXIREurope for coming up with such a great event, and to the participants for making it one. Special thanks to the MOWL project group – hope to see all of you at the next BioHackathon!



Sonja Katz - originally published in

New publication on a deep learning approach to predict the location of peroxisomal proteins

June 2021

Marco Anteghini, PerICo ESR at LifeGlimmer, presents his latest publication:

Anteghini M, Martins dos Santos V, Saccenti E. In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins. International Journal of Molecular Sciences. 2021; 22(12):6409.


Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs. membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools.