Non-globular proteins in the era of Machine Learning
COST Action CA21160
ML4NGP TRAINING SCHOOL
4th ML4NGP Training School "Introduction to Machine learning for Non-globular proteins"
February 17-20, 2026
Seville, Spain
APPLICATIONS FOR THIS EVENT ARE NOW CLOSED.
DESCRIPTION
Machine Learning (ML) has become an essential tool in modern structural biology, offering unprecedented opportunities to analyse, model, and predict the behaviour of non-globular proteins (NGPs), including intrinsically disordered proteins and regions, repeat proteins, and other flexible systems. The integration of ML approaches with biophysical and computational data is reshaping our understanding of protein disorder, dynamics, and function.
The aim of this Training School is to provide participants with both theoretical foundations and practical experience in applying ML methods to NGP research. Through a balanced combination of lectures and hands-on sessions, participants will explore how different ML techniques—ranging from classical algorithms to deep learning architectures—can be employed to analyse NGP sequences, predict disorder-related features, and interpret model outputs. Emphasis will be placed on understanding the underlying principles, proper evaluation of results, and identification of potential pitfalls and biases in ML applications.
The Training School is designed for PhD students, postdoctoral researchers, and young investigators with a background in life sciences, bioinformatics, or related disciplines. Participants will gain the conceptual and technical skills needed to interpret ML results critically and to apply them effectively in their own research. By fostering active participation and discussion, this event aims to strengthen the growing community of researchers working at the intersection of machine learning and protein science, advancing our collective understanding of non-globular proteins.
TARGET AUDIENCE AND REQUIREMENTS
Exclusive for the working group members of the COST Action CA21160. To ensure a balanced and diverse representation, participants will be selected by the organizing committee with attention to gender balance, early career stage (PhD students and junior postdocs), and geographic diversity. The number of spots available are limited to facilitate in-depth discussion and hands-on learning.
This Training School is targeted at PhD students and early career postdocs with (nearly) zero experience in Machine Learning who are interested in learning the basics of this technique to advance the study of non-globular proteins, with a special focus on IDPs. Participants are expected to have prior knowledge and basic training in Bioinformatics, Computational Biology or related sciences.
Application Deadline: 17 November 2025
Notification of acceptance: 24 November 2025
Please note that your application does not confirm that you have been selected for the training school.
You will be informed the selection results by 24 November 2025.
- A short motivation statement (max. 500 characters) and CV will be requested in the application form
- COST Action ML4NGP will select and reimburse travel expenses for a maximum of 22 participants.
- A certificate of attendance will be sent no later then one week after the end of the school.
Confirmed trainers and speakers
Giulio Tesei (Malmö University, Sweden)
Gabor Erdos (Eötvös Loránd University, Hungary)
Nicola Bordin (University College London, UK)
Giacomo Janson (Michigan State University, USA)
Silvio Tosatto (University of Padova, Italy)
Alexander Monzon (University of Padova, Italy)
preliminary PROGRAM
TUESDAY | FEBRUARY 17, 2026
14:00-14:15 Introduction to the TS
14:15-15:00 Lecture: Silvio Tosatto (AI and standards)
15:00-15:30 Coffee break
15:30-18:00 Lecture: Gabor Erdos
WEDNESDAY | FEBRUARY 18, 2026
09:30-10:30 Lecture: Gabor Erdos
10:30-11:00 Coffee Break
11:00-13:00 Practice: Gabor Erdos
13:00-14:30 Lunch
14:30-15:30 Lecture: Giulio Tesei
15:30-16:30 Practice: Giulio Tesei
16:30-17:00 Coffee Break
17:00-18:00 Practice: Giulio Tesei
THURSDAY | FEBRUARY 19, 2026
09:30-10:30 Lecture: Nicola Bordin
10:30-11:00 Coffee Break
11:00-13:00 Practice: Nicola Bordin
13:00-14:30 Lunch
14:30-15:30 Lecture: Giacomo Janson
15:30-16:30 Practice: Giacomo Janson
16:30-17:00 Coffee Break
17:00-18:00 Practice: Giacomo Janson
FRIDAY | FEBRUARY 20, 2026
09:30-10:30 Lecture: Gabor transformers and AF
10:30-11:00 Coffee Break
11:00-12:00 Practice: Gabor
12:00-13:00 Practice
13:00-14:30 Lunch
14:30-15:30 Lecture
15:30-16:30 Practice
16:30-17:30 Final Notes
Sessions description
Trainer: Gabor Erdos
Machine learning methods have become exceptionally powerful tools for exploring and understanding biological systems in ways that were not possible before. In this tutorial, we will take a deep and intuitive dive into how classical, “old-school” machine learning techniques operate—including linear and logistic regression, decision trees, and random forests—before gradually moving toward neural networks and modern deep learning approaches.
Rather than treating these models as black boxes, we will build several machine learning models step by step, essentially “by hand,” to develop a clear understanding of their underlying principles. Along the way, we will discuss where each method excels, the types of problems it is best suited for, and the limitations and trade-offs that come with it. The tutorial is designed to be fully accessible and does not require any prior knowledge of machine learning or programming; for participants without coding experience, all necessary code will be provided in advance so the focus can remain on concepts and intuition.
Trainer: Nicola Bordin
The emergence of accurate protein structure models, large-scale sequence databases, and protein language models is dramatically accelerating research in structural bioinformatics. This deluge of data has given rise to two main areas of research: classification and generation of novel, synthetic data.
We will cover, and run a hands-on practical, on how to classify globular domains in the AlphaFold Database, visualise outputs from various predictors, and assess multiple metrics for confident domain-boundary assignments. This protocol was used to identify and classify domains that were subsequently used to train a protein family language model, ProFam.
We will then use ProFam to generate synthetic sequences conditioned on a protein of your choice, producing sequences with similar characteristics that do not exist in nature. Finally, we will model these sequences and analyse their relationships to existing proteins using tools such as FoldTree and Foldseek.
Trainer: Giulio Tesei
Molecular dynamics simulations of coarse-grained models offer an efficient and accurate approach for generating conformational ensembles of non-globular proteins (NGPs). This tutorial introduces CALVADOS, a physics-based, residue-level model with amino acid–specific parameters optimized using experimental SAXS and NMR data. CALVADOS simulations enable the generation of conformational ensembles of NGPs at the proteome scale, and such datasets have been used to train machine-learning models that predict conformational properties from sequence.
We will run a hands-on practical using Jupyter Notebooks on Google Colab, in which participants will simulate an NGP with CALVADOS, and integrate the resulting ensemble with experimental SAXS data through Bayesian/Maximum-Entropy reweighting. Participants will then explore how proteome-scale simulation data can be used to train a machine-learning model that predicts the compaction of intrinsically disordered proteins based on selected sequence features.
Trainer: Giacomo Janson
Deep generative models are emerging as a tool to characterize the structural ensembles of non-globular proteins (NGPs). They offer a powerful and computationally efficient complement to traditional biophysical and simulation-based approaches. In this training session, we will introduce the concepts behind deep generative models of biomolecular structures, with an emphasis on intrinsically disordered proteins (IDPs) and dynamical proteins. Participants will learn what it means for a model to be “generative,” how existing techniques (e.g., diffusion models) capture structural variability, and how they are used to model the conformational landscapes of NGPs.
The hands-on component will guide participants through practical applications of both first-generation and state-of-the-art models for ensemble generation. All exercises will be carried out in Jupyter notebooks, allowing participants to run the models, visualize outputs, and experiment with the data. In the first part, attendees will work with idpGAN, a first-generation method for coarse-grained IDPs. We will examine how to generate structural ensembles, analyze their properties and validate them against classical simulation-based approaches.
The second hands-on part will focus on the current-generation models for atomistic ensembles, highlighting their improved resolution and environmental conditioning. Participants will use the aSAM model and compare its ensembles with the ones from other recent models (e.g.: BioEmu and AlphaFold3), learning how to interpret outputs, evaluate ensemble quality, and identify common pitfalls in methods for ML-driven structural modeling of protein ensembles.
By the end of the session, participants will have both a conceptual understanding of generative modeling for biomolecular structures and practical experience applying this approach to real protein systems. The training aims to provide participants with the skills needed to critically deploy generative models in their own studies of protein disorder and dynamics.
VENUE
The event will take place at Hotel Meliá Lebreros in Seville, Spain
ORGANIZATION
LOCAL ORGANIZING COMMITTEE
Pablo Mier (Universidad Pablo de Olavide Sevilla)
SCIENTIFIC organizing COMMITTEE
Core Group ML4NGP
This event is part of the activities of the COST Action ML4NGP, CA21160, which is supported by COST (European Cooperation in Science and Technology).
