Research Topics

Our group develops novel computational methods that combine big data algorithms and machine learning to gain insights into unexplored microbial communities.

  • Development of algorithms to search, cluster and assemble sequence data
  • Pathogen detection in sequencing data
  • Metagenomic analysis
  • Protein function and structure prediction
Avatar

Martin Steinegger

Assistant Professor of Bioinformatics

Seoul National University

Dr. Steinegger is an Assistant Professor in the biology department at the Seoul National University. He studied bioinformatics and computer science at the Technical University Munich and Ludwig Maximilian University of Munich. During this time, he worked as a research assistant of Professor Burkhard Rost, focusing on the development of methods for predicting protein mutation effects.

He received his Ph.D. from the Technical University Munich in collaboration with Dr. Johannes Söding at the Max Planck Institute for Biophysical Chemistry for his work on computational methods to assemble, cluster and annotate metagenomic sequencing data. As a Postdoc in the group of Professor Steven L. Salzberg at the CCB at Johns Hopkins University, he developed methods for the identification of pathogenic agents in infectious diseases, the detection of assembly contamination in public datasets and the annotation of missing exons in the human proteome.

Dr. Steinegger is an expert on large scale sequence data analysis and method development and an advocate for open science and open source.

Education

  • Ph.D. in Computer Science, 2018

    Technical University Munich

  • M.Sc. in Computer Science, 2014

    Ludwig-Maximilians-University Munich

  • B.Sc. in Bioinformatics, 2013

    Technical University Munich/Ludwig-Maximilians-University Munich

Announcements

We are hiring postdoctoral researchers, Ph.D. students and interns.

Meet the Team

Researchers

Avatar

Martin Steinegger

Assistant Professor of Bioinformatics

Avatar

Cameron Gilchrist

Postdoctoral researcher

Avatar

Dongwook Kim

Graduate Student

Avatar

Eli Levy Karin

ELKMO consultant

Avatar

Gyuri Kim

Intern

Avatar

Hyunbin Kim

Graduate Student

Avatar

Jaebeom Kim

PhD student

Avatar

Jihyeon Kim

Graduate Student

Avatar

Jihyun Jung

Lab Manager

Avatar

Jingi Yeo

Integrated Ph.D

Avatar

Junsu Lee

Intern

Avatar

Milot Mirdita

Postdoctoral researcher

Avatar

Seong-Eun Kim

Integrated Ph.D

Avatar

Sukhwan Park

Integrated Ph.D

Avatar

Sung-eun Jang

Graduate Student

Avatar

Woosub Kim

Graduate Student

Avatar

Yewon Han

Intern

Alumni

Avatar

Hyunjung Choi

Lab Manager

Avatar

Jieun Lee

Lab Manager

Avatar

Sewon Lee

Intern

Avatar

Stephanie Kim

Postdoctoral researcher

Methods

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 can run 10000 times faster than BLAST. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.

ColabFold is an easy-to-use environment for fast and convenient protein structure predictions. Its structure prediction is powered by AlphaFold2 and RoseTTAFold combined with a fast multiple sequence alignment generation stage using MMseqs2, which speeds up the MSA generation by a factor of 16 over the AlphaFold system.

Foldseek is a software suite for searching and clustering protein structures. It is 600,000 times faster than the fastest state-of-the-art aligners. Allowing to query millions of structures in seconds.

Linclust is a method that can cluster sequences down to 50% pairwise sequence similarity and its runtime scales linearly with the input set size, not quadratically as in conventional algorithms. It is >1000 times faster compared to its competitors.

Plass is a software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets. It assembles 10 times more protein residues in soil metagenomes than Megahit.


Conterminator is an efficient method for detecting incorrectly labeled sequences across kingdoms by an exhaustive all-against-all sequence comparison.

Selected Publications

Barrio-Hernandez I., Yeo J., Jänes J., Mirdita M., Gilchrist C.L.M., Wein T., Varadi M., Velankar S., Beltrao P., Steinegger M. (2023) Clustering predicted structures at the scale of the known protein universe, Nature [preprint] [journal] [software]

van Kempen M., Kim S.S., Tumescheit C., Mirdita M., Lee J., Gilchrist C.L.M., Söding J., Steinegger M. (2023) Fast and accurate protein structure search with Foldseek, Nature Biotechnology [preprint] [journal] [software]

Mirdita M., Schütze K., Moriwaki Y., Heo L., Ovchinnikov S., Steinegger M. (2022) ColabFold: making protein folding accessible to all, Nature Methods [preprint] [journal] [software]

Steinegger M., Salzberg S.L. (2020) Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank, Genome Biology [preprint] [journal] [software]

Steinegger M., Meier M., Mirdita M., Vöhringer H., Haunsberger S.J., Söding J. (2019) HH-suite3 for fast remote homology detection and deep protein annotation, BMC Bioinformatics [preprint] [journal] [software]

Steinegger M., Mirdita M., Söding J. (2019) Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold, Nature Methods [prerprint] [journal] [software]

Steinegger M., Söding J. (2018) Clustering huge protein sequence sets in linear time, Nature Communications [preprint] [journal] [software]

Steinegger M., Söding J. (2017) MMseqs2: Sensitive protein sequence searching for the analysis of massive data sets, Nature Biotechnology [preprint] [journal] [software]

Contact