Research Topics

Our group develops novel computational methods that combine big data algorithms and machine learning to gain insights into unexplored microbial communities.

  • Sequence search, clustering and assembly
  • Pathogen detection
  • Metagenomic analysis
  • Protein function and structure prediction

Martin Steinegger

Assistant Professor of Computational Biology

Seoul National University


Dr. Steinegger is an Assistant Professor at the Seoul National University. He received his B.Sc. in Bioinformatics in 2013. During this time, he worked as a research assistant of Professor Burkhard Rost at the Technical University Munich, focusing on the development of methods for predicting protein mutation effects.

He received his Ph.D in Computer Science in 2018 from the Technical University Munich. He developed methods to search, cluster and assemble large metagenomic sequence data under the guidance of Dr. Johannes Söding at the Max Planck Institute for Biophysical Chemistry. As a Postdoc in the group of Professor Steven L. Salzberg at the Johns Hopkins University School of Medicine, he developed methods for the identification of pathogenic agents in infectious diseases, the detection of assembly contamination in public datasets and the annotation of missing exons in the human proteome.


  • P.hD. in Computer Science, 2018

    Technical University Munich

  • M.Sc. in Computer Science, 2014

    Ludwig-Maximilians-University Munich

  • B.Sc. in Bioinformatics, 2013

    Technical University Munich/Ludwig-Maximilians-University Munich

Lab announcements

We are hiring postdoctoral researchers, Ph.D. students and interns.


MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 can run 10000 times faster than BLAST. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.

Linclust is a method that can cluster sequences down to 50% pairwise sequence similarity and its runtime scales linearly with the input set size, not quadratically as in conventional algorithms. It is >1000 times faster compared to its competitors.

Plass is a software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets. It assembles 10 times more protein residues in soil metagenomes than Megahit.

Conterminator is an efficient method for detecting incorrectly labeled sequences across kingdoms by an exhaustive all-against-all sequence comparison.

Selected Publications

Steinegger, M., Steven L. Salzberg (2020) Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank, biorxiv [preprint] [journal] [software]
Steinegger, M., Markus Meier, Milot Mirdita, Harald Vöhringer, Stephan J. Haunsberger, and Söding, J. (2019) HH-suite3 for fast remote homology detection and deep protein annotation, BMC Bioinformatics [preprint] [journal] [software]
Steinegger, M., Milot Mirdita, and Söding, J. (2019) Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold, Nature Methods [prerprint] [journal] [software]
Steinegger, M., and Söding, J. (2018) Clustering huge protein sequence sets in linear time, Nature Communications [preprint] [journal] [software]
Steinegger, M., and Söding, J. (2017) MMseqs2: Sensitive protein sequence searching for the analysis of massive data sets, Nature Biotechnology [preprint] [journal] [software]