Covary protocol portfolio and 3.0.1 beta are now available.
Performance

Performance validations

Covary has been benchmarked against established phylogenetic pipelines — ETE3, IQ-TREE, MAFFT, and FastTree — across four use cases: classification, identification, relationship, and prediction.

v1.3 benchmarked vs. ETE3 · IQ-TREE · FastTree Research use only

Covary performs taxonomic classification similarly with alignment-based algorithms.

Workflow: Covary v1.3 (powered by TIPs-VF; De los Santos, 2025) vs. ETE3 3.1.3 (Huerta-Cepas et al., 2016)

Method:

  1. The 16S ribosomal RNA of Staphylococcus taiwanensis strain NTUH-S172 (NR_181843.1) was retrieved from the NCBI Nucleotide database.
  2. Up to 5 base mutations were introduced into the sequence, iterated 100×, creating 100 additional mutational variants. A SARS-CoV-2 sequence (NC_045512.2) served as outgroup control; (n = 101).
  3. Sequences were fed into Covary v1.3 via Google Colab and ETE3 3.1.3 via GenomeNet.
  4. Default parameters were used. Phylogenetic tree was constructed in Covary using custom code; ETE3 used FastTree v2.1.8 (Price et al., 2010).
Result: Covary v1.3 generated similar phylogenetic clustering as ETE3, delineating the outgroup from the subsequences generated from Staphylococcus taiwanensis 16S rRNA gene sequence.
Covary v1.3 vs ETE3 classification — Staphylococcus taiwanensis 16S rRNA mutational variants

Covary has similar efficiency as conventional pipelines in sequence identification.

Workflow: Covary v1.3 (powered by TIPs-VF; De los Santos, 2025) vs. ETE3 3.1.3 (Huerta-Cepas et al., 2016)

Method:

  1. The 16S rRNA sequences of Staphylococcus species were retrieved from NCBI BioProject: txid1279[ORGN] AND (33175[Bioproject] OR 33317[Bioproject]).
  2. A SARS-CoV-2 sequence (PV950678.1) served as outgroup control. An unlabelled Staphylococcus spp. (NR_036791.1) and an unlabelled SARS-CoV-2 (PV950681.1) served as unknowns; (n = 94).
  3. Sequences were fed into Covary v1.3 via Google Colab and ETE3 3.1.3 via GenomeNet.
  4. ETE3 pipeline: MAFFT v6.861b alignment (Katoh et al., 2005), then ML tree via IQ-TREE 1.5.5 + ModelFinder (Nguyen et al., 2015).
Result: Both Covary v1.3 and ETE3 identified and placed the unknown sequences in the correct clades or groups.
Covary v1.3 vs ETE3/IQ-TREE identification — unknown Staphylococcus and SARS-CoV-2 placed in correct clades

Covary outperforms conventional pipelines in providing sequence relationships.

Workflow: Covary v1.3 (powered by TIPs-VF; De los Santos, 2025) vs. ETE3 3.1.3 (Huerta-Cepas et al., 2016)

Method:

  1. TP53 mutational profiles of patients with head and neck cancers (GDC-TCGA HNC) were retrieved via the UCSC Xena Browser.
  2. TP53 sequences of patients with mutations were reconstructed by mapping mutation positions and types from the wildtype (WT) TP53. Patients with no variant records were considered WT; (n = 220).
  3. Sequences were fed into Covary v1.3 via Google Colab and ETE3 3.1.3 via GenomeNet.
  4. ETE3 pipeline: MAFFT v6.861b alignment, then ML tree via IQ-TREE 1.5.5 + ModelFinder.
Result: Covary v1.3 provided better context in mutational relationship, clustering wildtypes distinctly from the mutational variants — which ETE3 failed to demonstrate.
Covary v1.3 vs ETE3 relationship — TP53 mutational clustering in TCGA head and neck cancer cohort

Covary features bring superior predictive context not found in conventional phylogenetics.

Workflow: Covary v1.3 (powered by TIPs-VF; De los Santos, 2025)

Method:

  1. TP53 mutational profiles of patients with head and neck cancers (GDC-TCGA HNC) were retrieved via the UCSC Xena Browser.
  2. TP53 sequences of patients with mutations were reconstructed by mapping mutation positions and types from the wildtype TP53. Patients with no variant records were considered WT; (n = 220).
  3. Sequences were fed into Covary v1.3 via Google Colab implementation.
  4. Default settings were used. Heatmaps were generated via default parameters.
Result: Covary v1.3 inferred mutational distances and provided predictive genome integrity outcomes based on TP53 mutational markers.
Covary v1.3 prediction — TP53 mutational distance heatmap with genome integrity inference for TCGA HNC cohort

Comparison with the gold standard

Covary occupies a distinct computational niche from established tools such as MEGA, IQ-TREE, RAxML, and FastTree. The table below summarizes key differences to help researchers understand where Covary adds value and where traditional methods remain appropriate.

Criterion Gold standard (MEGA / IQ-TREE / RAxML) Covary
Alignment requirementRequired — MSA is a prerequisiteNone — alignment-free by design
Substitution modelExplicit evolutionary model (GTR, HKY, WAG, etc.)No model assumption — distance from embedding space
ScaleHundreds to low thousands (MSA bottleneck)Hundreds to thousands; megascale in v3.0.1 beta
SpeedSlow at scale — MSA and model optimization rate-limitingFast — no MSA; k-mer encoding is lightweight
Translation awarenessOptional (codon models in some tools)Built-in — codon-bound, relative-proximity encoding
OutputNewick trees, bootstrap values, model parametersPCA/t-SNE/UMAP plots, heatmaps, dendrograms
Statistical supportBootstrap, UFBoot, SH-aLRT availableNot currently implemented — future roadmap
Best suited forHigh-precision species tree inference, publication-ready phylogeniesRapid exploratory analysis, large-scale screening
StandardizationWidely peer-reviewed across thousands of studiesExperimental; validated in preprint studies — ongoing

How to read this: Covary is not a direct replacement for MEGA, IQ-TREE, or RAxML — it solves a different problem. For high-confidence species trees with statistical support and publication-grade phylogeny reconstruction, alignment-based tools remain the gold standard. Covary's strength is speed and scale at the exploratory stage — rapidly screening large, heterogeneous datasets and generating embedding-based cluster hypotheses that can then be followed up with targeted alignment-based analyses on subsets of interest.

Covary is a research-grade tool. All outputs are for exploratory and research purposes only and are not intended for clinical, diagnostic, or regulatory use. See the full disclaimer for scope limitations and known technical constraints.