Experience the Covary analysis pipeline step-by-step (from FASTA preparation to output results) using predefined sequences and parameters based on the official Google Colab notebook. No setup, no runtime, no ML required.
This simulation demonstrates the logical flow of a Covary run using 16S rRNA sequences from Thermus species using version 2.0. Some workflow may have been upgraded or improved in newer versions. It walks through all 8 steps of the Colab notebook: parameter setting, sequence input, QC, encoding, dimensionality reduction, embedding visualization, deep learning, and results. The outputs are programmatically generated to match real Covary behavior — not a live ML run. To run a real analysis, open Covary on Google Colab →
Covary's run behavior is controlled by a set of parameters in Step 1 of the Colab notebook. The simulation uses the following predefined configuration (⚠️ fields that require user attention in the real notebook are highlighted):
In the Colab notebook, you upload your multi-FASTA file in Step 2. The simulation uses 6 representative Thermus 16S rRNA sequences as simulation reference.
6 sequences loaded · thermus_16s.fasta · ~1.2 kb total
Covary automatically performs QC on the input sequences: removing whitespace, filtering sequences with invalid (non-ATCGN) characters, and reporting sequence metrics. Since include_N = "no", entries with ambiguous bases would be excluded.
Each sequence is encoded into a numeric vector using Covary's translation-aware, k-mer-based encoding logic. With k=6 (by default), there are 4⁴ = 256 possible k-mers. The encoder captures relative proximity, directional alignment, and translation awareness — not just frequency counts.
After encoding, Covary computes a pairwise Euclidean distance matrix between all sequence vectors. This matrix drives the clustering and dendrogram construction downstream.
| Sequence | T. brevis G05 | T. brevis G02 | T. sediminis | T. thermophilus | T. aquaticus | T. scotoductus |
|---|---|---|---|---|---|---|
| Click "Compute Matrix" to generate distances | ||||||
Covary reduces high-dimensional embeddings to 2D using t-SNE, PCA, and UMAP for visualization. Below is the simulated t-SNE scatter plot — sequences cluster by species-level similarity. Two T. brevis entries appear in close proximity; T. thermophilus and T. aquaticus form a distinct clade.
Covary trains a deep learning autoencoder on the embedding representations, refining the distance structure. The refined distances are used to construct hierarchical dendrograms using Ward, Average, Complete, and Single linkage methods.
Dendrogram — Ward linkage (t-SNE)
Covary outputs are automatically packaged as a ZIP file. In the real notebook, results download automatically or can be retrieved from /content/covary_results. The simulated run produced the following:
Output file manifest — covary_results.zip
This was a simulation based on predefined data. To run a real Covary analysis on your own sequences:
Upload multi-FASTA with DNA sequences (ATCG, for RNA → convert U to T)
k-mer translation-aware embeddings via Covary-encoder — no alignment needed
Deep learning autoencoder (assisted by TIPs-VF representation logic) refines embeddings into a latent space
PCA, t-SNE, UMAP scatter plots · distance heatmaps · dendrograms