Getting started with MuTopia¶
Installation¶
MuTopia requires Python 3.11 and has a pinned dependency on scikit-learn 1.4.2 because it uses some internal GBT APIs for fast gradient-boosted tree training. We recommend uv to manage the environment — it resolves and installs dependencies significantly faster than pip alone, and its lockfile-based workflow makes it easy to reproduce exact environments across machines.
With Docker (zero setup)
The fastest way to try MuTopia is with the pre-built Docker image, which ships
with MuTopia and all the bioinformatics tools it depends on (bedtools,
bcftools, tabix, UCSC bigWigAverageOverBed):
docker pull allenlynch/mutopia:latest
# Mount your data directory and run any CLI command
docker run --rm -v "$PWD":/workspace allenlynch/mutopia:latest \
gtensor --help
For interactive use, drop into a shell inside the container:
docker run --rm -it -v "$PWD":/workspace allenlynch/mutopia:latest bash
With uv (recommended for native installs)
If you don’t have uv yet, install it with the official one-liner:
curl -LsSf https://astral.sh/uv/install.sh | sh
Then create a Python 3.11 virtual environment and install MuTopia from PyPI:
uv venv --python 3.11 .venv
source .venv/bin/activate
uv pip install mutopia
With conda / bioconda
MuTopia is published on bioconda,
which pulls in the bioinformatics tool dependencies (bedtools,
bcftools, tabix, samtools) automatically. Create a fresh
environment to avoid conflicts with the pinned scikit-learn version:
conda create -n mutopia -c conda-forge -c bioconda -y python=3.11 mutopia
conda activate mutopia
Verifying the installation¶
Check that the three command-line tools are available:
gtensor --help
topo-model --help
mutopia --help
If any of these fail, make sure the virtual environment is active and that its
bin/ directory is on your PATH.
Data¶
Genomic features — Collect feature tracks in BED, bedGraph, or bigWig format. MuTopia can ingest any combination of these; see the G-Tensor tutorial for details.
Reference genome annotations — MuTopia needs a FASTA file, a chromsizes file, and a blacklist. For hg38 these are included in the tutorial data bundle.
Mutation data — MuTopia accepts VCF and BCF files. Files split one sample per file work best, though multi-sample VCFs are supported via
-name.
Basic workflow¶
Build G-Tensors from genomic features and mutation VCFs using
gtensor compose.Train topographic models on the G-Tensor using
topo-model train.Analyze trained models interactively with the
mutopia.analysisPython API.Annotate new samples by running
mutopia sbs annotate-vcfon any VCF.