Getting started with MuTopia

Installation

MuTopia requires Python 3.11 and has a pinned dependency on scikit-learn 1.4.2 because it uses some internal GBT APIs for fast gradient-boosted tree training. We recommend uv to manage the environment — it resolves and installs dependencies significantly faster than pip alone, and its lockfile-based workflow makes it easy to reproduce exact environments across machines.

With Docker (zero setup)

The fastest way to try MuTopia is with the pre-built Docker image, which ships with MuTopia and all the bioinformatics tools it depends on (bedtools, bcftools, tabix, UCSC bigWigAverageOverBed):

docker pull allenlynch/mutopia:latest

# Mount your data directory and run any CLI command
docker run --rm -v "$PWD":/workspace allenlynch/mutopia:latest \
    gtensor --help

For interactive use, drop into a shell inside the container:

docker run --rm -it -v "$PWD":/workspace allenlynch/mutopia:latest bash

With uv (recommended for native installs)

If you don’t have uv yet, install it with the official one-liner:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then create a Python 3.11 virtual environment and install MuTopia from PyPI:

uv venv --python 3.11 .venv
source .venv/bin/activate
uv pip install mutopia

With conda / bioconda

MuTopia is published on bioconda, which pulls in the bioinformatics tool dependencies (bedtools, bcftools, tabix, samtools) automatically. Create a fresh environment to avoid conflicts with the pinned scikit-learn version:

conda create -n mutopia -c conda-forge -c bioconda -y python=3.11 mutopia
conda activate mutopia

Verifying the installation

Check that the three command-line tools are available:

gtensor --help
topo-model --help
mutopia --help

If any of these fail, make sure the virtual environment is active and that its bin/ directory is on your PATH.

Data

  1. Genomic features — Collect feature tracks in BED, bedGraph, or bigWig format. MuTopia can ingest any combination of these; see the G-Tensor tutorial for details.

  2. Reference genome annotations — MuTopia needs a FASTA file, a chromsizes file, and a blacklist. For hg38 these are included in the tutorial data bundle.

  3. Mutation data — MuTopia accepts VCF and BCF files. Files split one sample per file work best, though multi-sample VCFs are supported via -name.

Basic workflow

  1. Build G-Tensors from genomic features and mutation VCFs using gtensor compose.

  2. Train topographic models on the G-Tensor using topo-model train.

  3. Analyze trained models interactively with the mutopia.analysis Python API.

  4. Annotate new samples by running mutopia sbs annotate-vcf on any VCF.