Getting started with MuTopia
============================
Installation
------------
MuTopia requires Python 3.11 and has a pinned dependency on scikit-learn 1.4.2
because it uses some internal GBT APIs for fast gradient-boosted tree training.
We recommend `uv `_ to manage the environment — it
resolves and installs dependencies significantly faster than pip alone, and its
lockfile-based workflow makes it easy to reproduce exact environments across machines.
**With Docker (zero setup)**
The fastest way to try MuTopia is with the pre-built Docker image, which ships
with MuTopia and all the bioinformatics tools it depends on (``bedtools``,
``bcftools``, ``tabix``, UCSC ``bigWigAverageOverBed``):
.. code-block:: bash
docker pull allenlynch/mutopia:latest
# Mount your data directory and run any CLI command
docker run --rm -v "$PWD":/workspace allenlynch/mutopia:latest \
gtensor --help
For interactive use, drop into a shell inside the container:
.. code-block:: bash
docker run --rm -it -v "$PWD":/workspace allenlynch/mutopia:latest bash
**With uv (recommended for native installs)**
If you don't have uv yet, install it with the official one-liner:
.. code-block:: bash
curl -LsSf https://astral.sh/uv/install.sh | sh
Then create a Python 3.11 virtual environment and install MuTopia from PyPI:
.. code-block:: bash
uv venv --python 3.11 .venv
source .venv/bin/activate
uv pip install mutopia
**With conda / bioconda**
MuTopia is published on `bioconda `_,
which pulls in the bioinformatics tool dependencies (``bedtools``,
``bcftools``, ``tabix``, ``samtools``) automatically. Create a fresh
environment to avoid conflicts with the pinned scikit-learn version:
.. code-block:: bash
conda create -n mutopia -c conda-forge -c bioconda -y python=3.11 mutopia
conda activate mutopia
Verifying the installation
--------------------------
Check that the three command-line tools are available:
.. code-block:: bash
gtensor --help
topo-model --help
mutopia --help
If any of these fail, make sure the virtual environment is active and that its
``bin/`` directory is on your ``PATH``.
Data
----
1. **Genomic features** — Collect feature tracks in BED, bedGraph, or bigWig
format. MuTopia can ingest any combination of these; see the G-Tensor tutorial
for details.
2. **Reference genome annotations** — MuTopia needs a FASTA file, a chromsizes
file, and a blacklist. For hg38 these are included in the tutorial data bundle.
3. **Mutation data** — MuTopia accepts VCF and BCF files. Files split one sample
per file work best, though multi-sample VCFs are supported via ``-name``.
Basic workflow
--------------
1. **Build G-Tensors** from genomic features and mutation VCFs using ``gtensor compose``.
2. **Train topographic models** on the G-Tensor using ``topo-model train``.
3. **Analyze trained models** interactively with the ``mutopia.analysis`` Python API.
4. **Annotate new samples** by running ``mutopia sbs annotate-vcf`` on any VCF.