SBS Modality¶
This page documents the Single Base Substitution (SBS) modality configuration, which controls coordinate systems, signatures, and ingestion utilities for SBS data.
- class mutopia.modalities.sbs.sbs.SBSMode[source]¶
Bases:
ModeConfigSBS (single base substitution) modality configuration.
This
ModeConfigdefines the coordinate system, palettes, and data ingestion utilities used for SBS analyses. It provides helpers to:build coordinate labels used across datasets (
coords)fetch the modality-specific TopographyModel class (
TopographyModel)compute reference context frequencies from genome sequence and regions (
get_context_frequencies)ingest observations from VCF files into the expected sparse xarray layout (
ingest_observations/ingest_uncollaposed)
- MODE_ID¶
Stable modality identifier (
"sbs").- Type:
str
- MUTOPIA_TO_COSMIC_IDX¶
Mapping index to align Mutopia context ordering to COSMIC ordering.
- Type:
np.ndarray
- PALETTE¶
RGB colors for the 96 SBS categories (in Mutopia order).
- Type:
list[tuple[float, float, float]]
- X_LABELS¶
Mutation class labels (e.g.,
"C>A") used for display.- Type:
list[str]
- DATABASE¶
Path inside the package data for default SBS signature definitions.
- Type:
str
- DATABASE = 'sbs/musical_sbs.json'¶
- MODE_ID = 'sbs'¶
- MUTOPIA_TO_COSMIC_IDX = array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 50, 53, 56, 59, 62, 65, 68, 71, 74, 77, 80, 83, 86, 89, 92, 95, 49, 52, 55, 58, 61, 64, 67, 70, 73, 76, 79, 82, 85, 88, 91, 94])¶
- PALETTE = [(0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72)]¶
- property TopographyModel¶
Return the modality-specific TopographyModel class.
Notes
Imported lazily to avoid import cycles at module import time.
- X_LABELS = ['C>A', 'C>G', 'C>T', 'T>A', 'T>C', 'T>G']¶
- property available_components¶
List available reference components in
DATABASE.- Returns:
Names of components available for
load_components.- Return type:
list[str]
- property coords¶
Coordinate names and labels used by this modality.
- Returns:
A mapping from coordinate key to a pair of (dimension name, labels).
- Return type:
Mapping[str, tuple[str, list[str]]]
- property dims¶
Tuple of dimension keys in the canonical order.
- Returns:
Dimension keys corresponding to
coordsorder.- Return type:
tuple[str, …]
- classmethod get_context_frequencies(*, regions_file, fasta_file, **kw)[source]¶
Compute trinucleotide context frequencies for each region.
- Parameters:
regions_file (str) – BED12 file containing segmented regions of interest.
fasta_file (str) – Reference FASTA file path.
- Returns:
Array with dims (
configuration,context,locus) giving normalized counts for each trinucleotide context per region, with strand pairing applied via the two configurations.- Return type:
xarray.DataArray
- ingest_observations(vcf_file, chr_prefix='', pass_only=True, weight_col=None, mutation_rate_file=None, sample_weight=None, sample_name=None, skip_sort=False, cluster=True, *, locus_dim, locus_coords, regions_file, fasta_file, **kw)[source]¶
Ingest a VCF into a sparse SBS observation tensor.
- Parameters:
vcf_file (str) – Path to VCF file with somatic variants.
chr_prefix (str, optional) – Chromosome prefix to add/remove for matching, by default “”.
pass_only (bool, optional) – Keep only PASS variants, by default True.
weight_col (str, optional) – Optional INFO/FORMAT field to use as a weight.
mutation_rate_file (str, optional) – Optional per-locus mutation rate file for weighting.
sample_weight (float, optional) – Global sample weight to apply.
sample_name (str, optional) – Override sample name.
skip_sort (bool, optional) – Assume VCF is already sorted, by default False.
cluster (bool, optional) – Flag to mark clustered/unclustered dimension, by default True.
locus_dim (int) – Total number of loci across regions.
locus_coords (Sequence[int]) – Indices mapping variants to locus positions.
regions_file (str) – BED12 regions used to aggregate variants.
fasta_file (str) – Reference genome FASTA file.
- Returns:
Sparse COO-backed array with dims (
configuration,context,locus).- Return type:
xarray.DataArray
- ingest_uncollaposed(vcf_file, *, locus_dim, locus_coords, regions_file, fasta_file, **ingest_kw)[source]¶
Ingest VCF and return variant IDs alongside the observation tensor.
This variant of ingestion keeps the per-variant identifiers, useful for downstream lookups or joins.
- Parameters:
vcf_file (str) – Path to VCF file with somatic variants.
locus_dim (int) – Total number of loci across regions.
locus_coords (Sequence[int]) – Indices mapping variants to locus positions.
regions_file (str) – BED12 regions used to aggregate variants.
fasta_file (str) – Reference genome FASTA file.
- Returns:
Variant identifiers and the sparse COO-backed observation tensor with dims (
configuration,context,locus).- Return type:
tuple[list[str], xarray.DataArray]
- load_components(*init_components)¶
Load named reference components from the modality database.
- Parameters:
*init_components (str) – Component names to load (must exist in
available_components).- Returns:
Array of shape (component, context) with component spectra and modality metadata attached via attributes.
- Return type:
xarray.DataArray
- classmethod plot(signature, *select, palette=['#427aa1ff', '#e07a5fff', '#acacacff', '#83c5beff'], sig_names=None, normalize=False, title=None, width=5.25, height=1.25, ax=None, label_xaxis=True, **kwargs)¶
Plot one or more modality signatures as a linear bar plot.
- Parameters:
signature (xarray.DataArray) – Signature tensor with a trailing
contextdimension. May have an optional leading dimension to represent multiple signatures.*select (str) – Optional names to select a subset of signatures from the leading dimension. Matching is exact or by
name:prefixbefore the last colon.palette (sequence, optional) – Color palette for plotting multiple signatures; defaults to the modality palette when a single signature is plotted.
sig_names (Sequence[str], optional) – Custom names for the selected signatures; must match selection size.
normalize (bool, default False) – If True, normalize each signature to sum to 1 before plotting.
title (str, optional) – Axes title.
width (float, default (5.25, 1.25)) – Figure sizing passed to the underlying plotting helper.
height (float, default (5.25, 1.25)) – Figure sizing passed to the underlying plotting helper.
ax (matplotlib.axes.Axes, optional) – Existing axes to draw on; if None, a new figure/axes is created.
label_xaxis (bool, default True) – Whether to show x-axis tick labels.
**kwargs – Additional keyword arguments forwarded to the plotting helper.
- Returns:
The axes containing the rendered plot.
- Return type:
matplotlib.axes.Axes
- property sizes¶
Sizes of each coordinate dimension for this modality.
- Returns:
Mapping from dimension key to number of labels.
- Return type:
dict[str, int]
- classmethod validate_observations(locus_dim, observations)¶
Validate the observation tensor shape and required metadata.
Ensures dims match (configuration, context, mutation, locus) and that a
nameattribute is present.
- classmethod validate_signatures(signature, required_dims=())¶
Validate signature tensor structure for plotting or analysis.
- Parameters:
signature (xarray.DataArray) – Signature tensor.
required_dims (Sequence[str]) – Required dimension names.