SBS Modality

This page documents the Single Base Substitution (SBS) modality configuration, which controls coordinate systems, signatures, and ingestion utilities for SBS data.

class mutopia.modalities.sbs.sbs.SBSMode[source]

Bases: ModeConfig

SBS (single base substitution) modality configuration.

This ModeConfig defines the coordinate system, palettes, and data ingestion utilities used for SBS analyses. It provides helpers to:

  • build coordinate labels used across datasets (coords)

  • fetch the modality-specific TopographyModel class (TopographyModel)

  • compute reference context frequencies from genome sequence and regions (get_context_frequencies)

  • ingest observations from VCF files into the expected sparse xarray layout (ingest_observations / ingest_uncollaposed)

MODE_ID

Stable modality identifier ("sbs").

Type:

str

MUTOPIA_TO_COSMIC_IDX

Mapping index to align Mutopia context ordering to COSMIC ordering.

Type:

np.ndarray

PALETTE

RGB colors for the 96 SBS categories (in Mutopia order).

Type:

list[tuple[float, float, float]]

X_LABELS

Mutation class labels (e.g., "C>A") used for display.

Type:

list[str]

DATABASE

Path inside the package data for default SBS signature definitions.

Type:

str

DATABASE = 'sbs/musical_sbs.json'
MODE_ID = 'sbs'
MUTOPIA_TO_COSMIC_IDX = array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45,  1,         4,  7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46,  2,  5,         8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 48, 51, 54,        57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 50, 53, 56, 59,        62, 65, 68, 71, 74, 77, 80, 83, 86, 89, 92, 95, 49, 52, 55, 58, 61,        64, 67, 70, 73, 76, 79, 82, 85, 88, 91, 94])
PALETTE = [(0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.25882352941176473, 0.47843137254901963, 0.6313725490196078), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.8196078431372549, 0.4, 0.2901960784313726), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.78, 0.78, 0.78), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.39215686274509803, 0.7019607843137254, 0.6666666666666666), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72), (0.89, 0.67, 0.72)]
property TopographyModel

Return the modality-specific TopographyModel class.

Notes

Imported lazily to avoid import cycles at module import time.

X_LABELS = ['C>A', 'C>G', 'C>T', 'T>A', 'T>C', 'T>G']
property available_components

List available reference components in DATABASE.

Returns:

Names of components available for load_components.

Return type:

list[str]

property coords

Coordinate names and labels used by this modality.

Returns:

A mapping from coordinate key to a pair of (dimension name, labels).

Return type:

Mapping[str, tuple[str, list[str]]]

property dims

Tuple of dimension keys in the canonical order.

Returns:

Dimension keys corresponding to coords order.

Return type:

tuple[str, …]

classmethod get_context_frequencies(*, regions_file, fasta_file, **kw)[source]

Compute trinucleotide context frequencies for each region.

Parameters:
  • regions_file (str) – BED12 file containing segmented regions of interest.

  • fasta_file (str) – Reference FASTA file path.

Returns:

Array with dims (configuration, context, locus) giving normalized counts for each trinucleotide context per region, with strand pairing applied via the two configurations.

Return type:

xarray.DataArray

ingest_observations(vcf_file, chr_prefix='', pass_only=True, weight_col=None, mutation_rate_file=None, sample_weight=None, sample_name=None, skip_sort=False, cluster=True, *, locus_dim, locus_coords, regions_file, fasta_file, **kw)[source]

Ingest a VCF into a sparse SBS observation tensor.

Parameters:
  • vcf_file (str) – Path to VCF file with somatic variants.

  • chr_prefix (str, optional) – Chromosome prefix to add/remove for matching, by default “”.

  • pass_only (bool, optional) – Keep only PASS variants, by default True.

  • weight_col (str, optional) – Optional INFO/FORMAT field to use as a weight.

  • mutation_rate_file (str, optional) – Optional per-locus mutation rate file for weighting.

  • sample_weight (float, optional) – Global sample weight to apply.

  • sample_name (str, optional) – Override sample name.

  • skip_sort (bool, optional) – Assume VCF is already sorted, by default False.

  • cluster (bool, optional) – Flag to mark clustered/unclustered dimension, by default True.

  • locus_dim (int) – Total number of loci across regions.

  • locus_coords (Sequence[int]) – Indices mapping variants to locus positions.

  • regions_file (str) – BED12 regions used to aggregate variants.

  • fasta_file (str) – Reference genome FASTA file.

Returns:

Sparse COO-backed array with dims (configuration, context, locus).

Return type:

xarray.DataArray

ingest_uncollaposed(vcf_file, *, locus_dim, locus_coords, regions_file, fasta_file, **ingest_kw)[source]

Ingest VCF and return variant IDs alongside the observation tensor.

This variant of ingestion keeps the per-variant identifiers, useful for downstream lookups or joins.

Parameters:
  • vcf_file (str) – Path to VCF file with somatic variants.

  • locus_dim (int) – Total number of loci across regions.

  • locus_coords (Sequence[int]) – Indices mapping variants to locus positions.

  • regions_file (str) – BED12 regions used to aggregate variants.

  • fasta_file (str) – Reference genome FASTA file.

Returns:

Variant identifiers and the sparse COO-backed observation tensor with dims (configuration, context, locus).

Return type:

tuple[list[str], xarray.DataArray]

load_components(*init_components)

Load named reference components from the modality database.

Parameters:

*init_components (str) – Component names to load (must exist in available_components).

Returns:

Array of shape (component, context) with component spectra and modality metadata attached via attributes.

Return type:

xarray.DataArray

classmethod plot(signature, *select, palette=['#427aa1ff', '#e07a5fff', '#acacacff', '#83c5beff'], sig_names=None, normalize=False, title=None, width=5.25, height=1.25, ax=None, label_xaxis=True, **kwargs)

Plot one or more modality signatures as a linear bar plot.

Parameters:
  • signature (xarray.DataArray) – Signature tensor with a trailing context dimension. May have an optional leading dimension to represent multiple signatures.

  • *select (str) – Optional names to select a subset of signatures from the leading dimension. Matching is exact or by name:prefix before the last colon.

  • palette (sequence, optional) – Color palette for plotting multiple signatures; defaults to the modality palette when a single signature is plotted.

  • sig_names (Sequence[str], optional) – Custom names for the selected signatures; must match selection size.

  • normalize (bool, default False) – If True, normalize each signature to sum to 1 before plotting.

  • title (str, optional) – Axes title.

  • width (float, default (5.25, 1.25)) – Figure sizing passed to the underlying plotting helper.

  • height (float, default (5.25, 1.25)) – Figure sizing passed to the underlying plotting helper.

  • ax (matplotlib.axes.Axes, optional) – Existing axes to draw on; if None, a new figure/axes is created.

  • label_xaxis (bool, default True) – Whether to show x-axis tick labels.

  • **kwargs – Additional keyword arguments forwarded to the plotting helper.

Returns:

The axes containing the rendered plot.

Return type:

matplotlib.axes.Axes

property sizes

Sizes of each coordinate dimension for this modality.

Returns:

Mapping from dimension key to number of labels.

Return type:

dict[str, int]

classmethod validate_observations(locus_dim, observations)

Validate the observation tensor shape and required metadata.

Ensures dims match (configuration, context, mutation, locus) and that a name attribute is present.

classmethod validate_signatures(signature, required_dims=())

Validate signature tensor structure for plotting or analysis.

Parameters:
  • signature (xarray.DataArray) – Signature tensor.

  • required_dims (Sequence[str]) – Required dimension names.