scproca.model

class scproca.model.scProca(adata: AnnData, key_adt: str, key_batch: str, key_valid_adt: str, d_latent: int = 20, distribution_rna: Literal['ZINB', 'NB'] = 'NB', distribution_adt: Literal['MixtureNB', 'NB'] = 'MixtureNB', activation: Literal['relu', 'mish'] = 'mish', norm: Literal['BatchNorm', 'LayerNorm'] = 'LayerNorm', mode: Literal['none', 'cross_attention', 'NN'] = 'cross_attention', dropout: float = 0.2, d_hidden: tuple = (256, 256), pre_to_device: bool = True)[source]

Bases: object

Single-cell model of proteomics with/from transcriptomics using cross-attention.

Parameters:

adata (AnnData) – AnnData object. adata.X contains the RNA measurements.
key_adt (str) – The key used to access ADT measurements stored in adata.obsm.
key_batch (str) – The key used to access batch annotations stored in adata.obs.
key_valid_adt (str) – The key used to access whether the ADT measurements are valid or just placeholders in adata.obs.
d_latent (int, optional (default=20)) – The dimensionality of the latent space.
distribution_rna ({"ZINB", "NB"}, optional (default="NB")) –
The distribution to model RNA data. One of:
- 'NB' - Negative Binomial distribution
- 'ZINB' - Zero-Inflated Negative Binomial distribution
distribution_adt ({"MixtureNB", "NB"}, optional (default="MixtureNB")) –
The distribution to model ADT data. One of:
- 'NB' - Negative Binomial distribution
- 'MixtureNB' - Mixture of two Negative Binomial distributions
activation ({"relu", "mish"}, optional (default="mish")) –
The activation function used in the neural networks. One of:
- 'relu' - Rectified Linear Unit
- 'mish' - Mish activation function
norm ({"BatchNorm", "LayerNorm"}, optional (default="LayerNorm")) –
The type of normalization used in the networks. One of:
- 'BatchNorm' - Batch normalization
- 'LayerNorm' - Layer normalization
mode{“none”, “cross_attention”, “NN”}, optional (default=”cross_attention”)
Defines the mode of interaction between RNA and ADT data. One of:
- 'none' - No interaction between RNA and ADT data (independent processing).
- 'cross_attention' - Cross-attention mechanism between RNA and ADT data.
- 'NN' - Nearest Neighbors averaging approach.
dropout (float, optional (default=0.2)) – The dropout rate applied during training.
d_hidden (tuple of int, optional (default=(256, 256))) – A tuple indicating the number of neurons in each hidden layer.
pre_to_device (bool, optional (default=True)) – Whether to move the data to the device (e.g., GPU) beforehand to reduce data transfer overhead. For large datasets, this should be set to False.

Examples

>>> scproca.settings.seed = seed
>>> scproca.settings.batch_size = batch_size (default=512)
>>> scproca.settings.device = index_cuda (None if using 'cpu')
>>> adata = anndata.read_h5ad(path_to_anndata)
>>> batch = adata.obs[key_batch].values.ravel()
>>> valid_adt = np.array([True] * len(adata))
>>> valid_adt[adt_not_valid] = False
>>> adata.obs["valid_adt"] = valid_adt
>>> scproca = scProca(adata=adata, key_adt=key_adt, key_batch=key_batch, key_valid_adt="valid_adt")
>>> scproca.train()
>>> adata.obsm["latent"], adata.obsm["embedding_rna"], adata.obsm["embedding_adt"] = scproca.get_latent_representation()
>>> adata.obsm["protein_generation"] = scproca.generation(anchor_batch=list_str_anchor_batch)

curve_loss(key_loss)[source]

Plots the loss curve for the validation dataset during the training process.

Parameters:

key_loss (str, optional (default="loss_elbo")) –

The key used to specify which loss to plot. Choices are:

'loss_elbo' - ELBO (Evidence Lower Bound) loss
'loss_discriminator' - Loss for the discriminators

generation(anchor_batch: str | List[str] | None, n_shuffle: int | None = 100)[source]

Generates ADT measurements for each cell.

Parameters:

anchor_batch (str or List[str], optional (default=None)) – The batch or list of batches used to the generated measurements. If None, it refers to the original batch of the cells.
n_shuffle (int, optional (default=100)) – The number of repetitions used to estimate the mean generated measurements.

Return type:

protein_generation - generated ADT measurements.

get_latent_representation(n_shuffle: int | None = 100)[source]

Infers the integrated latent representation, RNA-specific embedding, and ADT-specific embedding for each cell.

Parameters:

n_shuffle (int, optional (default=100)) – The number of repetitions used to estimate the mean representation.

Returns:

- **latent* - integrated latent representation*
- **embedding_rna* - RNA-specific embedding representation*
- **embedding_adt* - ADT-specific embedding representation*

train(batch_size: int | None = None, lambda_a: float = 30.0, adversarial_step=1, epochs=400, lr=0.004, ratio_val: float = 0.1, epochs_warmup: int | None = None, steps_warmup: int | None = None, bool_also_reconstructed_from_embedding: bool = True)[source]

Trains the model using variational inference.

Parameters:

batch_size (int, optional) – The minibatch size used during training. Can also be specified via scproca.settings.batch_size.
lambda_a (float (default=30.0)) – The coefficient for the adversarial loss.
adversarial_step (int, optional (default=1)) – The number of steps for adversarial network optimization in each training epoch.
epochs (int, optional (default=400)) – The maximum number of training epochs.
lr (float, optional (default=4e-3)) – The learning rate.
ratio_val (float, optional (default=0.1)) – The proportion of the dataset used as the validation set.
epochs_warmup (int, optional (default=None)) – The number of epochs to use for warmup.
steps_warmup (int, optional (default=None)) – The number of steps to use for warmup.
bool_also_reconstructed_from_embedding (bool, optional (default=True)) – Whether to additionally train the reconstruction loss from the embeddings, apart from the latent space reconstruction loss.