scproca.model
- class scproca.model.scProca(adata: AnnData, key_adt: str, key_batch: str, key_valid_adt: str, d_latent: int = 20, distribution_rna: Literal['ZINB', 'NB'] = 'NB', distribution_adt: Literal['MixtureNB', 'NB'] = 'MixtureNB', activation: Literal['relu', 'mish'] = 'mish', norm: Literal['BatchNorm', 'LayerNorm'] = 'LayerNorm', mode: Literal['none', 'cross_attention', 'NN'] = 'cross_attention', dropout: float = 0.2, d_hidden: tuple = (256, 256), pre_to_device: bool = True)[source]
Bases:
objectSingle-cell model of proteomics with/from transcriptomics using cross-attention.
- Parameters:
adata (AnnData) – AnnData object. adata.X contains the RNA measurements.
key_adt (str) – The key used to access ADT measurements stored in adata.obsm.
key_batch (str) – The key used to access batch annotations stored in adata.obs.
key_valid_adt (str) – The key used to access whether the ADT measurements are valid or just placeholders in adata.obs.
d_latent (int, optional (default=20)) – The dimensionality of the latent space.
distribution_rna ({"ZINB", "NB"}, optional (default="NB")) –
The distribution to model RNA data. One of:
'NB'- Negative Binomial distribution'ZINB'- Zero-Inflated Negative Binomial distribution
distribution_adt ({"MixtureNB", "NB"}, optional (default="MixtureNB")) –
The distribution to model ADT data. One of:
'NB'- Negative Binomial distribution'MixtureNB'- Mixture of two Negative Binomial distributions
activation ({"relu", "mish"}, optional (default="mish")) –
The activation function used in the neural networks. One of:
'relu'- Rectified Linear Unit'mish'- Mish activation function
norm ({"BatchNorm", "LayerNorm"}, optional (default="LayerNorm")) –
The type of normalization used in the networks. One of:
'BatchNorm'- Batch normalization'LayerNorm'- Layer normalization
- mode{“none”, “cross_attention”, “NN”}, optional (default=”cross_attention”)
Defines the mode of interaction between RNA and ADT data. One of:
'none'- No interaction between RNA and ADT data (independent processing).'cross_attention'- Cross-attention mechanism between RNA and ADT data.'NN'- Nearest Neighbors averaging approach.
dropout (float, optional (default=0.2)) – The dropout rate applied during training.
d_hidden (tuple of int, optional (default=(256, 256))) – A tuple indicating the number of neurons in each hidden layer.
pre_to_device (bool, optional (default=True)) – Whether to move the data to the device (e.g., GPU) beforehand to reduce data transfer overhead. For large datasets, this should be set to False.
Examples
>>> scproca.settings.seed = seed >>> scproca.settings.batch_size = batch_size (default=512) >>> scproca.settings.device = index_cuda (None if using 'cpu') >>> adata = anndata.read_h5ad(path_to_anndata) >>> batch = adata.obs[key_batch].values.ravel() >>> valid_adt = np.array([True] * len(adata)) >>> valid_adt[adt_not_valid] = False >>> adata.obs["valid_adt"] = valid_adt >>> scproca = scProca(adata=adata, key_adt=key_adt, key_batch=key_batch, key_valid_adt="valid_adt") >>> scproca.train() >>> adata.obsm["latent"], adata.obsm["embedding_rna"], adata.obsm["embedding_adt"] = scproca.get_latent_representation() >>> adata.obsm["protein_generation"] = scproca.generation(anchor_batch=list_str_anchor_batch)
- curve_loss(key_loss)[source]
Plots the loss curve for the validation dataset during the training process.
- Parameters:
key_loss (str, optional (default="loss_elbo")) –
The key used to specify which loss to plot. Choices are:
'loss_elbo'- ELBO (Evidence Lower Bound) loss'loss_discriminator'- Loss for the discriminators
- generation(anchor_batch: str | List[str] | None, n_shuffle: int | None = 100)[source]
Generates ADT measurements for each cell.
- Parameters:
- Return type:
protein_generation - generated ADT measurements.
- get_latent_representation(n_shuffle: int | None = 100)[source]
Infers the integrated latent representation, RNA-specific embedding, and ADT-specific embedding for each cell.
- Parameters:
n_shuffle (int, optional (default=100)) – The number of repetitions used to estimate the mean representation.
- Returns:
- **latent* - integrated latent representation*
- **embedding_rna* - RNA-specific embedding representation*
- **embedding_adt* - ADT-specific embedding representation*
- train(batch_size: int | None = None, lambda_a: float = 30.0, adversarial_step=1, epochs=400, lr=0.004, ratio_val: float = 0.1, epochs_warmup: int | None = None, steps_warmup: int | None = None, bool_also_reconstructed_from_embedding: bool = True)[source]
Trains the model using variational inference.
- Parameters:
batch_size (int, optional) – The minibatch size used during training. Can also be specified via scproca.settings.batch_size.
lambda_a (float (default=30.0)) – The coefficient for the adversarial loss.
adversarial_step (int, optional (default=1)) – The number of steps for adversarial network optimization in each training epoch.
epochs (int, optional (default=400)) – The maximum number of training epochs.
lr (float, optional (default=4e-3)) – The learning rate.
ratio_val (float, optional (default=0.1)) – The proportion of the dataset used as the validation set.
epochs_warmup (int, optional (default=None)) – The number of epochs to use for warmup.
steps_warmup (int, optional (default=None)) – The number of steps to use for warmup.
bool_also_reconstructed_from_embedding (bool, optional (default=True)) – Whether to additionally train the reconstruction loss from the embeddings, apart from the latent space reconstruction loss.