New in 0.8.0 (2020-MM-DD)


Online updates of SCVI, SCANVI, and TOTALVI with the scArches method

It is now possible to iteratively update these models with new samples, without altering the model for the “reference” population. Here we use the scArches method. For usage, please see the tutorial in the user guide.

To enable scArches in our models, we added a few new options. The first is encode_covariates, which is an SCVI option to encode the one-hotted batch covariate. We also allow users to exchange batch norm in the encoder and decoder with layer norm, which can be though of as batch norm but per cell. As the layer norm we use has no parameters, it’s a bit faster than models with batch norm. We don’t find many differences between using batch norm or layer norm in our models, though we have kept defaults the same in this case. To run scArches effectively, batch norm should be exhanged with layer norm.

Empirical initialization of protein background parameters with totalVI

The learned prior parameters for the protein background were randomly initialized. Now, they can be set with the empirical_protein_background_prior option in TOTALVI. This option fits a two-component Gaussian mixture model per cell, separating those proteins that are background for the cell and those that are foreground, and aggregates the learned mean and variance of the smaller component across cells. This computation is done per batch, if the batch_key was registered. We emphasize this is just for the initialization of a learned parameter in the model.

Use observed library size option

Many of our models like SCVI, SCANVI, and TOTALVI learn a latent library size variable. The option use_observed_lib_size may now be passed on model initialization. We have set this as True by default, as we see no regression in performance, and training is a bit faster.

Important changes

  • To facilitate these enhancements, saved TOTALVI models from previous versions will not load properly. This is due to an architecture change of the totalVI encoder, related to latent library size handling.

  • The default latent distribtuion for TOTALVI is now “normal”.

  • Autotune was removed from this release. We could not maintain the code given the new API changes and we will soon have alternative ways to tune hyperparameters.

  • Protein names during setup_anndata() are now stored in adata.uns[“_scvi”][“protein_names”], instead of adata.uns[“scvi_protein_names”].

Bug fixes

  • Fixed an issue where the unlabeled category affected the SCANVI architecture prior distribution. Unfortunately, by fixing this bug, loading previously trained (<v0.8.0) SCANVI models will fail.