Data loading

scvi-tools now relies entirely on the AnnData format. For convenience, we have included data loaders from the AnnData API. Scanpy also has utilities to load data that are outputted by 10x’s Cell Ranger software.

data.read_h5ad(filename[, backed, …])

Read .h5ad-formatted hdf5 file.

data.read_csv(filename[, delimiter, …])

Read .csv file.

data.read_loom(filename[, sparse, cleanup, …])

Read .loom-formatted hdf5 file.

data.read_text(filename[, delimiter, …])

Read .txt, .tab, .data (text) file.

Basic preprocessing

For general single-cell preprocessing, we defer to our friends at Scanpy, and specifically their preprocessing module (scanpy.pp).

All scvi-tools models require raw UMI count data. The count data can be safely stored in an AnnData layer as one of the first steps of a Scanpy single-cell workflow:

adata.layers["counts"] = adata.X.copy()

Here we maintain a few package specific utilities for feature selection, etc.

data.poisson_gene_selection(adata[, layer, …])

Rank and select genes based on the enrichment of zero counts in data compared to a Poisson count model.

data.organize_cite_seq_10x(adata[, copy])

Organize anndata object loaded from 10x for scvi models.

Data preparation

Setting up an AnnData object is a prerequisite for running any scvi-tools model.

data.setup_anndata(adata[, batch_key, …])

Sets up AnnData object for scvi models.

data.transfer_anndata_setup(adata_source, …)

Transfer anndata setup from a source object to a target object.

data.register_tensor_from_anndata(adata, …)

Add another tensor to scvi data registry.


Prints setup anndata.

Built in data

data.pbmcs_10x_cite_seq([save_path, …])

Filtered PBMCs from 10x Genomics profiled with RNA and protein.

data.spleen_lymph_cite_seq([save_path, …])

Immune cells from the murine spleen and lymph nodes [GayosoSteier20].

data.purified_pbmc_dataset([save_path, …])

Purified PBMC dataset from: “Massively parallel digital transcriptional profiling of single cells”.

data.dataset_10x([dataset_name, filename, …])

Loads a file from 10x website.

data.brainlarge_dataset([save_path, …])

Loads brain-large dataset.

data.pbmc_dataset([save_path, …])

Loads pbmc dataset.

data.cortex([save_path, run_setup_anndata])

Loads cortex dataset.

data.seqfishplus([save_path, tissue_region, …])

seqFISH+ of cortex, subventricular zone and olfactory bulb of mouse brain.

data.seqfish([save_path, run_setup_anndata])

Seqfish dataset.

data.smfish([save_path, …])

Loads osmFISH data of mouse cortex cells from the Linarsson lab.

data.breast_cancer_dataset([save_path, …])

Loads breast cancer dataset.

data.mouse_ob_dataset([save_path, …])

Loads mouse ob dataset.

data.retina([save_path, run_setup_anndata])

Loads retina dataset.

data.prefrontalcortex_starmap([save_path, …])

Loads a starMAP dataset of mouse pre-frontal cortex (Wang et al., 2018).

data.frontalcortex_dropseq([save_path, …])

Load the cells from the mouse frontal cortex sequenced by the Dropseq technology (Saunders et al., 2018).