scvi-tools now relies entirely on the AnnData format. For convenience, we have included data loaders from the AnnData API. Scanpy also has utilities to load data that are outputted by 10x’s Cell Ranger software.
scvi-tools
data.read_h5ad(filename[, backed, …])
data.read_h5ad
Read .h5ad-formatted hdf5 file.
data.read_csv(filename[, delimiter, …])
data.read_csv
Read .csv file.
data.read_loom(filename[, sparse, cleanup, …])
data.read_loom
Read .loom-formatted hdf5 file.
data.read_text(filename[, delimiter, …])
data.read_text
Read .txt, .tab, .data (text) file.
For general single-cell preprocessing, we defer to our friends at Scanpy, and specifically their preprocessing module (scanpy.pp).
scanpy.pp
All scvi-tools models require raw UMI count data. The count data can be safely stored in an AnnData layer as one of the first steps of a Scanpy single-cell workflow:
adata.layers["counts"] = adata.X.copy()
Here we maintain a few package specific utilities for feature selection, etc.
data.poisson_gene_selection(adata[, layer, …])
data.poisson_gene_selection
Rank and select genes based on the enrichment of zero counts in data compared to a Poisson count model.
data.organize_cite_seq_10x(adata[, copy])
data.organize_cite_seq_10x
Organize anndata object loaded from 10x for scvi models.
Setting up an AnnData object is a prerequisite for running any scvi-tools model.
data.setup_anndata(adata[, batch_key, …])
data.setup_anndata
Sets up AnnData object for scvi models.
AnnData
data.transfer_anndata_setup(adata_source, …)
data.transfer_anndata_setup
Transfer anndata setup from a source object to a target object.
data.register_tensor_from_anndata(adata, …)
data.register_tensor_from_anndata
Add another tensor to scvi data registry.
data.view_anndata_setup(source)
data.view_anndata_setup
Prints setup anndata.
data.pbmcs_10x_cite_seq([save_path, …])
data.pbmcs_10x_cite_seq
Filtered PBMCs from 10x Genomics profiled with RNA and protein.
data.spleen_lymph_cite_seq([save_path, …])
data.spleen_lymph_cite_seq
Immune cells from the murine spleen and lymph nodes [GayosoSteier20].
data.purified_pbmc_dataset([save_path, …])
data.purified_pbmc_dataset
Purified PBMC dataset from: “Massively parallel digital transcriptional profiling of single cells”.
data.dataset_10x([dataset_name, filename, …])
data.dataset_10x
Loads a file from 10x website.
data.brainlarge_dataset([save_path, …])
data.brainlarge_dataset
Loads brain-large dataset.
data.pbmc_dataset([save_path, …])
data.pbmc_dataset
Loads pbmc dataset.
data.cortex([save_path, run_setup_anndata])
data.cortex
Loads cortex dataset.
data.seqfishplus([save_path, tissue_region, …])
data.seqfishplus
seqFISH+ of cortex, subventricular zone and olfactory bulb of mouse brain.
data.seqfish([save_path, run_setup_anndata])
data.seqfish
Seqfish dataset.
data.smfish([save_path, …])
data.smfish
Loads osmFISH data of mouse cortex cells from the Linarsson lab.
data.breast_cancer_dataset([save_path, …])
data.breast_cancer_dataset
Loads breast cancer dataset.
data.mouse_ob_dataset([save_path, …])
data.mouse_ob_dataset
Loads mouse ob dataset.
data.retina([save_path, run_setup_anndata])
data.retina
Loads retina dataset.
data.prefrontalcortex_starmap([save_path, …])
data.prefrontalcortex_starmap
Loads a starMAP dataset of mouse pre-frontal cortex (Wang et al., 2018).
data.frontalcortex_dropseq([save_path, …])
data.frontalcortex_dropseq
Load the cells from the mouse frontal cortex sequenced by the Dropseq technology (Saunders et al., 2018).