Advanced autotune tutorial

DISCLAIMER: Most experiments in this notebook require one or more GPUs to keep their runtime a matter of hours. DISCLAIMER: To use our new autotune feature in parallel mode, you need to install `MongoDb <https://docs.mongodb.com/manual/installation/>`__ first.

In this notebook, we give an in-depth tutorial on scVI’s new autotune module.

Overall, the new module enables users to perform parallel hyperparemter search for any scVI model and on any number of GPUs/CPUs. Although, the search may be performed sequentially using only one GPU/CPU, we will focus on the paralel case. Note that GPUs provide a much faster approach as they are particularly suitable for neural networks gradient back-propagation.

Additionally, we provide the code used to generate the results presented in our Hyperoptimization blog post. For an in-depth analysis of the results obtained on three gold standard scRNAseq datasets (Cortex, PBMC and BrainLarge), please to the above blog post. In the blog post, we also suggest guidelines on how and when to use our auto-tuning feature.

[1]:
import sys

sys.path.append("../../")
sys.path.append("../")

%matplotlib inline
[2]:
import logging
import os
import pickle
import scanpy
import anndata

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from hyperopt import hp

import scvi
from scvi.data import cortex, pbmc_dataset, brainlarge_dataset, annotation_simulation
from scvi.inference import auto_tune_scvi_model
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-28d72a8e5914> in <module>
     11 from hyperopt import hp
     12
---> 13 import scvi
     14 from scvi.data import cortex, pbmc_dataset, brainlarge_dataset, annotation_simulation
     15 from scvi.inference import auto_tune_scvi_model

~/scvi_dev/scvi/__init__.py in <module>
      6
      7 from ._constants import _CONSTANTS
----> 8 from ._settings import settings
      9 from . import dataset, models
     10

~/scvi_dev/scvi/_settings.py in <module>
      2 from typing import Union
      3 from ._compat import Literal
----> 4 from rich.logging import RichHandler
      5 from rich.console import Console
      6

ModuleNotFoundError: No module named 'rich'
[3]:
logger = logging.getLogger("scvi.inference.autotune")
logger.setLevel(logging.WARNING)
[4]:
def allow_notebook_for_test():
    print("Testing the autotune advanced notebook")

test_mode = False


def if_not_test_else(x, y):
    if not test_mode:
        return x
    else:
        return y


save_path = "data/"
n_epochs = if_not_test_else(1000, 1)
n_epochs_brain_large = if_not_test_else(50, 1)
max_evals = if_not_test_else(100, 1)
reserve_timeout = if_not_test_else(180, 5)
fmin_timeout = if_not_test_else(300, 10)

Default usage

For the sake of principled simplicity, we provide an all-default approach to hyperparameter search for any scVI model. The few lines below present an example of how to perform hyper-parameter search for scVI on the Cortex dataset.

Note that, by default, the model used is scVI’s VAE and the trainer is the UnsupervisedTrainer

Also, the default search space is as follows:

  • n_latent: [5, 15]

  • n_hidden: {64, 128, 256}

  • n_layers: [1, 5]

  • dropout_rate: {0.1, 0.3, 0.5, 0.7}

  • reconstruction_loss: {“zinb”, “nb”}

  • lr: {0.01, 0.005, 0.001, 0.0005, 0.0001}

On a more practical note, verbosity varies in the following way:

  • logger.setLevel(logging.WARNING) will show a progress bar.

  • logger.setLevel(logging.INFO) will show global logs including the number of jobs done.

  • logger.setLevel(logging.DEBUG) will show detailed logs for each training (e.g the parameters tested).

This function’s behaviour can be customized, please refer to the rest of this tutorial as well as its documentation for information about the different parameters available.

Running the hyperoptimization process.

[3]:
cortex_dataset = scvi.data.cortex(save_path=save_path)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-a15587a23fcc> in <module>
----> 1 cortex_dataset = scvi.data.cortex(save_path=save_path)

NameError: name 'scvi' is not defined
[6]:
best_vae, trials = auto_tune_scvi_model(
    gene_dataset=cortex_dataset,
    parallel=True,
    exp_key="cortex_dataset",
    train_func_specific_kwargs={"n_epochs": n_epochs},
    max_evals=max_evals,
    reserve_timeout=reserve_timeout,
    fmin_timeout=fmin_timeout,
)
latent = best_vae.get_latent_representation()
[2020-07-17 21:16:45,659] INFO - scvi.inference.autotune.all | Starting experiment: cortex_dataset
[2020-07-17 21:16:45,660] DEBUG - scvi.inference.autotune.all | Using default parameter search space.
[2020-07-17 21:16:45,662] DEBUG - scvi.inference.autotune.all | Adding default early stopping behaviour.
[2020-07-17 21:16:45,663] INFO - scvi.inference.autotune.all | Fixed parameters:
model:
{}
trainer:
{'early_stopping_kwargs': {'early_stopping_metric': 'elbo', 'save_best_state_metric': 'elbo', 'patience': 50, 'threshold': 0, 'reduce_lr_on_plateau': True, 'lr_patience': 25, 'lr_factor': 0.2}, 'metrics_to_monitor': ['elbo']}
train method:
{'n_epochs': 1}
[2020-07-17 21:16:45,663] INFO - scvi.inference.autotune.all | Starting parallel hyperoptimization
[2020-07-17 21:16:45,671] DEBUG - scvi.inference.autotune.all | Starting MongoDb process, logs redirected to ./mongo/mongo_logfile.txt.
[2020-07-17 21:16:50,697] DEBUG - scvi.inference.autotune.all | Starting minimization procedure
[2020-07-17 21:16:50,702] DEBUG - scvi.inference.autotune.all | Starting FminProcess.
[2020-07-17 21:16:50,702] DEBUG - scvi.inference.autotune.all | Starting worker launcher
[2020-07-17 21:16:50,705] DEBUG - scvi.inference.autotune.all | gpu_ids is None, defaulting to all 0 GPUs found by torch.
[2020-07-17 21:16:50,708] DEBUG - scvi.inference.autotune.all | No GPUs found and n_cpu_wokers is None, defaulting to n_cpu_workers = 7 (os.cpu_count() - 1)
  0%|          | 0/1 [00:00<?, ?it/s]
[2020-07-17 21:16:50,712] DEBUG - scvi.inference.autotune.all | Listener listening...
[2020-07-17 21:16:50,712] INFO - scvi.inference.autotune.all | Starting 1 worker.s for each of the 0 gpu.s set for use/found.
[2020-07-17 21:16:50,714] INFO - scvi.inference.autotune.all | Starting 7 cpu worker.s
[2020-07-17 21:16:50,723] DEBUG - scvi.inference.autotune.all | No timer, waiting for fmin...
[2020-07-17 21:17:05,506] DEBUG - scvi.inference.autotune.all | All workers have died, check stdout/stderr for error tracebacks.
[2020-07-17 21:17:05,507] DEBUG - scvi.inference.autotune.all | Worker watchdog finished, terminating workers and stopping listener.

[2020-07-17 21:17:10,719] DEBUG - scvi.inference.autotune.all | multiple_hosts set to false, Fmin has 10 seconds to finish
[2020-07-17 21:17:20,725] ERROR - scvi.inference.autotune.all | Queue still empty 10 seconds after all workers have died.
Terminating minimization process.
[2020-07-17 21:17:20,726] ERROR - scvi.inference.autotune.all | Caught ('Queue still empty 10 seconds after all workers have died. Check that you have used a new exp_key or allowed a higher max_evals',) in auto_tune_scvi_model, starting cleanup.
Traceback (most recent call last):
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 725, in _auto_tune_parallel
    trials = queue.get(timeout=fmin_timeout)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 146, in decorated
    return func(*args, **kwargs)
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 470, in auto_tune_scvi_model
    multiple_hosts=multiple_hosts,
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 735, in _auto_tune_parallel
    "a higher max_evals".format(fmin_timeout=fmin_timeout)
scvi.inference.autotune.FminTimeoutError: Queue still empty 10 seconds after all workers have died. Check that you have used a new exp_key or allowed a higher max_evals
[2020-07-17 21:17:20,730] INFO - scvi.inference.autotune.all | Cleaning up
[2020-07-17 21:17:20,731] DEBUG - scvi.inference.autotune.all | Cleaning up: closing files.
[2020-07-17 21:17:20,732] DEBUG - scvi.inference.autotune.all | Cleaning up: closing queues.
[2020-07-17 21:17:20,732] DEBUG - scvi.inference.autotune.all | Cleaning up: setting cleanup_event and joining threads.
[2020-07-17 21:17:20,733] DEBUG - scvi.inference.autotune.all | Thread Progress Listener already done.
[2020-07-17 21:17:20,734] DEBUG - scvi.inference.autotune.all | Thread Worker Launcher already done.
[2020-07-17 21:17:20,735] DEBUG - scvi.inference.autotune.all | Closing Thread Fmin Launcher.
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/Users/galen/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/Users/galen/anaconda3/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/galen/anaconda3/lib/python3.7/logging/handlers.py", line 1475, in _monitor
    record = self.dequeue(True)
  File "/Users/galen/anaconda3/lib/python3.7/logging/handlers.py", line 1424, in dequeue
    return self.queue.get(block)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

[2020-07-17 21:17:30,737] DEBUG - scvi.inference.autotune.all | fmin finished.
[2020-07-17 21:17:30,739] DEBUG - scvi.inference.autotune.all | Cleaning up: terminating processes.
[2020-07-17 21:17:30,741] DEBUG - scvi.inference.autotune.all | Process Worker CPU 6 already done.
[2020-07-17 21:17:30,743] DEBUG - scvi.inference.autotune.all | Process Worker CPU 5 already done.
[2020-07-17 21:17:30,743] DEBUG - scvi.inference.autotune.all | Process Worker CPU 4 already done.
[2020-07-17 21:17:30,745] DEBUG - scvi.inference.autotune.all | Process Worker CPU 3 already done.
[2020-07-17 21:17:30,746] DEBUG - scvi.inference.autotune.all | Process Worker CPU 2 already done.
[2020-07-17 21:17:30,747] DEBUG - scvi.inference.autotune.all | Process Worker CPU 1 already done.
[2020-07-17 21:17:30,747] DEBUG - scvi.inference.autotune.all | Process Worker CPU 0 already done.
[2020-07-17 21:17:30,748] DEBUG - scvi.inference.autotune.all | Terminating Process Fmin.
[2020-07-17 21:17:30,749] DEBUG - scvi.inference.autotune.all | Terminating mongod process.
[2020-07-17 21:17:30,750] DEBUG - scvi.inference.autotune.all | Cleaning up: removing added logging handler.
[2020-07-17 21:17:30,751] DEBUG - scvi.inference.autotune.all | Cleaning up: removing hyperopt FileHandler.
[2020-07-17 21:17:30,752] DEBUG - scvi.inference.autotune.all | Cleaning up: removing autotune FileHandler.
---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
~/scVI/scvi/inference/autotune.py in _auto_tune_parallel(objective_hyperopt, exp_key, space, max_evals, save_path, n_cpu_workers, gpu_ids, n_workers_per_gpu, reserve_timeout, fmin_timeout, fmin_timer, mongo_port, mongo_host, db_name, multiple_hosts)
    724             )
--> 725         trials = queue.get(timeout=fmin_timeout)
    726         queue.close()

~/anaconda3/lib/python3.7/multiprocessing/queues.py in get(self, block, timeout)
    104                     if not self._poll(timeout):
--> 105                         raise Empty
    106                 elif not self._poll():

Empty:

During handling of the above exception, another exception occurred:

FminTimeoutError                          Traceback (most recent call last)
<ipython-input-6-eb9feb949bd4> in <module>
      6     max_evals=max_evals,
      7     reserve_timeout=reserve_timeout,
----> 8     fmin_timeout=fmin_timeout,
      9 )

~/scVI/scvi/inference/autotune.py in decorated(*args, **kwargs)
    144     def decorated(*args, **kwargs):
    145         try:
--> 146             return func(*args, **kwargs)
    147         except Exception as e:
    148             logger_all.exception(

~/scVI/scvi/inference/autotune.py in auto_tune_scvi_model(exp_key, gene_dataset, delayed_populating, custom_objective_hyperopt, objective_kwargs, model_class, trainer_class, metric_name, metric_kwargs, posterior_name, model_specific_kwargs, trainer_specific_kwargs, train_func_specific_kwargs, space, max_evals, train_best, pickle_result, save_path, use_batches, parallel, n_cpu_workers, gpu_ids, n_workers_per_gpu, reserve_timeout, fmin_timeout, fmin_timer, mongo_port, mongo_host, db_name, multiple_hosts)
    468             mongo_host=mongo_host,
    469             db_name=db_name,
--> 470             multiple_hosts=multiple_hosts,
    471         )
    472

~/scVI/scvi/inference/autotune.py in _auto_tune_parallel(objective_hyperopt, exp_key, space, max_evals, save_path, n_cpu_workers, gpu_ids, n_workers_per_gpu, reserve_timeout, fmin_timeout, fmin_timer, mongo_port, mongo_host, db_name, multiple_hosts)
    733             "Queue still empty {fmin_timeout} seconds after all workers "
    734             "have died. Check that you have used a new exp_key or allowed "
--> 735             "a higher max_evals".format(fmin_timeout=fmin_timeout)
    736         )
    737

FminTimeoutError: Queue still empty 10 seconds after all workers have died. Check that you have used a new exp_key or allowed a higher max_evals

Returned objects

The trials object contains detailed information about each run. trials.trials is an Iterable in which each element corresponds to a single run. It can be used as a dictionary for wich the key “result” yields a dictionnary containing the outcome of the run as defined in our default objective function (or the user’s custom version). For example, it will contain information on the hyperparameters used (under the “space” key), the resulting metric (under the “loss” key) or the status of the run.

The best_trainer object can be used directly as an scVI Trainer object. It is the result of a training on the whole dataset provided using the optimal set of hyperparameters found.

Custom hyperamater space

Although our default can be a good one in a number of cases, we still provide an easy way to use custom values for the hyperparameters search space. These are broken down in three categories:

  • Hyperparameters for the Trainer instance. (if any)

  • Hyperparameters for the Trainer instance’s train method. (e.g lr)

  • Hyperparameters for the model instance. (e.g n_layers)

To build your own hyperparameter space follow the scheme used in scVI’s codebase as well as the sample below. Note the various spaces you define, have to follow the hyperopt syntax, for which you can find a detailed description here.

For example, if you were to want to search over a continuous range of droupouts varying in [0.1, 0.3] and for a continuous learning rate varying in [0.001, 0.0001], you could use the following search space.

[12]:
space = {
    "model_tunable_kwargs": {"dropout_rate": hp.uniform("dropout_rate", 0.1, 0.3)},
    "train_func_tunable_kwargs": {"lr": hp.loguniform("lr", -4.0, -3.0)},
}

best_vae, trials = auto_tune_scvi_model(
    gene_dataset=cortex_dataset,
    space=space,
    parallel=True,
    exp_key="cortex_dataset_custom_space",
    train_func_specific_kwargs={"n_epochs": n_epochs},
    max_evals=max_evals,
    reserve_timeout=reserve_timeout,
    fmin_timeout=fmin_timeout,
)

[2020-07-17 17:38:46,936] INFO - scvi.inference.autotune.all | Starting experiment: cortex_dataset_custom_space
[2020-07-17 17:38:46,937] DEBUG - scvi.inference.autotune.all | Adding default early stopping behaviour.
[2020-07-17 17:38:46,939] INFO - scvi.inference.autotune.all | Fixed parameters:
model:
{}
trainer:
{'early_stopping_kwargs': {'early_stopping_metric': 'elbo', 'save_best_state_metric': 'elbo', 'patience': 50, 'threshold': 0, 'reduce_lr_on_plateau': True, 'lr_patience': 25, 'lr_factor': 0.2}, 'metrics_to_monitor': ['elbo']}
train method:
{'n_epochs': 1}
[2020-07-17 17:38:46,940] INFO - scvi.inference.autotune.all | Starting parallel hyperoptimization
[2020-07-17 17:38:46,942] DEBUG - scvi.inference.autotune.all | Starting MongoDb process, logs redirected to ./mongo/mongo_logfile.txt.
[2020-07-17 17:38:51,973] DEBUG - scvi.inference.autotune.all | Starting minimization procedure
[2020-07-17 17:38:51,975] DEBUG - scvi.inference.autotune.all | Starting FminProcess.
[2020-07-17 17:38:51,975] DEBUG - scvi.inference.autotune.all | Starting worker launcher
[2020-07-17 17:38:51,978] DEBUG - scvi.inference.autotune.all | gpu_ids is None, defaulting to all 0 GPUs found by torch.
[2020-07-17 17:38:52,046] DEBUG - scvi.inference.autotune.all | No GPUs found and n_cpu_wokers is None, defaulting to n_cpu_workers = 7 (os.cpu_count() - 1)
  0%|          | 0/1 [00:00<?, ?it/s]
[2020-07-17 17:38:52,067] DEBUG - scvi.inference.autotune.all | Listener listening...
[2020-07-17 17:38:52,068] INFO - scvi.inference.autotune.all | Starting 1 worker.s for each of the 0 gpu.s set for use/found.
[2020-07-17 17:38:52,072] INFO - scvi.inference.autotune.all | Starting 7 cpu worker.s
[2020-07-17 17:39:00,504] DEBUG - scvi.inference.autotune.all | No timer, waiting for fmin...
[2020-07-17 17:39:06,405] DEBUG - scvi.inference.autotune.all | All workers have died, check stdout/stderr for error tracebacks.
[2020-07-17 17:39:06,406] DEBUG - scvi.inference.autotune.all | Worker watchdog finished, terminating workers and stopping listener.

[2020-07-17 17:39:10,512] DEBUG - scvi.inference.autotune.all | fmin finished.
[2020-07-17 17:39:11,984] DEBUG - scvi.inference.autotune.all | Setting worker launcher stop event.
[2020-07-17 17:39:11,986] DEBUG - scvi.inference.autotune.all | multiple_hosts set to false, Fmin has 10 seconds to finish
[2020-07-17 17:39:21,994] ERROR - scvi.inference.autotune.all | Queue still empty 10 seconds after all workers have died.
Terminating minimization process.
[2020-07-17 17:39:21,995] ERROR - scvi.inference.autotune.all | Caught ('Queue still empty 10 seconds after all workers have died. Check that you have used a new exp_key or allowed a higher max_evals',) in auto_tune_scvi_model, starting cleanup.
Traceback (most recent call last):
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 725, in _auto_tune_parallel
    trials = queue.get(timeout=fmin_timeout)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 146, in decorated
    return func(*args, **kwargs)
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 470, in auto_tune_scvi_model
    multiple_hosts=multiple_hosts,
  File "/Users/galen/scVI/scvi/inference/autotune.py", line 735, in _auto_tune_parallel
    "a higher max_evals".format(fmin_timeout=fmin_timeout)
scvi.inference.autotune.FminTimeoutError: Queue still empty 10 seconds after all workers have died. Check that you have used a new exp_key or allowed a higher max_evals
[2020-07-17 17:39:21,997] INFO - scvi.inference.autotune.all | Cleaning up
[2020-07-17 17:39:21,998] DEBUG - scvi.inference.autotune.all | Cleaning up: closing files.
[2020-07-17 17:39:22,000] DEBUG - scvi.inference.autotune.all | Cleaning up: closing queues.
[2020-07-17 17:39:22,001] DEBUG - scvi.inference.autotune.all | Cleaning up: setting cleanup_event and joining threads.
[2020-07-17 17:39:22,002] DEBUG - scvi.inference.autotune.all | Thread Progress Listener already done.
[2020-07-17 17:39:22,003] DEBUG - scvi.inference.autotune.all | Thread Worker Launcher already done.
[2020-07-17 17:39:22,004] DEBUG - scvi.inference.autotune.all | Thread Fmin Launcher already done.
[2020-07-17 17:39:22,007] DEBUG - scvi.inference.autotune.all | Thread Progress Listener already done.
[2020-07-17 17:39:22,008] DEBUG - scvi.inference.autotune.all | Thread Worker Launcher already done.
[2020-07-17 17:39:22,009] DEBUG - scvi.inference.autotune.all | Thread Fmin Launcher already done.
[2020-07-17 17:39:22,011] DEBUG - scvi.inference.autotune.all | Thread Progress Listener already done.
[2020-07-17 17:39:22,012] DEBUG - scvi.inference.autotune.all | Thread Worker Launcher already done.
[2020-07-17 17:39:22,013] DEBUG - scvi.inference.autotune.all | Thread Fmin Launcher already done.
[2020-07-17 17:39:22,014] DEBUG - scvi.inference.autotune.all | Thread Progress Listener already done.
[2020-07-17 17:39:22,015] DEBUG - scvi.inference.autotune.all | Thread Worker Launcher already done.
[2020-07-17 17:39:22,016] DEBUG - scvi.inference.autotune.all | Thread Fmin Launcher already done.
[2020-07-17 17:39:22,016] DEBUG - scvi.inference.autotune.all | Thread Progress Listener already done.
[2020-07-17 17:39:22,022] DEBUG - scvi.inference.autotune.all | Thread Worker Launcher already done.
[2020-07-17 17:39:22,028] DEBUG - scvi.inference.autotune.all | Thread Fmin Launcher already done.
[2020-07-17 17:39:22,032] DEBUG - scvi.inference.autotune.all | Cleaning up: terminating processes.
[2020-07-17 17:39:22,034] DEBUG - scvi.inference.autotune.all | Process Fmin already done.
[2020-07-17 17:39:22,034] DEBUG - scvi.inference.autotune.all | Process Worker CPU 6 already done.
[2020-07-17 17:39:22,036] DEBUG - scvi.inference.autotune.all | Process Worker CPU 5 already done.
[2020-07-17 17:39:22,037] DEBUG - scvi.inference.autotune.all | Process Worker CPU 4 already done.
[2020-07-17 17:39:22,043] DEBUG - scvi.inference.autotune.all | Process Worker CPU 3 already done.
[2020-07-17 17:39:22,047] DEBUG - scvi.inference.autotune.all | Process Worker CPU 2 already done.
[2020-07-17 17:39:22,053] DEBUG - scvi.inference.autotune.all | Process Worker CPU 1 already done.
[2020-07-17 17:39:22,054] DEBUG - scvi.inference.autotune.all | Process Worker CPU 0 already done.
[2020-07-17 17:39:22,055] DEBUG - scvi.inference.autotune.all | Terminating mongod process.
[2020-07-17 17:39:22,057] DEBUG - scvi.inference.autotune.all | Process Fmin already done.
[2020-07-17 17:39:22,058] DEBUG - scvi.inference.autotune.all | Process Worker CPU 6 already done.
[2020-07-17 17:39:22,059] DEBUG - scvi.inference.autotune.all | Process Worker CPU 5 already done.
[2020-07-17 17:39:22,060] DEBUG - scvi.inference.autotune.all | Process Worker CPU 4 already done.
[2020-07-17 17:39:22,061] DEBUG - scvi.inference.autotune.all | Process Worker CPU 3 already done.
[2020-07-17 17:39:22,062] DEBUG - scvi.inference.autotune.all | Process Worker CPU 2 already done.
[2020-07-17 17:39:22,063] DEBUG - scvi.inference.autotune.all | Process Worker CPU 1 already done.
[2020-07-17 17:39:22,063] DEBUG - scvi.inference.autotune.all | Process Worker CPU 0 already done.
[2020-07-17 17:39:22,064] DEBUG - scvi.inference.autotune.all | Terminating mongod process.
[2020-07-17 17:39:22,065] DEBUG - scvi.inference.autotune.all | Process Fmin already done.
[2020-07-17 17:39:22,066] DEBUG - scvi.inference.autotune.all | Process Worker CPU 6 already done.
[2020-07-17 17:39:22,066] DEBUG - scvi.inference.autotune.all | Process Worker CPU 5 already done.
[2020-07-17 17:39:22,067] DEBUG - scvi.inference.autotune.all | Process Worker CPU 4 already done.
[2020-07-17 17:39:22,068] DEBUG - scvi.inference.autotune.all | Process Worker CPU 3 already done.
[2020-07-17 17:39:22,068] DEBUG - scvi.inference.autotune.all | Process Worker CPU 2 already done.
[2020-07-17 17:39:22,069] DEBUG - scvi.inference.autotune.all | Process Worker CPU 1 already done.
[2020-07-17 17:39:22,069] DEBUG - scvi.inference.autotune.all | Process Worker CPU 0 already done.
[2020-07-17 17:39:22,070] DEBUG - scvi.inference.autotune.all | Terminating mongod process.
[2020-07-17 17:39:22,071] DEBUG - scvi.inference.autotune.all | Process Fmin already done.
[2020-07-17 17:39:22,072] DEBUG - scvi.inference.autotune.all | Process Worker CPU 6 already done.
[2020-07-17 17:39:22,072] DEBUG - scvi.inference.autotune.all | Process Worker CPU 5 already done.
[2020-07-17 17:39:22,073] DEBUG - scvi.inference.autotune.all | Process Worker CPU 4 already done.
[2020-07-17 17:39:22,073] DEBUG - scvi.inference.autotune.all | Process Worker CPU 3 already done.
[2020-07-17 17:39:22,074] DEBUG - scvi.inference.autotune.all | Process Worker CPU 2 already done.
[2020-07-17 17:39:22,075] DEBUG - scvi.inference.autotune.all | Process Worker CPU 1 already done.
[2020-07-17 17:39:22,075] DEBUG - scvi.inference.autotune.all | Process Worker CPU 0 already done.
[2020-07-17 17:39:22,076] DEBUG - scvi.inference.autotune.all | Terminating mongod process.
[2020-07-17 17:39:22,077] DEBUG - scvi.inference.autotune.all | Process Fmin already done.
[2020-07-17 17:39:22,078] DEBUG - scvi.inference.autotune.all | Process Worker CPU 6 already done.
[2020-07-17 17:39:22,079] DEBUG - scvi.inference.autotune.all | Process Worker CPU 5 already done.
[2020-07-17 17:39:22,080] DEBUG - scvi.inference.autotune.all | Process Worker CPU 4 already done.
[2020-07-17 17:39:22,080] DEBUG - scvi.inference.autotune.all | Process Worker CPU 3 already done.
[2020-07-17 17:39:22,081] DEBUG - scvi.inference.autotune.all | Process Worker CPU 2 already done.
[2020-07-17 17:39:22,081] DEBUG - scvi.inference.autotune.all | Process Worker CPU 1 already done.
[2020-07-17 17:39:22,082] DEBUG - scvi.inference.autotune.all | Process Worker CPU 0 already done.
[2020-07-17 17:39:22,083] DEBUG - scvi.inference.autotune.all | Terminating mongod process.
[2020-07-17 17:39:22,084] DEBUG - scvi.inference.autotune.all | Cleaning up: removing added logging handler.
[2020-07-17 17:39:22,085] DEBUG - scvi.inference.autotune.all | Cleaning up: removing hyperopt FileHandler.
[2020-07-17 17:39:22,086] DEBUG - scvi.inference.autotune.all | Cleaning up: removing autotune FileHandler.
Exception in thread Thread-13:
Traceback (most recent call last):
  File "/Users/galen/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/Users/galen/anaconda3/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/galen/anaconda3/lib/python3.7/logging/handlers.py", line 1475, in _monitor
    record = self.dequeue(True)
  File "/Users/galen/anaconda3/lib/python3.7/logging/handlers.py", line 1424, in dequeue
    return self.queue.get(block)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/Users/galen/anaconda3/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
~/scVI/scvi/inference/autotune.py in _auto_tune_parallel(objective_hyperopt, exp_key, space, max_evals, save_path, n_cpu_workers, gpu_ids, n_workers_per_gpu, reserve_timeout, fmin_timeout, fmin_timer, mongo_port, mongo_host, db_name, multiple_hosts)
    724             )
--> 725         trials = queue.get(timeout=fmin_timeout)
    726         queue.close()

~/anaconda3/lib/python3.7/multiprocessing/queues.py in get(self, block, timeout)
    104                     if not self._poll(timeout):
--> 105                         raise Empty
    106                 elif not self._poll():

Empty:

During handling of the above exception, another exception occurred:

FminTimeoutError                          Traceback (most recent call last)
<ipython-input-12-167205b30336> in <module>
     12     max_evals=max_evals,
     13     reserve_timeout=reserve_timeout,
---> 14     fmin_timeout=fmin_timeout,
     15 )

~/scVI/scvi/inference/autotune.py in decorated(*args, **kwargs)
    144     def decorated(*args, **kwargs):
    145         try:
--> 146             return func(*args, **kwargs)
    147         except Exception as e:
    148             logger_all.exception(

~/scVI/scvi/inference/autotune.py in auto_tune_scvi_model(exp_key, gene_dataset, delayed_populating, custom_objective_hyperopt, objective_kwargs, model_class, trainer_class, metric_name, metric_kwargs, posterior_name, model_specific_kwargs, trainer_specific_kwargs, train_func_specific_kwargs, space, max_evals, train_best, pickle_result, save_path, use_batches, parallel, n_cpu_workers, gpu_ids, n_workers_per_gpu, reserve_timeout, fmin_timeout, fmin_timer, mongo_port, mongo_host, db_name, multiple_hosts)
    468             mongo_host=mongo_host,
    469             db_name=db_name,
--> 470             multiple_hosts=multiple_hosts,
    471         )
    472

~/scVI/scvi/inference/autotune.py in _auto_tune_parallel(objective_hyperopt, exp_key, space, max_evals, save_path, n_cpu_workers, gpu_ids, n_workers_per_gpu, reserve_timeout, fmin_timeout, fmin_timer, mongo_port, mongo_host, db_name, multiple_hosts)
    733             "Queue still empty {fmin_timeout} seconds after all workers "
    734             "have died. Check that you have used a new exp_key or allowed "
--> 735             "a higher max_evals".format(fmin_timeout=fmin_timeout)
    736         )
    737

FminTimeoutError: Queue still empty 10 seconds after all workers have died. Check that you have used a new exp_key or allowed a higher max_evals

Custom objective metric

By default, our autotune process tracks the marginal negative log likelihood of the best state of the model according ot the held-out Evidence Lower BOund (ELBO). But, if you want to track a different early stopping metric and optimize a different loss you can use auto_tune_scvi_model’s parameters.

For example, if for some reason, you had a dataset coming from two batches (i.e two merged datasets) and wanted to optimize the hyperparameters for the batch mixing entropy. You could use the code below, which makes use of the metric_name argument of auto_tune_scvi_model. This can work for any metric that is implemented in the ScviDataLoader class you use. You may also specify the name of the ScviDataLoader attribute you want to use (e.g “train_set”).

[8]:
pbmc_dataset = pbmc_dataset(save_path=os.path.join(save_path, "10X/"))
[2020-07-17 21:12:09,530] INFO - scvi.dataset._utils | Downloading file at data/10X/gene_info_pbmc.csv
[2020-07-17 21:12:10,461] INFO - scvi.dataset._utils | Downloading file at data/10X/pbmc_metadata.pickle
[2020-07-17 21:12:11,733] INFO - scvi.dataset._utils | Downloading file at data/10X/pbmc8k/filtered_gene_bc_matrices.tar.gz
[2020-07-17 21:12:13,217] INFO - scvi.dataset.dataset10X | Extracting tar file
data/10X/pbmc8k/filtered_gene_bc_matrices/GRCh38
/Users/galen/scVI/tests/notebooks
[2020-07-17 21:12:30,831] INFO - scvi.dataset.dataset10X | Removing extracted data at data/10X/pbmc8k/filtered_gene_bc_matrices
[2020-07-17 21:12:31,862] INFO - scvi.dataset._utils | Downloading file at data/10X/pbmc4k/filtered_gene_bc_matrices.tar.gz
[2020-07-17 21:12:33,073] INFO - scvi.dataset.dataset10X | Extracting tar file
data/10X/pbmc4k/filtered_gene_bc_matrices/GRCh38
/Users/galen/scVI/tests/notebooks
[2020-07-17 21:12:41,448] INFO - scvi.dataset.dataset10X | Removing extracted data at data/10X/pbmc4k/filtered_gene_bc_matrices
[2020-07-17 21:12:42,427] INFO - scvi.dataset._anndata | Using data from adata.X
[2020-07-17 21:12:42,428] INFO - scvi.dataset._anndata | Using batches from adata.obs["batch"]
[2020-07-17 21:12:42,430] INFO - scvi.dataset._anndata | Using labels from adata.obs["labels"]
[2020-07-17 21:12:42,431] INFO - scvi.dataset._anndata | Computing library size prior per batch
[2020-07-17 21:12:42,456] INFO - scvi.dataset._anndata | Successfully registered anndata object containing 11990 cells, 3346 genes, and 2 batches
Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels']
[ ]:
# best_trainer, trials = auto_tune_scvi_model(
#     gene_dataset=pbmc_dataset,
#     metric_name="entropy_batch_mixing",
#     data_loader_name="train_set",
#     parallel=True,
#     exp_key="pbmc_entropy_batch_mixing",
#     train_func_specific_kwargs={"n_epochs": n_epochs},
#     max_evals=max_evals,
#     reserve_timeout=reserve_timeout,
#     fmin_timeout=fmin_timeout,
# )
[2020-07-17 21:11:08,093] DEBUG - scvi.inference.autotune.all | All workers have died, check stdout/stderr for error tracebacks.
[2020-07-17 21:11:08,095] DEBUG - scvi.inference.autotune.all | Worker watchdog finished, terminating workers and stopping listener.
[2020-07-17 21:11:11,907] DEBUG - scvi.inference.autotune.all | fmin finished.

Custom objective function

Below, we describe, using one of our Synthetic dataset, how to tune our annotation model SCANVI for, e.g, better accuracy on a 20% subset of the labelled data. Note that the model is trained in a semi-supervised framework, that is why we have a labelled and unlabelled dataset. Please, refer to the original paper for details on SCANVI!

In this case, as described in our annotation notebook we may want to form the labelled/unlabelled sets using batch indices. Unfortunately, that requires a little “by hand” work. Even in that case, we are able to leverage the new autotune module to perform hyperparameter tuning. In order to do so, one has to write his own objective function and feed it to auto_tune_scvi_model.

One can proceed as described below. Note three important conditions:

  • Since it is going to be pickled the objective should not be implemented in the “main” module, i.e an executable script or a notebook.

  • the objective should have the search space as its first attribute and a boolean is_best_training as its second.

  • If not using a cutstom search space, it should be expected to take the form of a dictionary with the following keys:

    • "model_tunable_kwargs"

    • "trainer_tunable_kwargs"

    • "train_func_tunable_kwargs"

[9]:
from notebooks.utils.autotune_advanced_notebook import custom_objective_hyperopt
[ ]:
synthetic_dataset = annotation_simulation(1, save_path=os.path.join(save_path, "simulation/"))
objective_kwargs = dict(dataset=synthetic_dataset, n_epochs=n_epochs)
best_trainer, trials = auto_tune_scvi_model(
    custom_objective_hyperopt=custom_objective_hyperopt,
    objective_kwargs=objective_kwargs,
    parallel=True,
    exp_key="synthetic_dataset_scanvi",
    max_evals=max_evals,
    reserve_timeout=reserve_timeout,
    fmin_timeout=fmin_timeout,
)
[2020-07-17 21:15:51,567] INFO - scvi.dataset._utils | Downloading file at /Users/galen/scVI/tests/notebooks/data/simulation/simulation_1.loom
[2020-07-17 21:16:01,421] WARNING - scvi.dataset._anndata | adata.X does not contain unnormalized count data. Are you sure this is what you want?
[2020-07-17 21:16:01,421] INFO - scvi.dataset._anndata | Using data from adata.X
[2020-07-17 21:16:01,422] INFO - scvi.dataset._anndata | Using batches from adata.obs["batch"]
[2020-07-17 21:16:01,425] INFO - scvi.dataset._anndata | Using labels from adata.obs["labels"]
[2020-07-17 21:16:01,426] INFO - scvi.dataset._anndata | Computing library size prior per batch
[2020-07-17 21:16:01,817] INFO - scvi.dataset._anndata | Successfully registered anndata object containing 20000 cells, 2000 genes, and 2 batches
Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels']
[2020-07-17 21:16:01,819] INFO - scvi.inference.autotune.all | Starting experiment: synthetic_dataset_scanvi
[2020-07-17 21:16:01,819] DEBUG - scvi.inference.autotune.all | Using default parameter search space.
[2020-07-17 21:16:01,822] INFO - scvi.inference.autotune.all | Using custom objective function.
[2020-07-17 21:16:01,823] INFO - scvi.inference.autotune.all | Starting parallel hyperoptimization
[2020-07-17 21:16:01,825] DEBUG - scvi.inference.autotune.all | Starting MongoDb process, logs redirected to ./mongo/mongo_logfile.txt.
[2020-07-17 21:16:06,867] DEBUG - scvi.inference.autotune.all | Starting minimization procedure
[2020-07-17 21:16:06,870] DEBUG - scvi.inference.autotune.all | Starting FminProcess.
[2020-07-17 21:16:06,870] DEBUG - scvi.inference.autotune.all | Starting worker launcher
[2020-07-17 21:16:06,873] DEBUG - scvi.inference.autotune.all | gpu_ids is None, defaulting to all 0 GPUs found by torch.
[2020-07-17 21:16:07,100] DEBUG - scvi.inference.autotune.all | No GPUs found and n_cpu_wokers is None, defaulting to n_cpu_workers = 7 (os.cpu_count() - 1)
  0%|          | 0/100 [00:00<?, ?it/s]
[2020-07-17 21:16:07,115] DEBUG - scvi.inference.autotune.all | Listener listening...
[2020-07-17 21:16:07,115] INFO - scvi.inference.autotune.all | Starting 1 worker.s for each of the 0 gpu.s set for use/found.
[2020-07-17 21:16:07,119] INFO - scvi.inference.autotune.all | Starting 7 cpu worker.s
[2020-07-17 21:16:16,677] DEBUG - scvi.inference.autotune.all | No timer, waiting for fmin...

Delayed populating, for very large datasets.

DISCLAIMER: We don’t actually need this for the BrainLarge dataset with 720 genes, this is just an example.

The fact is that after building the objective function and feeding it to hyperopt, it is pickled on to the MongoWorkers. Thus, if you pass a loaded dataset as a partial argument to the objective function, and this dataset exceeds 4Gb, you’ll get a PickleError (Objects larger than 4Gb can’t be pickled).

To remedy this issue, in case you have a very large dataset for which you want to perform hyperparameter optimization, you should subclass scVI’s DownloadableDataset or use one of its many existing subclasses, such that the dataset can be populated inside the objective function which is called by each worker.

[ ]:
# brain_large_dataset_path = os.path.join(save_path, 'brainlarge_dataset_test.h5ad')

# best_trainer, trials = auto_tune_scvi_model(
#     gene_dataset=brain_large_dataset_path,
#     parallel=True,
#     exp_key="brain_large_dataset",
#     max_evals=max_evals,
#     trainer_specific_kwargs={
#         "early_stopping_kwargs": {
#             "early_stopping_metric": "elbo",
#             "save_best_state_metric": "elbo",
#             "patience": 20,
#             "threshold": 0,
#             "reduce_lr_on_plateau": True,
#             "lr_patience": 10,
#             "lr_factor": 0.2,
#         }
#     },
#     train_func_specific_kwargs={"n_epochs": n_epochs_brain_large},
#     reserve_timeout=reserve_timeout,
#     fmin_timeout=fmin_timeout,
# )

Working with totalVI

[ ]:
adata = scvi.data.pbmcs_10x_cite_seq(
    save_path=save_path, run_setup_anndata=False
)
adata = if_not_test_else(adata, adata[:75, :50].copy())
scvi.data.setup_anndata(
    adata, batch_key="batch", protein_expression_obsm_key="protein_expression"
)

space = {
    "model_tunable_kwargs": {
        "n_latent": 5 + hp.randint("n_latent", 11),  # [5, 15]
        "n_hidden": hp.choice("n_hidden", [64, 128, 256]),
        "n_layers_encoder": 1 + hp.randint("n_layers", 5),
        "dropout_rate_encoder": hp.choice("dropout_rate", [0.1, 0.3, 0.5, 0.7]),
        "gene_likelihood": hp.choice("gene_likelihood", ["zinb", "nb"]),
    },
    "train_func_tunable_kwargs": {
        "lr": hp.choice("lr", [0.01, 0.005, 0.001, 0.0005, 0.0001])
    },
}

best_vae, trials = auto_tune_scvi_model(
    gene_dataset=adata,
    space=space,
    parallel=True,
    model_class=scvi.model.TOTALVI,
    exp_key="totalvi_adata",
    train_func_specific_kwargs={"n_epochs": n_epochs},
    max_evals=max_evals,
    reserve_timeout=reserve_timeout,
    fmin_timeout=fmin_timeout,
    save_path=save_path,  # temp dir, see conftest.py
)
best_vae.get_latent_representation()
[ ]:
#     def get_param_df(self):
#         ddd = {}
#         for i, trial in enumerate(self.trials):
#             dd = {}
#             dd["marginal_ll"] = trial["result"]["loss"]
#             for item in trial["result"]["space"].values():
#                 for key, value in item.items():
#                     dd[key] = value
#             ddd[i] = dd
#         df_space = pd.DataFrame(ddd)
#         df_space = df_space.T
#         n_params_dataset = np.vectorize(
#             partial(
#                 n_params, self.trainer.adata.uns["_scvi"]["summary_stats"]["n_vars"]
#             )
#         )
#         df_space["n_params"] = n_params_dataset(
#             df_space["n_layers"], df_space["n_hidden"], df_space["n_latent"]
#         )
#         df_space = df_space[
#             [
#                 "marginal_ll",
#                 "n_layers",
#                 "n_hidden",
#                 "n_latent",
#                 "reconstruction_loss",
#                 "dropout_rate",
#                 "lr",
#                 "n_epochs",
#                 "n_params",
#             ]
#         ]
#         df_space = df_space.sort_values(by="marginal_ll")
#         df_space["run index"] = df_space.index
#         df_space.index = np.arange(1, df_space.shape[0] + 1)
#         return df_space


# def n_params(n_vars, n_layers, n_hidden, n_latent):
#     if n_layers == 0:
#         res = 2 * n_vars * n_latent
#     else:
#         res = 2 * n_vars * n_hidden
#         for i in range(n_layers - 1):
#             res += 2 * n_hidden * n_hidden
#         res += 2 * n_hidden * n_latent
#     return res