Celltype prediction

Scanpy Toolkit

Assignment of cell identities based on gene expression patterns using reference data.
Authors

Åsa Björklund

Paulo Czarnewski

Susanne Reinsbach

Roy Francis

Published

22-Oct-2024

Note

Code chunks run Python commands unless it starts with %%bash, in which case, those chunks run shell commands.

Celltype prediction can either be performed on indiviudal cells where each cell gets a predicted celltype label, or on the level of clusters. All methods are based on similarity to other datasets, single cell or sorted bulk RNAseq, or uses known marker genes for each cell type.
Ideally celltype predictions should be run on each sample separately and not using the integrated data. In this case we will select one sample from the Covid data, ctrl_13 and predict celltype by cell on that sample.
Some methods will predict a celltype to each cell based on what it is most similar to, even if that celltype is not included in the reference. Other methods include an uncertainty so that cells with low similarity scores will be unclassified.
There are multiple different methods to predict celltypes, here we will just cover a few of those.

Here we will use a reference PBMC dataset that we get from scanpy datasets and classify celltypes based on two methods:

First, lets load required libraries

import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import warnings
import os
import urllib.request

warnings.simplefilter(action="ignore", category=Warning)

# verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.verbosity = 2
sc.settings.set_figure_params(dpi=80)

Let’s read in the saved Covid-19 data object from the clustering step.

# download pre-computed data if missing or long compute
fetch_data = True

# url for source and intermediate data
path_data = "https://export.uppmax.uu.se/naiss2023-23-3/workshops/workshop-scrnaseq"

path_results = "data/covid/results"
if not os.path.exists(path_results):
    os.makedirs(path_results, exist_ok=True)

# path_file = "data/covid/results/scanpy_covid_qc_dr_scanorama_cl.h5ad"
path_file = "data/covid/results/scanpy_covid_qc_dr_scanorama_cl.h5ad"
if fetch_data and not os.path.exists(path_file):
    urllib.request.urlretrieve(os.path.join(
        path_data, 'covid/results/scanpy_covid_qc_dr_scanorama_cl.h5ad'), path_file)

adata = sc.read_h5ad(path_file)
adata
AnnData object with n_obs × n_vars = 7222 × 19468
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'percent_mt2', 'n_counts', 'n_genes', 'percent_chrY', 'XIST-counts', 'S_score', 'G2M_score', 'phase', 'doublet_scores', 'predicted_doublets', 'doublet_info', 'leiden', 'leiden_0.4', 'leiden_0.6', 'leiden_1.0', 'leiden_1.4', 'kmeans5', 'kmeans10', 'kmeans15', 'hclust_5', 'hclust_10', 'hclust_15'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'dendrogram_leiden_0.6', 'doublet_info_colors', 'hclust_10_colors', 'hclust_15_colors', 'hclust_5_colors', 'hvg', 'kmeans10_colors', 'kmeans15_colors', 'kmeans5_colors', 'leiden', 'leiden_0.4', 'leiden_0.4_colors', 'leiden_0.6', 'leiden_0.6_colors', 'leiden_1.0', 'leiden_1.0_colors', 'leiden_1.4', 'leiden_1.4_colors', 'log1p', 'neighbors', 'pca', 'phase_colors', 'sample_colors', 'tsne', 'umap'
    obsm: 'Scanorama', 'X_pca', 'X_pca_harmony', 'X_tsne', 'X_tsne_bbknn', 'X_tsne_harmony', 'X_tsne_scanorama', 'X_tsne_uncorr', 'X_umap', 'X_umap_bbknn', 'X_umap_harmony', 'X_umap_scanorama', 'X_umap_uncorr'
    obsp: 'connectivities', 'distances'
adata.uns['log1p']['base']=None
print(adata.shape)
(7222, 19468)

Subset one patient.

adata = adata[adata.obs["sample"] == "ctrl_13",:]
print(adata.shape)
(1121, 19468)
adata.obs["leiden_0.6"].value_counts()
1     272
0     217
3     205
5     129
2     120
4      63
7      35
6      28
10     27
8      13
9      12
Name: leiden_0.6, dtype: int64

Some clusters have very few cells from this individual, so any cluster comparisons may be biased by this.

sc.pl.umap(
    adata, color=["leiden_0.6"], palette=sc.pl.palettes.default_20
)

1 Reference data

Load the reference data from scanpy.datasets. It is the annotated and processed pbmc3k dataset from 10x.

adata_ref = sc.datasets.pbmc3k_processed() 

adata_ref.obs['sample']='pbmc3k'

print(adata_ref.shape)
adata_ref.obs
(2638, 1838)
n_genes percent_mito n_counts louvain sample
index
AAACATACAACCAC-1 781 0.030178 2419.0 CD4 T cells pbmc3k
AAACATTGAGCTAC-1 1352 0.037936 4903.0 B cells pbmc3k
AAACATTGATCAGC-1 1131 0.008897 3147.0 CD4 T cells pbmc3k
AAACCGTGCTTCCG-1 960 0.017431 2639.0 CD14+ Monocytes pbmc3k
AAACCGTGTATGCG-1 522 0.012245 980.0 NK cells pbmc3k
... ... ... ... ... ...
TTTCGAACTCTCAT-1 1155 0.021104 3459.0 CD14+ Monocytes pbmc3k
TTTCTACTGAGGCA-1 1227 0.009294 3443.0 B cells pbmc3k
TTTCTACTTCCTCG-1 622 0.021971 1684.0 B cells pbmc3k
TTTGCATGAGAGGC-1 454 0.020548 1022.0 B cells pbmc3k
TTTGCATGCCTCAC-1 724 0.008065 1984.0 CD4 T cells pbmc3k

2638 rows × 5 columns

As you can see, the celltype annotation is in the metadata column louvain, so that is the column we will have to use for classification.

sc.pl.umap(adata_ref, color='louvain')

Make sure we have the same genes in both datset by taking the intersection

# before filtering genes, store the full matrix in raw.
adata.raw = adata
# also store the umap in a new slot as it will get overwritten
adata.obsm["X_umap_uncorr"] = adata.obsm["X_umap"]

print(adata_ref.shape[1])
print(adata.shape[1])
var_names = adata_ref.var_names.intersection(adata.var_names)
print(len(var_names))

adata_ref = adata_ref[:, var_names]
adata = adata[:, var_names]
1838
19468
1676

First we need to rerun pca and umap with the same gene set for both datasets.

sc.pp.pca(adata_ref)
sc.pp.neighbors(adata_ref)
sc.tl.umap(adata_ref)
sc.pl.umap(adata_ref, color='louvain')
computing PCA
    with n_comps=50
    finished (0:00:02)
computing neighbors
    using 'X_pca' with n_pcs = 50
    finished (0:00:06)
computing UMAP
    finished (0:00:04)

sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color='leiden_0.6')
computing PCA
    with n_comps=50
    finished (0:00:00)
computing neighbors
    using 'X_pca' with n_pcs = 50
    finished (0:00:00)
computing UMAP
    finished (0:00:01)

2 Integrate with scanorama

import scanorama

#subset the individual dataset to the same variable genes as in MNN-correct.
alldata = dict()
alldata['ctrl']=adata
alldata['ref']=adata_ref

#convert to list of AnnData objects
adatas = list(alldata.values())

# run scanorama.integrate
scanorama.integrate_scanpy(adatas, dimred = 50)
Found 1676 genes among all datasets
[[0.         0.49509367]
 [0.         0.        ]]
Processing datasets (0, 1)
# add in sample info
adata_ref.obs['sample']='pbmc3k'

# create a merged scanpy object and add in the scanorama 
adata_merged = alldata['ctrl'].concatenate(alldata['ref'], batch_key='sample', batch_categories=['ctrl','pbmc3k'])

embedding = np.concatenate([ad.obsm['X_scanorama'] for ad in adatas], axis=0)
adata_merged.obsm['Scanorama'] = embedding
#run  umap.
sc.pp.neighbors(adata_merged, n_pcs =50, use_rep = "Scanorama")
sc.tl.umap(adata_merged)
computing neighbors
    finished (0:00:00)
computing UMAP
    finished (0:00:05)
sc.pl.umap(adata_merged, color=["sample","louvain"])

2.1 Label transfer

Using the functions from the Spatial tutorial from Scanpy we will calculate normalized cosine distances between the two datasets and tranfer labels to the celltype with the highest scores.

from sklearn.metrics.pairwise import cosine_distances

distances = 1 - cosine_distances(
    adata_merged[adata_merged.obs['sample'] == "pbmc3k"].obsm["Scanorama"],
    adata_merged[adata_merged.obs['sample'] == "ctrl"].obsm["Scanorama"],
)

def label_transfer(dist, labels, index):
    lab = pd.get_dummies(labels)
    class_prob = lab.to_numpy().T @ dist
    norm = np.linalg.norm(class_prob, 2, axis=0)
    class_prob = class_prob / norm
    class_prob = (class_prob.T - class_prob.min(1)) / class_prob.ptp(1)
    # convert to df
    cp_df = pd.DataFrame(
        class_prob, columns=lab.columns
    )
    cp_df.index = index
    # classify as max score
    m = cp_df.idxmax(axis=1)
    
    return m

class_def = label_transfer(distances, adata_ref.obs.louvain, adata.obs.index)

# add to obs section of the original object
adata.obs['label_trans'] = class_def

sc.pl.umap(adata, color="label_trans")

# add to merged object.
adata_merged.obs["label_trans"] = pd.concat(
    [class_def, adata_ref.obs["louvain"]], axis=0
).tolist()

sc.pl.umap(adata_merged, color=["sample","louvain",'label_trans'])
#plot only ctrl cells.
sc.pl.umap(adata_merged[adata_merged.obs['sample']=='ctrl'], color='label_trans')

Now plot how many cells of each celltypes can be found in each cluster.

tmp = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['label_trans'], normalize='index')
tmp.plot.bar(stacked=True).legend(bbox_to_anchor=(1.8, 1),loc='upper right')

3 Ingest

Another method for celltype prediction is Ingest, for more information, please look at https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html

sc.tl.ingest(adata, adata_ref, obs='louvain')
sc.pl.umap(adata, color=['louvain','leiden_0.6'], wspace=0.5)
running ingest
    finished (0:00:18)

As you can see, ingest has created a new umap for us, so to get consistent plotting, lets revert back to the old one for further plotting:

adata.obsm["X_umap"] = adata.obsm["X_umap_uncorr"]

sc.pl.umap(adata, color=['louvain','leiden_0.6'], wspace=0.5)

Now plot how many cells of each celltypes can be found in each cluster.

tmp = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['louvain'], normalize='index')
tmp.plot.bar(stacked=True).legend(bbox_to_anchor=(1.8, 1),loc='upper right')

4 Celltypist

Celltypist provides pretrained models for classification for many different human tissues and celltypes. Here, we are following the steps of this tutorial, with some adaptations for this dataset. So please check out the tutorial for more detail.

import celltypist
from celltypist import models

# there are many different models, we will only download 2 of them for now.
models.download_models(force_update = False, model = 'Immune_All_Low.pkl')
models.download_models(force_update = False, model = 'Immune_All_High.pkl')

Now select the model you want to use and show the info:

model = models.Model.load(model = 'Immune_All_High.pkl')

model
CellTypist model with 32 cell types and 6639 features
    date: 2022-07-16 08:53:00.959521
    details: immune populations combined from 20 tissues of 18 studies
    source: https://doi.org/10.1126/science.abl5197
    version: v2
    cell types: B cells, B-cell lineage, ..., pDC precursor
    features: A1BG, A2M, ..., ZYX

To infer celltype labels to our cells, we first need to convert back to the full matrix. OBS! For celltypist we want to have log1p normalised expression to 10,000 counts per cell. Which we already have in adata.raw.X, check by summing up the data, it should sum to 10K.

adata = adata.raw.to_adata() 
adata.X.expm1().sum(axis = 1)[:10]
matrix([[10000.],
        [10000.],
        [10000.],
        [10000.],
        [10000.],
        [10000.],
        [10000.],
        [10000.],
        [10000.],
        [10000.]])
predictions = celltypist.annotate(adata, model = 'Immune_All_High.pkl', majority_voting = True)

predictions.predicted_labels
running Leiden clustering
    finished (0:00:00)
predicted_labels over_clustering majority_voting
AGGTCATGTGCGAACA-13-5 T cells 25 T cells
CCTATCGGTCCCTCAT-13-5 ILC 19 ILC
TCCTCCCTCGTTCATT-13-5 ILC 7 ILC
CAACCAATCATCTATC-13-5 ILC 8 ILC
TACGGTATCGGATTAC-13-5 T cells 18 T cells
... ... ... ...
TCCACCATCATAGCAC-13-5 T cells 0 T cells
GAGTTACAGTGAGTGC-13-5 T cells 5 T cells
ATCATTCAGGCTCACC-13-5 Monocytes 14 Monocytes
AGCCACGCAACCCTAA-13-5 T cells 4 T cells
CTACCTGGTCAGGAGT-13-5 ILC 19 ILC

1121 rows × 3 columns

The first column predicted_labels is the predictions made for each individual cell, while majority_voting is done for local subclusters, the clustering identities are in column over_clustering.

Now we convert the predictions to an anndata object.

adata = predictions.to_adata()

sc.pl.umap(adata, color = ['leiden_0.6', 'predicted_labels', 'majority_voting'], legend_loc = 'on data')

Task

Rerun predictions with Celltypist, but use another model, for instance Immune_All_High.pkl, or any other model you find relevant, you can find a list of models here. How do the results differ for you?

4.1 Celltypist custom model

We can also train our own model on any reference data that we want to use. In this case we will use the pbmc data in adata_ref to train a model.

Celltypist requires the data to be in the format of log1p normalised expression to 10,000 counts per cell, we can check if that is the case for the object we have:

adata_ref.raw.X.expm1().sum(axis = 1)[:10]
matrix([[2419.],
        [4903.],
        [3147.],
        [2639.],
        [ 980.],
        [2163.],
        [2175.],
        [2260.],
        [1275.],
        [1103.]], dtype=float32)

These should all sum up to 10K, which is not the case, probably since some genes were removed after normalizing. Wo we will have to start from the raw counts of that dataset instead. Before we selected the data pbmc3k_processed, but now we will instead use pbmc3k.

adata_ref2 = sc.datasets.pbmc3k() 
adata_ref2
AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'

This data is not annotated, so we will have to match the indices from the filtered and processed object. And add in the metadata with annotations.

adata_ref2 = adata_ref2[adata_ref.obs_names,:]
adata_ref2.obs = adata_ref.obs
adata_ref2
AnnData object with n_obs × n_vars = 2638 × 32738
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain', 'sample'
    var: 'gene_ids'

Now we can normalize the matrix:

sc.pp.normalize_total(adata_ref2, target_sum = 1e4)
sc.pp.log1p(adata_ref2)

# check the sums again
adata_ref2.X.expm1().sum(axis = 1)[:10]
normalizing counts per cell
    finished (0:00:00)
matrix([[10000.   ],
        [ 9999.999],
        [10000.   ],
        [ 9999.998],
        [ 9999.998],
        [10000.   ],
        [ 9999.999],
        [10000.   ],
        [10000.001],
        [10000.   ]], dtype=float32)

And finally train the model.

new_model = celltypist.train(adata_ref2, labels = 'louvain', n_jobs = 10, feature_selection = True)

Now we can run predictions on our data

predictions2 = celltypist.annotate(adata, model = new_model, majority_voting = True)
running Leiden clustering
    finished (0:00:00)

Instead of converting the predictions to anndata we will just add another column in the adata.obs with these new predictions since the column names from the previous celltypist runs with clash.

adata.obs["predicted_labels_ref"] = predictions2.predicted_labels["predicted_labels"]
adata.obs["majority_voting_ref"] = predictions2.predicted_labels["majority_voting"]
sc.pl.umap(adata, color = ['predicted_labels', 'majority_voting','predicted_labels_ref', 'majority_voting_ref'], legend_loc = 'on data', ncols=2)

5 Compare results

The predictions from ingest is stored in the column ‘louvain’ while we named the label transfer with scanorama as ‘predicted’

sc.pl.umap(adata, color=['louvain','label_trans','majority_voting', 'majority_voting_ref'], wspace=0.5, ncols=3)

As you can see, the main celltypes are generally the same, but there are clearly differences, especially with regards to the cells predicted as either ILC/NK/CD8 T-cells.

The only way to make sure which method you trust is to look at what genes the different celltypes express and use your biological knowledge to make decisions.

6 Gene set analysis

Another way of predicting celltypes is to use the differentially expressed genes per cluster and compare to lists of known cell marker genes. This requires a list of genes that you trust and that is relevant for the tissue you are working on.

You can either run it with a marker list from the ontology or a list of your choice as in the example below.

path_file = 'data/human_cell_markers.txt'
if not os.path.exists(path_file):
    urllib.request.urlretrieve(os.path.join(
        path_data, 'human_cell_markers.txt'), path_file)
df = pd.read_table(path_file)
df

print(df.shape)
(2868, 15)
# Filter for number of genes per celltype
df['nG'] = df.geneSymbol.str.split(",").str.len()

df = df[df['nG'] > 5]
df = df[df['nG'] < 100]
d = df[df['cancerType'] == "Normal"]
print(df.shape)
(445, 16)
df.index = df.cellName
gene_dict = df.geneSymbol.str.split(",").to_dict()
# run differential expression per cluster
sc.tl.rank_genes_groups(adata, 'leiden_0.6', method='wilcoxon', key_added = "wilcoxon")
ranking genes
    finished (0:00:01)
# do gene set overlap to the groups in the gene list and top 300 DEGs.
import gseapy

gsea_res = dict()
pred = dict()

for cl in adata.obs['leiden_0.6'].cat.categories.tolist():
    print(cl)
    glist = sc.get.rank_genes_groups_df(adata, group=cl, key='wilcoxon')[
        'names'].squeeze().str.strip().tolist()
    enr_res = gseapy.enrichr(gene_list=glist[:300],
                             organism='Human',
                             gene_sets=gene_dict,
                             background=adata.shape[1],
                             cutoff=1)
    if enr_res.results.shape[0] == 0:
        pred[cl] = "Unass"
    else:
        enr_res.results.sort_values(
            by="P-value", axis=0, ascending=True, inplace=True)
        print(enr_res.results.head(2))
        gsea_res[cl] = enr_res
        pred[cl] = enr_res.results["Term"][0]
0
    Gene_set                      Term Overlap   P-value  Adjusted P-value  \
1   gs_ind_0     Cancer stem-like cell     1/6  0.088981          0.203113   
21  gs_ind_0  Spermatogonial stem cell     1/6  0.088981          0.203113   

    Odds Ratio  Combined Score  Genes  
1    17.450448       42.218477  ANPEP  
21   17.450448       42.218477   BCL6  
1
   Gene_set                    Term Overlap   P-value  Adjusted P-value  \
0  gs_ind_0         Effector T cell    1/13  0.182865          0.255621   
2  gs_ind_0  Lake et al.Science.Ex1    1/14  0.195465          0.255621   

   Odds Ratio  Combined Score    Genes  
0    7.675392       13.040543    IL2RB  
2    7.106474       11.600406  CCDC88C  
2
   Gene_set                      Term Overlap   P-value  Adjusted P-value  \
3  gs_ind_0                  Monocyte     1/7  0.103024          0.206048   
4  gs_ind_0  Parietal progenitor cell     1/7  0.103024          0.206048   

   Odds Ratio  Combined Score  Genes  
3   14.764993       33.557802   CD52  
4   14.764993       33.557802  ANXA1  
3
   Gene_set                    Term Overlap   P-value  Adjusted P-value  \
6  gs_ind_0  Effector memory T cell     1/7  0.103024          0.226084   
8  gs_ind_0            Naive T cell     1/7  0.103024          0.226084   

   Odds Ratio  Combined Score Genes  
6   14.764993       33.557802  IL7R  
8   14.764993       33.557802  CCR7  
4
   Gene_set      Term Overlap   P-value  Adjusted P-value  Odds Ratio  \
0  gs_ind_0    B cell     1/6  0.088981          0.116851   17.450448   
4  gs_ind_0  Monocyte     1/7  0.103024          0.116851   14.764993   

   Combined Score Genes  
0       42.218477  CD19  
4       33.557802  CD52  
5
   Gene_set                             Term Overlap   P-value  \
2  gs_ind_0                       Macrophage     1/6  0.088981   
3  gs_ind_0  Monocyte derived dendritic cell     1/8  0.116851   

   Adjusted P-value  Odds Ratio  Combined Score  Genes  
2          0.233702   17.450448       42.218477   AIF1  
3          0.233702   12.795659       27.470422  ITGAX  
6
   Gene_set        Term Overlap   P-value  Adjusted P-value  Odds Ratio  \
3  gs_ind_0  Macrophage     1/6  0.088981          0.154536   17.450448   
4  gs_ind_0  Myeloblast     1/6  0.088981          0.154536   17.450448   

   Combined Score  Genes  
3       42.218477   AIF1  
4       42.218477  CSF3R  
7
   Gene_set      Term Overlap   P-value  Adjusted P-value  Odds Ratio  \
0  gs_ind_0    B cell     1/6  0.088981          0.155801   17.450448   
3  gs_ind_0  Monocyte     1/7  0.103024          0.155801   14.764993   

   Combined Score Genes  
0       42.218477  CD19  
3       33.557802  CD52  
8
   Gene_set                    Term Overlap   P-value  Adjusted P-value  \
1  gs_ind_0         Effector T cell    1/13  0.182865          0.365730   
0  gs_ind_0  Circulating fetal cell    1/40  0.463035          0.463035   

   Odds Ratio  Combined Score  Genes  
1    7.675392       13.040543  IL2RB  
0    2.425498        1.867518   ACTB  
9
   Gene_set                    Term Overlap   P-value  Adjusted P-value  \
2  gs_ind_0  Lake et al.Science.In4     1/6  0.088981          0.212585   
1  gs_ind_0             Immune cell     1/7  0.103024          0.212585   

   Odds Ratio  Combined Score    Genes  
2   17.450448       42.218477  ADAMTS2  
1   14.764993       33.557802     CD14  
10
   Gene_set        Term Overlap   P-value  Adjusted P-value  Odds Ratio  \
1  gs_ind_0  Macrophage     1/6  0.088981          0.171706   17.450448   
2  gs_ind_0  Myeloblast     1/6  0.088981          0.171706   17.450448   

   Combined Score  Genes  
1       42.218477   AIF1  
2       42.218477  CSF3R  
# prediction per cluster
pred
{'0': 'CD16+ dendritic cell',
 '1': 'Effector T cell',
 '2': 'CD8+ T cell',
 '3': 'Activated T cell',
 '4': 'B cell',
 '5': 'Circulating fetal cell',
 '6': 'Circulating fetal cell',
 '7': 'B cell',
 '8': 'Circulating fetal cell',
 '9': 'Circulating fetal cell',
 '10': 'Circulating fetal cell'}
prediction = [pred[x] for x in adata.obs['leiden_0.6']]
adata.obs["GS_overlap_pred"] = prediction

sc.pl.umap(adata, color='GS_overlap_pred')

Discuss

As you can see, it agrees to some extent with the predictions from the methods above, but there are clear differences, which do you think looks better?

7 Save data

We can finally save the object for use in future steps.

adata.write_h5ad('data/covid/results/scanpy_covid_qc_dr_int_cl_ct-ctrl13.h5ad')

8 Session info

Click here
sc.logging.print_versions()
-----
anndata     0.10.8
scanpy      1.10.3
-----
CoreFoundation              NA
Foundation                  NA
PIL                         10.4.0
PyObjCTools                 NA
annoy                       NA
anyio                       NA
appnope                     0.1.4
array_api_compat            1.8
arrow                       1.3.0
asciitree                   NA
asttokens                   NA
attr                        24.2.0
attrs                       24.2.0
babel                       2.14.0
brotli                      1.1.0
celltypist                  1.6.3
certifi                     2024.08.30
cffi                        1.17.1
charset_normalizer          3.3.2
cloudpickle                 3.0.0
colorama                    0.4.6
comm                        0.2.2
cycler                      0.12.1
cython_runtime              NA
cytoolz                     0.12.3
dask                        2024.9.0
dateutil                    2.9.0
debugpy                     1.8.5
decorator                   5.1.1
defusedxml                  0.7.1
exceptiongroup              1.2.2
executing                   2.1.0
fastjsonschema              NA
fbpca                       NA
fqdn                        NA
google                      NA
gseapy                      1.1.3
h5py                        3.11.0
idna                        3.10
igraph                      0.11.6
intervaltree                NA
ipykernel                   6.29.5
isoduration                 NA
jedi                        0.19.1
jinja2                      3.1.4
joblib                      1.4.2
json5                       0.9.25
jsonpointer                 3.0.0
jsonschema                  4.23.0
jsonschema_specifications   NA
jupyter_events              0.10.0
jupyter_server              2.14.2
jupyterlab_server           2.27.3
kiwisolver                  1.4.7
legacy_api_wrap             NA
leidenalg                   0.10.2
llvmlite                    0.43.0
markupsafe                  2.1.5
matplotlib                  3.9.2
matplotlib_inline           0.1.7
mpl_toolkits                NA
msgpack                     1.1.0
natsort                     8.4.0
nbformat                    5.10.4
numba                       0.60.0
numcodecs                   0.13.0
numpy                       1.26.4
objc                        10.3.1
overrides                   NA
packaging                   24.1
pandas                      1.5.3
parso                       0.8.4
patsy                       0.5.6
pickleshare                 0.7.5
platformdirs                4.3.6
prometheus_client           NA
prompt_toolkit              3.0.47
psutil                      6.0.0
pure_eval                   0.2.3
pycparser                   2.22
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.9.5
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.18.0
pynndescent                 0.5.13
pyparsing                   3.1.4
pythonjsonlogger            NA
pytz                        2024.2
referencing                 NA
requests                    2.32.3
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rpds                        NA
scanorama                   1.7.4
scipy                       1.14.1
send2trash                  NA
session_info                1.0.0
six                         1.16.0
sklearn                     1.5.2
sniffio                     1.3.1
socks                       1.7.1
sortedcontainers            2.4.0
sparse                      0.15.4
sphinxcontrib               NA
stack_data                  0.6.2
statsmodels                 0.14.3
texttable                   1.7.0
threadpoolctl               3.5.0
tlz                         0.12.3
toolz                       0.12.1
torch                       2.4.0.post101
torchgen                    NA
tornado                     6.4.1
tqdm                        4.66.5
traitlets                   5.14.3
typing_extensions           NA
umap                        0.5.6
uri_template                NA
urllib3                     2.2.3
wcwidth                     0.2.13
webcolors                   24.8.0
websocket                   1.8.0
yaml                        6.0.2
zarr                        2.18.3
zipp                        NA
zmq                         26.2.0
zoneinfo                    NA
zstandard                   0.23.0
-----
IPython             8.27.0
jupyter_client      8.6.3
jupyter_core        5.7.2
jupyterlab          4.2.5
notebook            7.2.2
-----
Python 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
macOS-14.7-x86_64-i386-64bit
-----
Session information updated at 2024-10-22 09:56