GRN#
- class regdiffusion.GRN(adj_matrix: ndarray, gene_names: ndarray, tf_names: ndarray | None = None, top_gene_percentile: int | None = None)[source]#
A Object to save and analyze gene regulatory network
A GRN object includes the adjacency matrix between transcriptional factors and target genes. In many cases, when TFs are not specified, we have a square-shaped adjacency matrix. We expected the adjacency matrix to hold predicted weights/probabilities for the edges (float).
To create a GRN object, you need at least two things: the adjacency matrix and the corresponding gene names. You can further specify the TF names if your adjacency matrix is not a square matrix.
You can save a GRN object to the HDF5 format using the .to_hdf5 method in the GRN class. You can load a saved GRN object using the read_hdf5 function in this package.
If your adjacency matrix is very large and space is a concern, you may consider provide a value for top_gene_percentile. This value will calculate the a cutoff point for the values in the adjacency matrix. Every value whose absolute value is below this cutoff point will be set to zero. Later on, we can save the data as a sparse matrix to reduce the space requirement.
The GRN object comes with many useful methods to analyze and visualize the network. Top top-level interfaces includes .extract_node_2hop_neighborhood and .visualize_local_neighborhood.
- Parameters:
adj_matrix (np.ndarray) – A 2D adjacency matrix to save.
gene_names (np.ndarray) – A 1D numpy array with all the target gene
names.
tf_names (np.ndarray, optional) – A 1D numpy array with all the TF gene
names.
top_gene_percentile (int) – If this value is set, only the top k absolute
values (values in the adjacency matrix will be kept. All the other)
zero. (will be set to)
- extract_local_neighborhood(genes: str | List[str], k: int = 20, hops: str = '2.5') DataFrame [source]#
Generate a pandas dataframe for the 2.5 or 1.5 hop local neighborhood around selected gene(s). “2.5 hop local neighborhood” includes all the nodes and edges reachable by a 2-hop search from the selected genes and the edges connecting all the 2-hop nodes. “1.5 hop local neighborhood” is defined in a similar way but smaller.
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all
hops (str) – Number of hops to explore. We can either do a “2.5” or
"2.5". ("1.5" hop travesal around selected genes. Default is)
- extract_node_neighbors(gene: str, k: int = 20) DataFrame [source]#
Generate a pandas dataframe for the top direct neighbors of selected gene. The dataframe will be sorted by the absolute weight of edges.
The dataframe will have 3 columns: source, target, weight.
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all
- extract_node_neighbors_as_indices(gene: str, k: int = 20) Dict [source]#
Generate a dictionary for the top direct neighbors of selected gene. It is slightly faster than the dataframe version.
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all
- extract_node_sources(gene: str, k: int = 20) DataFrame [source]#
Generate a pandas dataframe for the top direct edge pointing to the selected gene.
The dataframe will have 3 columns: source, target, weight.
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all
- extract_node_sources_as_indices(gene: str, k: int = 20) Dict [source]#
Generate a dictionary for the top direct edge pointing to the selected gene. It is slightly faster than the dataframe version.
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all
- extract_node_targets(gene: str, k: int = 20) DataFrame [source]#
Generate a pandas dataframe for the top direct edge pointing from the selected gene.
The dataframe will have 3 columns: source, target, weight.
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all
- extract_node_targets_as_indices(gene: str, k: int = 20) Dict [source]#
Generate a dictionary for the top direct edge pointing from the selected gene. It is slightly faster than the dataframe version.
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all
- get_edgelist(k: int = 20, workers: int = 2) DataFrame [source]#
Simply generate a dataframe to hold the edge list.
The dataframe will have 3 columns: source, target, weight.
- Parameters:
k (int) – Top-k edges to inspect on each node. If k=-1, export all.
workers (int) – Number of concurrent workers. Default is 2.
- to_hdf5(file_path: str, as_sparse: bool = False)[source]#
Save GRN into a HDF5 file. You have the option to save as a sparse matrix. This option is preferred when most of the values in the adjacency matrix are zeros.
- Parameters:
file_path (str) – File path to save.
as_sparse (bool) – Whether to save as sparse matrix
- visualize_local_neighborhood(genes: str | List[str], k: int = 20, hops: str = '2.5', edge_widths: List[int] = [2, 1, 0.5], plot_engine: str = 'pyvis', *args, **kwargs)[source]#
Generate a vis.js network visualization of the local neighborhood (2-hop) around selected gene(s).
- Parameters:
genes (str, List(str)) – A single gene or a list of genes to inspect.
k (int) – Top-k edges to inspect on each node. If k=-1, export all.
hops (str) – Number of hops of the neighborhood to explore. Default
"2.5". (is)
edge_widths (List) – The widths for edges for different edge width
levels.
plot_engine (str) – Choose which network plot engine to use. Default
"pyvis". (is)
**kwargs – Keyword arguments to be passed to
plot_pyvis
.