graph2class module¶
- graph2class.calc_bc(G, return_dict)[source]¶
Parallel subprocess function to calculate the betweenness centrality.
- Parameters:
G ([networkx graph]) – graph
- Returns:
betweeness centrality dictionary from multiple processes
- Return type:
[dictionary]
- graph2class.calc_graph_features(G)[source]¶
Calculates several graph network features. If not connected, largest subgraph is used. Uses multiprocessing for parallelsim.
- Parameters:
G ([networkx graph]) – graph
- Returns:
features dictionary
- Return type:
[dictionary]
- graph2class.calc_shortest_pthlen(G, return_dict)[source]¶
Parallel subprocess function to calculate the average shortest path length.
- Parameters:
G ([networkx graph]) – graph
- Returns:
average shortest path length dictionary from multiple processes
- Return type:
[dictionary]
- graph2class.calc_similarity_score(G1_dict, G2_dict, feature_list)[source]¶
calculates the similarity score of two graphs
- Parameters:
G1_dict ([dict] or [Pandas datafrane]) – graph 1 features dictionary or dataframe. must be able to use a key to access values
G2_dict ([dict] or [Pandas datafrane]) – graph 2 features dictionary or dataframe. must be able to use a key to access values
features_list ([list]) – list of graph features to compare. must be keys in graph features dictionary (above)
- Returns:
similarity score (0,1) where 1 is an identical graph.
- Return type:
[float]
- graph2class.classify_graphs(class_file_list, sample_file_list, feature_list)[source]¶
Classifies a similarity score from a list of Class and Sample graphs
- Parameters:
class_file_list ([list]) – list of control/reference graph files (classes)
sample_file_list ([list]) – list of non-control/non-reference graph files (samples)
feature_list ([list]) – list of which features to use for similarity score. must be a valid key to the graph features dictionary/dataframe (above)
- Returns:
each colummn is a class and each row is the similarity score of the sampled graph
- Return type:
[pandas dataframe]
- graph2class.process_graphs(graph_fnames)[source]¶
take a list of graph files, calculate their features, and return as a dataframe
- Parameters:
graph_fnames ([list]) – list of graph filenames to process
- Returns:
dataframe containing graph features for each graph in filename list
- Return type:
[pandas dataframe]
- graph2class.process_similarity_df(class_similarity_df)[source]¶
Generates y_true and y_pred based on the similarity score dataframe.
y_true is a list where each index is a class and each value is the class value. E.g., class 1 is y_true[1] = 1, class 2 is y_true[2]=2, etc.
y_pred is a list where each index is a sample and each value is the maximum similarity score for that sample.
Note: This assumes the correct classification is along the diagonal of the similarity matrix/dataframe.
- Parameters:
class_similarity_df ([pandas dataframe]) – each column is a class graph and each row is a sample graph. A_ij is the similarity score between graphs i and j. The exception is one column ‘name’ which contains the names of the sampled graphs for each row.
- Returns:
y_true, y_pred
- Return type:
([tuple of lists])
- graph2class.similarity_measure(x1, x2)[source]¶
calculates the similarity between two feature values. similarity = 1 - the relative distance between features (x1 and x2)
- Parameters:
x1 ([float]) – feature from graph 1 (must range between 0,1)
x2 ([float]) – feature from graph 2 (must range between 0,1)
- Returns:
returns the relative similarity between 2 features
- Return type:
[float]