graph2class module

graph2class.calc_bc(G, return_dict)[source]

Parallel subprocess function to calculate the betweenness centrality.

Parameters:

G ([networkx graph]) – graph

Returns:

betweeness centrality dictionary from multiple processes

Return type:

[dictionary]

graph2class.calc_graph_features(G)[source]

Calculates several graph network features. If not connected, largest subgraph is used. Uses multiprocessing for parallelsim.

Parameters:

G ([networkx graph]) – graph

Returns:

features dictionary

Return type:

[dictionary]

graph2class.calc_shortest_pthlen(G, return_dict)[source]

Parallel subprocess function to calculate the average shortest path length.

Parameters:

G ([networkx graph]) – graph

Returns:

average shortest path length dictionary from multiple processes

Return type:

[dictionary]

graph2class.calc_similarity_score(G1_dict, G2_dict, feature_list)[source]

calculates the similarity score of two graphs

Parameters:
  • G1_dict ([dict] or [Pandas datafrane]) – graph 1 features dictionary or dataframe. must be able to use a key to access values

  • G2_dict ([dict] or [Pandas datafrane]) – graph 2 features dictionary or dataframe. must be able to use a key to access values

  • features_list ([list]) – list of graph features to compare. must be keys in graph features dictionary (above)

Returns:

similarity score (0,1) where 1 is an identical graph.

Return type:

[float]

graph2class.classify_graphs(class_file_list, sample_file_list, feature_list)[source]

Classifies a similarity score from a list of Class and Sample graphs

Parameters:
  • class_file_list ([list]) – list of control/reference graph files (classes)

  • sample_file_list ([list]) – list of non-control/non-reference graph files (samples)

  • feature_list ([list]) – list of which features to use for similarity score. must be a valid key to the graph features dictionary/dataframe (above)

Returns:

each colummn is a class and each row is the similarity score of the sampled graph

Return type:

[pandas dataframe]

graph2class.process_graphs(graph_fnames)[source]

take a list of graph files, calculate their features, and return as a dataframe

Parameters:

graph_fnames ([list]) – list of graph filenames to process

Returns:

dataframe containing graph features for each graph in filename list

Return type:

[pandas dataframe]

graph2class.process_similarity_df(class_similarity_df)[source]

Generates y_true and y_pred based on the similarity score dataframe.

y_true is a list where each index is a class and each value is the class value. E.g., class 1 is y_true[1] = 1, class 2 is y_true[2]=2, etc.

y_pred is a list where each index is a sample and each value is the maximum similarity score for that sample.

Note: This assumes the correct classification is along the diagonal of the similarity matrix/dataframe.

ref: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html#sklearn.metrics.classification_report

Parameters:

class_similarity_df ([pandas dataframe]) – each column is a class graph and each row is a sample graph. A_ij is the similarity score between graphs i and j. The exception is one column ‘name’ which contains the names of the sampled graphs for each row.

Returns:

y_true, y_pred

Return type:

([tuple of lists])

graph2class.similarity_measure(x1, x2)[source]

calculates the similarity between two feature values. similarity = 1 - the relative distance between features (x1 and x2)

Parameters:
  • x1 ([float]) – feature from graph 1 (must range between 0,1)

  • x2 ([float]) – feature from graph 2 (must range between 0,1)

Returns:

returns the relative similarity between 2 features

Return type:

[float]