pdb2graph module

pdb2graph.PDB_df_to_G(PDB_df, d_cut=8.0)[source]

Converts a dataframe containing alpha carbon / atom coordinates (in Angstroms) into a graph, G(V,E).

Each vertex, V, is an alpha carbon / atom. Two alpha carbons / atoms with a distance (in Angstroms) less than a cutoff, d_cut, are connected by an edge, E.

Parameters:
  • PDB_df ([Pandas dataframe object]) – a dataframe containing alpha carbon / atom coordinate columns labeled: ‘x’, ‘y’, and ‘z’

  • d_cut ([float]) – Threshold for two alpha carbons / atoms to be connected (in Angstroms) by an edge. Defaults to 8.0

Returns:

protein structure network graph, G(V,E)

Return type:

G ([networkX graph object])

pdb2graph.PDB_to_df(pdb_code, fname, pdbx, offset, CA_only=1)[source]

Loads a PDB (or PDBx) file and stores the atom coordinates and residue name and number into a dataframe.

Note: if the PDB file has more than one model, the first model is chosen.

Parameters:
  • pdb_code ([str]) – PDB ID / label for the protein of interest

  • fname ([str]) – filename for the protein of interest. Can be PDB or PDBx format

  • pdbx ([int]) – Set=1 if using the newer PDBx file format.

  • offest ([int]) – index offset incase first residue ID in PDB file is not the first physical residue (e.g. PDB starts at 5th residue).

  • CA_only ([int]) – Set=1 [default] if using only alpha carbons, else all atoms are used.

Returns:

dataframe containing every atom’s x,y,z coord and serial number

Return type:

[Pandas dataframe object]

pdb2graph.get_hydrophobicity(name, warn=False)[source]

Gets hydophobicity based on amino acid name.

ref: https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/midas/hydrophob.html

note: returns NaN if input name is invalid!

Parameters:
  • name ([str]) – Amino acid name

  • warn ([bool]) – if True, print a warning when hydrophobicity can’t be determined

Returns:

hydrophobicity

Return type:

[float]

pdb2graph.main(args)[source]

Takes a .pdb(x) file, converts it into a graph, and saves the atom coordinates to .csv and graph as .gexf

Parameters:

args ([argument parser object]) –

  • args.pdb_code: PDB id / protein name

  • args.fname: PDB/PDBx filename

  • args.d_cut: Alpha Carbon / atom pairwise contact distance cutoff (in Angstroms)

  • args.o: PDB residue index offset integer. Default is 0.

  • args.pdbx: set=1 to use pdbx file parser

  • args.CA_only: set=1 to use only alpha carbons (0 for all atoms)

pdb2graph.plot_FA_and_CA_coordinates(FA_xyz, CA_xyz, figsize=5)[source]

creates a 3D scatter plot containing CA and FA atom coordinates

Parameters:
  • FA_xyz ([numpy array]) – A[0] = [x0,y0,z0] for all atom coordinate data

  • CA_xyz ([numpy array]) – A[0] = [x0,y0,z0] for alpha carbon only coordinate data

  • figsize (int, optional) – size of figure (figsize x figsize). Defaults to 5.

Returns:

3d scatter plot figure

Return type:

[matplotlib figure object]

pdb2graph.plot_coordinates(xyz_data, figsize=5)[source]

creates a 3D scatter plot containing the xyz data

Parameters:
  • xyz_data ([numpy array]) – A[0] = [x0,y0,z0]

  • figsize (int, optional) – size of figure (figsize x figsize). Defaults to 5.

Returns:

3d scatter plot figure

Return type:

[matplotlib figure object]

pdb2graph.save_data(df, G, df_name, G_name)[source]

Convenience function that stores dataframe as .csv and graph as .gexf file

Parameters:
  • df ([Pandas dataframe object]) – dataframe to save

  • G ([NetworkX graph object]) – graph to save

  • df_name ([str]) – output filename for dataframe .csv

  • G_name ([str]) – output filename for graph .gexf

pdb2graph.save_data_at_this_folder(data_path, df, G, df_name, G_name)[source]

Convenience function that stores dataframe as .csv and graph as .gexf file

Parameters:
  • data_path ([str] or [Path]) – output directory path

  • df ([Pandas dataframe object]) – dataframe to save

  • G ([NetworkX graph object]) – graph to save

  • df_name ([str]) – output filename for dataframe .csv

  • G_name ([str]) – output filename for graph .gexf