pdb2graph module¶

pdb2graph.PDB_df_to_G(PDB_df, d_cut=8.0)[source]¶

Converts a dataframe containing alpha carbon / atom coordinates (in Angstroms) into a graph, G(V,E).

Each vertex, V, is an alpha carbon / atom. Two alpha carbons / atoms with a distance (in Angstroms) less than a cutoff, d_cut, are connected by an edge, E.

Parameters:

PDB_df ([Pandas dataframe object]) – a dataframe containing alpha carbon / atom coordinate columns labeled: ‘x’, ‘y’, and ‘z’
d_cut ([float]) – Threshold for two alpha carbons / atoms to be connected (in Angstroms) by an edge. Defaults to 8.0

Returns:

protein structure network graph, G(V,E)

Return type:

G ([networkX graph object])

pdb2graph.PDB_to_df(pdb_code, fname, pdbx, offset, CA_only=1)[source]¶

Loads a PDB (or PDBx) file and stores the atom coordinates and residue name and number into a dataframe.

Note: if the PDB file has more than one model, the first model is chosen.

Parameters:

pdb_code ([str]) – PDB ID / label for the protein of interest
fname ([str]) – filename for the protein of interest. Can be PDB or PDBx format
pdbx ([int]) – Set=1 if using the newer PDBx file format.
offest ([int]) – index offset incase first residue ID in PDB file is not the first physical residue (e.g. PDB starts at 5th residue).
CA_only ([int]) – Set=1 [default] if using only alpha carbons, else all atoms are used.

Returns:

dataframe containing every atom’s x,y,z coord and serial number

Return type:

[Pandas dataframe object]

pdb2graph.get_hydrophobicity(name, warn=False)[source]¶

Gets hydophobicity based on amino acid name.

ref: https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/midas/hydrophob.html

note: returns NaN if input name is invalid!

Parameters:

name ([str]) – Amino acid name
warn ([bool]) – if True, print a warning when hydrophobicity can’t be determined

Returns:

hydrophobicity

Return type:

[float]

pdb2graph.main(args)[source]¶

Takes a .pdb(x) file, converts it into a graph, and saves the atom coordinates to .csv and graph as .gexf

Parameters:

args ([argument parser object]) –

args.pdb_code: PDB id / protein name
args.fname: PDB/PDBx filename
args.d_cut: Alpha Carbon / atom pairwise contact distance cutoff (in Angstroms)
args.o: PDB residue index offset integer. Default is 0.
args.pdbx: set=1 to use pdbx file parser
args.CA_only: set=1 to use only alpha carbons (0 for all atoms)

pdb2graph.plot_FA_and_CA_coordinates(FA_xyz, CA_xyz, figsize=5)[source]¶

creates a 3D scatter plot containing CA and FA atom coordinates

Parameters:

FA_xyz ([numpy array]) – A[0] = [x0,y0,z0] for all atom coordinate data
CA_xyz ([numpy array]) – A[0] = [x0,y0,z0] for alpha carbon only coordinate data
figsize (int, optional) – size of figure (figsize x figsize). Defaults to 5.

Returns:

3d scatter plot figure

Return type:

[matplotlib figure object]

pdb2graph.plot_coordinates(xyz_data, figsize=5)[source]¶

creates a 3D scatter plot containing the xyz data

Parameters:

xyz_data ([numpy array]) – A[0] = [x0,y0,z0]
figsize (int, optional) – size of figure (figsize x figsize). Defaults to 5.

Returns:

3d scatter plot figure

Return type:

[matplotlib figure object]

pdb2graph.save_data(df, G, df_name, G_name)[source]¶

Convenience function that stores dataframe as .csv and graph as .gexf file

Parameters:

df ([Pandas dataframe object]) – dataframe to save
G ([NetworkX graph object]) – graph to save
df_name ([str]) – output filename for dataframe .csv
G_name ([str]) – output filename for graph .gexf

pdb2graph.save_data_at_this_folder(data_path, df, G, df_name, G_name)[source]¶

Convenience function that stores dataframe as .csv and graph as .gexf file

Parameters:

data_path ([str] or [Path]) – output directory path
df ([Pandas dataframe object]) – dataframe to save
G ([NetworkX graph object]) – graph to save
df_name ([str]) – output filename for dataframe .csv
G_name ([str]) – output filename for graph .gexf

pdb2graph module¶

grip-tomo

Navigation

Related Topics