"CATHDatabase" (Service Connection)
Connecting & Authenticating
Requests
BioMolecule Structures
"BioMolecule" — get BioMolecule structure of a protein domain
| "CATHDomainID" | None | CATH Domain ID |
Protein Domain Summary
"DomainDetails" — get information about a protein domain
| "CATHDomainID" | None | CATH Domain ID |
Protein Superfamilies
"Superfamilies" — get a list of all the protein superfamilies in the database
Protein Superfamily Details
"SuperfamilyDetails" — get information about a protein superfamily
| "CATHSuperfamilyID" | None | CATH superfamily ID |
Protein Superfamily Domains
"SuperfamilyDomains" — get a list of all the protein domains in a superfamily
| "CATHSuperfamilyID" | None | CATH superfamily ID |
Protein PDB IDs
"PDBStructureIDs" — get a list of all the PDB structure IDs of proteins that contain at least one domain in the superfamily
| "CATHSuperfamilyID" | None | CATH superfamily ID |
Graph of a Protein Superfamily
"SuperfamilyGraph" — get the Graph of organization and hierarchy of domains in a protein superfamily
| "CATHSuperfamilyID" | None | CATH superfamily ID |
Hierarchy of a Protein Superfamily
"SuperfamilyHierarchy" — get information about domains at different levels of hierarchy of a protein superfamily
| "CATHSuperfamilyID" | None | CATH superfamily ID | |
| "Level" | 9 | level of hierarchy |
Functional Families
"FunctionalFamilies" — get all the functional families in a protein superfamily
| "CATHSuperfamilyID" | None | CATH superfamily ID |
Functional Family Details
"FunctionalFamilyDetails" — get detailed information about a functional family
| "CATHSuperfamilyID" | None | CATH superfamily ID | |
| "FunctionalFamilyID" | None | functional family ID |
Functional Family Alignment
"FunctionalFamilyAlignment" — get multiple sequence alignment of domains in a given functional family
| "CATHSuperfamilyID" | None | CATH superfamily ID | |
| "FunctionalFamilyID" | None | functional family ID |
Information about Enzymes
"EnzymeInformation" — get information about the enzymes associated with a superfamily
| "CATHSuperfamilyID" | None | CATH superfamily ID |
Custom Domain Search
"DomainSearch" — get list of domains associated with a specific amino acid sequence
| "BioSequence" | None | a BioSequence of type "Peptide" |
Notes & Issues
Examples
open all close allBasic Examples (3)
Create a new connection to the CATH database:
cathdb = ServiceConnect["CATHDatabase"]Get the BioMolecule corresponding to a protein domain:
ServiceExecute[cathdb, "BioMolecule", <|"CATHDomainID" -> ExternalIdentifier["CATHDomainID", "3bd3A00"]|>]Visualize the BioMolecule:
BioMoleculePlot3D[%]Get details about the specific domain, with DomainID given as a string. The first four letters ("3bd3") of the domain ID corresponds to the PDB ID of the protein, followed by the chain ID ("A"), followed by the domain number in the chain ("00"):
ServiceExecute["CATHDatabase", "DomainDetails", <|"CATHDomainID" -> "3bd3A00"|>]The "BioSequence" is based on the residues observed in the ATOM records of the PDB file for this structure. The "COMBSSequence" is based on the residues observed in the SEQRES records of the PDB file for this structure. This can sometimes contain extra residues that were not able to be resolved in the 3D coordinates. Here is one such example:
ServiceExecute["CATHDatabase", "DomainDetails", <|"CATHDomainID" -> ExternalIdentifier["CATHDomainID", "1a22A00"]|>]Get details about the immunoglobulin superfamily:
ServiceExecute["CATHDatabase", "SuperfamilyDetails", <|"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "2.60.40.10"]|>]Scope (8)
Get a list of all protein superfamilies in the CATH database:
ServiceExecute["CATHDatabase", "Superfamilies", <||>]Get a list of all protein domains belonging to a specific superfamily. CATHSOLID is the further classification of a protein superfamily based on sequence similarity and each domain has an unique CATHSOLID ID:
ServiceExecute["CATHDatabase", "SuperfamilyDomains", <|"CATHSuperfamilyID" -> "1.10.8.10"|>]Get a list of all the PDB structure IDs that contains at least one protein domain in a superfamily:
ServiceExecute["CATHDatabase", "PDBStructureIDs", <|"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"]|>]//ShallowGet information about domains at different levels of hierarchy for a protein superfamily:
ServiceExecute["CATHDatabase", "SuperfamilyHierarchy", <|"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"]|>]Define the level for which you want the domain information. The minimum level is 4 for CATH classification and the maximum level is 9 for all the way up to CATHSOLID:
ServiceExecute["CATHDatabase", "SuperfamilyHierarchy", <|"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"], "Level" -> 6|>]Get all the functional families (domains having similar function) in a protein superfamily:
ServiceExecute["CATHDatabase", "FunctionalFamilies", <|"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"]|>]Get detailed information about a functional family:
ServiceExecute["CATHDatabase", "FunctionalFamilyDetails", <|"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"], "FunctionalFamilyID" -> 1|>]Get the multiple sequence alignment for domains related to a functional family:
ServiceExecute["CATHDatabase", "FunctionalFamilyAlignment", {"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"], "FunctionalFamilyID" -> 2}]Get information about the enzymes associated with a superfamily:
ServiceExecute["CATHDatabase", "EnzymeInformation", {"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"]}]Generalizations & Extensions (2)
Obtain a Graph for a superfamily where domains are grouped by their CATHSOLID IDs:
graph = ServiceExecute["CATHDatabase", "SuperfamilyGraph", <|"CATHSuperfamilyID" -> ExternalIdentifier["CATHSuperfamilyID", "1.10.8.10"]|>]Here is a neater way to see the organization and hierarchy based on CATHSOLID classification:
vertexStyle = Thread[VertexList[graph] -> ColorData["Rainbow"] /@ Rescale[GraphDistance[graph, #, "1"]& /@ VertexList[graph]]];
colors = (ColorData["Rainbow"] /@ Rescale@Range@9)[[-6 ;; -1]];
labels = {"Level 1: Same superfamily (CATH)",
"Level 2: 35% sequence similarity (CATHS)",
"Level 3: 60% sequence similarity (CATHSO)", "Level 4: 95% sequence similarity (CATHSOL)", "Level 5: 100% sequence similarity (CATHSOLI)", "Level 6: 100% sequence similarity, unique domains (CATHSOLID)"};
legend = SwatchLegend[colors, labels, LegendMarkers -> "Bubble"];
Legended[Graph[graph, VertexLabels -> Placed["Name", Tooltip], VertexStyle -> vertexStyle, GraphLayout -> "RadialEmbedding"], Placed[legend, Bottom]]Get information about the domains associated with a specific amino acid sequence. Here, search for a sequence found in a protein in the ESM Metagenomic Atlas that is similar to known protein structures in nature:
ServiceExecute["CATHDatabase", "DomainSearch", {"BioSequence" -> ServiceExecute["ESMAtlas", "Sequence", {"MGnifyID" -> "MGYP003566250623"}]}]See how the first domain obtained from CATH Database search aligns with the structure in the ESM Metagenomic Atlas. Obtain the structures of the domain and from the ESM Metagenomic Atlas:
{biomolDomain, biomolESM} = {ServiceExecute["CATHDatabase", "BioMolecule", {"CATHDomainID" -> %[[1, "ExampleCATHDomainID"]]}], ServiceExecute["ESMAtlas", "PredictedStructure", {"MGnifyID" -> ExternalIdentifier["MGnifyProteinID", "MGYP003566250623"]}]}BioMoleculePlot3D /@ {biomolESM, biomolDomain}bmalign = BioMoleculeAlign[biomolESM, biomolDomain]BioMoleculePlot3D[{biomolESM, bmalign}, PlotLegends -> {"ESMAtlas", "CATH protein domain"}]Neat Examples (1)
Visualize a domain in a protein using the information from the output of "DomainDetails" request:
domainPlot[id_] := Module[
{domainDetails, resIDs, biomol, chains, chainsAndResidues},
domainDetails = ServiceExecute["CATHDatabase", "DomainDetails", {"CATHDomainID" -> id}];
resIDs = Normal[domainDetails["Residues", All, "ResidueID"]];
biomol = BioMolecule[domainDetails["PDBStructureID"]];
chains = Keys@Select[biomol["AuthorChainIDs"], # === domainDetails["PDBSegments", 1, 1]&];
chainsAndResidues = Cases[Flatten[KeyValueMap[Thread @* List, biomol["ResidueIDs"][[chains]]], 1], {_, Alternatives@@resIDs}];
BioMoleculePlot3D[biomol, ColorRules -> Append[Thread[chainsAndResidues -> StandardRed], _ -> StandardGray]
]
]Here is the N terminal domain of glutamine synthetase highlighted in red:
domainPlot["4acfA01"]And here is another domain in the heavy chain of human immunoglobulin G1:
domainPlot[ExternalIdentifier["CATHDomainID", "1hzhH01"]]See Also
ServiceExecute ▪ ServiceConnect ▪ BioMolecule ▪ BioSequence
Service Connections: RCSBProteinDataBank ▪ AlphaFoldDatabase ▪ ESMAtlas ▪ UniProt ▪ EncyclopediaOfDomains