-
See Also
- ServiceExecute
- ServiceConnect
- BioMolecule
- BioSequence
-
- Service Connections
- RCSBProteinDataBank
- AlphaFoldDatabase
- UniProt
-
-
See Also
- ServiceExecute
- ServiceConnect
- BioMolecule
- BioSequence
-
- Service Connections
- RCSBProteinDataBank
- AlphaFoldDatabase
- UniProt
-
See Also
"ESMAtlas" (Service Connection)
Connecting & Authenticating
Requests
BioMolecule Structures
"FoldSequence" — generate 3D coordinates for a peptide sequence
| "BioSequence" | None | sequence to be folded, either a BioSequence or string |
"PredictedStructure" — get the predicted structure of a BioMolecule from the ESM Metagenomic Atlas database
| "MGnifyID" | None | MGnifyID of the predicted structure |
Properties of Predicted Structures
"StructureConfidencePrediction" — get a dataset with values indicating the confidence behind the 3D embedding of the sequence.
| "MGnifyID" | None | MGnifyID of the predicted structure |
"Sequence" — returns the BioSequence of a biomolecule from the ESM Metagenomic Atlas using the "MGnifyID"
| "MGnifyID" | None | MGnifyID of the predicted structure |
"SequenceEmbedding" — returns the 2560-dimensional embedding vector after averaging over the final layer activations of the ESM2 model over the sequence length.
| "MGnifyID" | None | MGnifyID of the predicted structure |
Examples
open all close allBasic Examples (1)
Create a new connection by launching an authentication dialog:
esm = ServiceConnect["ESMAtlas"]Fold a BioSequence of length 400 residues or less and obtain the predicted BioMolecule structure:
esm["FoldSequence", <|"BioSequence" -> BioSequence["Peptide", "MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTG\
EGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLPARTVETRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPP\
DESGPGCMNCKCVIS", {}]|>]Visualize the structure of the BioMolecule:
BioMoleculePlot3D[%]Scope (5)
Get the predicted structure of a biomolecule from the ESM Metagenomic Atlas by providing the "MGnifyID":
ServiceExecute["ESMAtlas", "PredictedStructure", <|"MGnifyID" -> ExternalIdentifier["MGnifyProteinID", "MGYP002537940442"]|>]Visualize the structure obtained from the database:
BioMoleculePlot3D[%]Get the predicted structure of a biomolecule from the ESM Metagenomic Atlas by providing the "MGnifyID" as a string:
ServiceExecute["ESMAtlas", "PredictedStructure", <|"MGnifyID" -> "MGYP001331810506"|>]Visualize the structure obtained from the database:
BioMoleculePlot3D[%]Get different values of prediction confidence of structures in the ESM Metagenomic Atlas:
confidence = ServiceExecute["ESMAtlas", "StructureConfidencePrediction", <|"MGnifyID" -> ExternalIdentifier["MGnifyProteinID", "MGYP002537940442"]|>]The confidence data has the following keys:
| "PredictedAlignedError" | a measure of how confident a prediction model is in the relative position of two residues within the predicted structure | |
| "PredictedLocalDistanceDifferenceTest" | a measure of local confidence that is an indicator of how confident the model is on an individual residue level | |
| "PredictedTemplateModelingScore" | derived from the template modeling score that measures the global accuracy of the protein, which is to a large extent independent of local inaccuracies in prediction |
Visualize the confidence of prediction:
ListLinePlot[confidence["PredictedLocalDistanceDifferenceTest"], AxesLabel -> {"Residues", "Confidence"}]Visualize the predicted alignment error, which is a measure of how confident a prediction model is in the relative position of two residues within the predicted structure. The measure is in angstroms, and the larger the value, the worse the confidence:
MatrixPlot[confidence["PredictedAlignedError"], ColorFunction -> "DeepSeaColors", FrameTicks -> True, FrameLabel -> {"Residues", "Residues"}, PlotLegends -> Automatic]Get the BioSequence of a biomolecule from the ESM Metagenomic Atlas using the "MGnifyID" of the biomolecule:
ServiceExecute["ESMAtlas", "Sequence", <|"MGnifyID" -> ExternalIdentifier["MGnifyProteinID", "MGYP002537940442"]|>]Get the embedding vector after averaging over the final layer activations of the ESM2 model over the sequence length for a given protein using their "MGnifyID":
ServiceExecute["ESMAtlas", **"SequenceEmbedding"**, <|"MGnifyID" -> ExternalIdentifier["MGnifyProteinID", "MGYP002537940442"]|>]//ShortPossible Issues (2)
Sometimes a BioSequence may contain nonstandard amino acids. For example, selenocysteine residues are represented by a single letter U:
BioSequence["Peptide", "DDWRAARSMHEFSAKDIDGHMVNLDKYRGFVSIVTNVASQUGKTEVNYTQLVDLHARYAEAGLRILAFPSNQFGKQEPGSNEEIKEFAAGYNVKFDMFSKIAVNGDDAHPLWKWMKIQPKGKGILGNAIKWNFTKFLIDKNGAVVKRYGPMEEPLVIEKDLPHYF"]Such biomolecular sequences with nonstandard amino acids will not be folded:
ServiceExecute["ESMAtlas", "FoldSequence", {"BioSequence" -> %}]Some biomolecular sequences contain degenerate letters, which can signify a choice of residues. For example, the letter B stands for aspartate or asparagine:
BioSequence["Peptide", "ACDFBKLW"]Such biomolecular sequences with nonstandard amino acids will not be folded:
ServiceExecute["ESMAtlas", "FoldSequence", {"BioSequence" -> %}]Neat Examples (1)
Visualize the confidence of prediction along with the structure:
confidencePlot3D[mgnifyID_, colorFunction_ : "TemperatureMap"] := Module[{confidence, plddt, bm, colors, res, chain, colRules}, confidence = ServiceExecute["ESMAtlas", "StructureConfidencePrediction", <|"MGnifyID" -> mgnifyID|>];
plddt = 0.01 * Normal[confidence[[-1]]];
bm = ServiceExecute["ESMAtlas", "PredictedStructure", <|"MGnifyID" -> mgnifyID|>];
colors = ColorData[colorFunction] /@ plddt;
res = bm["Residues"];
chain = (Keys@res)[[1]];
colRules = MapThread[{chain, #} -> #2&, {Range@Length@bm["Residues"][[1]], colors}];
BioMoleculePlot3D[bm, ColorRules -> colRules, PlotLegends -> BarLegend[{colorFunction, {0, 100}}, LegendLabel -> "pLDDT"]]]confidencePlot3D[ExternalIdentifier["MGnifyProteinID", "MGYP002537940442"]]See Also
ServiceExecute ▪ ServiceConnect ▪ BioMolecule ▪ BioSequence
Service Connections: RCSBProteinDataBank ▪ AlphaFoldDatabase ▪ UniProt