Wolfram Language & System Documentation Center

BioSequenceTranslate

See Also
- BioSequence
- BioSequenceBackTranslateList
- BioSequenceTranscribe
- BioSequenceInstances
- Entity Types
- GeneticTranslationTable
Related Guides
- Biomolecular Sequences
- See Also
  - BioSequence
  - BioSequenceBackTranslateList
  - BioSequenceTranscribe
  - BioSequenceInstances
  - Entity Types
  - GeneticTranslationTable
- Related Guides
  - Biomolecular Sequences

BioSequenceTranslate

BioSequenceTranslate[bioseq]

translates a DNA or RNA sequence bioseq to a peptide sequence.

BioSequenceTranslate[bioseq,gtt]

uses the genetic translation table gtt.

BioSequenceTranslate[bioseq,gtt,startspec]

treats start codons in bioseq according to the specification startspec.

Details

The genetic translation table gtt can be specified as follows:

	Automatic	the standard code (default)
	"name"	standard name of a "GeneticTranslationTable" entity
	Entity["GeneticTranslationTable",…]	"GeneticTranslationTable" entity
	n	NCBI genetic code number
	"AAAA…"	length-64 NCBI codon translation string
	<\|"cod₁""tran₁","cod₂""tran₂",…\|>	explicit codon translation table

The start codon specification startspec specifies what translation should be used for the first codon in bioseq. If no startspec is given, the first codon will be treated as a start codon only if bioseq can be identified as corresponding to a complete protein.
The following forms for startspec can be used to specific how to treat the first codon in bioseq:
Automatic treat as start codon only for complete proteins

False never treat as start codon

True always treat as start code
The following additional forms for startspec can be used to define start codon behavior, overriding the specification implied by the genetic translation table gtt:

	n	use NCBI genetic code number start codon specification
	"AAAA…"	use length-64 NCBI start codon translation string
	<\|"cod₁""tran₁","cod₂""tran₂",…\|>	explicit specify start codon translations

Examples

open all close all

Basic Examples (1)

Translate a DNA sequence into the corresponding peptide sequences:

Wolfram Language code: BioSequenceTranslate[BioSequence["DNA", "CTGATATAC"]]

Scope (8)

Translate an RNA sequence:

Wolfram Language code: BioSequenceTranslate[BioSequence["RNA", "CUGAUAUAC"]]

Translate using a "GeneticTranslationTable" entity:

Wolfram Language code: BioSequenceTranslate[BioSequence["DNA", "CTGATATAC"], Entity["GeneticTranslationTable", "VertebrateMitochondrial"]]

Translate using the table corresponding to a NCBI genetic code number:

Wolfram Language code: BioSequenceTranslate[BioSequence["DNA", "CTGATATAC"], 2]

Use an Association to specify the translation table:

Wolfram Language code:

BioSequenceTranslate[BioSequence["DNA", "CTGATATAC"], <|"TTT" -> "F", "TTC" -> "F", "TTA" -> "L", "TTG" -> "L", "TCT" -> "S", "TCC" -> "S", "TCA" -> "S", "TCG" -> "S", "TAT" -> "Y", "TAC" -> "S", "TAA" -> "*", "TAG" -> "*", "TGT" -> "C", "TGC" -> "C", "TGA" -> "W", "TGG" -> "W", "CTT" -> "L", "CTC" -> "L", "CTA" -> "L", "CTG" -> "L", "CCT" -> "P", "CCC" -> "P", "CCA" -> "P", "CCG" -> "P", "CAT" -> "H", "CAC" -> "H", "CAA" -> "Q", "CAG" -> "Q", "CGT" -> "R", "CGC" -> "R", "CGA" -> "R", "CGG" -> "R", "ATT" -> "I", "ATC" -> "I", "ATA" -> "I", "ATG" -> "M", "ACT" -> "T", "ACC" -> "T", "ACA" -> "T", "ACG" -> "T", "AAT" -> "N", "AAC" -> "N", "AAA" -> "K", "AAG" -> "K", "AGT" -> "S", "AGC" -> "S", "AGA" -> "*", "AGG" -> "*", "GTT" -> "V", "GTC" -> "V", "GTA" -> "V", "GTG" -> "V", "GCT" -> "A", "GCC" -> "A", "GCA" -> "A", "GCG" -> "A", "GAT" -> "D", "GAC" -> "D", "GAA" -> "E", "GAG" -> "E", "GGT" -> "G", "GGC" -> "G", "GGA" -> "G", "GGG" -> "G"|>]

Use an arbitrary NCBI translation table:

Wolfram Language code:

BioSequenceTranslate[BioSequence["DNA", "CTGATATAC"], "GGGGEEDDAAAAVVVVRRSSKKNNTTTTMIIIRRRRQQHHPPPPLLLLW*CC**YYSSSSLLFF"]

Use the start translation of the automatically selected translation table:

Wolfram Language code: BioSequenceTranslate[BioSequence["CTGATATAC"], Automatic, True]

Specify start codon translations with an association of the changed start codons:

Wolfram Language code: BioSequenceTranslate[BioSequence["DNA", "CTGATATAC"], Automatic, <|"CTG" -> "R"|>]

Use NCBI notation to specify the start codon translation:

Wolfram Language code:

BioSequenceTranslate[BioSequence["DNA", "CTGATATAG"], Automatic, "GGGGEEDDAAAAVVVVRRSSKKNNTTTTMIIIRRRRQQHHPPPPLLLLW*CC**YYSSSSLLFF"]

Applications (2)

The translations to selenocysteine (U) and pyrrolysine (O) only happen in particular chemical contexts that are distinct from the translation system for a particular organism. One way to incorporate them is to change the codon translation association:

Wolfram Language code:

standardTranslations = Entity["GeneticTranslationTable", "Standard"]["CodonTranslations"];
BioSequenceTranslate[BioSequence["DNA", "GGCTGATAGTAA"], 
	AssociateTo[standardTranslations, 
	specialRules = {"TGA" -> "U", "TAG" -> "O"}]]

After splicing together coding sequences, translation yields the same sequence as the protein:

Wolfram Language code:

hbbBioSeq = BioSequence[Interpreter["Gene"]["human hbb gene"]];
spliced = StringJoin@@StringTake[hbbBioSeq, {{51, 142}, {273, 495}, {1346, 1474}}];
translated = BioSequenceModify[BioSequenceTranslate[spliced, Automatic, True], "DropFromStopLetter"]

Wolfram Language code: targetProtein = BioSequence[Interpreter["Protein"]["hbb"]]

Wolfram Language code: SequenceAlignment[targetProtein, translated]

Possible Issues (2)

Inputs with degenerate letters may have multiple possible translations with no single generalization:

Wolfram Language code: BioSequenceTranslate[BioSequence["RNA", "SUGAUAUAC"]]

Using BioSequenceInstances first will avoid ambiguous results:

Wolfram Language code: BioSequenceTranslate /@ BioSequenceInstances[BioSequence["RNA", "SUGAUAUAC"]]

If the length of a DNA or RNA sequence is not a multiple of three, then the remaining letters after constructing codons are discarded from the end of the sequence:

Wolfram Language code: BioSequenceTranslate[BioSequence["DNA", "CTGATATACGG"]]

Neat Examples (2)

See the peptides resulting from translation:

Wolfram Language code:

acgt = {"A", "C", "G", "T"};
Graphics3D[Table[aa = BioSequenceTranslate[dna = BioSequence["DNA", StringJoin[acgt[[{i1, i2, i3}]]]]];
	mol = If[aa =!= BioSequence["Peptide", "."], Molecule[aa], Null];
	molpl = If[aa =!= BioSequence["Peptide", "."], MoleculePlot3D[mol], "STOP"];
	Inset[Tooltip[Rasterize[molpl, ImageSize -> 40, Background -> None], Column[{CommonName[Entity["BioSequenceType", "Peptide"]["AlphabetRules"][aa["SequenceString"]]]//Quiet, aa, mol, dna}]], {i1, i2, i3}], {i1, 4}, {i2, 4}, {i3, 4}], Boxed -> False, Axes -> True, AxesOrigin -> {0, 0, 0}, Ticks -> Table[Transpose[{Range[4], Style[#, Bold]& /@ acgt}], {3}]]

Reveal a phrase encoded in DNA:

Wolfram Language code: BioSequenceTranslate[BioSequence["DNA", "GGAGAAAACGAAACAATATGCAGC"]]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

BioSequenceTranslate

Details

Examples

Basic Examples (1)

Scope (8)

Applications (2)

Possible Issues (2)

Neat Examples (2)

Text

CMS

APA

BibTeX

BibLaTeX

	Automatic	treat as start codon only for complete proteins
	False	never treat as start codon
	True	always treat as start code

BioSequenceTranslate

Details

Examples

Basic Examples (1)

Scope (8)

Applications (2)

Possible Issues (2)

Neat Examples (2)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX