Wolfram Language & System Documentation Center

SFF (.sff)

See Also
- Import
- CloudImport
- Formats
- MOL
- PDB
- XYZ
- FASTA
- FASTQ
Related Guides
- See Also
  - Import
  - CloudImport
  - Formats
  - MOL
  - PDB
  - XYZ
  - FASTA
  - FASTQ
- Related Guides

SFF (.sff)

Import supports most common variants of the SFF file format, including those with and without an index.

Background & Context

- MIME type: chemical/seq-na-sff
- SFF molecular biology format.
- Standard flowgram format for storing and exchanging DNA sequences with base qualities.
- Commonly used by the 454 Life Sciences DNA pyrosequencing platform.
- Binary format.
- Stores nucleic acid sequences and base qualities as character strings and lists, respectively.
- Meta-information about the sequencing run are stored in the file.

Import

Import["file.sff"] imports DNA sequencing data from an SFF file.
Import["file.sff"] returns an array representing the sequencing data stored in the file.
Import["file.sff",elem] imports the specified element from an SFF file.
Import["file.sff",{{elem₁,elem₂,…}}] imports multiple elements.
The import format can be specified with Import["file","SFF"] or Import["file",{"SFF",elem,…}].
See the following reference pages for full general information:

	Import	import from a file
	CloudImport	import from a cloud object
	ImportString	import from a string
	ImportByteArray	import from a byte array

Import Elements

General Import elements:
"Elements" list of elements and options available in this file

"Summary" summary of the file

"Rules" list of rules for all available elements
File metadata:
"Header" file header given as a list of rules

"XMLManifest" XML manifest as an XML object
Data representation elements for each sequencing read:

	"Sequence"	DNA sequences as a list of strings
	"Qualities"	base qualities as a list of lists
	"FlowgramValues"	flowgram values as a list of lists
	"FlowIndexPerBase"	flow index values as a list of lists
	"ClipQualities"	coordinates for quality-trimming the sequences as an array
	"ClipAdapter"	coordinates for adapter-trimming the sequences as an array
	"ReadName"	names of the reads as a list of strings

Additional data elements:
"Data" all data representation elements combined in a list

"LabeledData" list of rules for each sequence stored in the file
Import uses the "Data" element by default for the SFF format.
The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:

	A	adenosine
	C	cytidine
	G	guanine
	T	thymidine
	U	uracil
	R	purine (G or A)
	Y	pyrimidine (T or C)
	K	ketone (G or T)
	M	amino group (A or C)
	S	strong interaction (G or C)
	W	weak interaction (A or T)
	B	C or G or T
	D	A or G or T
	H	A or C or T
	V	A or C or G
	N	any nucleic acid (A or C or G or T)
	-	gap of indeterminate length

The Wolfram Language uses integers for the base qualities.

Examples

open all close all

Basic Examples (5)

This reads the file header from a sample SFF file:

Wolfram Language code: Short@Import["ExampleData/Echinococcus.sff", {"SFF", "Header"}]

Read the DNA sequences:

Wolfram Language code: Short@Import["ExampleData/Echinococcus.sff", {"SFF", "Sequence"}]

Read the DNA sequences with qualities, flowgram values, etc.:

Wolfram Language code: First@Import["ExampleData/Echinococcus.sff", {"SFF", "LabeledData"}]

Import names of the reads in the file:

Wolfram Language code:

names = Import["ExampleData/Echinococcus.sff", {"SFF", "ReadName"}];
Short[names]

Retrieve a sequence entry by name:

Wolfram Language code: Short@Import["ExampleData/Echinococcus.sff", {"SFF", Last@names, "LabeledData"}]

Retrieve the XML manifest of the sequencing run in the file and extract the analysis name:

Wolfram Language code:

manifest = Import["ExampleData/Echinococcus.sff", {"SFF", "XMLManifest"}];
Flatten@Cases[manifest, XMLElement["analysis_name", _, a_] :> a, Infinity]

Scope (3)

Trim the sequences according to the quality-trimming coordinates:

Wolfram Language code:

MapThread[StringTake[#1, #2]&, Import["ExampleData/Echinococcus.sff", {"SFF", {"Sequence", "ClipQualities"}}]]//Short[#, 5]&

Convert the SFF file to a FASTQ file, adding 64 to the quality scores for the character encoding:

Wolfram Language code: {names, seqs, quals} = Import["ExampleData/Echinococcus.sff", {"SFF", {"ReadName", "Sequence", "Qualities"}}];

Wolfram Language code: quals = StringJoin /@ FromCharacterCode[quals + 64];

Wolfram Language code: ExportString[First /@ {names, seqs, quals}, "FASTQ"]

Plot the flowgram intensity values:

Wolfram Language code:

numOfFlows = Import["ExampleData/Echinococcus.sff", {"SFF", "Header", "NumberOfFlows"}];
{flowValues, flowIndexPerBase} = Import["ExampleData/Echinococcus.sff", {"SFF", {"FlowgramValues", "FlowIndexPerBase"}, 1}];

Wolfram Language code:

(*Calculate positions of bases*){a, c, g, t} = Intersection[Range[1, numOfFlows, 4] + #, Accumulate[flowIndexPerBase]]& /@ Range[0, 3];

Wolfram Language code:

Labeled[ListPlot[Transpose /@ ({#, flowValues[[#]]}& /@ {a, c, g, t}), Filling -> Axis, ImageSize -> 400], Text@Style[#]& /@ {"Flow Cycle", "Intensity"}, {Bottom, Left}, RotateLabel -> True]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

SFF (.sff)

Background & Context

Import

Import Elements

Examples

Basic Examples (5)

Scope (3)

	"Elements"	list of elements and options available in this file
	"Summary"	summary of the file
	"Rules"	list of rules for all available elements

	"Data"	all data representation elements combined in a list
	"LabeledData"	list of rules for each sequence stored in the file

SFF (.sff)

Background & Context

Import

Import Elements

Examples

Basic Examples (5)

Scope (3)

See Also

Related Guides

History