Wolfram Language & System Documentation Center

SpatialTransformationLayer

represents a net layer that applies an affine transformation to an input of size c×h₀×w₀ and returns an output of size c×h×w.

Details and Options

SpatialTransformationLayer exposes the following ports for use in NetGraph etc.:
"Input" a 3-dimensional array

"Parameters" a vector of length 6

"Output" a 3-dimensional array
SpatialTransformationLayer[…][<|"Input"->in,"Parameters"param|>] explicitly computes the output from applying the layer.
SpatialTransformationLayer[…][<|"Input"->{in₁,in₂,…},"Parameters"->{param₁,param₂,…}|>] explicitly computes output for each of the in_i and param_i.
When given a NumericArray as input, the output will be a NumericArray.
SpatialTransformationLayer is typically used inside NetGraph to focus the attention of a later convolutional network on the best part of the image to perform a specific task.
When it cannot be inferred from other layers in a larger net, the option "Input"->{d₁,d₂,d₃} can be used to fix the input dimensions of SpatialTransformationLayer.
The six components of the vector provided to the port "Parameters", {z_h,s_h,t_h,s_v,z_v,t_v}, represent the parameters in the affine transformation matrix, where z_i represents zoom, s_i skewness and t_i translation, and the subscripts h and v indicate horizontal and vertical. The identity transformation is obtained when "Parameters" is {1,0,0,0,1,0}.
Options[SpatialTransformationLayer] gives the list of default options to construct the layer. Options[SpatialTransformationLayer[…]] gives the list of default options to evaluate the layer on some data.
Information[SpatialTransformationLayer[…]] gives a report about the layer.
Information[SpatialTransformationLayer[…],prop] gives the value of the property prop of SpatialTransformationLayer[…]. Possible properties are the same as for NetGraph.

Examples

open all close all

Basic Examples (2)

Create a SpatialTransformationLayer with output size 30×30:

Wolfram Language code: SpatialTransformationLayer[{30, 30}]

Create a SpatialTransformationLayer that expects an input of size 1×3×3 and returns an output of size 1×2×2:

Wolfram Language code: transform = SpatialTransformationLayer[{2, 2}, "Input" -> {1, 3, 3}]

Apply the layer to an input:

Wolfram Language code: transform[<|"Parameters" -> {0.5, 0, 0, 0, 0.5, 0}, "Input" -> {{{1, 0, 0}, {0, 0, 0}, {0, 0, 1}}}|>]

Scope (1)

Create a SpatialTransformationLayer whose input is an image and whose output is an image:

Wolfram Language code: spatial = SpatialTransformationLayer[{64, 64}, "Input" -> NetEncoder["Image"], "Output" -> NetDecoder["Image"]]

Apply the SpatialTransformationLayer to an image with a factor-2 zoom transformation:

Wolfram Language code: img = [image];

Wolfram Language code: spatial[<|"Input" -> img, "Parameters" -> {0.5, 0, 0, 0, 0.5, 0}|>]

Apply the SpatialTransformationLayer using a sequence of zooms:

Wolfram Language code: spatial[<|"Input" -> Table[img, 10], "Parameters" -> Table[{i, 0, 0, 0, i, 0}, {i, 0.2, 2, 1 / 5}]|>]

Applications (1)

Train a digit recognizer on the MNIST database of handwritten digits using a convolutional neural network with a SpatialTransformationLayer. First obtain the training and test data:

Wolfram Language code:

resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];

Define a function to apply extra padding and random translations to the training and test data:

Wolfram Language code:

f[image_] := ImagePerspectiveTransformation[ImagePad[ColorNegate[image], 5], TranslationTransform[RandomReal[{-0.2, 0.2}, 2]]];

Wolfram Language code: f /@ {[image], [image], [image]}

Create new training and test data using the function (this should take about a minute):

Wolfram Language code: {trainingData2, testData2} = MapAt[f, {trainingData, testData}, {All, All, 1}];

Wolfram Language code: RandomSample[trainingData2, 6]

Create a network that uses the image to predict the best affine transformation to apply to the image to extract the digit:

Wolfram Language code: localizer = NetChain[{PoolingLayer[4, 4], FlattenLayer[], LinearLayer[6]}]

Create a convolutional classification net to use the subimage extracted by the localization net:

Wolfram Language code:

classifier = NetChain[{
	ConvolutionLayer[16, 4], BatchNormalizationLayer[], Ramp, PoolingLayer[4], ConvolutionLayer[32, 4], BatchNormalizationLayer[], Ramp, PoolingLayer[4], LinearLayer[10], SoftmaxLayer[]}, 
	"Output" -> NetDecoder[{"Class", Range[0, 9]}], 
	"Input" -> {1, 16, 16}]

Attach the classification network and the localization network to a spatial transformation layer:

Wolfram Language code:

net = NetGraph[<|
	"localizer" -> localizer, "transformer" -> SpatialTransformationLayer[{16, 16}], "classifier" -> classifier|>, 
	{"localizer" -> NetPort["transformer", "Parameters"], "transformer" -> "classifier"}, 
	"Input" -> NetEncoder[{"Image", {38, 38}, "Grayscale"}]]

Train the network:

Wolfram Language code: trained = NetTrain[net, trainingData2, ValidationSet -> testData2, MaxTrainingRounds -> 6]

If the classification network is removed, the effect of the spatial transformer can be visualized:

Wolfram Language code:

spatial = NetTake[trained, {NetPort["Input"], "transformer"}];
spatial = NetReplacePart[spatial, "Output" -> NetDecoder[{"Image", "Grayscale"}]]

Apply the spatial transformer to some images from the validation set:

Wolfram Language code: sample = Keys@RandomSample[testData2, 5]

Wolfram Language code: spatial[sample]

Obtain the accuracy of the network on the validation set:

Wolfram Language code: NetMeasurements[trained, testData2, "Accuracy"]

Properties & Relations (1)

Apply an AffineTransform to the coordinates of an image using ImageTransformation:

Wolfram Language code: img = [image];

Wolfram Language code:

m = {{0.3, 0.2}, {-0.3, 0.3}};
v = {-0.02, 0};
t = AffineTransform[{m, v}];
ImageTransformation[img, t, DataRange -> {{-1, 1}, {-1, 1}}]

Construct an equivalent set of parameters for SpatialTransformationLayer:

Wolfram Language code: st = SpatialTransformationLayer[{128, 128}, "Input" -> NetEncoder["Image"], "Output" -> NetDecoder["Image"]]

Wolfram Language code: makeTrans[{{m11_, m12_}, {m21_, m22_}}, {v1_, v2_}] := {m11, -m12, -v1, -m21, m22, v2};

Wolfram Language code: st[<|"Input" -> img, "Parameters" -> makeTrans[m, v]|>]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

SpatialTransformationLayer

Details and Options

Examples

Basic Examples (2)

Scope (1)

Applications (1)

Properties & Relations (1)

Text

CMS

APA

BibTeX

BibLaTeX

	"Input"	a 3-dimensional array
	"Parameters"	a vector of length 6
	"Output"	a 3-dimensional array

SpatialTransformationLayer

Details and Options

Examples

Basic Examples (2)

Scope (1)

Applications (1)

Properties & Relations (1)

See Also

Tech Notes

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX