SpatialTransformationLayer[{h,w}]
represents a net layer that applies an affine transformation to an input of size c×h0×w0 and returns an output of size c×h×w.
SpatialTransformationLayer
SpatialTransformationLayer[{h,w}]
represents a net layer that applies an affine transformation to an input of size c×h0×w0 and returns an output of size c×h×w.
Details and Options
- SpatialTransformationLayer exposes the following ports for use in NetGraph etc.:
-
"Input" a 3-dimensional array "Parameters" a vector of length 6 "Output" a 3-dimensional array - SpatialTransformationLayer[…][<|"Input"->in,"Parameters"param|>] explicitly computes the output from applying the layer.
- SpatialTransformationLayer[…][<|"Input"->{in1,in2,…},"Parameters"->{param1,param2,…}|>] explicitly computes output for each of the ini and parami.
- When given a NumericArray as input, the output will be a NumericArray.
- SpatialTransformationLayer is typically used inside NetGraph to focus the attention of a later convolutional network on the best part of the image to perform a specific task.
- When it cannot be inferred from other layers in a larger net, the option "Input"->{d1,d2,d3} can be used to fix the input dimensions of SpatialTransformationLayer.
- The six components of the vector provided to the port "Parameters", {zh,sh,th,sv,zv,tv}, represent the parameters in the affine transformation matrix, where zi represents zoom, si skewness and ti translation, and the subscripts h and v indicate horizontal and vertical. The identity transformation is obtained when "Parameters" is {1,0,0,0,1,0}.
- Options[SpatialTransformationLayer] gives the list of default options to construct the layer. Options[SpatialTransformationLayer[…]] gives the list of default options to evaluate the layer on some data.
- Information[SpatialTransformationLayer[…]] gives a report about the layer.
- Information[SpatialTransformationLayer[…],prop] gives the value of the property prop of SpatialTransformationLayer[…]. Possible properties are the same as for NetGraph.
Examples
open all close allBasic Examples (2)
Create a SpatialTransformationLayer with output size 30×30:
SpatialTransformationLayer[{30, 30}]Create a SpatialTransformationLayer that expects an input of size 1×3×3 and returns an output of size 1×2×2:
transform = SpatialTransformationLayer[{2, 2}, "Input" -> {1, 3, 3}]transform[<|"Parameters" -> {0.5, 0, 0, 0, 0.5, 0}, "Input" -> {{{1, 0, 0}, {0, 0, 0}, {0, 0, 1}}}|>]Scope (1)
Create a SpatialTransformationLayer whose input is an image and whose output is an image:
spatial = SpatialTransformationLayer[{64, 64}, "Input" -> NetEncoder["Image"], "Output" -> NetDecoder["Image"]]Apply the SpatialTransformationLayer to an image with a factor-2 zoom transformation:
img = [image];spatial[<|"Input" -> img, "Parameters" -> {0.5, 0, 0, 0, 0.5, 0}|>]Apply the SpatialTransformationLayer using a sequence of zooms:
spatial[<|"Input" -> Table[img, 10], "Parameters" -> Table[{i, 0, 0, 0, i, 0}, {i, 0.2, 2, 1 / 5}]|>]Applications (1)
Train a digit recognizer on the MNIST database of handwritten digits using a convolutional neural network with a SpatialTransformationLayer. First obtain the training and test data:
resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];Define a function to apply extra padding and random translations to the training and test data:
f[image_] := ImagePerspectiveTransformation[ImagePad[ColorNegate[image], 5], TranslationTransform[RandomReal[{-0.2, 0.2}, 2]]];f /@ {[image], [image], [image]}Create new training and test data using the function (this should take about a minute):
{trainingData2, testData2} = MapAt[f, {trainingData, testData}, {All, All, 1}];RandomSample[trainingData2, 6]Create a network that uses the image to predict the best affine transformation to apply to the image to extract the digit:
localizer = NetChain[{PoolingLayer[4, 4], FlattenLayer[], LinearLayer[6]}]Create a convolutional classification net to use the subimage extracted by the localization net:
classifier = NetChain[{
ConvolutionLayer[16, 4], BatchNormalizationLayer[], Ramp, PoolingLayer[4], ConvolutionLayer[32, 4], BatchNormalizationLayer[], Ramp, PoolingLayer[4], LinearLayer[10], SoftmaxLayer[]},
"Output" -> NetDecoder[{"Class", Range[0, 9]}],
"Input" -> {1, 16, 16}]Attach the classification network and the localization network to a spatial transformation layer:
net = NetGraph[<|
"localizer" -> localizer, "transformer" -> SpatialTransformationLayer[{16, 16}], "classifier" -> classifier|>,
{"localizer" -> NetPort["transformer", "Parameters"], "transformer" -> "classifier"},
"Input" -> NetEncoder[{"Image", {38, 38}, "Grayscale"}]]trained = NetTrain[net, trainingData2, ValidationSet -> testData2, MaxTrainingRounds -> 6]If the classification network is removed, the effect of the spatial transformer can be visualized:
spatial = NetTake[trained, {NetPort["Input"], "transformer"}];
spatial = NetReplacePart[spatial, "Output" -> NetDecoder[{"Image", "Grayscale"}]]Apply the spatial transformer to some images from the validation set:
sample = Keys@RandomSample[testData2, 5]spatial[sample]Obtain the accuracy of the network on the validation set:
NetMeasurements[trained, testData2, "Accuracy"]Properties & Relations (1)
Apply an AffineTransform to the coordinates of an image using ImageTransformation:
img = [image];m = {{0.3, 0.2}, {-0.3, 0.3}};
v = {-0.02, 0};
t = AffineTransform[{m, v}];
ImageTransformation[img, t, DataRange -> {{-1, 1}, {-1, 1}}]Construct an equivalent set of parameters for SpatialTransformationLayer:
st = SpatialTransformationLayer[{128, 128}, "Input" -> NetEncoder["Image"], "Output" -> NetDecoder["Image"]]makeTrans[{{m11_, m12_}, {m21_, m22_}}, {v1_, v2_}] := {m11, -m12, -v1, -m21, m22, v2};st[<|"Input" -> img, "Parameters" -> makeTrans[m, v]|>]Tech Notes
Related Guides
History
Text
Wolfram Research (2017), SpatialTransformationLayer, Wolfram Language function, https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html.
CMS
Wolfram Language. 2017. "SpatialTransformationLayer." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html.
APA
Wolfram Language. (2017). SpatialTransformationLayer. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html
BibTeX
@misc{reference.wolfram_2026_spatialtransformationlayer, author="Wolfram Research", title="{SpatialTransformationLayer}", year="2017", howpublished="\url{https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html}", note=[Accessed: 13-June-2026]}
BibLaTeX
@online{reference.wolfram_2026_spatialtransformationlayer, organization={Wolfram Research}, title={SpatialTransformationLayer}, year={2017}, url={https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html}, note=[Accessed: 13-June-2026]}