Wolfram Language & System Documentation Center

BatchNormalizationLayer

represents a trainable net layer that normalizes its input data by learning the data mean and variance.

Details and Options

BatchNormalizationLayer is typically used inside NetChain, NetGraph, etc. to regularize and speed up network training.
The following optional parameters can be included:

"Epsilon"	0.001`	stability parameter
Interleaving	False	the position of the channel dimension
"Momentum"	0.9	momentum used during training

With the setting InterleavingFalse, the channel dimension is taken to be the first dimension of the input and output arrays.
With the setting InterleavingTrue, the channel dimension is taken to be the last dimension of the input and output arrays.
The following learnable arrays can be included:

"Biases"	Automatic	learnable bias array
"MovingMean"	Automatic	moving estimate of the mean
"MovingVariance"	Automatic	moving estimate of the variance
"Scaling"	Automatic	learnable scaling array

With Automatic settings, the biases, scaling, moving mean and moving variance arrays are initialized automatically when NetInitialize or NetTrain is used.
The following training parameter can be included:
LearningRateMultipliers Automatic learning rate multipliers for the arrays
BatchNormalizationLayer freezes the values of "MovingVariance" and "MovingMean" during training with NetTrain if LearningRateMultipliers is 0 or "Momentum" is 1.
If biases, scaling, moving variance and moving mean have been set, BatchNormalizationLayer[…][input] explicitly computes the output from applying the layer.
BatchNormalizationLayer[…][{input₁,input₂,…}] explicitly computes outputs for each of the input_i.
When given a NumericArray as input, the output will be a NumericArray.
BatchNormalizationLayer exposes the following ports for use in NetGraph etc.:
"Input" a vector, matrix or higher-rank array

"Output" a vector, matrix or higher-rank array
When it cannot be inferred from other layers in a larger net, the option "Input"->{n₁,n₂,…} can be used to fix the input dimensions of BatchNormalizationLayer.
NetExtract can be used to extract biases, scaling, moving variance and moving mean arrays from a BatchNormalizationLayer object.
Options[BatchNormalizationLayer] gives the list of default options to construct the layer. Options[BatchNormalizationLayer[…]] gives the list of default options to evaluate the layer on some data.
Information[BatchNormalizationLayer[…]] gives a report about the layer.
Information[BatchNormalizationLayer[…],prop] gives the value of the property prop of BatchNormalizationLayer[…]. Possible properties are the same as for NetGraph.

Examples

open all close all

Basic Examples (2)

Create a BatchNormalizationLayer:

Wolfram Language code: BatchNormalizationLayer[]

Create an initialized BatchNormalizationLayer that takes a vector and returns a vector:

Wolfram Language code: batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 3]

Apply the layer to an input vector:

Wolfram Language code: batchnorm[{1, 2, 3}]

Scope (4)

Ports (2)

Create an initialized BatchNormalizationLayer that takes a rank-3 array and returns a rank-3 array:

Wolfram Language code: batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> {2, 3, 3}]

Wolfram Language code: batchnorm[RandomReal[1, {2, 3, 3}]]//Normal//MatrixForm

Create an initialized BatchNormalizationLayer that takes a vector and returns a vector:

Wolfram Language code: batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 3]

Apply the layer to a batch of input vectors:

Wolfram Language code: batchnorm[{{1, 2, 3}, {4, 0.2, 3}}]

Use NetEvaluationMode to use the training behavior of BatchNormalizationLayer:

Wolfram Language code: batchnorm[{{1, 2, 3}, {4, 0.2, 3}}, NetEvaluationMode -> "Train"]

Parameters (2)

"Biases" (1)

Create a BatchNormalizationLayer with an initial value for the "Biases" parameter:

Wolfram Language code: batchnorm = BatchNormalizationLayer["Biases" -> {-1, 3.4}]

Extract the "Biases" parameter:

Wolfram Language code: NetExtract[batchnorm, "Biases"]

The default value for "Biases" chosen by NetInitialize is a zero vector:

Wolfram Language code:

batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 2];
NetExtract[batchnorm, "Biases"]

"Scaling" (1)

Create an initialized BatchNormalizationLayer with the "Scaling" parameter set to zero and the "Biases" parameter set to a custom value:

Wolfram Language code: batchnorm = NetInitialize@BatchNormalizationLayer["Scaling" -> {0, 0, 0}, "Biases" -> {1.3, -22.1, 1.2}]

Applying the layer to any input returns the value for the "Biases" parameter:

Wolfram Language code: batchnorm[{1, 2, 3}]

Wolfram Language code: batchnorm[{-3.4, 2.3, 100}]

The default value for "Scaling" chosen by NetInitialize is a vector of 1s:

Wolfram Language code:

batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 2];
NetExtract[batchnorm, "Scaling"]

Options (2)

"Epsilon" (1)

Create a BatchNormalizationLayer with the "Epsilon" parameter explicitly specified:

Wolfram Language code: batchnorm = BatchNormalizationLayer["Epsilon" -> 0.1]

Extract the "Epsilon" parameter:

Wolfram Language code: NetExtract[batchnorm, "Epsilon"]

"Momentum" (1)

Create a BatchNormalizationLayer with the "Momentum" parameter explicitly specified:

Wolfram Language code: batchnorm = BatchNormalizationLayer["Momentum" -> 0.1]

Extract the "Momentum" parameter:

Wolfram Language code: NetExtract[batchnorm, "Momentum"]

Applications (1)

BatchNormalizationLayer is commonly inserted between a ConvolutionLayer and its activation function in order to stabilize and speed up training:

Wolfram Language code: NetChain[{ConvolutionLayer[3, {3, 3}], BatchNormalizationLayer["Input" -> {3, 28, 28}], ElementwiseLayer[Ramp]}]

Properties & Relations (1)

During ordinary evaluation, BatchNormalizationLayer computes the following function:

Wolfram Language code:

batchNormFunction = Function[Block[{sd = Sqrt[#MovingVariance + #Epsilon]}, 
	(#2 * #Scaling / sd ) + (#Biases - (#Scaling * #MovingMean) / sd)]];

Evaluate a BatchNormalizationLayer on an example vector containing a single channel:

Wolfram Language code:

params = <|"Scaling" -> {3}, "Biases" -> {2}, "MovingMean" -> {1}, "MovingVariance" -> {2}, "Epsilon" -> 0.001|>;
layer = BatchNormalizationLayer@@Normal[params]

Wolfram Language code: layer[{5}]//Normal

Manually compute the same result:

Wolfram Language code: batchNormFunction[params, {5}]

Possible Issues (3)

Specifying negative values for the "MovingVariance" parameter causes numerical errors during evaluation:

Wolfram Language code: batchnorm = NetInitialize[BatchNormalizationLayer["Input" -> {1, 2, 2}, "MovingVariance" -> {-2}]]

Wolfram Language code: batchnorm[RandomReal[1, {1, 2, 2}]]

BatchNormalizationLayer cannot be initialized until all its input and output dimensions are known:

Wolfram Language code: NetInitialize@BatchNormalizationLayer[]

Wolfram Language code: NetInitialize@BatchNormalizationLayer["Input" -> 3]

The "MovingMean" and "MovingVariance" arrays of BatchNormalizationLayer cannot be shared:

Wolfram Language code: BatchNormalizationLayer["MovingMean" -> NetArray["MovingMean"]]

Create a BatchNormalizationLayer with shared arrays:

Wolfram Language code: sharedBatchNorm = NetInsertSharedArrays[BatchNormalizationLayer[]]

Train it on some data:

Wolfram Language code: net = NetTrain[NetChain[{2, sharedBatchNorm, 2, sharedBatchNorm, 2}], {{0, 1} -> {1, 0}, {1, 0} -> {0, 1}}]

Extract the trained batch normalization layers:

Wolfram Language code: {batchnorm1, batchnorm2} = NetExtract[net, {{2}, {4}}]

The "Scaling" and "Biases" arrays were shared, but not "MovingMean" or "MovingVariance":

Wolfram Language code: Normal /@ Information[batchnorm1, "Arrays"]

Wolfram Language code: Normal /@ Information[batchnorm2, "Arrays"]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

BatchNormalizationLayer

Details and Options

Examples

Basic Examples (2)

Scope (4)

Ports (2)

Parameters (2)

"Biases" (1)

"Scaling" (1)

Options (2)

"Epsilon" (1)

"Momentum" (1)

Applications (1)

Properties & Relations (1)

Possible Issues (3)

Text

CMS

APA

BibTeX

BibLaTeX

	"Input"	a vector, matrix or higher-rank array
	"Output"	a vector, matrix or higher-rank array

BatchNormalizationLayer

Details and Options

Examples

Basic Examples (2)

Scope (4)

Ports (2)

Parameters (2)

"Biases" (1)

"Scaling" (1)

Options (2)

"Epsilon" (1)

"Momentum" (1)

Applications (1)

Properties & Relations (1)

Possible Issues (3)

See Also

Tech Notes

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX