represents a trainable net layer that normalizes its input data by learning the data mean and variance.
BatchNormalizationLayer
represents a trainable net layer that normalizes its input data by learning the data mean and variance.
Details and Options
- BatchNormalizationLayer is typically used inside NetChain, NetGraph, etc. to regularize and speed up network training.
- The following optional parameters can be included:
-
"Epsilon" 0.001` stability parameter Interleaving False the position of the channel dimension "Momentum" 0.9 momentum used during training - With the setting InterleavingFalse, the channel dimension is taken to be the first dimension of the input and output arrays.
- With the setting InterleavingTrue, the channel dimension is taken to be the last dimension of the input and output arrays.
- The following learnable arrays can be included:
-
"Biases" Automatic learnable bias array "MovingMean" Automatic moving estimate of the mean "MovingVariance" Automatic moving estimate of the variance "Scaling" Automatic learnable scaling array - With Automatic settings, the biases, scaling, moving mean and moving variance arrays are initialized automatically when NetInitialize or NetTrain is used.
- The following training parameter can be included:
-
LearningRateMultipliers Automatic learning rate multipliers for the arrays - BatchNormalizationLayer freezes the values of "MovingVariance" and "MovingMean" during training with NetTrain if LearningRateMultipliers is 0 or "Momentum" is 1.
- If biases, scaling, moving variance and moving mean have been set, BatchNormalizationLayer[…][input] explicitly computes the output from applying the layer.
- BatchNormalizationLayer[…][{input1,input2,…}] explicitly computes outputs for each of the inputi.
- When given a NumericArray as input, the output will be a NumericArray.
- BatchNormalizationLayer exposes the following ports for use in NetGraph etc.:
-
"Input" a vector, matrix or higher-rank array "Output" a vector, matrix or higher-rank array - When it cannot be inferred from other layers in a larger net, the option "Input"->{n1,n2,…} can be used to fix the input dimensions of BatchNormalizationLayer.
- NetExtract can be used to extract biases, scaling, moving variance and moving mean arrays from a BatchNormalizationLayer object.
- Options[BatchNormalizationLayer] gives the list of default options to construct the layer. Options[BatchNormalizationLayer[…]] gives the list of default options to evaluate the layer on some data.
- Information[BatchNormalizationLayer[…]] gives a report about the layer.
- Information[BatchNormalizationLayer[…],prop] gives the value of the property prop of BatchNormalizationLayer[…]. Possible properties are the same as for NetGraph.
Examples
open all close allBasic Examples (2)
Create a BatchNormalizationLayer:
BatchNormalizationLayer[]Create an initialized BatchNormalizationLayer that takes a vector and returns a vector:
batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 3]Apply the layer to an input vector:
batchnorm[{1, 2, 3}]Scope (4)
Ports (2)
Create an initialized BatchNormalizationLayer that takes a rank-3 array and returns a rank-3 array:
batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> {2, 3, 3}]batchnorm[RandomReal[1, {2, 3, 3}]]//Normal//MatrixFormCreate an initialized BatchNormalizationLayer that takes a vector and returns a vector:
batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 3]Apply the layer to a batch of input vectors:
batchnorm[{{1, 2, 3}, {4, 0.2, 3}}]Use NetEvaluationMode to use the training behavior of BatchNormalizationLayer:
batchnorm[{{1, 2, 3}, {4, 0.2, 3}}, NetEvaluationMode -> "Train"]Parameters (2)
"Biases" (1)
Create a BatchNormalizationLayer with an initial value for the "Biases" parameter:
batchnorm = BatchNormalizationLayer["Biases" -> {-1, 3.4}]Extract the "Biases" parameter:
NetExtract[batchnorm, "Biases"]The default value for "Biases" chosen by NetInitialize is a zero vector:
batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 2];
NetExtract[batchnorm, "Biases"]"Scaling" (1)
Create an initialized BatchNormalizationLayer with the "Scaling" parameter set to zero and the "Biases" parameter set to a custom value:
batchnorm = NetInitialize@BatchNormalizationLayer["Scaling" -> {0, 0, 0}, "Biases" -> {1.3, -22.1, 1.2}]Applying the layer to any input returns the value for the "Biases" parameter:
batchnorm[{1, 2, 3}]batchnorm[{-3.4, 2.3, 100}]The default value for "Scaling" chosen by NetInitialize is a vector of 1s:
batchnorm = NetInitialize@BatchNormalizationLayer["Input" -> 2];
NetExtract[batchnorm, "Scaling"]Options (2)
"Epsilon" (1)
Create a BatchNormalizationLayer with the "Epsilon" parameter explicitly specified:
batchnorm = BatchNormalizationLayer["Epsilon" -> 0.1]Extract the "Epsilon" parameter:
NetExtract[batchnorm, "Epsilon"]"Momentum" (1)
Create a BatchNormalizationLayer with the "Momentum" parameter explicitly specified:
batchnorm = BatchNormalizationLayer["Momentum" -> 0.1]Extract the "Momentum" parameter:
NetExtract[batchnorm, "Momentum"]Applications (1)
BatchNormalizationLayer is commonly inserted between a ConvolutionLayer and its activation function in order to stabilize and speed up training:
NetChain[{ConvolutionLayer[3, {3, 3}], BatchNormalizationLayer["Input" -> {3, 28, 28}], ElementwiseLayer[Ramp]}]Properties & Relations (1)
During ordinary evaluation, BatchNormalizationLayer computes the following function:
batchNormFunction = Function[Block[{sd = Sqrt[#MovingVariance + #Epsilon]},
(#2 * #Scaling / sd ) + (#Biases - (#Scaling * #MovingMean) / sd)]];Evaluate a BatchNormalizationLayer on an example vector containing a single channel:
params = <|"Scaling" -> {3}, "Biases" -> {2}, "MovingMean" -> {1}, "MovingVariance" -> {2}, "Epsilon" -> 0.001|>;
layer = BatchNormalizationLayer@@Normal[params]layer[{5}]//NormalManually compute the same result:
batchNormFunction[params, {5}]Possible Issues (3)
Specifying negative values for the "MovingVariance" parameter causes numerical errors during evaluation:
batchnorm = NetInitialize[BatchNormalizationLayer["Input" -> {1, 2, 2}, "MovingVariance" -> {-2}]]batchnorm[RandomReal[1, {1, 2, 2}]]BatchNormalizationLayer cannot be initialized until all its input and output dimensions are known:
NetInitialize@BatchNormalizationLayer[]NetInitialize@BatchNormalizationLayer["Input" -> 3]The "MovingMean" and "MovingVariance" arrays of BatchNormalizationLayer cannot be shared:
BatchNormalizationLayer["MovingMean" -> NetArray["MovingMean"]]Create a BatchNormalizationLayer with shared arrays:
sharedBatchNorm = NetInsertSharedArrays[BatchNormalizationLayer[]]net = NetTrain[NetChain[{2, sharedBatchNorm, 2, sharedBatchNorm, 2}], {{0, 1} -> {1, 0}, {1, 0} -> {0, 1}}]Extract the trained batch normalization layers:
{batchnorm1, batchnorm2} = NetExtract[net, {{2}, {4}}]The "Scaling" and "Biases" arrays were shared, but not "MovingMean" or "MovingVariance":
Normal /@ Information[batchnorm1, "Arrays"]Normal /@ Information[batchnorm2, "Arrays"]Tech Notes
Related Guides
Text
Wolfram Research (2016), BatchNormalizationLayer, Wolfram Language function, https://reference.wolfram.com/language/ref/BatchNormalizationLayer.html (updated 2020).
CMS
Wolfram Language. 2016. "BatchNormalizationLayer." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/BatchNormalizationLayer.html.
APA
Wolfram Language. (2016). BatchNormalizationLayer. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/BatchNormalizationLayer.html
BibTeX
@misc{reference.wolfram_2026_batchnormalizationlayer, author="Wolfram Research", title="{BatchNormalizationLayer}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/BatchNormalizationLayer.html}", note=[Accessed: 12-June-2026]}
BibLaTeX
@online{reference.wolfram_2026_batchnormalizationlayer, organization={Wolfram Research}, title={BatchNormalizationLayer}, year={2020}, url={https://reference.wolfram.com/language/ref/BatchNormalizationLayer.html}, note=[Accessed: 12-June-2026]}