Wolfram Language & System Documentation Center

LearningRateMultipliers

is an option for net layers and for NetTrain, NetChain, NetGraph that specifies learning rate multipliers to apply during training.

Details

With the default value of LearningRateMultipliers->Automatic, all layers learn at the same rate.
LearningRateMultipliers->{rule₁,rule₂,…} specifies a set of rules that will be used to determine learning rate multipliers for every trainable array in the net.
In LearningRateMultipliers->{rule₁,rule₂,…}, each of the rule_i can be of the following forms:

	"part"r	use multiplier r for a named layer, subnetwork or array in a layer
	nr	use multiplier r for the n layer
	m;;nr	use multiplier r for layers m through n
	{part₁,part₂,…}r	use multiplier r for a nested layer or array
	_r	use multiplier r for all layers

LearningRateMultipliersr specifies using the same multiplier r for all trainable arrays.
If r is zero or None, it specifies that the layer or array should not undergo training and will be left unchanged by NetTrain.
If r is a positive or negative number, it specifies a multiplier to apply to the global learning rate chosen by the training method to determine the learning rate for the given layer or array.
For each trainable array, the rate used is given by the first matching rule, or 1 if no rule matches.
Rules that specify a subnet (e.g. a nested NetChain or NetGraph) apply to all layers and arrays within that subnet.
LearningRateMultipliers->{part->None} can be used to "freeze" a specific part.
LearningRateMultipliers->{part->1,_->None} can be used to "freeze" all layers except for a specific part.
The hierarchical specification {part₁,part₂,…} used by LearningRateMultipliers to refer to parts of a net is equivalent to that used by NetExtract and NetReplacePart.
Information[net,"ArraysLearningRateMultipliers"] yields the default learning rate multipliers for all arrays of a net.
The multipliers that are genuinely used when training can be obtained from a NetTrainResultsObject via the property "ArraysLearningRateMultipliers".

Examples

open all close all

Basic Examples (2)

Create and initialize a net with three layers, but train only the last layer:

Wolfram Language code: net = NetInitialize@NetChain[{LinearLayer[3], Ramp, LinearLayer[{}]}, "Input" -> "Real", "Output" -> "Real"]

Wolfram Language code: trained = NetTrain[net, {1 -> 1.9, 2 -> 4.1, 3 -> 6.0, 4 -> 8.1}, LearningRateMultipliers -> {3 -> 1, _ -> None}]

The biases of the first layer remain unmodified in the trained net:

Wolfram Language code: NetExtract[net, {1, "Biases"}] == NetExtract[trained, {1, "Biases"}]

The biases of the third layer have been trained:

Wolfram Language code: NetExtract[net, {3, "Biases"}] == NetExtract[trained, {3, "Biases"}]

Create a frozen layer with given array values:

Wolfram Language code: frozen = LinearLayer[3, "Weights" -> {{1}, {2}, {3}}, "Biases" -> {-1, -2, -3}, LearningRateMultipliers -> None]

Nest this layer inside a bigger net:

Wolfram Language code: net = NetChain[{frozen, Ramp, LinearLayer[{}]}, "Input" -> "Real", "Output" -> "Real"]

Train the net:

Wolfram Language code: trained = NetTrain[net, {1 -> 1.9, 2 -> 4.1, 3 -> 6.0, 4 -> 8.1}]

The arrays of the frozen layer were unchanged during training:

Wolfram Language code: Normal@NetExtract[trained, {{1, "Weights"}, {1, "Biases"}}]

Scope (1)

Replace LearningRateMultipliers in a Network (1)

Take a net:

Wolfram Language code: net = NetInitialize@NetChain[{LinearLayer[3], Ramp, LinearLayer[{}]}, "Input" -> "Real", "Output" -> "Real"]

Set the LearningRateMultipliers of the first layer of this net to zero:

Wolfram Language code: fnet = NetReplacePart[net, {1, LearningRateMultipliers} -> 0]

Check programmatically the values of LearningRateMultipliers options:

Wolfram Language code: NetExtract[net, {1, LearningRateMultipliers}]

Wolfram Language code: NetExtract[fnet, {1, LearningRateMultipliers}]

Applications (1)

Train an existing network to solve a new task. Obtain a pre-trained convolutional model that was trained on handwritten digits:

Wolfram Language code: lenet = NetModel["LeNet Trained on MNIST Data"]

Remove the final two layers, and attach two new layers, in order to classify images into 3 classes:

Wolfram Language code:

net = NetJoin[
	NetDrop[lenet, -2], 
	NetChain[{LinearLayer[], SoftmaxLayer[]}]];
net = NetReplacePart[net, "Output" -> NetDecoder[{"Class", {"x", "y", "z"}}]]

Generate training data by rasterizing the characters "x", "y", and "z" with a variety of fonts, sizes, and cases:

Wolfram Language code:

letterImage[str_, size_, style_, font_] := Rasterize[Style[str, style, FontSize -> size, FontFamily -> font], "Image", ImageSize -> {28, 28}];

Wolfram Language code:

trainingData = Table[
	letterImage[case[class], size, style, font] -> class, 
	{class, {"x", "y", "z"}}, 
	{font, {"Courier", "Helvetica", "Times New Roman"}}, 
	{style, {Plain, Italic, Bold}}, 
	{size, {7, 8, 9, 10}}, 
	{case, {ToLowerCase, ToUpperCase}}
	]//Flatten;

Wolfram Language code: Length[trainingData]

Wolfram Language code: RandomSample[trainingData, 10]

Train the modified network on the new task:

Wolfram Language code:

trained = NetTrain[net, trainingData, TimeGoal -> 10, LearningRateMultipliers -> {-2 -> 1, _ -> None}, ValidationSet -> Scaled[0.1]]

Classify an unseen letter:

Wolfram Language code: x = letterImage["x", 11, Italic, "Arial Narrow"]

Wolfram Language code: trained[x, "Probabilities"]

Measure the performance on the original training data, which includes the training and validation set:

Wolfram Language code: NetMeasurements[trained, trainingData, "Accuracy"]

Properties & Relations (1)

Train LeNet on the MNIST dataset with specific learning rate multipliers, returning a NetTrainResultsObject:

Wolfram Language code: results = NetTrain[NetModel["LeNet"], ResourceData["MNIST"], All, LearningRateMultipliers -> {4 ;; -> 2}]

Obtain the actual learning rate multipliers used on individual weight arrays:

Wolfram Language code: results["ArraysLearningRateMultipliers"]

Possible Issues (1)

When a shared array occurs at several places in the network, only a unique learning rate multiplier will be applied to all the occurrences of the shared array.

Create a network with shared arrays:

Wolfram Language code:

sharedlayer = NetInsertSharedArrays[LinearLayer[{}, "Input" -> "Real"]];
net = NetChain[{sharedlayer, Tanh, sharedlayer}]

Specifying a LearningRateMultipliers to a shared array in the network will assign the same multiplier to all places:

Wolfram Language code:

NetTrain[net, {1 -> 0, 0 -> 1}, "ArraysLearningRateMultipliers", TimeGoal -> 0.01, LearningRateMultipliers -> {{1, "Weights"} -> None}]

If there is a conflict, the first matching value will be used:

Wolfram Language code:

NetTrain[net, {1 -> 0, 0 -> 1}, "WeightsLearningRateMultipliers", TimeGoal -> 0.01, LearningRateMultipliers -> {{1, "Weights"} -> 0, {3, "Weights"} -> 2}]

The same happens when LearningRateMultipliers is specified when constructing the network:

Wolfram Language code:

sharedlayer = NetInsertSharedArrays[LinearLayer[{}, "Input" -> "Real"]];net2 = NetChain[{sharedlayer, Tanh, sharedlayer}, LearningRateMultipliers -> {{1, "Weights"} -> 0, {3, "Weights"} -> 2}]

Wolfram Language code: Information[net2, "ArraysLearningRateMultipliers"]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

LearningRateMultipliers

Details

Examples

Basic Examples (2)

Scope (1)

Replace LearningRateMultipliers in a Network (1)

Applications (1)

Properties & Relations (1)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

LearningRateMultipliers

Details

Examples

Basic Examples (2)

Scope (1)

Replace LearningRateMultipliers in a Network (1)

Applications (1)

Properties & Relations (1)

Possible Issues (1)

See Also

Tech Notes

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX