A torch.nn.Linear module with lazy initialization.
The first sentence should talk about the use, it’s not immediate obvious what lazy initialization means unless you are already familiar. Something like “…with lazy initialization, useful for automatically inferring the
in_features parameter of an
nn.Linear module by running a sample input through the network”
After construction, networks with lazy modules should first be converted to the desired dtype and placed on the desired device. The lazy modules should then be initialized with one or more “dry runs”. These “dry runs” send inputs of the correct size, dtype, and device through the network and to each one of its lazy modules. After this the network can be used as usual.
why one or more dry runs instead of just 1? It’s not clear.
A final caveat when using lazy modules is that the order of initialization of a network’s parameters may change, since the lazy modules are always initialized after other modules. This can cause the parameters of a network using lazy modules to be initialized differently than the parameters of a network without lazy modules. For example, if the LazyMLP class defined above had a torch.nn.LazyLinear module first and then a regular torch.nn.Linear second, the second module would be initialized on construction and the first module would be initialized during the first dry run.
it’s not clear why I would care about the order of initialization. I think this is suggesting something about random initialization / determinism, should probably link to the existing determinism page.
Lazy modules can also load regular torch.nn.Parameter s, which replace their torch.nn.UninitializedParameter s:
maybe say something like “load regular nn.Parameters (i.e. you can serialize/deserialize initialized LazyModules and they will remain initialized)
Note, however, that lazy modules cannot validate that the shape of parameters they load is correct.
It’s not clear what this means. I think it’s saying something like, because the initialization in this case is only done once, the module cannot be used in other contexts?
After construction, networks with lazy modules should first be converted to the desired dtype and placed on the desired device
Probably useful to explain what’s going on here. Ie. something like, “…placed on the desired device. This is because lazy modules only perform shape inference — the usual dtype and device placement behavior applies.”