L i[-dddlZddlmZmZmZddlZddlmZdgZGddeZ GddZ y)N)AnyOptionalProtocol)is_lazyLazyModuleMixinceZdZdZdZddddZdZdZdZe d Z e d Z e d Z e d Z e d Zy) _LazyProtocolzThis class is used to avoid errors with mypy checks for the attributes in a mixin. https://mypy.readthedocs.io/en/latest/more_types.html#mixin-classes cyN)selfhooks [/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/nn/modules/lazy.py"_register_load_state_dict_pre_hookz0_LazyProtocol._register_load_state_dict_pre_hookF)prepend with_kwargscyr r )r rrrs rregister_forward_pre_hookz'_LazyProtocol.register_forward_pre_hookrrcyr r )r state_dictprefixlocal_metadatastrict missing_keysunexpected_keys error_msgss r_lazy_load_hookz_LazyProtocol._lazy_load_hooks rcyr r r s r _get_namez_LazyProtocol._get_name!rrcyr r )r moduleinputs r_infer_parametersz_LazyProtocol._infer_parameters#rrcyr r r!s r _parametersz_LazyProtocol._parameters%srcyr r r!s r_buffersz_LazyProtocol._buffers(srcyr r r!s r_non_persistent_buffers_setz)_LazyProtocol._non_persistent_buffers_set+s+.rcyr r r!s r _load_hookz_LazyProtocol._load_hook.srcyr r r!s r_initialize_hookz_LazyProtocol._initialize_hook1s #rN)__name__ __module__ __qualname____doc__rrrr"r&propertyr(r*r,r.r0r rrr r sg <9>EW  3   ..  ##rr ceZdZUdZdZeeeed<de ffd Z de fdZ de fdZ de fdZ de fd Zd de fd Zde fd ZxZS) raA mixin for modules that lazily initialize parameters, also known as "lazy modules". .. warning: Lazy modules are an experimental new feature under active development, and their API is likely to change. Modules that lazily initialize parameters, or "lazy modules", derive the shapes of their parameters from the first input(s) to their forward method. Until that first forward they contain :class:`torch.nn.UninitializedParameter` s that should not be accessed or used, and afterward they contain regular :class:`torch.nn.Parameter` s. Lazy modules are convenient since they don't require computing some module arguments, like the :attr:`in_features` argument of a typical :class:`torch.nn.Linear`. After construction, networks with lazy modules should first be converted to the desired dtype and placed on the expected device. This is because lazy modules only perform shape inference so the usual dtype and device placement behavior applies. The lazy modules should then perform "dry runs" to initialize all the components in the module. These "dry runs" send inputs of the correct size, dtype, and device through the network and to each one of its lazy modules. After this the network can be used as usual. >>> # xdoctest: +SKIP >>> class LazyMLP(torch.nn.Module): ... def __init__(self) -> None: ... super().__init__() ... self.fc1 = torch.nn.LazyLinear(10) ... self.relu1 = torch.nn.ReLU() ... self.fc2 = torch.nn.LazyLinear(1) ... self.relu2 = torch.nn.ReLU() ... ... def forward(self, input): ... x = self.relu1(self.fc1(input)) ... y = self.relu2(self.fc2(x)) ... return y >>> # constructs a network with lazy modules >>> lazy_mlp = LazyMLP() >>> # transforms the network's device and dtype >>> # NOTE: these transforms can and should be applied after construction and before any 'dry runs' >>> lazy_mlp = lazy_mlp.cuda() >>> lazy_mlp LazyMLP( (fc1): LazyLinear(in_features=0, out_features=10, bias=True) (relu1): ReLU() (fc2): LazyLinear(in_features=0, out_features=1, bias=True) (relu2): ReLU() ) >>> # performs a dry run to initialize the network's lazy modules >>> lazy_mlp(torch.ones(10, 10).cuda()) >>> # after initialization, LazyLinear modules become regular Linear modules >>> lazy_mlp LazyMLP( (fc1): Linear(in_features=10, out_features=10, bias=True) (relu1): ReLU() (fc2): Linear(in_features=10, out_features=1, bias=True) (relu2): ReLU() ) >>> # attaches an optimizer, since parameters can now be used as usual >>> optim = torch.optim.SGD(lazy_mlp.parameters(), lr=0.01) A final caveat when using lazy modules is that the order of initialization of a network's parameters may change, since the lazy modules are always initialized after other modules. For example, if the LazyMLP class defined above had a :class:`torch.nn.LazyLinear` module first and then a regular :class:`torch.nn.Linear` second, the second module would be initialized on construction and the first module would be initialized during the first dry run. This can cause the parameters of a network using lazy modules to be initialized differently than the parameters of a network without lazy modules as the order of parameter initializations, which often depends on a stateful random number generator, is different. Check :doc:`/notes/randomness` for more details. Lazy modules can be serialized with a state dict like other modules. For example: >>> lazy_mlp = LazyMLP() >>> # The state dict shows the uninitialized parameters >>> lazy_mlp.state_dict() OrderedDict({'fc1.weight': , 'fc1.bias': , 'fc2.weight': , 'fc2.bias': }) Lazy modules can load regular :class:`torch.nn.Parameter` s (i.e. you can serialize/deserialize initialized LazyModules and they will remain initialized) >>> full_mlp = LazyMLP() >>> # Dry run to initialize another module >>> full_mlp.forward(torch.ones(10, 1)) >>> # Load an initialized state into a lazy module >>> lazy_mlp.load_state_dict(full_mlp.state_dict()) >>> # The state dict now holds valid values >>> lazy_mlp.state_dict() OrderedDict([('fc1.weight', tensor([[-0.3837], [ 0.0907], [ 0.6708], [-0.5223], [-0.9028], [ 0.2851], [-0.4537], [ 0.6813], [ 0.5766], [-0.8678]])), ('fc1.bias', tensor([-1.8832e+25, 4.5636e-41, -1.8832e+25, 4.5636e-41, -6.1598e-30, 4.5637e-41, -1.8788e+22, 4.5636e-41, -2.0042e-31, 4.5637e-41])), ('fc2.weight', tensor([[ 0.1320, 0.2938, 0.0679, 0.2793, 0.1088, -0.1795, -0.2301, 0.2807, 0.2479, 0.1091]])), ('fc2.bias', tensor([0.0019]))]) Note, however, that the loaded parameters will not be replaced when doing a "dry run" if they are initialized when the state is loaded. This prevents using initialized modules in different contexts. N cls_to_becomer ct||i||j|j|_|j |j d|_y)NT)r)super__init__rrr.rr&r0)r argskwargs __class__s rr:zLazyModuleMixin.__init__sP $)&)AA$BVBVW $ > >  " "!?! rcJ|jjD]-\}}| t|s|s|j}||||z</|jjD]<\}}| ||j vst|s|s|j}||||z<>yr )r(itemsrdetachr*r,)r destinationr keep_varsnameparambufs r_save_to_state_dictz#LazyModuleMixin._save_to_state_dicts ++113 3KD% )!LLNE-2 FTM*  3 ,,. 1ID#4t/O/O#O **,C-0 FTM*  1rc|tj|jj|jjD]g\}} ||z} | |vs| || } t | s$t | r0t j5| j| jdddiy#1swYtxYw)aload_state_dict pre-hook function for lazy buffers and parameters. The purpose of this hook is to adjust the current state and/or ``state_dict`` being loaded so that a module instance serialized in both un/initialized state can be deserialized onto both un/initialized module instance. See comment in ``torch.nn.Module._register_load_state_dict_pre_hook`` for the details of the hook specification. N) itertoolschainr(r?r*rtorchno_grad materializeshape) r rrrrrrrrCrDkey input_params rrzLazyModuleMixin._lazy_load_hooks&%??    " " $dmm&9&9&; AKD%4-Cj U%6(o 5>#;/"]]_A!--k.?.?@AA AAAs B22B; cFtd|jj)zInitialize parameters according to the input batch properties. This adds an interface to isolate parameter initialization from the forward pass when doing parameter shape inference. z-initialize_parameters is not implemented for )NotImplementedErrorr=r1)r r;r<s rinitialize_parametersz%LazyModuleMixin.initialize_parameterss& ";DNNrds7** &  &$H&$R_ _ r