L iBd @ddlmZmZddlmZddlZddlmcmZ ddlm Z ddl m Z ddl mZgdZddZd ZGd d eZGd d e Z d ddde dedeedede f dZGdde Zd!de dedefdZGdde Z d"de dedededeede f dZy)#)autoEnum)OptionalN)Tensor)Module) parametrize) orthogonal spectral_norm weight_normcF|jd|jd}}tj||j|j}d|ztj |jj z}tj|j|z||S)Ndtypedeviceg$@)atol) sizetorcheyerrfinfoepsallclosemH)QrnkIds e/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/nn/utils/parametrizations.py_is_orthogonalrsq 66":qvvbzqA 1AGGAHH 5B (U[[)-- -C >>!$$(BS 11ctj|\}}tjj||}||j ddj j dz}|S)zAssume that A is a tall matrix. Compute the Q factor s.t. A = QR (A may be complex) and diag(R) is real and non-negative. r rdim1dim2)rgeqrflinalghouseholder_productdiagonalsgn unsqueeze)AXtaurs r_make_orthogonalr.s[ [[^FAs ((C0A" % ) ) + 5 5b 99A Hr c6eZdZeZeZeZy) _OrthMapsN)__name__ __module__ __qualname__r matrix_expcayley householderr rr0r0#sJ VF&Kr r0ceZdZUeed<dddeddffdZdejdejfd Zejjd ejdejfd Z xZ S) _OrthogonalbaseTuse_trivializationorthogonal_mapreturnNct||jr|tjk(r t d|j |_||_|r|jddyy)NzAThe householder parametrization does not support complex tensors.r:) super__init__ is_complexr0r6 ValueErrorshaper=register_buffer)selfweightr=r< __class__s rrAz_Orthogonal.__init__,sh     >Y5J5J#JS \\ ,    . r r,c |jd|jd}}||k}|r|j}||}}|jtjk(s|jtj k(rH|j }||k7rMtj||j|||z jg|jddddgd}||jz }|jtjk(rtj|}n|jtj k(rxtj||j|j}tj j#tj$||dtj$||d}||k7rdd|f}n|j d }d d ||zj'dzz }tj j)||}||j+dd j-j/dz}t1|d r|j2z}|r j}S)Nr rdimrg)alphag?.)r(g@g?r"r:)rmTr=r0r4r5trilrcat new_zerosexpandrDrrrrr&solveaddsumr'r(intr*hasattrr:) rFr,rr transposedr+rrr-s rforwardz_Orthogonal.forwardGs vvbz166":1U AaqA   9#7#7 7""i&6&66AAvII4 Aq1u-44KaggcrlKBKKLRTADDA""i&:&::$$Q'$$ (8(88YYqALL&&IIb!40%))B2MAvc2A2gJ#AA{{r{223C 00C8AAJJBRJ0446@@DDA 4  A A Ar rc|j|jk7r&td|jd|jd|}|jd|jd}}||k}|r|j}||}}t |ds|j t jk(s|j t jk(r tdtj|\}}|jddj|jdd|d k(xxdzcc<|r |jS|S||k(r(t|s t|}nx|j!}ngtj"|jdd|||z fz|j$|j&d }tj(||gd }t|}||_tj,|} | jddj/d | S) Nz0Expected a matrix or batch of matrices of shape z. Got a tensor of shape .r rr:ztIt is not possible to assign to the matrix exponential or the Cayley parametrizations when use_trivialization=False.r"grrJg)rDrCrrMrVr=r0r5r4NotImplementedErrorrr%r(sign_rr.clonerandnrrrOr: zeros_likefill_) rFrQ_initrr transposer+r-Nneg_Ids r right_inversez_Orthogonal.right_inversevs 77djj B4::,O))* 4  vvbz166":1E AaqAtV$##y'7'77&&)*>*>>)T[[^FAs JJBRJ ( . . 0 JJBRJ ( 4 : 4$144 +! +Av%a((+A AKKffhsmq!a%j0IIq!f"-$Q'DI %%f-F OO"O - 3 3D 9Mr ) r1r2r3r__annotations__r0rArrXautogradno_gradre __classcell__rHs@rr9r9)s{ LHL/&// /6--%,,-^ ^^Eu||E EEr r9Tr;modulenamer=r<r>ct||d}t|tstd|d|d|jdkrtd|jd|7|j d|j d k(s|j rd nd }tt|d}|td |t||| }tj|||d|S)a#Apply an orthogonal or unitary parametrization to a matrix or a batch of matrices. Letting :math:`\mathbb{K}` be :math:`\mathbb{R}` or :math:`\mathbb{C}`, the parametrized matrix :math:`Q \in \mathbb{K}^{m \times n}` is **orthogonal** as .. math:: \begin{align*} Q^{\text{H}}Q &= \mathrm{I}_n \mathrlap{\qquad \text{if }m \geq n}\\ QQ^{\text{H}} &= \mathrm{I}_m \mathrlap{\qquad \text{if }m < n} \end{align*} where :math:`Q^{\text{H}}` is the conjugate transpose when :math:`Q` is complex and the transpose when :math:`Q` is real-valued, and :math:`\mathrm{I}_n` is the `n`-dimensional identity matrix. In plain words, :math:`Q` will have orthonormal columns whenever :math:`m \geq n` and orthonormal rows otherwise. If the tensor has more than two dimensions, we consider it as a batch of matrices of shape `(..., m, n)`. The matrix :math:`Q` may be parametrized via three different ``orthogonal_map`` in terms of the original tensor: - ``"matrix_exp"``/``"cayley"``: the :func:`~torch.matrix_exp` :math:`Q = \exp(A)` and the `Cayley map`_ :math:`Q = (\mathrm{I}_n + A/2)(\mathrm{I}_n - A/2)^{-1}` are applied to a skew-symmetric :math:`A` to give an orthogonal matrix. - ``"householder"``: computes a product of Householder reflectors (:func:`~torch.linalg.householder_product`). ``"matrix_exp"``/``"cayley"`` often make the parametrized weight converge faster than ``"householder"``, but they are slower to compute for very thin or very wide matrices. If ``use_trivialization=True`` (default), the parametrization implements the "Dynamic Trivialization Framework", where an extra matrix :math:`B \in \mathbb{K}^{n \times n}` is stored under ``module.parametrizations.weight[0].base``. This helps the convergence of the parametrized layer at the expense of some extra memory use. See `Trivializations for Gradient-Based Optimization on Manifolds`_ . Initial value of :math:`Q`: If the original tensor is not parametrized and ``use_trivialization=True`` (default), the initial value of :math:`Q` is that of the original tensor if it is orthogonal (or unitary in the complex case) and it is orthogonalized via the QR decomposition otherwise (see :func:`torch.linalg.qr`). Same happens when it is not parametrized and ``orthogonal_map="householder"`` even when ``use_trivialization=False``. Otherwise, the initial value is the result of the composition of all the registered parametrizations applied to the original tensor. .. note:: This function is implemented using the parametrization functionality in :func:`~torch.nn.utils.parametrize.register_parametrization`. .. _`Cayley map`: https://en.wikipedia.org/wiki/Cayley_transform#Matrix_map .. _`Trivializations for Gradient-Based Optimization on Manifolds`: https://arxiv.org/abs/1909.09501 Args: module (nn.Module): module on which to register the parametrization. name (str, optional): name of the tensor to make orthogonal. Default: ``"weight"``. orthogonal_map (str, optional): One of the following: ``"matrix_exp"``, ``"cayley"``, ``"householder"``. Default: ``"matrix_exp"`` if the matrix is square or complex, ``"householder"`` otherwise. use_trivialization (bool, optional): whether to use the dynamic trivialization framework. Default: ``True``. Returns: The original module with an orthogonal parametrization registered to the specified weight Example:: >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_LAPACK) >>> orth_linear = orthogonal(nn.Linear(20, 40)) >>> orth_linear ParametrizedLinear( in_features=20, out_features=40, bias=True (parametrizations): ModuleDict( (weight): ParametrizationList( (0): _Orthogonal() ) ) ) >>> # xdoctest: +IGNORE_WANT >>> Q = orth_linear.weight >>> torch.dist(Q.T @ Q, torch.eye(20)) tensor(4.9332e-07) NModule '(' has no parameter or buffer with name ''z8Expected a matrix or batch of matrices. Got a tensor of z dimensions.r rr4r6zLorthogonal_map has to be one of "matrix_exp", "cayley", "householder". Got: r;Tunsafe) getattr isinstancerrCndimrrBr0r9rregister_parametrization)rkrlr=r<rG orth_enumorths rr r svVT4 (F ff %vhFtfA N  {{Q %{{m< 9  {{2&++b/1V5F5F5H    >48I "# %   vy=O PD((tDI Mr c>eZdZ ddeeddffd ZdZdZxZS) _WeightNormrKr>Nc8t||d}||_y)Nr)r@rArK)rFrKrHs rrAz_WeightNorm.__init__;s!  ;Cr cDtj|||jSN)r _weight_normrK)rFweight_gweight_vs rrXz_WeightNorm.forwardDs!!(Hdhh??r cPtj|d|j}|}||fS)Nrq)rnorm_except_dimrK)rFrGrrs rrez_WeightNorm.right_inverseGs*((DHH=!!r )r) r1r2r3rrUrArXrerirjs@rr{r{:s. c] @"r r{rKc|t|}tj||dfd}|j||S)aRApply weight normalization to a parameter in the given module. .. math:: \mathbf{w} = g \dfrac{\mathbf{v}}{\|\mathbf{v}\|} Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This replaces the parameter specified by :attr:`name` with two parameters: one specifying the magnitude and one specifying the direction. By default, with ``dim=0``, the norm is computed independently per output channel/plane. To compute a norm over the entire weight tensor, use ``dim=None``. See https://arxiv.org/abs/1602.07868 Args: module (Module): containing module name (str, optional): name of weight parameter dim (int, optional): dimension over which to compute the norm Returns: The original module with the weight norm hook Example:: >>> m = weight_norm(nn.Linear(20, 40), name='weight') >>> m ParametrizedLinear( in_features=20, out_features=40, bias=True (parametrizations): ModuleDict( (weight): ParametrizationList( (0): _WeightNorm() ) ) ) >>> m.parametrizations.weight.original0.size() torch.Size([40, 1]) >>> m.parametrizations.weight.original1.size() torch.Size([40, 20]) Trrc| d}| d}||vr>||vr9|j|} |j|} | ||d d<| ||d d<yyy)N_g_vzparametrizations.z .original0z .original1)pop) state_dictprefixlocal_metadatastrict missing_keysunexpected_keys error_msgsg_keyv_key original0 original1rls r_weight_norm_compat_hookz-weight_norm.._weight_norm_compat_hook|s(4&#(4&# J 5J#6"u-I"u-IGPJ&!24& C DGPJ&!24& C D $7 r )r{rrw"_register_load_state_dict_pre_hook)rkrlrKrrs ` rr r Ns@Vs#L((|DQQ" --.FG Mr c ZeZdZ ddejdedededdf fd ZdejdejfdZejjd ejdeddfd Z dejdejfd Z d ejdejfd Z xZS) _SpectralNormrGn_power_iterationsrKrr>Nc t ||j}||k\s|| krtd|d|dz d|d|dkrt d||dk\r|n||z|_||_|dkDr||_|j|}|j\}}|j|jdd} |j|jdd} |jdtj| d|j |jd tj| d|j |j|d yy) Nz5Dimension out of range (expected to be in range of [-z, z ] but got )rzGExpected n_power_iterations to be positive, but got n_power_iterations=_urKrr)r@rArv IndexErrorrCrKrr_reshape_weight_to_matrixr new_emptynormal_rEF normalize _power_method) rFrGrrKrrv weight_mathwuvrHs rrAz_SpectralNorm.__init__s[ {{ $;#+F"TAXJjQ8   "**<)=? (3d  !8&8D #77?J??$DAq$$Q'//15A$$Q'//15A  q{{1!'J K  q{{1!'J K   z2 . r c|jdkDsJjdk7r=|jjgfdt|jD}|j dS)Nrrc3BK|]}|jk7s|ywr~rJ).0drFs r z:_SpectralNorm._reshape_weight_to_matrix..sL!a488mALs)rvrKpermuterangeflatten)rFrGs` rrz'_SpectralNorm._reshape_weight_to_matrixs^{{Q 88q=#V^^LuVZZ\':LF~~a  r rc|jdkDsJt|D]}tjt j ||j d|j|j|_tjt j |j|jd|j|j |_y)Nrr)rKrout) rvrrrrmvrrrH)rFrr_s rrz_SpectralNorm._power_methodsD""")* AkkTWW-HHGG DG kktww/HHGG DG r c|jdk(r"tj|d|jS|j |}|j r|j ||j|jjtj}|jjtj}tj|tj||}||z S)Nrrr) memory_format)rvrrrrtrainingrrrr]rcontiguous_formatrvdotr)rFrGrrrsigmas rrXz_SpectralNorm.forwards ;;! ;;v1$((; ;77?J}}"":t/F/FG E,C,C DA E,C,C DAJJq%((:q"9:EE> !r valuec|Sr~r7)rFrs rrez_SpectralNorm.right_inverse s  r )rr-q=)r1r2r3rrrUfloatrArrgrhrrXrerirjs@rrrs#$ #/ #/ #/ #/  #/  #/J ! ! ! ^^2 2#2RV22h"ell"u||""5<<ELLr rrrc ft||d}t|tstd|d|d|\t|tj j tj jtj jfrd}nd}tj||t|||||S)a[ Apply spectral normalization to a parameter in the given module. .. math:: \mathbf{W}_{SN} = \dfrac{\mathbf{W}}{\sigma(\mathbf{W})}, \sigma(\mathbf{W}) = \max_{\mathbf{h}: \mathbf{h} \ne 0} \dfrac{\|\mathbf{W} \mathbf{h}\|_2}{\|\mathbf{h}\|_2} When applied on a vector, it simplifies to .. math:: \mathbf{x}_{SN} = \dfrac{\mathbf{x}}{\|\mathbf{x}\|_2} Spectral normalization stabilizes the training of discriminators (critics) in Generative Adversarial Networks (GANs) by reducing the Lipschitz constant of the model. :math:`\sigma` is approximated performing one iteration of the `power method`_ every time the weight is accessed. If the dimension of the weight tensor is greater than 2, it is reshaped to 2D in power iteration method to get spectral norm. See `Spectral Normalization for Generative Adversarial Networks`_ . .. _`power method`: https://en.wikipedia.org/wiki/Power_iteration .. _`Spectral Normalization for Generative Adversarial Networks`: https://arxiv.org/abs/1802.05957 .. note:: This function is implemented using the parametrization functionality in :func:`~torch.nn.utils.parametrize.register_parametrization`. It is a reimplementation of :func:`torch.nn.utils.spectral_norm`. .. note:: When this constraint is registered, the singular vectors associated to the largest singular value are estimated rather than sampled at random. These are then updated performing :attr:`n_power_iterations` of the `power method`_ whenever the tensor is accessed with the module on `training` mode. .. note:: If the `_SpectralNorm` module, i.e., `module.parametrization.weight[idx]`, is in training mode on removal, it will perform another power iteration. If you'd like to avoid this iteration, set the module to eval mode before its removal. Args: module (nn.Module): containing module name (str, optional): name of weight parameter. Default: ``"weight"``. n_power_iterations (int, optional): number of power iterations to calculate spectral norm. Default: ``1``. eps (float, optional): epsilon for numerical stability in calculating norms. Default: ``1e-12``. dim (int, optional): dimension corresponding to number of outputs. Default: ``0``, except for modules that are instances of ConvTranspose{1,2,3}d, when it is ``1`` Returns: The original module with a new parametrization registered to the specified weight Example:: >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_LAPACK) >>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> snm = spectral_norm(nn.Linear(20, 40)) >>> snm ParametrizedLinear( in_features=20, out_features=40, bias=True (parametrizations): ModuleDict( (weight): ParametrizationList( (0): _SpectralNorm() ) ) ) >>> torch.linalg.matrix_norm(snm.weight, 2) tensor(1.0081, grad_fn=) Nrnrorprr) rtrurrCrnnConvTranspose1dConvTranspose2dConvTranspose3drrwr)rkrlrrrKrGs rr r s`VT4 (F ff %vhFtfA N   {  ((((((  CC((mF,>SI Mr r~)rGN)rGr)rGrrN)enumrrtypingrrtorch.nn.functionalr functionalrrtorch.nn.modulesrtorch.nn.utilsr__all__rr.r0r9strboolr r{rUr rrr r7r rrs* #& 92   S&Sp$(x $ x x xSMx  x  xv"&"(@@c@3@F{F{@ e e ee  e # e  er