L ilNTdZddlmZmZmZddlZddlmZddlmZm Z m Z m Z m Z m Z mZmZmZmZmZmZddgZGd deZd d ed e d e d e d e d zdze_ d$deedeedeeededeedeedeedeededededededefdZdeedeedeeedeedeedededededededefd Zdeedeedeeedeedeedededededededefd!Zdeedeedeeedeedeedededededededed"dfd#Zy)%z9Implementation for Stochastic Gradient Descent optimizer.)castOptionalUnionN)Tensor) _default_to_fused_or_foreach_device_dtype_check_for_fused_differentiable_doc _foreach_doc _fused_doc _maximize_doc _params_doc _to_scalar_use_grad_for_differentiable DeviceDict OptimizerParamsTSGDsgdceZdZ dddddddedeeefdededeeefd ed ed eed ed eeffdZ fdZ dZ e ddZ xZS)rFN)maximizeforeachdifferentiablefusedparamslrmomentum dampening weight_decaynesterovrrrrc t|tr|jdk7r td|dkrtd||dkrtd||dkrtd||||||||| | d } |r|dks|dk7r td t ||| | r)d |_d |_| r td |r td yy) NrzTensor lr must be 1-elementgzInvalid learning rate: zInvalid momentum value: zInvalid weight_decay value: ) rrrrr rrrrrz8Nesterov momentum requires a momentum and zero dampeningTz)`fused` does not support `differentiable`z0`fused` and `foreach` cannot be `True` together.) isinstancernumel ValueErrorsuper__init___step_supports_amp_scaling"_need_device_dtype_check_for_fused RuntimeError) selfrrrrrr rrrrdefaults __class__s U/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/optim/sgd.pyr&z SGD.__init__s b& !bhhjAo:; ; 86rd;< < c>7zBC C # ;L>JK K "(  ,   Q)q.WX X * .2D +6:D 3"#NOO"#UVV ct|||jD]\}|jdd|jdd|jdd|jdd|jdd^y)Nr Frrrr)r% __setstate__ param_groups setdefault)r*stategroupr,s r-r0zSGD.__setstate__Ksv U#&& -E   Z /   Z /   Y -   -u 5   We ,  -r.cxd}|dD]}|j|drt|ddrt|d|_|j ||j |j|jj rd}|ddk7s|j |}|j |jd|S) NFrrr(Trrmomentum_buffer)gradgetattrr r(append is_sparser3get)r*r4rgradsmomentum_buffer_listhas_sparse_gradpr3s r- _init_groupzSGD._init_groupTsx NAvv!>g>'2!4>CD; a  QVV$66##&*O$) JJqME(// :K0LM N r.cd}|$tj5|}ddd|jD]}g}g}g}|j||||}t ||||d|d|d|d|d|d||d|d t |d dt |d d |dd k7smt ||D]\}} |j|} | | d<|S#1swYxYw)zPerform a single optimization step. Args: closure (Callable, optional): A closure that reevaluates the model and returns the loss. Nrrrrr rrr grad_scale found_inf) rrrrr rr>rrrBrCrr6)torch enable_gradr1r@rr8zipr3) r*closurelossr4rr<r=r>r?r6r3s r-stepzSGD.stepis0  ""$ !y !&& ?E#%F"$E;= "..vu&:O $">2z*; ,z*z* /i(Gn"4t<!$ T: "Z A%*-f6J*K?&A JJqME/>E+,?9 ?@ G ! !s CC)gMbP?rrrFN)__name__ __module__ __qualname__rrfloatrboolrr&r0r@rrI __classcell__)r,s@r-rrs$(-.,W"&$ $,W,W %- ,W ,W  ,W E6M* ,W,W,W$,W,W~,W\-*",",r.a Implements stochastic gradient descent (optionally with momentum). .. math:: \begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)}, \: f(\theta) \text{ (objective)}, \: \lambda \text{ (weight decay)}, \\ &\hspace{13mm} \:\mu \text{ (momentum)}, \:\tau \text{ (dampening)}, \:\textit{ nesterov,}\:\textit{ maximize} \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}\textbf{if} \: \textit{maximize}: \\ &\hspace{10mm}g_t \leftarrow -\nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}\textbf{else} \\ &\hspace{10mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}\textbf{if} \: \lambda \neq 0 \\ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\ &\hspace{5mm}\textbf{if} \: \mu \neq 0 \\ &\hspace{10mm}\textbf{if} \: t > 1 \\ &\hspace{15mm} \textbf{b}_t \leftarrow \mu \textbf{b}_{t-1} + (1-\tau) g_t \\ &\hspace{10mm}\textbf{else} \\ &\hspace{15mm} \textbf{b}_t \leftarrow g_t \\ &\hspace{10mm}\textbf{if} \: \textit{nesterov} \\ &\hspace{15mm} g_t \leftarrow g_{t} + \mu \textbf{b}_t \\ &\hspace{10mm}\textbf{else} \\[-1.ex] &\hspace{15mm} g_t \leftarrow \textbf{b}_t \\ &\hspace{5mm}\theta_t \leftarrow \theta_{t-1} - \gamma g_t \\[-1.ex] &\rule{110mm}{0.4pt} \\[-1.ex] &\bf{return} \: \theta_t \\[-1.ex] &\rule{110mm}{0.4pt} \\[-1.ex] \end{aligned} Nesterov momentum is based on the formula from `On the importance of initialization and momentum in deep learning`__. z Args: a lr (float, Tensor, optional): learning rate (default: 1e-3) momentum (float, optional): momentum factor (default: 0) dampening (float, optional): dampening for momentum (default: 0) weight_decay (float, optional): weight decay (L2 penalty) (default: 0) nesterov (bool, optional): enables Nesterov momentum. Only applicable when momentum is non-zero. (default: False) z z aL Example: >>> # xdoctest: +SKIP >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step() __ http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf .. note:: The implementation of SGD with Momentum/Nesterov subtly differs from Sutskever et al. and implementations in some other frameworks. Considering the specific case of Momentum, the update can be written as .. math:: \begin{aligned} v_{t+1} & = \mu * v_{t} + g_{t+1}, \\ p_{t+1} & = p_{t} - \text{lr} * v_{t+1}, \end{aligned} where :math:`p`, :math:`g`, :math:`v` and :math:`\mu` denote the parameters, gradient, velocity, and momentum respectively. This is in contrast to Sutskever et al. and other frameworks which employ an update of the form .. math:: \begin{aligned} v_{t+1} & = \mu * v_{t} + \text{lr} * g_{t+1}, \\ p_{t+1} & = p_{t} - v_{t+1}. \end{aligned} The Nesterov version is analogously modified. Moreover, the initial value of the momentum buffer is set to the gradient value at the first step. This is in contrast to some other frameworks that initialize it to all zeros. One notable side effect of this decision is that the first momentum value will not be scaled by dampening. Dampening will be applied starting at the second step. rd_p_listr=r>rrrBrCrrrrr rc|6|4tjjst|dd\}}nd}d}|d}|d}|r)tjjr t d|r)tjjr t d|r%tjjst }n-|r%tjjst }nt}|||||| | | | || || y)zlFunctional API that performs SGD algorithm computation. See :class:`~torch.optim.SGD` for details. NF)r use_fusedz6torch.jit.script not supported with foreach optimizersz4torch.jit.script not supported with fused optimizers) rrrrr r>rrBrC)rDjit is_scriptingrr)_multi_tensor_sgd _fused_sgd_single_tensor_sgd)rrQr=r>rrrBrCrrrrr rfuncs r-rrs65=yy%%'9uNE7GE }599))+STT '')QRRuyy--/  uyy--/!! ' r.r<c ||Jtjjs t|}t |D]E\} } | s|| n||  }|dk7rdt |t rA|jr!|j| j|}n'|j| |}n|j| |}|dk7rh|| }|$|jj}||| <n%|j|j|d|z | r|j||}n|}t |t r8|jr| j||d| j|| 2| j|| Hy)Nralphar)value)rDrTrUr enumerater"r requires_gradaddcmul_cloneadddetachmul_add_)rr<r=rBrCrrrrr rr>iparamr7bufs r-rXrX@si  )"33 3 99 ! ! # ^f%#(5'uQxeAhY 1 ,/--== ED88E8>Dxx\x: q=&q)C{kkm))+*-$Q'"''A M'Bxx8x4 b& !tRr2 4s + JJtB3J 'G#(r.c ||Jt|dk(ryt|}tj|||gd} | j D]\\} }}}t t t| }t t t|}| xrtd|D}| rtj|}|dk7r3| rtj|||ntj|||}|dk7r4g}d}tt|D]-}||d}n$|jt t||/|r2tj||tj||d|z ng}tt|D]}||/||j!j#x}x||<|||<n;t t||}|j%|j'||d|z |j|| rtj|||n|}|st)|tjrMtj*j-r/tj.|| }tj||Atj||| \tt|D]}||j'||| y)NrT with_indicesc34K|]}|jywrJ)r:).0r7s r- z$_multi_tensor_sgd..s9 #DNN9 sr[Fr)lenrr"_group_tensors_by_device_and_dtypevaluesrlistranyrD _foreach_neg _foreach_add_ _foreach_addranger9 _foreach_mul_rdrbrerfr"compiler is_compiling _foreach_mul)rr<r=rBrCrrrrr rr>grouped_tensorsdevice_params_ device_grads_device_momentum_buffer_listindices device_params device_gradsdevice_has_sparse_gradbufsall_states_with_momentum_bufferrgri grads_x_lrs r-rVrVzs  )"33 3 6{a BBBB ,-O"((* BB  #&*4<&H %)$v, %F !0" S9 '39 6   --l;L 1 ##L-|T$11 -|   q=!#D.2 +3:;< N.q196;3KKV-H-K LM  N/##D(3##D,a)mLs#>?@ %A215=)O224::<==9!##M<sK3}-. Ba %%l1obS%A BCBBr.returnc|sy| r td||j|ini} ||j|ini} |dk(}td|Dxr| }|r+t|D]\}}t j |||<t j|||gd}|jD]\\}}\\}}}}ttt|}ttt|}d\}}|!| j||j|}| #|!| j||j|}t j|||rgnttt|||||| | ||| y)Nz.`_fused_sgd` does not support sparse gradientsrc3$K|]}|du ywrJ)rnts r-roz_fused_sgd..s 4!AI 4sFrk)NN) rrrrr r is_first_steprBrC)r)deviceallr_rD empty_likerrqitemsrrsrr2to _fused_sgd_)rr<r=rBrCrrrrr rr>grad_scale_dictfound_inf_dictno_momentum_bufferrrggr}r_r~rrrrdevice_grad_scaledevice_found_infs r-rWrWs KLL+5+A  J'r*3)>  9%B"Q 43 44O=O9Oe$ :DAq&+&6&6q&9  # :BB ,-O       D(C &*4<&H %)$v, %F .8++  ! / : : f-!   %)*?-88fAUV    ! d6l$?@%'(&  r.)FNNNN)__doc__typingrrrrDr optimizerrr r r r r rrrrrr__all__rrsrOrNrrXrVrWrr.r-rsg@((      %.z)z|"F         G0b+c\ N"" #'"&C LC6lCx/0C  Cd^C D>C CCCC CC !C"#CL7( L7( <7(x/07( 7(  7(7(7( 7(7(7(7(7(t\B L\B <\Bx/0\B \B  \B\B\B \B\B\B\B\B~A LA <A x/0A  A  A A A  A A A A A  A r.