L iQ!rddlmZmZmZddlZddlmZddlmZmZm Z m Z m Z m Z m Z mZmZmZmZmZmZddgZGddeZd d ed e d e d e d ze_ d$deedeedeedeedeedeedeededeedededededededef dZdZdeedeedeedeedeedeededededededededefd Zdeedeedeedeedeedeededededededededefd!Zdeedeedeedeedeedeededededededededed"dfd#Zy)%)castOptionalUnionN)Tensor) _default_to_fused_or_foreach_device_dtype_check_for_fused_differentiable_doc _foreach_doc_get_scalar_dtype _get_value _maximize_doc _params_doc _to_scalar_use_grad_for_differentiable _view_as_real OptimizerParamsTAdagradadagradceZdZ ddddddedeeefdededed ed eed ed ed eeffdZ fdZ dZ dZ e ddZxZS)rNF)maximizedifferentiablefusedparamslrlr_decay weight_decayinitial_accumulator_valueepsforeachrrrc 6t|tr|jdk7r tdd|kstd|d|kstd|d|kstd|d|kstd|d|kstd||||||||| | d } t||| | r!| r t d |r t d d |_|jD]} | d D]} |j| }| dr/tjdt| d| jntjdt|d<tj| r t!||n|}tj"| |tj$|d<y)NrzTensor lr must be 1-elementgzInvalid learning rate: zInvalid lr_decay value: zInvalid weight_decay value: z)Invalid initial_accumulator_value value: zInvalid epsilon value: ) rrr rrr!rrrz)`fused` does not support `differentiable`z0`fused` and `foreach` cannot be `True` together.Trris_fused)dtypedevicer&step) memory_formatsum) isinstancernumel ValueErrorsuper__init__ RuntimeError"_need_device_dtype_check_for_fused param_groupsstatetorchzerosr r'tensor is_complexcomplex full_likepreserve_format)selfrrrrrr r!rrrdefaultsgrouppr4 init_value __class__s Y/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/optim/adagrad.pyr0zAdagrad.__init__s b& !bhhjAo:; ;by6rd;< <h7zBC Cl";L>JK K//; state_valuesstep_is_tensorsrAs rBrEzAdagrad.__setstate__bs U#&& 4E   Y -   Z /   -u 5$$Wd3E  4 DJJ--/0 l+q0 eoo OF #7 ! !LL!F)$,=u,M&  rCc~|jD].}|dD]$}|j|}|dj&0y)z6Calls tensor.share_memory_() on the state sum tensors.rr+N)r3r4 share_memory_)r<r>r?r4s rB share_memoryzAdagrad.share_memorywsG&& -E8_ - 1 e **, - -rCcd\}}|dD]}|j|dr!t|ddrt|dd|_||jjz}|t j |z}|j||j|j|j|} |j| d|j| d ||fS) N)FFrrr2T)cuda_unsupportedFr+r)) gradgetattrr r2 is_sparser5r8appendr4) r<r>params_with_gradgrads state_sums state_stepshas_sparse_grad has_complexr?r4s rB _init_groupzAdagrad._init_group~s'3$x 2Avv!>g8' 2!dK>CD;166#3#33u//22  ''* QVV$ 1 !!%,/""5=1 2" ++rCcbd}|$tj5|}ddd|jD]k}g}g}g}g}|j|||||\}} t |||||d|d|d|d||d|d|d| |d t |d dt |d d m|S#1swYxYw) zPerform a single optimization step. Args: closure (Callable, optional): A closure that reevaluates the model and returns the loss. Nrrrr r!rrr grad_scale found_inf) rrrr r\r!rrr]rr`ra)r5 enable_gradr3r^rrU) r<closurelossr>rXrYrZr[r\r]s rBr)z Adagrad.steps  ""$ !y !&& E-/ "$E')J(*K+/+;+;' K, (O[  ;">2z*%L /i(z*$%56'Gn"4t<!$ T:!  : A ! !s B%%B.)g{Gz?rrrg|=NN)__name__ __module__ __qualname__rrrKrrboolr0rErQr^rr) __classcell__)rAs@rBrrs$(+,"&D$ $DD %- D D  D $) DD$DDD~DL*-,*"*"*rCa[Implements Adagrad algorithm. .. math:: \begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)}, \: f(\theta) \text{ (objective)}, \: \lambda \text{ (weight decay)}, \\ &\hspace{12mm} \tau \text{ (initial accumulator value)}, \: \eta\text{ (lr decay)}\\ &\textbf{initialize} : state\_sum_0 \leftarrow \tau \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm} \tilde{\gamma} \leftarrow \gamma / (1 +(t-1) \eta) \\ &\hspace{5mm} \textbf{if} \: \lambda \neq 0 \\ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\ &\hspace{5mm}state\_sum_t \leftarrow state\_sum_{t-1} + g^2_t \\ &\hspace{5mm}\theta_t \leftarrow \theta_{t-1}- \tilde{\gamma} \frac{g_t}{\sqrt{state\_sum_t}+\epsilon} \\ &\rule{110mm}{0.4pt} \\[-1.ex] &\bf{return} \: \theta_t \\[-1.ex] &\rule{110mm}{0.4pt} \\[-1.ex] \end{aligned} For further details regarding the algorithm we refer to `Adaptive Subgradient Methods for Online Learning and Stochastic Optimization`_. z Args: a lr (float, Tensor, optional): learning rate (default: 1e-2) lr_decay (float, optional): learning rate decay (default: 0) weight_decay (float, optional): weight decay (L2 penalty) (default: 0) initial_accumulator_value (float, optional): initial value of the sum of squares of gradients (default: 0) eps (float, optional): term added to the denominator to improve numerical stability (default: 1e-10) z a fused (bool, optional): whether the fused implementation (CPU only) is used. Currently, `torch.float64`, `torch.float32`, `torch.float16`, and `torch.bfloat16` are supported. (default: None). Please note that the fused implementations does not support sparse or complex gradients. .. _Adaptive Subgradient Methods for Online Learning and Stochastic Optimization: http://jmlr.org/papers/v12/duchi11a.html rrYrZr[rr`rar\r!rr]rrrr rc td|Ds td||t|| d\}}|d}|d}|r)tjj r td|r)tjj r td|r%tjj st }n-|r%tjj st}nt}|||||| | | |||| | ||y) ztFunctional API that performs Adagrad algorithm computation. See :class:`~torch.optim.Adagrad` for details. c3PK|]}t|tj ywre)r,r5r).0ts rB zadagrad.. s@qz!U\\*@s$&zPAPI has changed, `state_steps` argument must contain a list of singleton tensorsNF) use_fusedz6torch.jit.script not supported with foreach optimizersz4torch.jit.script not supported with fused optimizers rrrr r\rrr]r`ra) allr1rr5jit is_scripting_fused_adagrad_multi_tensor_adagrad_single_tensor_adagrad)rrYrZr[rr`rar\r!rr]rrrr r_funcs rBrrs2 @K@ @ ^   }1 Ne 7 }599))+STT '')QRR UYY++- //1$%  ! '%rCcP|j}tj|||Sre)sizer5sparse_coo_tensor)rT grad_indicesrHr{s rB _make_sparser~>s" 99;D  " "< >>rCc ||Jtjjs t|}t ||||D]\}}}}|dz }t |}| s|n| }|dk7r*|j r td|j||}|d|dz |zzz }|j r|j}|j}|j}|jt|||jd|j|}|jj!j| }|jt||||z | &tj"|}|r?tj$|}tj$|}tj$|}|j'||d| r|j)| z}n|j)j| }|j+||| |stj,|}tj,|} y)Nrrz;weight_decay option is not compatible with sparse gradientsalphavalue)r5rsrtrzipr rVr1addcoalesce_indices_valuesadd_r~pow sparse_masksqrt_r8 view_as_realaddcmul_sqrtaddcdiv_view_as_complex)rrYrZr[r`rarrrr r\rrr]paramrT state_sumstep_tr)clrr} grad_valuesstd std_valuesr8s rBrwrwCs"  )"33 3 99 ! ! # ^*-feZ*U(=&tY! &!#t$ 1 ~~"Q88E86DAX--. >>==?D==?L,,.K NN<lKOOA.s9 #DNN9 sTrqg?cpu)r'rrr)rIrr"_group_tensors_by_device_and_dtyperHrrGranyrwrr5 _foreach_negcompiler is_compilingis_cpu _foreach_add_r7 _foreach_addr _foreach_addcmul_ _foreach_sqrt _foreach_mul_ _foreach_mul_foreach_addcdiv_)rrYrZr[r`rarrrr r\rrr]grouped_tensorlistsdevice_params_ device_grads_device_state_sums_device_state_steps_rx device_params device_gradsdevice_state_sumsdevice_state_stepsdevice_has_sparse_gradr) minus_clrr numerators rBrvrvs"DDD   )"33 3 6{a BB#FF  K0 & & ( M?  T&\>: DL-8  f/AB!$v,0CD!0" S9 '39 6  " "!")! $!-'%#    -7H I  --l;L ~~**,1CA1F1M1M   "ELLU$C3     2A 6 1 ##L-|T$11 -|  GY >BRC1 4(1,88 9    1<UVW!!"34 C% 1     i 8$I**<CI  y#>[M?z sI6returnc"|sy| s| r td| r tdt|}||j|ind}||j|ind}tj||||g}|j D]\\}}\\}}}}}t tt|}t tt|}t tt|}t tt|}d\}}|!|||vr|j|d||<||}|!|||vr|j|d||<||}tj|dtj|||||||| | || |tj||gt|zy)Nz5`fused` does not support sparse grad or complex paramz: DL-8  f/AB!$v,0CD.8++  !o&A_,*4--T-*R' / 7   ^%?.)2f4)Pv&-f5  .2     %(&   '   "%5$6=O9P$P M(rC)NNNFNFF)typingrrrr5r optimizerrr r r r r rrrrrrr__all__r__doc__rGrirKrr~rwrvrur#rCrBrs(( " i cicN4      5.p!#'"&"" G LG <GV Gf G D> G  GGGd^GGG G !G"#G$ %G&'GT? >= L>= <>=V >=f >=  >=  >= >=>=>= >=>=>=>=>=Bj? Lj? <j?V j?f j?  j?  j? j?j?j? j?j?j?j?j?ZM LM <MV Mf M  M  M MMM MMMMM  !MrC