L i53ddlZddlZddlmZddlZddlmZmZddlmZddl m Z ddl m Z e jeZedGd d ej Zed Gd d ej ZedGddej ZedGddej ZedGddej ZedGddej ZGddej ZGddej ZGddej ZGd d!ej ZGd"d#ej ZGd$d%ej ZGd&d'eZGd(d)ej Zid*ed+ed,d-d.fd/ed0ed1ed2d3ifd4ed5ed6d3ifd7ed8ed9ej>d:ed;eded?ejBd@ejDeejFejHejJedAZ&ee&Z'dBZ(e(d1Z)e(d0Z*e(d*Z+e(d/Z,e(d<Z-e(dCZ.e(d;Z/e(d:Z0y)DN) OrderedDict)Tensornn)use_kernel_forward_from_hub)logging)is_torchdynamo_compilingGeluTanhcJeZdZdZddeffd ZdedefdZdedefdZxZ S) GELUTanha& A fast C implementation of the tanh approximation of the GeLU activation function. See https://huggingface.co/papers/1606.08415. This implementation is equivalent to NewGELU and FastGELU but much faster. However, it is not an exact numerical match due to rounding errors. use_gelu_tanh_pythonct||r|j|_yt j t jjd|_y)Ntanh) approximate) super__init___gelu_tanh_pythonact functoolspartialr functionalgelu)selfr __class__s ^/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/activations.pyrzGELUTanh.__init__(s<  --DH ((););PDHinputreturnc |dzdtjtjdtjz |dtj |dzzzzzSN??@Hm?g@torchrmathsqrtpipowrrs rrzGELUTanh._gelu_tanh_python/sPs{cEJJtyytww/G5S[^c^g^ghmor^sSsKs/t$uuvvrc$|j|SNrr+s rforwardzGELUTanh.forward2xxrF) __name__ __module__ __qualname____doc__boolrrrr/ __classcell__rs@rr r s?QTQwvw&wVrr NewGELUc eZdZdZdedefdZy)NewGELUActivationz Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT). Also see the Gaussian Error Linear Units paper: https://huggingface.co/papers/1606.08415 rrc d|zdtjtjdtjz |dtj |dzzzzzSr r%r+s rr/zNewGELUActivation.forward=sPU{cEJJtyytww/G5S[^c^g^ghmor^sSsKs/t$uuvvrNr2r3r4r5rr/rrr;r;6s wVwwrr;GeLUcJeZdZdZddeffd ZdedefdZdedefdZxZ S) GELUActivationa Original Implementation of the GELU activation function in Google BERT repo when initially created. For information: OpenAI GPT's GELU is slightly different (and gives slightly different results): 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))) This is now written in C in nn.functional Also see the Gaussian Error Linear Units paper: https://huggingface.co/papers/1606.08415 use_gelu_pythonct||r|j|_ytj j |_yr-)rr _gelu_pythonrrrr)rrBrs rrzGELUActivation.__init__Js/  ((DH}}))DHrrrcj|dzdtj|tjdz zzS)Nr!r"r#)r&erfr'r(r+s rrDzGELUActivation._gelu_pythonQs,s{cEIIediin.D$EEFFrc$|j|Sr-r.r+s rr/zGELUActivation.forwardTr0rr1) r2r3r4r5r6rrrDr/r7r8s@rrArAAs=**G&GVGVrrASiLUc eZdZdZdedefdZy)SiLUActivationa See Gaussian Error Linear Units (Hendrycks et al., https://arxiv.org/abs/1606.08415) where the SiLU (Sigmoid Linear Unit) was originally introduced and coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning (Elfwing et al., https://arxiv.org/abs/1702.03118) and Swish: a Self-Gated Activation Function (Ramachandran et al., https://arxiv.org/abs/1710.05941v1) where the SiLU was experimented with later. rrc@tjj|Sr-)rrsilur+s rr/zSiLUActivation.forwardbs}}!!%((rNr=r>rrrJrJXs)V))rrJFastGELUc eZdZdZdedefdZy)FastGELUActivationz} Applies GELU approximation that is slower than QuickGELU but more accurate. See: https://github.com/hendrycks/GELUs rrc\d|zdtj|dzdd|z|zzzzzS)Nr!r"g3E?r$)r&rr+s rr/zFastGELUActivation.forwardls:U{cEJJu|/CsXX]M]`eMeGe/f$gghhrNr=r>rrrOrOfsiViirrO QuickGELUc eZdZdZdedefdZy)QuickGELUActivationzr Applies GELU approximation that is fast but somewhat inaccurate. See: https://github.com/hendrycks/GELUs rrc8|tjd|zzS)NgZd;?)r&sigmoidr+s rr/zQuickGELUActivation.forwardvsu}}UU]333rNr=r>rrrSrSps4V44rrSc<eZdZdZdedeffd ZdedefdZxZS)ClippedGELUActivationa Clip the range of possible GeLU outputs between [min, max]. This is especially useful for quantization purpose, as it allows mapping negatives values in the GeLU spectrum. For more information on this trick, please refer to https://huggingface.co/papers/2004.09602. Gaussian Error Linear Unit. Original Implementation of the gelu activation function in Google Bert repo when initially created. For information: OpenAI GPT's gelu is slightly different (and gives slightly different results): 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))). See https://huggingface.co/papers/1606.08415 minmaxcl||kDrtd|d|dt| ||_||_y)Nzmin should be < max (got min: z, max: )) ValueErrorrrrXrY)rrXrYrs rrzClippedGELUActivation.__init__s> 9=cU'#aPQ Q rxrcjtjt||j|jSr-)r&cliprrXrY)rr]s rr/zClippedGELUActivation.forwards!zz$q'488TXX66r) r2r3r4r5floatrrr/r7r8s@rrWrWzs. E77F7rrWc2eZdZdZfdZdedefdZxZS)AccurateGELUActivationz Applies GELU approximation that is faster than default and more accurate than QuickGELU. See: https://github.com/hendrycks/GELUs Implemented along with MEGA (Moving Average Equipped Gated Attention) cxt|tjdtjz |_y)N)rrr'r(r)precomputed_constantrrs rrzAccurateGELUActivation.__init__s' $(IIa$''k$:!rrrc d|zdtj|j|dtj|dzzzzzS)Nr!rr$)r&rrer*r+s rr/zAccurateGELUActivation.forwardsEU{a%**T-F-F%RZ]b]f]fglno]pRpJp-q"rrssr)r2r3r4r5rrr/r7r8s@rrbrbs#;tVttrrbcBeZdZdZfdZdedefdZdedefdZxZS)MishActivationz See Mish: A Self-Regularized Non-Monotonic Activation Function (Misra., https://huggingface.co/papers/1908.08681). Also visit the official repository for the paper: https://github.com/digantamisra98/Mish c`t|tjj|_yr-)rrrrmishrrfs rrzMishActivation.__init__s ==%%rrrcl|tjtjj |zSr-)r&rrrsoftplusr+s r _mish_pythonzMishActivation._mish_pythons%uzz"--"8"8"?@@@rc$|j|Sr-r.r+s rr/zMishActivation.forwardr0r) r2r3r4r5rrror/r7r8s@rrjrjs6 &A&AVAVrrjc eZdZdZdedefdZy)LinearActivationz[ Applies the linear activation function, i.e. forwarding input directly to output. rrc|Sr-r>r+s rr/zLinearActivation.forwards rNr=r>rrrrrrsVrrrceZdZdZddZy)LaplaceActivationz Applies elementwise activation based on Laplace function, introduced in MEGA as an attention activation. See https://huggingface.co/papers/2209.10655 Inspired by squared relu, but with bounded range and gradient for better stability c||z j|tjdz}ddtj|zzS)Nr#r!r")divr'r(r&rF)rrmusigmas rr/zLaplaceActivation.forwards<  3!78cEIIe,,--rN)g۞?g ^/ ?r2r3r4r5r/r>rrrurus .rruceZdZdZdZy)ReLUSquaredActivationz` Applies the relu^2 activation introduced in https://huggingface.co/papers/2109.08668v2 cntjj|}tj|}|Sr-)rrrelur&square)rr relu_appliedsquareds rr/zReLUSquaredActivation.forwards)}}))%0 ,,|,rNrzr>rrr|r|s rr|ceZdZfdZxZS)ClassInstantiercdt||}t|tr|n|if\}}|di|S)Nr>)r __getitem__ isinstancetuple)rkeycontentclskwargsrs rrzClassInstantier.__getitem__s6'%c*!+GU!;g'2 V}V}r)r2r3r4rr7r8s@rrrs rrcteZdZdZddddej dffd Zdedefd Zdedefd Z d edefd Z xZ S) XIELUActivationz Applies the xIELU activation function introduced in https://arxiv.org/abs/2411.13010 If the user has installed the nickjbrowning/XIELU wheel, we import xIELU CUDA Otherwise, we emit a single warning and use xIELU Python g?r!gưFc Zt |tjt j t j t j||jd|_ tjt j t j t j||z |jd|_ |jdt j|||jdt j||||_ t|jjj!jj#|_t|j&jj!jj#|_d|_ ddl}tj.j0j3|_d} ddlm} | |j8|_|dz }t>jA|y#t<$r$} |d| d z }|j8|_Yd} ~ >d} ~ wwxYw#t<$r)} t>jAd tC| Yd} ~ yd} ~ wwxYw) N)dtyperbetaepszUsing experimental xIELU CUDA.)allow_in_graphz& Enabled torch._dynamo for xIELU CUDA.z+ Could not enable torch._dynamo for xIELU (z*) - this may result in slower performance.uCUDA-fused xIELU not available (%s) – falling back to a Python version. For CUDA xIELU (experimental), `pip install git+https://github.com/nickjbrowning/XIELU`)"rrr Parameterr&logexpm1tensor unsqueezealpha_palpha_nregister_bufferwith_vector_loadsr`rdetachcpuitem _beta_scalarr _eps_scalar_xielu_cuda_obj xielu.opsclassesxieluXIELU torch._dynamor _xielu_cuda_xielu_cuda_fn Exceptionlogger warning_oncestr) r alpha_p_init alpha_n_initrrrrrmsgrerrrs rrzXIELUActivation.__init__s ||EIIekk%,,|[`:a.b$c$m$mno$pq || IIekk%,,|d/B%"PQ R \ \]^ _   VU\\$e%DE UELLE$BC!2!$))"2"2"4"8"8":"@"@"B"G"G"IJ !2!6!6!8!>!>!@!E!E!GH#  #(==#6#6#<#<#>D 2C 78&4T5E5E&F#??    $ 7DSEIstt&*&6&6## 7    jC   sB3I8"I2I8 I5I0+I80I55I88 J*J%%J*r]rc tjj|j}|jtjj|j z}t j|dkD||z|z|j|zzt jt j||j|z |z|j|zzS)Nr) rrrnrrrr&whererrXr)rr]rrs r _xielu_pythonzXIELUActivation._xielu_pythons--((6))bmm44T\\BB{{ E aK!Odii!m + [[1dhh/ 01 4 ?$))a- O  rc~|j}|jdkr%|jd}|jdkr%|jdkDr"|jdd|j d}||jk7r!t j d||j|jj||jj|j|jj|j|j|j|j}|j|S)zDFirewall function to prevent torch.compile from seeing .item() callsrhrrz_Warning: xIELU input tensor expects 3 dimensions but got (shape: %s). Reshaping to (shape: %s).)shapedimrviewsizerrrr/rtorrrrr)rr]original_shaperesults rrzXIELUActivation._xielu_cudaseegk AAeegk 557Q;r1affRj)A QWW $   q  %%-- LLOOAGG $ LLOOAGG $        " " {{>**rrc|j<|jr0ts|j|Stj d|j |S)Nz:torch._dynamo is compiling, using Python version of xIELU.)ris_cudar rrrrr+s rr/zXIELUActivation.forward1sK    + +-**511##$`a!!%((r) r2r3r4r5r&bfloat16rrrrr/r7r8s@rrrs_  nn)V v & +V++2)V))rrrgelu_10i )rXrY gelu_fastgelu_new gelu_pythonrBTgelu_pytorch_tanhgelu_python_tanhr gelu_accuratelaplace leaky_relulinearrl quick_gelur~relu2relu6rU)rLswishrprelurc ||tvr t|Std|dttj)Nz function z not found in ACT2FN mapping )ACT2FNKeyErrorlistkeys)activation_strings rget_activationrUsBF"'((#4"55RSWX^XcXcXeSfRghiirrL)1rr' collectionsrr&rrintegrations.hub_kernelsrutilsrutils.import_utilsr get_loggerr2rModuler r;rArJrOrSrWrbrjrrrur|rr LeakyReLUReLUReLU6SigmoidrHTanhPReLUACT2CLSrrrrrrrrLrl linear_actr>rrrs # A8   H %Z(ryy).Y'w w(wV$RYY%,V$ )RYY )% )Z(ii)i[)4"))4*47BII72 tRYY t RYY"ryy . .BIIk[)bii[)|  N  %s2'>? # !   N%6$=>     $:D#AB +   ",,    N %  BGG  "   RXX! "rzz# $  WW GG XX - 0  !j]+ * %f ; ' L ) ff H % r