L i+ddlmZddlZddlmZmZmZddlmZeje Z eddZ ed dZ eZeZd ej d ed ej fd Zdeej dej d efdZ ddej*j,dej dej dej deej dedeedeed eej dffdZy))OptionalN)is_torch_npu_availableis_torch_xpu_availablelogging)is_torch_greater_or_equalz2.5T) accept_devz2.8 hidden_statesn_repreturnc|j\}}}}|dk(r|S|dddddddddfj|||||}|j|||z||S)z This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) N)shapeexpandreshape)r r batchnum_key_value_headsslenhead_dims n/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/integrations/sdpa_attention.py repeat_kvrso 2?1D1D.E h z!!Qa"23::5BUW\^bdlmM  (;e(CT8 TTattention_maskkeyctr-txr%t|tjj  St rytxr+|duxr%t|tjj  S)NF)_is_torch_xpu_available#_is_torch_greater_or_equal_than_2_8 isinstancetorchfxProxy_is_torch_npu_available#_is_torch_greater_or_equal_than_2_5)rrs ruse_gqa_in_sdpar$sX2Z:c588>>;Z7ZZ . q>T3I qR\]`bgbjbjbpbpRqNqqrmodulequeryvaluedropoutscaling is_causalc |jdds|jdtjdi} t|dr=t ||s-t ||j }t ||j }nddi} |-|jdk(r|ddddddd|jd f}|'|jd d kDxr|duxr t|d d}tjjr*t|tjr|j}t r[|Y|j"tj$k7r'1a399R=(@A KKNQ&h>T+AhgfVacgFh  yy*Y "ENN$   %.*>*>%***L"..~/B/B/DEHHVN((%%BB    !   K''1-88:K  r)gNN)typingrrutilsrrrutils.import_utilsr get_logger__name__r6r#rrr"r=intrr@r$rDModulefloattuplerLrrrWsD KK:   H %'@RV&W#&?RV&W#0202 UU\\ U# U%,, UrHU\\$:rrRVr0# $< HHOO< <<< < << < U\\* <  <e_<~< 5<< <r