L i ddlmZddlZddlmZdejdedejfdZdejd ejd ejd ejd eejd ef dZ y))OptionalN)nn hidden_statesn_repreturnc|j\}}}}|dk(r|S|dddddddddfj|||||}|j|||z||S)z This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) N)shapeexpandreshape)rrbatchnum_key_value_headsslenhead_dims k/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/integrations/eager_paged.py repeat_kvrso 2?1D1D.E h z!!Qa"23::5BUW\^bdlmM  (;e(CT8 TTmodulequerykeyvalueattention_maskscalingc |jdd}|d|j|||jfi|\}}|jddj d}|jddj d}t |dr,t ||j}t ||j}t|trt|dd}|dk(s|dnd} || } n|} tj||jdd |z} | | | z} t |d r|jjdd ddj|j dd |j d d } tj"| | gd } | | j%d dj&z } t(j*j-| d tj.j1|j2} | ddd f} nIt(j*j-| d tj.j1|j2} tj| |} | jddj5} | | fS)Ncacherr num_key_value_groupssliding_windowfull_attentionsliding_attentionsinks)dimT)r%keepdim)r%dtype.)popupdate layer_idx transpose unsqueezehasattrrr isinstancedictgetattrtorchmatmulr"r r r catmaxvaluesr functionalsoftmaxfloat32tor' contiguous)rrrrrrkwargsrr layer_type causal_mask attn_weightsr" attn_outputs reager_paged_attention_forwardr@s; JJw %E !U\\#uf.>.>I&I UmmAq!++A.1%//2v-.V889%!rGs  UU\\ U# U%,, U2% II2% <<2% 2% << 2% U\\* 2%  2%r