L i$ddlmZddlZdejdedejfdZ ddej jdejd ejd ejd eejd ed eede ejdffdZ y))OptionalN hidden_statesn_repreturnc|j\}}}}|dk(r|S|dddddddddfj|||||}|j|||z||S)z This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) N)shapeexpandreshape)rrbatchnum_key_value_headsslenhead_dims j/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/integrations/sdpa_paged.py repeat_kvrso 2?1D1D.E h z!!Qa"23::5BUW\^bdlmM  (;e(CT8 TTmodulequerykeyvalueattention_maskdropoutscalingc n|jdd}|d|j|||jfi|\}}|jddj d}|jddj d}t |dr,t ||j}t ||j}|} |j}|j}|j}tjjj|||| ||d} | jddj} | dfS)Ncacherrnum_key_value_groupsF) attn_mask dropout_pscale is_causal) popupdate layer_idx transpose unsqueezehasattrrr contiguoustorchnn functionalscaled_dot_product_attention) rrrrrrrkwargsr causal_mask attn_outputs rsdpa_attention_paged_forwardr0s2 JJw %E !U\\#uf.>.>I&I UmmAq!++A.1%//2v-.V889%!r8s  UU\\ U# U%,, U$#* HHOO* <<* * << * U\\* *  *e_* 5<< *r