L ir4dZddlmZmZddlmZddlmZddlZddl m cm Z ddl mZmZmZmZddlmZddlmZmZmZmZgd Zej4j7eej4j7eej4j7eej4j7eGd d eZGd d ej:ZdefdZdefdZ y)zCDefines bias subclasses that work with scaled_dot_product_attention)autoIntEnum)Optional)warnN)can_use_efficient_attentioncan_use_flash_attentionis_flash_attention_available SDPAParams)_raise_kernel_warnings)_calculate_scale_input_requires_grad_postprocess_flash_output_validate_sdpa_input)causal_upper_leftcausal_lower_right CausalVariant CausalBiasc,eZdZdZeZeZy)ra+ Enum for causal variants used in attention mechanisms. Defines two types of causal biases: ``UPPER_LEFT``: Represents upper-left triangular bias for standard causal attention. The equivalent pytorch code for constructing this bias is: .. code-block:: python torch.tril(torch.ones(size, dtype=torch.bool)) For instance, with ``shape=(3,4)``, the materialized bias tensor will be: .. code-block:: text [[1, 0, 0, 0], [1, 1, 0, 0], [1, 1, 1, 0]] ``LOWER_RIGHT``: Represents lower-right triangular bias, the include values are aligned to the lower right corner of the matrix. The equivalent pytorch code for constructing this bias is: .. code-block:: python diagonal_offset = size[1] - size[0] torch.tril( torch.ones(size, dtype=torch.bool), diagonal=diagonal_offset, ) For instance, with ``shape=(3,4)``, the materialized bias tensor will be: .. code-block:: text [[1, 1, 0, 0], [1, 1, 1, 0], [1, 1, 1, 1]] Note that these variants are equivalent to each other when the sequence lengths of the query and key/value tensors are equal since the triangular matrix is square. .. warning:: This enum is a prototype and subject to change. N)__name__ __module__ __qualname____doc__r UPPER_LEFT LOWER_RIGHT]/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/nn/attention/bias.pyrr"s.`J&KrrceZdZdZdededefdZdejdejfdZ dejdejfd Z dde ejdejfd Z e dd ejd ejd ejdddedede ededejfdZedfd ZdZxZS)raN A bias representing causal attention patterns. For an overview of the bias structure, see the :class:`CausalVariant` enum. This class is used for defining causal (triangular) attention biases. For construing the bias, there exist two factory functions: :func:`causal_upper_left` and :func:`causal_lower_right`. Example: .. code-block:: python from torch.nn.attention.bias import causal_lower_right bsz, num_heads, seqlen_q, seqlen_kv, head_dim = 32, 8, 4, 12, 8 # Create a lower-right causal bias attn_bias = causal_lower_right(seqlen_q, seqlen_kv) q = torch.randn( bsz, num_heads, seqlen_q, head_dim, device="cuda", dtype=torch.float16 ) k = torch.randn( bsz, num_heads, seqlen_kv, head_dim, device="cuda", dtype=torch.float16 ) v = torch.randn( bsz, num_heads, seqlen_kv, head_dim, device="cuda", dtype=torch.float16 ) out = F.scaled_dot_product_attention(q, k, v, attn_bias) .. warning:: This class is a prototype and subject to change. variant seq_len_q seq_len_kvct|tsJ||_||_||_||kDr |tj k(r t dyyy)a Initializes the CausalBias instance with a specified variant and sequence lengths. Args: variant (CausalVariant): The type of causal bias to use (either UPPER_LEFT or LOWER_RIGHT). seq_len_q (int): The sequence length of the query tensor. seq_len_kv (int): The sequence length of the key/value tensor. Raises a warning if the LOWER_RIGHT variant is used with seq_len_q > seq_len_kv, as it may produce NaNs. zTLower right causal bias will produce NaNs in the output when seq_len_q > seq_len_kv!N) isinstancerrr r!rr)selfrr r!s r__init__zCausalBias.__init__xsR'=111 "$ z !g1J1J&J f 'K !rdevicereturnctjtj|j|j|tj S)zUpper left causal biasr&dtype)torchtrilonesr r!boolr$r&s r _upper_leftzCausalBias._upper_lefts1zz JJt~~tvUZZ X  rc|j|jz }tjtj|j|j|tj |S)zLower right causal biasr))diagonal)r!r r+r,r-r.)r$r&diagonal_offsets r _lower_rightzCausalBias._lower_rightsK//DNN:zz JJejj %   rc|tjd}|jtjk(r|j |S|jtj k(r|j|Sy)a Materializes the causal bias into a tensor form. Depending on the variant, this method generates either an upper-left or lower-right triangular matrix to represent the causal bias. Args: device (Optional[torch.device]): The device on which to create the tensor. Defaults to CPU. Returns: torch.Tensor: The materialized bias tensor. Ncpu)r+r&rrrr0rr4r/s r _materializezCausalBias._materializesb >\\%(F <<=33 3##F+ + \\]66 6$$V, ,7rquerykeyvalue attn_mask dropout_p is_causalscale enable_gqac|r td|j|jk(s|jtj k(rt j|||d|d||S|jtjk(rOt|||d|||t|||d|||}t|r8|jddzdk7} |jd} t| |} | rtjj j#|dd|jddzz f}tjj j#|dd|jddzz f}tjj j#|dd|jddzz f}tj$j&j)||||dd| d} t+| | St-|rd} t/|||rd} tj$j&j1|j3d d |j3d d |j3d d ddddd|t5|j| |d dj3d d St7|t j||||j9|j:|d||Std |j)a8 Handles the logic for computing attention with the specified causal bias. Args: query (Tensor): Query tensor; shape :math:`(N, ..., L, E)`. key (Tensor): Key tensor; shape :math:`(N, ..., S, E)`. value (Tensor): Value tensor; shape :math:`(N, ..., S, Ev)`. attn_mask (CausalBias): The type of causal attention to apply. A boolean mask where a value of True indicates that the element *should* take part in attention. A float mask of the same type as query, key, value that is added to the attention score. dropout_p (float): Dropout probability; if greater than 0.0, dropout is applied is_causal (bool): If true, assumes upper left causal attention masking and errors if both attn_mask and is_causal are set. scale (optional float): Scaling factor applied prior to softmax. If None, the default value is set to :math:`\frac{1}{\sqrt{E}}`. enable_gqa (optional bool): If set to True, Grouped Query Attention (GQA) is enabled, by default it is set to False. Returns: output (Tensor): Attention output; shape :math:`(N, ..., L, Ev)`. Raises: ValueError: If the causal bias variant is not a CausalVariant type. z.CausalBias should not be used with causal=TrueNT)r;r<r=r>r?rF)r=return_debug_maskr>) bias cu_seqlens_q cu_seqlens_k max_seqlen_q max_seqlen_kr<custom_mask_typecompute_log_sumexpr>seqlen_kzr? sdpa_params needs_padding og_head_sizeog_scaleoutrLs r _dispatchzCausalBias._dispatchsF MN N   9#7#7 7  M$<$<<11#%    -";"; ; UD)YPU V$sE4IzK'{3 % 2 2a 7 $zz"~ +L%@ !HH//33EAq5::b>TUCU?U;VWE((--11#1sxx|a?O;O7PQC!HH//33EAq5::b>TUCU?U;VWEiinnHH"&+"I1lCC*;7%*"'sE:)-&yy~~BBOOAq)MM!Q'OOAq)!%!%!%!%'%():):%;'9!CYq!_% '{355'44U\\B'#)  NyO`O`Nab rc|i}|tjjjur|j|i|St |||||S)zjDefines the behavior of torch.nn.functional.scaled_dot_product_attention when the attn_bias is an AttnBias)r+rRrSrPr`super__torch_function__)clsfunctypesargskwargs __class__s rrczCausalBias.__torch_function__ sS >F 588&&CC C 3==$1&1 1w)$tVDDrc>|jjSN)r7__repr__)r$s rrlzCausalBias.__repr__)s  "++--rrk)gFNF)rN)rrrrrrZr%r+r&Tensorr0r4rr7 staticmethodfloatr.r` classmethodrcrl __classcell__)ris@rrrWs,@ #3( %,, 5<<  5<< ELL -8ELL#9-U\\-( !% m||m \\m||m m  m  mmm mm^EE.rrr'clt|dk(sJd|\}}ttj||S)a& Creates an upper-left triangular causal bias. This function generates a upper-left triangular matrix to represent causal attention bias with a diagonal offset set so that the inclusive values are aligned to the upper left corner of the matrix. This equivalent to the `is_causal=True` argument in `scaled_dot_product_attention`. The equivalent pytorch code for constructing this bias is: .. code-block:: python torch.tril(torch.ones(size, dtype=torch.bool)) For instance, with `shape=(3,4)`, the materialized bias tensor will be: .. code-block:: text [[1, 0, 0, 0], [1, 1, 0, 0], [1, 1, 1, 0]] Args: size: The size of the bias matrix. Returns: CausalBias: The UPPER_LEFT triangular causal bias variant. rEz*causal_upper_left only supports 2D tensors)lenrrrrQr r!s rrr-s98 t9>GGG> Iz m.. : FFrclt|dk(sJd|\}}ttj||S)a: Creates a lower-right triangular causal bias. This function generates a lower-right triangular matrix to represent causal attention bias with a diagonal offset set so that the inclusive values are aligned to the lower right corner of the matrix. The equivalent pytorch code for constructing this bias is: .. code-block:: python diagonal_offset = size[1] - size[0] torch.tril( torch.ones(size, dtype=torch.bool), diagonal=diagonal_offset, ) For instance, with `shape=(3,4)`, the materialized bias tensor will be: .. code-block:: text [[1, 1, 0, 0], [1, 1, 1, 0], [1, 1, 1, 1]] Args: size: The size of the bias matrix. Returns: CausalBias: The LOWER_RIGHT triangular causal bias variant. rEz+causal_lower_right only supports 2D tensors)rsrrrrts rrrNs9> t9>HHH> Iz m//J GGr)!renumrrtypingrwarningsrr+torch.nn.functionalrRrSrOtorch.backends.cudarrr r torch.nn.attentionr torch.nn.attention._utilsr r rr__all___dynamoallow_in_graphrrmrrrrrrrsI  6 U 9: 45 89 Z(2G2jS.S.lG GB!H!Hr