L iOddlZddlmZddlmZddlZddlmZmZddl m Z eGdde Z eGd d e Z eGd d e Z eGd de ZeGdde ZeGdde ZeGdde ZeGdde ZeGdde ZeGdde ZeGdde ZeGdde ZeGdd e ZeGd!d"e ZeGd#d$e ZeGd%d&e ZeGd'd(e ZeGd)d*e ZeGd+d,e ZeGd-d.e ZeGd/d0e ZeGd1d2e Z eGd3d4e Z!eGd5d6e Z"eGd7d8e Z#eGd9d:e Z$eGd;de Z&eGd?d@e Z'eGdAdBe Z(eGdCdDe Z)eGdEdFe Z*eGdGdHe Z+eGdIdJe Z,eGdKdLe Z-eGdMdNe Z.eGdOdPe Z/eGdQdRe Z0eGdSdTe Z1eGdUdVe Z2eGdWdXe Z3eGdYdZe Z4eGd[d\e Z5y)]N) dataclass)Optional)CacheEncoderDecoderCache) ModelOutputceZdZUdZdZeejed<dZ ee ejdfed<dZ ee ejdfed<y)BaseModelOutputa Base class for model's outputs, with potential hidden states and attentions. Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): Sequence of hidden-states at the output of the last layer of the model. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Nlast_hidden_state. hidden_states attentions) __name__ __module__ __qualname____doc__r rtorch FloatTensor__annotations__r tupler c/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/modeling_outputs.pyr r s]&6:x 1 129=AM8E%"3"3S"89:A:>Ju00#567>rr cleZdZUdZdZeejed<dZ ee ejdfed<y)BaseModelOutputWithNoAttentiona Base class for model's outputs, with potential hidden states. Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`): Sequence of hidden-states at the output of the last layer of the model. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, num_channels, height, width)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. Nr .r ) rrrrr rrrrr rrrrrr3s> 6:x 1 129=AM8E%"3"3S"89:ArrceZdZUdZdZeejed<dZ eejed<dZ ee ejdfed<dZ ee ejdfed<y)BaseModelOutputWithPoolinga Base class for model's outputs that also contains a pooling of the last hidden states. Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): Sequence of hidden-states at the output of the last layer of the model. pooler_output (`torch.FloatTensor` of shape `(batch_size, hidden_size)`): Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Nr pooler_output.r r ) rrrrr rrrrrr rr rrrrrFsr06:x 1 12915M8E--.5=AM8E%"3"3S"89:A:>Ju00#567>rrceZdZUdZdZeejed<dZ eejed<dZ ee ejdfed<y)(BaseModelOutputWithPoolingAndNoAttentiona Base class for model's outputs that also contains a pooling of the last hidden states. Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`): Sequence of hidden-states at the output of the last layer of the model. pooler_output (`torch.FloatTensor` of shape `(batch_size, hidden_size)`): Last layer hidden-state after a pooling operation on the spatial dimensions. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, num_channels, height, width)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. Nr r.r ) rrrrr rrrrrr rrrrrrfsS 6:x 1 12915M8E--.5=AM8E%"3"3S"89:ArrceZdZUdZdZeejed<dZ ee ed<dZ ee ejdfed<dZ ee ejdfed<y)BaseModelOutputWithPasta Base class for model's outputs that may also contain a past key/values (to speed up sequential decoding). Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): Sequence of hidden-states at the output of the last layer of the model. If `past_key_values` is used only the last hidden-state of the sequences of shape `(batch_size, 1, hidden_size)` is output. past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if `config.is_encoder_decoder=True` in the cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Nr past_key_values.r r )rrrrr rrrrr"rr rr rrrr!r!|sk86:x 1 129'+OXe_+=AM8E%"3"3S"89:A:>Ju00#567>rr!ceZdZUdZdZeejed<dZ ee ejdfed<dZ ee ejdfed<dZ ee ejdfed<y)"BaseModelOutputWithCrossAttentionsa Base class for model's outputs, with potential hidden states and attentions. Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): Sequence of hidden-states at the output of the last layer of the model. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` and `config.add_cross_attention=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads. Nr .r r cross_attentions) rrrrr rrrrr rr r%rrrr$r$s}26:x 1 129=AM8E%"3"3S"89:A:>Ju00#567>@DhuU%6%6%;<=Drr$c eZdZUdZdZeejed<dZ eejed<dZ ee ejdfed<dZ ee ed<dZee ejdfed<dZee ejdfed <y) ,BaseModelOutputWithPoolingAndCrossAttentionsa! Base class for model's outputs that also contains a pooling of the last hidden states. Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): Sequence of hidden-states at the output of the last layer of the model. pooler_output (`torch.FloatTensor` of shape `(batch_size, hidden_size)`): Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` and `config.add_cross_attention=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads. past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if `config.is_encoder_decoder=True` in the cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. Nr r.r r"r r%)rrrrr rrrrrr rr"rr r%rrrr'r's"H6:x 1 12915M8E--.5=AM8E%"3"3S"89:A'+OXe_+:>Ju00#567>@DhuU%6%6%;<=Drr'ceZdZUdZdZeejed<dZ ee ed<dZ ee ejdfed<dZ ee ejdfed<dZee ejdfed<y) )BaseModelOutputWithPastAndCrossAttentionsa Base class for model's outputs that may also contain a past key/values (to speed up sequential decoding). Args: last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): Sequence of hidden-states at the output of the last layer of the model. If `past_key_values` is used only the last hidden-state of the sequences of shape `(batch_size, 1, hidden_size)` is output. past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if `config.is_encoder_decoder=True` in the cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` and `config.add_cross_attention=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads. Nr r".r r r%)rrrrr rrrrr"rr rr r%rrrr)r)s D6:x 1 129'+OXe_+=AM8E%"3"3S"89:A:>Ju00#567>@DhuU%6%6%;<=Drr)cXeZdZUdZdZeejed<dZ eejed<dZ ee ed<dZ ee ejdfed<dZee ejdfed<dZeejed <dZeejed <dZee ejed <y) MoECausalLMOutputWithPasta) Base class for causal language model (or autoregressive) outputs as well as Mixture of Expert's router hidden states terms, to train a MoE model. Args: loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided): Language modeling loss (for next-token prediction). logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`): Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. z_loss (`torch.FloatTensor`, *optional*, returned when `labels` is provided): z_loss for the sparse modules. aux_loss (`torch.FloatTensor`, *optional*, returned when `labels` is provided): aux_loss for the sparse modules. router_logits (`tuple(torch.FloatTensor)`, *optional*, returned when `output_router_logits=True` is passed or when `config.add_router_probs=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, sequence_length, num_experts)`. Router logits of the encoder model, useful to compute the auxiliary loss and the z_loss for the sparse modules. Nlosslogitsr".r r z_lossaux_loss router_logits)rrrrr,rrrrr-r"rr rr r.r/r0rrrr+r+s"H)-D(5$$ %,*.FHU&& '.'+OXe_+=AM8E%"3"3S"89:A:>Ju00#567>*.FHU&& '.,0Hhu(()08