L iG 6ddlZddlZddlZddlmZddlmZmZmZddl Z ddl Z ddl mZddlmZmZddlmZmZmZmZddlmZmZdd lmZmZd d lmZd d lm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,ejZe.Z/eGd deZ0eGddeZ1eGddeZ2eGddeZ3eGddeZ4eGddeZ5eGddeZ6eGddeZ7eGddeZ8eGdd eZ9ee1e0fZ:ee3e2fZ;ee5e4fZee:e;efZ?Gd!d"Z@d#ZAd$ZBd%e jd&e jd'e jd(eDd)eEd*e jf d+ZFy),N) dataclass)AnyOptionalUnion)dynamic_update_slice)TFCausalLMOutputWithPastTFSeq2SeqLMOutput)TF_MODEL_FOR_CAUSAL_LM_MAPPING)TF_MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING%TF_MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING!TF_MODEL_FOR_VISION_2_SEQ_MAPPING) shape_liststable_softmax) ModelOutputlogging)GenerationConfig) TFForcedBOSTokenLogitsProcessorTFForcedEOSTokenLogitsProcessorTFForceTokensLogitsProcessorTFLogitsProcessorListTFMinLengthLogitsProcessorTFNoBadWordsLogitsProcessorTFNoRepeatNGramLogitsProcessor"TFRepetitionPenaltyLogitsProcessor&TFSuppressTokensAtBeginLogitsProcessorTFSuppressTokensLogitsProcessorTFTemperatureLogitsWarperTFTopKLogitsWarperTFTopPLogitsWarperceZdZUdZdZeejed<dZ ee ejed<dZ ee e ejed<dZ ee e ejed<y)TFGreedySearchDecoderOnlyOutputa Base class for outputs of decoder-only generation models using greedy search. Args: sequences (`tf.Tensor` of shape `(batch_size, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size, config.vocab_size)`. attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, generated_length, hidden_size)`. N sequencesscores attentions hidden_states __name__ __module__ __qualname____doc__r$rtfTensor__annotations__r%tupler&r'f/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/generation/tf_utils.pyr#r#9n(&*Ix "))-FHU299% &-48JuRYY/0187;M8E% "234;r2r#c`eZdZUdZdZeejed<dZ ee ejed<dZ ee ejed<dZ ee ejed<dZ ee e ejed<dZee e ejed<dZee e ejed <y) "TFGreedySearchEncoderDecoderOutputa Base class for outputs of encoder-decoder generation models using greedy search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes) Args: sequences (`tf.Tensor` of shape `(batch_size, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size, config.vocab_size)`. encoder_attentions (`tuple(tf.Tensor)`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple of `tf.Tensor` (one for each layer of the decoder) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. encoder_hidden_states (`tuple(tf.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. decoder_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. cross_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. decoder_hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, generated_length, hidden_size)`. Nr$r%encoder_attentionsencoder_hidden_statesdecoder_attentionscross_attentionsdecoder_hidden_statesr)r*r+r,r$rr-r.r/r%r0r7r8r9r:r;r1r2r3r6r6U>&*Ix "))-FHU299% &-59ryy!1298<8E"))$45<<@uRYY'7!89@:>huU299%567>?C8E% *:$;<Cr2r6ceZdZUdZdZeejed<dZ ee ejed<dZ ee e ejed<dZ ee e ejed<y)TFSampleDecoderOnlyOutputaE Base class for outputs of decoder-only generation models using sampling. Args: sequences (`tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size*num_return_sequences, config.vocab_size)`. attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(num_return_sequences*batch_size, num_heads, generated_length, sequence_length)`. hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(num_return_sequences*batch_size, generated_length, hidden_size)`. Nr$r%r&r'r(r1r2r3r?r?r4r2r?c`eZdZUdZdZeejed<dZ ee ejed<dZ ee ejed<dZ ee ejed<dZ ee e ejed<dZee e ejed<dZee e ejed <y) TFSampleEncoderDecoderOutputaZ Base class for outputs of encoder-decoder generation models using sampling. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes) Args: sequences (`tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size*num_return_sequences, config.vocab_size)`. encoder_attentions (`tuple(tf.Tensor)`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple of `tf.Tensor` (one for each layer of the decoder) of shape `(batch_size*num_return_sequences, num_heads, sequence_length, sequence_length)`. encoder_hidden_states (`tuple(tf.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of shape `(batch_size*num_return_sequences, sequence_length, hidden_size)`. decoder_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_return_sequences, num_heads, generated_length, sequence_length)`. cross_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. decoder_hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_return_sequences, generated_length, hidden_size)`. Nr$r%r7r8r9r:r;r<r1r2r3rArAr=r2rAc eZdZUdZdZeejed<dZ eejed<dZ ee ejed<dZ eejed<dZ ee e ejed<dZee e ejed<y) TFBeamSearchDecoderOnlyOutputa Base class for outputs of decoder-only generation models using beam search. Args: sequences (`tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. sequences_scores (`tf.Tensor` of shape `(batch_size*num_return_sequences)`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Final beam scores of the generated `sequences`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed beam scores for each vocabulary token at each generation step. Beam scores consisting of log softmax scores for each vocabulary token and sum of log softmax of previously generated tokens in this beam. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size*num_beams*num_return_sequences, config.vocab_size)`. beam_indices (`tf.Tensor`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Beam indices of generated token id at each generation step. `tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`. attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams, num_heads, generated_length, sequence_length)`. hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams*num_return_sequences, generated_length, hidden_size)`. Nr$sequences_scoresr% beam_indicesr&r'r)r*r+r,r$rr-r.r/rDr%r0rEr&r'r1r2r3rCrC2&*Ix "),0hryy)0)-FHU299% &-(,L(299%,48JuRYY/0187;M8E% "234;r2rCceZdZUdZdZeejed<dZ eejed<dZ ee ejed<dZ eejed<dZ ee ejed<dZee ejed<dZee e ejed <dZee e ejed <dZee e ejed <y) TFBeamSearchEncoderDecoderOutputa Base class for outputs of encoder-decoder generation models using beam search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes) Args: sequences (`tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. sequences_scores (`tf.Tensor` of shape `(batch_size*num_return_sequences)`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Final beam scores of the generated `sequences`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed beam scores for each vocabulary token at each generation step. Beam scores consisting of log softmax scores for each vocabulary token and sum of log softmax of previously generated tokens in this beam. `Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size*num_beams, config.vocab_size)`. beam_indices (`tf.Tensor`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Beam indices of generated token id at each generation step. `tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`. encoder_attentions (`tuple(tf.Tensor)`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple of `tf.Tensor` (one for each layer of the decoder) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. encoder_hidden_states (`tuple(tf.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of shape `(batch_size*num_beams*num_return_sequences, sequence_length, hidden_size)`. decoder_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams*num_return_sequences, num_heads, generated_length, sequence_length)`. cross_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. decoder_hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams*num_return_sequences, generated_length, hidden_size)`. Nr$rDr%rEr7r8r9r:r;r)r*r+r,r$rr-r.r/rDr%r0rEr7r8r9r:r;r1r2r3rIrIs#J&*Ix "),0hryy)0)-FHU299% &-(,L(299%,59ryy!1298<8E"))$45<<@uRYY'7!89@:>huU299%567>?C8E% *:$;<Cr2rIc eZdZUdZdZeejed<dZ eejed<dZ ee ejed<dZ eejed<dZ ee e ejed<dZee e ejed<y) TFBeamSampleDecoderOnlyOutputa Base class for outputs of decoder-only generation models using beam sample. Args: sequences (`tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. sequences_scores (`tf.Tensor` of shape `(batch_size * num_return_sequence)`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Final beam scores of the generated `sequences`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed beam scores for each vocabulary token at each generation step. Beam scores consisting of log softmax scores for each vocabulary token and sum of log softmax of previously generated tokens in this beam. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size*num_beams*num_return_sequences, config.vocab_size)`. beam_indices (`tf.Tensor`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Beam indices of generated token id at each generation step. `tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`. attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams, num_heads, generated_length, sequence_length)`. hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams, generated_length, hidden_size)`. Nr$rDr%rEr&r'rFr1r2r3rLrLrGr2rLceZdZUdZdZeejed<dZ eejed<dZ ee ejed<dZ eejed<dZ ee ejed<dZee ejed<dZee e ejed <dZee e ejed <dZee e ejed <y) TFBeamSampleEncoderDecoderOutputa~ Base class for outputs of encoder-decoder generation models using beam sampling. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes) Args: sequences (`tf.Tensor` of shape `(batch_size*num_beams, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. sequences_scores (`tf.Tensor` of shape `(batch_size * num_return_sequence)`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Final beam scores of the generated `sequences`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed beam scores for each vocabulary token at each generation step. Beam scores consisting of log softmax scores for each vocabulary token and sum of log softmax of previously generated tokens in this beam. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size*num_beams, config.vocab_size)`. beam_indices (`tf.Tensor`, *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Beam indices of generated token id at each generation step. `tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`. encoder_attentions (`tuple(tf.Tensor)`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple of `tf.Tensor` (one for each layer of the decoder) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. encoder_hidden_states (`tuple(tf.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of shape `(batch_size*num_beams, sequence_length, hidden_size)`. decoder_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams, num_heads, generated_length, sequence_length)`. cross_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. decoder_hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size*num_beams, generated_length, hidden_size)`. Nr$rDr%rEr7r8r9r:r;rJr1r2r3rNrN=s"H&*Ix "),0hryy)0)-FHU299% &-(,L(299%,59ryy!1298<8E"))$45<<@uRYY'7!89@:>huU299%567>?C8E% *:$;<Cr2rNceZdZUdZdZeejed<dZ ee ejed<dZ ee e ejed<dZ ee e ejed<y)$TFContrastiveSearchDecoderOnlyOutputa Base class for outputs of decoder-only generation models using contrastive search. Args: sequences (`tf.Tensor` of shape `(batch_size, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size, config.vocab_size)`. attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, generated_length, hidden_size)`. Nr$r%r&r'r(r1r2r3rPrPnsn&&*Ix "))-FHU299% &-48JuRYY/0187;M8E% "234;r2rPc`eZdZUdZdZeejed<dZ ee ejed<dZ ee ejed<dZ ee ejed<dZ ee e ejed<dZee e ejed<dZee e ejed <y) 'TFContrastiveSearchEncoderDecoderOutputa Base class for outputs of encoder-decoder generation models using contrastive search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes) Args: sequences (`tf.Tensor` of shape `(batch_size, sequence_length)`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. scores (`tuple(tf.Tensor)` *optional*, returned when `output_scores=True` is passed or when `config.output_scores=True`): Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size, config.vocab_size)`. encoder_attentions (`tuple(tf.Tensor)`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple of `tf.Tensor` (one for each layer of the decoder) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. encoder_hidden_states (`tuple(tf.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. decoder_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. cross_attentions (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_attentions=True` is passed or `config.output_attentions=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, num_heads, generated_length, sequence_length)`. decoder_hidden_states (`tuple(tuple(tf.Tensor))`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of `tf.Tensor` of shape `(batch_size, generated_length, hidden_size)`. Nr$r%r7r8r9r:r;r<r1r2r3rRrRs<&*Ix "))-FHU299% &-59ryy!1298<8E"))$45<<@uRYY'7!89@:>huU299%567>?C8E% *:$;<Cr2rRc"eZdZdZdZedZdZdZ dDde jde e jde e jd e d e jf d Zd Zd eeeffdZ dEde e jde ede ed eee jffdZde jde ede ed e jfdZ dFde jde ed eeeffdZ dGdeded eee jfde ede ed e e jeee jfff dZ dGde ede ed efdZe dHdede d e e jd!e d e e jeeefff d"Z dIde e jde ed e eee jfd e e je eeee jfffd#Z dIde e jde ed e eee jfd e jfd$Z!ed%e"fd&Z# dJd%e"d eeefde d eeeffd'Z$ dKd(e"d eeefd)ed*edede d+efd,Z%ded efd-Z&ded.ede ed efd/Z'd0ed1ed efd2Z( dLd e jd*e ede ede ede ed3e e d4e e d5e e d6e e d ee)e jffd7Z* dMd e jde ed8e ed*e ede ede ed9e e eefd3e e d4e e d5e e d6e e d ee+e jffd:Z,edNd;Z- dOd e jde ee efde ed8e ed?e ed3e e d4e e d5e e d6e e d ee/e0e jffd@Z1 dPd e jdAe edBe e.de ed8e ed*e ede ede ed3e e d4e e d5e e d6e e d ee2e jffdCZ3y)QTFGenerationMixina A class containing all of the functions supporting generation, to be used as a mixin in [`TFPreTrainedModel`]. The class exposes [`~generation.TFGenerationMixin.generate`], which can be used for: - *greedy decoding* by calling [`~generation.TFGenerationMixin.greedy_search`] if `num_beams=1` and `do_sample=False` - *contrastive search* by calling [`~generation.TFGenerationMixin.contrastive_search`] if `penalty_alpha>0` and `top_k>1` - *multinomial sampling* by calling [`~generation.TFGenerationMixin.sample`] if `num_beams=1` and `do_sample=True` - *beam-search decoding* by calling [`~generation.TFGenerationMixin.beam_search`] if `num_beams>1` You do not need to call any of the above methods directly. Pass custom parameter values to 'generate' instead. To learn more about decoding strategies refer to the [text generation strategies guide](../generation_strategies). Nctjdt|j-tj j j|_|jS)NzG`seed_generator` is deprecated and will be removed in a future version.)warningswarn UserWarning_seed_generatorr-random Generatorfrom_non_deterministic_state)selfs r3seed_generatorz TFGenerationMixin.seed_generatorsE _alm    '#%99#6#6#S#S#UD ###r2Tctd)NzbA model class needs to define a `prepare_inputs_for_generation` method in order to use `generate`.)NotImplementedError)r]argskwargss r3prepare_inputs_for_generationz/TFGenerationMixin.prepare_inputs_for_generations! p  r2r$r%rEnormalize_logitsreturnc ~|Ytjtjtj|djdddt |g}tj tjtj|t |dfd}tj|d|jj|jdf}|r!tjj|d}|dk}tjjtjjdtj |tj"z d}|dd| df}|dd| df}tj$|d|}|jd|z }|dd|df}tj&tj|jd|j} tj||| gd} tj(|| } tj$|d| } | S)a Computes the transition scores of sequences given the generation scores (and beam indices, if beam search was used). This is a convenient method to quickly obtain the scores of the selected tokens at generation time. Parameters: sequences (`tf.Tensor`): The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early due to the `eos_token_id`. scores (`tuple(tf.Tensor)`): Transition scores for each vocabulary token at each generation step. Beam transition scores consisting of log probabilities of tokens conditioned on log softmax of previously generated tokens Tuple of `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each tensor of shape `(batch_size*num_beams, config.vocab_size)`. beam_indices (`tf.Tensor`, *optional*): Beam indices of generated token id at each generation step. `tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)`. Only required if a `num_beams>1` at generate-time. normalize_logits (`bool`, *optional*, defaults to `False`): Whether to normalize the logits (which, for legacy reasons, may be unnormalized). Return: `tf.Tensor`: A `tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)` containing the transition scores (logits) Examples: ```python >>> from transformers import GPT2Tokenizer, TFAutoModelForCausalLM >>> import numpy as np >>> tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2") >>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2") >>> tokenizer.pad_token_id = tokenizer.eos_token_id >>> inputs = tokenizer(["Today is"], return_tensors="tf") >>> # Example 1: Print the scores for each token generated with Greedy Search >>> outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True) >>> transition_scores = model.compute_transition_scores( ... outputs.sequences, outputs.scores, normalize_logits=True ... ) >>> # input_length is the length of the input prompt for decoder-only models, like the GPT family, and 1 for >>> # encoder-decoder models, like BART or T5. >>> input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1] >>> generated_tokens = outputs.sequences[:, input_length:] >>> for tok, score in zip(generated_tokens[0], transition_scores[0]): ... # | token | token string | logits | probability ... print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}") | 262 | the | -1.414 | 24.33% | 1110 | day | -2.609 | 7.36% | 618 | when | -2.010 | 13.40% | 356 | we | -1.859 | 15.58% | 460 | can | -2.508 | 8.14% >>> # Example 2: Reconstruct the sequence scores from Beam Search >>> outputs = model.generate( ... **inputs, ... max_new_tokens=5, ... num_beams=4, ... num_return_sequences=4, ... return_dict_in_generate=True, ... output_scores=True, ... ) >>> transition_scores = model.compute_transition_scores( ... outputs.sequences, outputs.scores, outputs.beam_indices, normalize_logits=False ... ) >>> # If you sum the generated tokens' scores and apply the length penalty, you'll get the sequence scores. >>> # Tip: recomputing the scores is only guaranteed to match with `normalize_logits=False`. Depending on the >>> # use case, you might want to recompute it with `normalize_logits=True`. >>> output_length = np.sum(transition_scores.numpy() < 0, axis=1) >>> length_penalty = model.generation_config.length_penalty >>> reconstructed_scores = np.sum(transition_scores, axis=1) / (output_length**length_penalty) >>> print(np.allclose(outputs.sequences_scores, reconstructed_scores)) True ```Nrraxis)rrdtype)r-tile expand_dimsrangeshapelen transposereshapestackconfig vocab_sizenn log_softmaxmath reduce_max reduce_sumcastint32where broadcast_to gather_nd) r]r$r%rErdbeam_indices_maskmax_beam_lengthcut_idx token_indices gen_step_idxindicestransition_scoress r3compute_transition_scoresz+TFGenerationMixin.compute_transition_scoressf  772>>"((6!9??1;M2NUV#WZ[]`ag]hYijLbjj&)9CK;LMvVFR)?)?bAQ$RS UU&&vA&6F)1,'',, GG  BGG,=RXX$N NVX  Y $A'7'8$89 -a/1A1B.BCxx 11lC //"%7!!WX+. rxx R0@'A=CVCVW ((L-FRPLL9HH%6;LM  r2cX|jsttttg}t }|D]F}|j t|jd}|,|j|jHd|jjd}|r|d|z }t|y)z Confirms that the model class is compatible with generation. If not, raises an exception that points to the right class to use. N)defaultzThe current model class (zQ) is not compatible with `.generate()`, as it doesn't have a language model head.z2 Please use one of the following classes instead: ) can_generater rr r setgettypertaddr) __class__ TypeError)r]generate_compatible_mappingsgenerate_compatible_classes model_mappingsupported_modelsexception_messages r3_validate_model_classz'TFGenerationMixin._validate_model_classUs   ".195 , ( +.% '!= O #0#4#4T$++5FPT#4#U #//334D4M4MN O ,DNN,C,C+DE99 +!'YZuYv%ww!-. .%#r2 model_kwargsc|jjrdD]}|j|dg}tt j |j j}d|vsd|vr5|tt j |jjz}|jD]\}}| ||vs|j| |rtd|dy)zXValidates model kwargs for generation. Generate argument typos will also be caught here.)decoder_input_idsNrbrz8The following `model_kwargs` are not used by the model: zG (note: typos in the generate arguments will also show up in this list)) rtis_encoder_decoderpoprinspect signaturerc parameterscallitemsappend ValueError)r]rkeyunused_model_args model_argsvalues r3_validate_model_kwargsz(TFGenerationMixin._validate_model_kwargsns ;; ) ), ,  d+ ,**4+M+MNYYZ  z !^z%A #g// :EEF FJ&,,. .JC S %:!((- . JK\J]^FF  r2inputsgeneration_configlogits_processorc |j||jjrv|jjt |jk(rJt j |j}||jk7rtjd||_|j}tj|}|jd0i|}|j|j|t|tj r|j"j$rnmt|t&j(r/t'j*|j"t&j,rn$tj.|tj0}|j3d*tj.|dtj0|d<d|vrt|dtj r|dj"j$rnyt|dt&j(r2t'j*|dj"t&j,rn*tj.|dtj0|d<||n t5}|j6Z|j8N|j3dt:j=d|j8}t|t>r|d}||_tj@ } | r|jBs tEd|jG||jH|\} } }tK| d} |jL|d<|jN|d <|jP|d <dtStUjV|jXjZj]v} d |v}|j3dd.|r,| r*|j_| |j6|j8|d<|jj`sT|j6Htjbje| ddd f|j6k(rt:j=d |jj`rd |vr|jg| || }|jj`r.|ji| | ||jj|jH\}}n| dk(r| n|jmd}tK|d }|j3dduxr|jndu}|rD|jp8|jndk(r)tjd|jndtrn^|jpR|s<|jn0t:j=d|jpd|jnd|jp|z|_7t|tj s|jt?|jt|jnkDr&tEd|jtd|jnd||jnk\rC|jj`rdnd}t:j=d|d|d|jnd|jvduxr@|jvdkDxr/|jxduxr|jzduxr|jzdkD}| xr|j|dk(xr|jxdu}| xr|j|dkDxr|jxdu}|j|dk(xr|jxd u}|j|dkDxr|jxd u}|j|||!}|rt|jdkDrtEd"|jd#|j|f|jn|j6|j8||j|jd$|S|r|jdkDrtEd"|jd%|j|f|jv|jz||jn|j6|j8|j|jd&|S|r|j|'}|jd0||j|jj`d(|\}}|j|f|||jn|j6|j8||j|jd)|S|r|j||jkr&tEd*|j|d+|jd,|jd0||j||jj`d d-|\}}|j|f|jn|j6|j8|j|j||j|j|jd. |S|r|j||jkr&tEd*|j|d+|jd,|j|'}|jd0||j||jj`d d-|\}}|j|fd |jn|j6|j8|j|j|||j|j|jd/ |Sy)1a Generates sequences of token ids for models with a language modeling head. Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the model's default generation configuration. You can override any `generation_config` by passing the corresponding parameters to generate, e.g. `.generate(inputs, num_beams=4, do_sample=True)`. For an overview of generation strategies and code examples, check out the [following guide](../generation_strategies). Parameters: inputs (`tf.Tensor` of varying shape depending on the modality, *optional*): The sequence used as a prompt for the generation or as model inputs to the encoder. If `None` the method initializes it with `bos_token_id` and a batch size of 1. For decoder-only models `inputs` should of in the format of `input_ids`. For encoder-decoder models *inputs* can represent any of `input_ids`, `input_values`, `input_features`, or `pixel_values`. generation_config (`~generation.GenerationConfig`, *optional*): The generation configuration to be used as base parametrization for the generation call. `**kwargs` passed to generate matching the attributes of `generation_config` will override them. If `generation_config` is not provided, the default will be used, which had the following loading priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s default values, whose documentation should be checked to parameterize generation. logits_processor (`LogitsProcessorList`, *optional*): Custom logits processors that complement the default logits processors built from arguments and generation config. If a logit processor is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users. seed (`list[int]`, *optional*): Random seed to control sampling, containing two integers, used when `do_sample` is `True`. See the `seed` argument from stateless functions in `tf.random`. kwargs (`dict[str, Any]`, *optional*): Ad hoc parametrization of `generate_config` and/or additional model-specific kwargs that will be forwarded to the `forward` function of the model. If the model is an encoder-decoder model, encoder specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with *decoder_*. Return: [`~utils.ModelOutput`] or `tf.Tensor`: A [`~utils.ModelOutput`] (if `return_dict_in_generate=True` or when `config.return_dict_in_generate=True`) or a `tf.Tensor`. If the model is *not* an encoder-decoder model (`model.config.is_encoder_decoder=False`), the possible [`~utils.ModelOutput`] types are: - [`~generation.TFGreedySearchDecoderOnlyOutput`], - [`~generation.TFSampleDecoderOnlyOutput`], - [`~generation.TFBeamSearchDecoderOnlyOutput`], - [`~generation.TFBeamSampleDecoderOnlyOutput`] If the model is an encoder-decoder model (`model.config.is_encoder_decoder=True`), the possible [`~utils.ModelOutput`] types are: - [`~generation.TFGreedySearchEncoderDecoderOutput`], - [`~generation.TFSampleEncoderDecoderOutput`], - [`~generation.TFBeamSearchEncoderDecoderOutput`], - [`~generation.TFBeamSampleEncoderDecoderOutput`] NaSYou have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration )attention_maskrzThe attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.rz[The selected model does not support Graph mode nor XLA generation (e.g. from tf.function())output_attentionsoutput_hidden_states use_cacheencoder_outputsrizA decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.) batch_sizemodel_input_namerdecoder_start_token_id bos_token_id input_ids max_lengthz0Using the model-agnostic default `max_length` (=zx) to control the generation length. recommend setting `max_new_tokens` to control the maximum length of the generation.zBoth `max_new_tokens` (=z) and `max_length`(=z) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)z3Unfeasable length constraints: the minimum length (z%) is larger than the maximum length ()zInput length of z is z, but `max_length` is set to zW. This can lead to unexpected behavior. You should consider increasing`max_new_tokens`.rFT)rinput_ids_seq_lengthrz)num_return_sequences has to be 1, but is z when doing greedy search.)r pad_token_id eos_token_idr output_scoresreturn_dict_in_generatez when doing contrastive search.)top_k penalty_alpharrrrrr)r)r expand_sizer)r logits_warperrrrseedrrzwBeam search decoding cannot return more sequences than it has beams. Please set num_beams >= num_return_sequences, got z and z (respectively))rrrexpand_in_new_axis) rrrlength_penaltyearly_stoppingrrrnum_return_sequences) do_samplerrrrrrrrrrr1)Krr_from_model_config_original_object_hashhashrfrom_model_configrtrVrWcopydeepcopyupdater isinstancer-r.rk is_floatingnpndarray issubdtypefloatingr{r|rrrrloggerwarninglistexecuting_eagerlysupports_xla_generationr_prepare_model_inputsrrrrrrrrrrkeys&_prepare_attention_mask_for_generationrrx reduce_any._prepare_encoder_decoder_kwargs_for_generation)_prepare_decoder_input_ids_for_generationrrrmax_new_tokensrX min_lengthrrr num_beams_get_logits_processorr greedy_searchrrcontrastive_search_get_logits_warper_expand_inputs_for_generationsample beam_searchrr)r]rrrrrbnew_generation_configrruse_xla inputs_tensorrraccepts_attention_maskrequires_attention_maskrrhas_default_max_lengthinput_ids_stringis_contrastive_search_gen_modeis_greedy_gen_modeis_beam_gen_modeis_sample_gen_modeis_beam_sample_gen_moders r3generatezTFGenerationMixin.generates L ""$  $ %%88T=S=S=i=imq&&n>)9(J(J4;;(W%(D,B,BBMMB .CD* $ 6 6  MM*;</(//9&9  ##L$5$5$78  &")),1I1IFBJJ/BMM&,,PRP[P[4\2   , - 9-/WW\BR5SUWU]U]-^L) * , .<(;,O,O ()/@/U/U +,$5$?$? [!!1S9J9J4999U9`9`9e9e9g5h!h"3<"G   ,d 3 ;@W\r-1-X-X0==?P?]?].L) * {{-- --9bgg>P>Pae$(9(F(FF?l ;; ) ).?|.SNN|-=L ;; ) )&*&T&T%!1)'8'O'O.;; 'U' #I|*:[)H lN^N^_jNkI *)4R8!'L!9T!A!nFWFbFbjnFn !&7&F&F&NSdSoSosuSu MMBCTC_C_B`aII   - - 9).?.J.J.V./@/O/O.PPd(3345ff ,=+K+KNb+b  (. :!,,8%003D3O3OO IJ[JfJfIgh11B1M1M0NaQ$'8'C'CC:>++:X:X#6^i &'7&8=QgDUD_D_cgDg#4#>#>#B"kHYHcHcgkHk 55/!5-6   559 ?@Q@f@f?gh&& &4%% ,77.;;.;;!1/==(9(Q(Q   , 559 ?@Q@f@f?gh++ +4** '--/==!1,77.;;.;;/==(9(Q(Q     33FW3XM'Id&H&H'#-BB#';;#A#A' ' #I|4;; !1+,77.;;.;;/==(9(Q(Q    **->-S-SS 22C2M2M1NO)>>?P'Id&H&H'#-77#';;#A#A#' '  ' #I|$4## ,77.;;.;;0??0??!1/==(9(Q(Q%6%K%K   % **->-S-SS 22C2M2M1NO)>>?P!33FW3XM'Id&H&H'#-77#';;#A#A#' '  ' #I|$4##,77.;;.;;0??0??!1+/==(9(Q(Q%6%K%K +%r2rrct|jdk(xr,|jtjtj fv}|duxr"tj j||k(}|duxs||k7}|rG|rE|rCtjtj j||tjStj|jddtjS)Nrrj) rprorkr-r|int64rxrr{ not_equalones)r]rrr is_input_idsis_pad_token_in_inputs&is_pad_token_not_equal_to_eos_token_ids r3rz8TFGenerationMixin._prepare_attention_mask_for_generations 6<<(A-V&,,288RXXBV2V ".d":!j@R@RSY]iSi@j2>$2F1iL\hLh. 27]77277,,V\B"((S S776<<+288< zSTFGenerationMixin._prepare_encoder_decoder_kwargs_for_generation.. sI!x**1-IsrbrT return_dictrr1) get_encoderranyrrrrrmain_input_name) r]rrrencoderirrelevant_prefixrrencoder_kwargsencoder_signatureencoder_accepts_wildcardrs ` r3rz@TFGenerationMixin._prepare_encoder_decoder_kwargs_for_generations-""$D$0#5#5#7  %I7HII eO    1 1',, ? J JK#+/@#@#gNVgDg '7E7K7K7M$3HeQY]nQn%N )-}%+8'( t33 337N4// 0!3N3*9 &') s!C+ C1,C1rrrc|d|vr|jd}nd|vr|dk7r|jd}nd}|j||}tj|dftj|z}||}||fStj |dddf|k7r\tj ||gd}d |vr?|d }tj tj|ddddf|fd}||d <||fS) zGPrepares `decoder_input_ids` for generation with encoder-decoder modelsNrrrrjrrirgdecoder_attention_mask)r_get_decoder_start_token_idr-rr| reduce_allconcat ones_like) r]rrrrrrdecoder_input_ids_startrs r3rz;TFGenerationMixin._prepare_decoder_input_ids_for_generation3s2  #(;|(K , 0 01D E  L (-=-L , 0 0 =  $ "&!A!ABXZf!g"$'':q/"JMc"c  $ 7 !,..]],QT26LL M " +BDU*V]_ ` '<7)56N)O&)+\\"89!RaR%@BXY*&:P 56 ,..r2c||n|jj}||n|jj}||S||Std)Nz\`decoder_start_token_id` or `bos_token_id` has to be defined for encoder-decoder generation.)rrrr)r]rrs r3rz-TFGenerationMixin._get_decoder_start_token_idZse&1 #''>>  (4'?|TE[E[EhEh ! -) )  %  j  r2rrrrc dtjffd fd}||}||}|r*|jd td||d|d<||fS)a Expands tensors from [batch_size, ...] to [batch_size * expand_size, ...] or [batch_size, expand_size, ...], depending on `expand_in_new_axis`. Beam-based approaches expect this function to be used with `expand_in_new_axis=True` tensorc r._expand_tensor|sZ!"6*vagq;8ORWX]^_^`XaRb8bccyy1==r2cz|D]4}|| t||tjs'||||<6|Sr)rr-r.)dict_to_expandrrs r3_expand_dict_for_generationzTTFGenerationMixin._expand_inputs_for_generation.._expand_dict_for_generationsQ% N!#&2z.QTBUWYW`W`7a*89L*MN3' N" !r2rzMIf `is_encoder_decoder` is True, make sure that `encoder_outputs` is defined.)r-r.rr)rrrrrr rs` ` @r3rz/TFGenerationMixin._expand_inputs_for_generationnsv >299 > "  &y1I2<@  12: !pqq.I,WhJi.jL* +,&&r2c X|jjr\t|drPt|jdr:|jj|jk7r|jj}n |j}|j Dcic]\}}|||k7s||}}}|j |d}||td|d|d|d|d ||}|d k(rd |vr|jjsd ttj|jjjv}|s#td |jjd |j!||| |d <n | td|d d }}|j!|||}|||fScc}}w)zT This function extracts the model-specific `inputs` for generation. r r Nz `inputs`: z` were passed alongside z0 which is not allowed. Make sure to either pass z or z=...r inputs_embedszAYou passed `inputs_embeds` to `.generate()`, but the model class z doesn't have its forwarding implemented. See the GPT2 implementation for an example (https://github.com/huggingface/transformers/pull/21405), and feel free to open a PR with it!)rzMYou passed `inputs_embeds` and `input_ids` to `.generate()`. Please pick one.)rtrhasattrr r rrrrrrrcrrrr)*_maybe_initialize_input_ids_for_generation) r]rrr input_namekv inputs_kwarghas_inputs_embeds_forwardings r3rz'TFGenerationMixin._prepare_model_inputss KK * *i( &78 ,,0D0DD55J--J)5););)=bARSWaRa1b b$'' D9  #(:VH$#%%d&H&HITTYY[C0,4$[\`\j\j\s\s[tuxx-1,[,[L|-\- [)%$%tuu!-o!>JF@@Wcdz<//Wcs F&"F&c||S|jd}|jjrC|A|jjdd}t j |t jdzS| tdd}|jD]-}t|t js|jd}nt j |dft j|zS) z3Initializes input ids for generation, if necessary.NrrirjizB`bos_token_id` has to be defined when no `input_ids` are provided.rr) rrtrlast_hidden_stateror-rr|rvaluesrr.)r]rrrrrorrs r3r$z!?13#RWWj.H.KQ-OWYW_W_%`ahj2 -.r2 model_outputscur_lenr batch_axiscpfd}fd} d} d} |j|} | &tdt|jd|j dddu} | s||z dz }||||}| | ||}n|d z }| |||}| | ||}|j |t ||d<|S) Nc |rtjtjdftjtj|ftjtjdftjgd}d|i}|S|j d}tj|tj|f|j tjdf|j gd}d|i}|S)zainitializes the appropriate attention mask -- encoder-decoder models use `decoder_attention_mask`rrjrgrr)r-rrr|zerosrrk)rnum_padding_valuesrrmaskrrs r3_initialize_attentionzXTFGenerationMixin._update_model_kwargs_for_xla_generation.._initialize_attentions!*,Qrxx@*.@!ARQrxx@  *&12HIK".!1!12B!C!#&*.@!AI]I]^Q~7K7KL  ").9Kr2cxtjddgtj|z}|rG|jd}tj df|j }t |||}d|i}|S|jd}tj df|j }t |||}d|i}|S)z]updates the appropriate attention mask -- encoder-decoder models use `decoder_attention_mask`rrrjrr)r-constantr|rrrkr) rnew_past_indexr update_startr#decoder_attention_mask_update_slicer<rattention_mask_update_slicers r3_update_attentionzTTFGenerationMixin._update_model_kwargs_for_xla_generation.._update_attention4s;;1vRXX>OL!)5)9)9:R)S&68ggz1oUkUqUq6r3)=*,OQ]*&12HI K ".!1!12B!C.0ggz1o^MaMa.b+!5nFaco!p(.9Kr2c|dk(rtjddgddgd|gddggtj}d}|D]T}t|}t t |ddD]}tj |||||< |t|fz }V|Stjddgg|gd }t|}t t |D]}tj |||||< |S) zNinitialize past_key_values with zeros -- the structure depends on `batch_axis`rrjr1Nrr)r)rupdatesro) r-r?r|rrnrppadr0 scatter_nd)r/r;r7padding_valuesnew_past past_layernew_past_layeris r3_initialize_pastzSTFGenerationMixin._update_model_kwargs_for_xla_generation.._initialize_pastEsQ!#q!fq!fqBT>UXY[\W]-^fhfnfn!o"19J%)*%5N"3~bq'9#:;R,.FF:a=.,Qq)R~!6 88H 9O "$AxJ\I]ek!l0s?34MA"$&&);^"LHQKMOr2c D|dk(rtjgd}d}|D]k}t|}tt |ddD]5}||ddddddf}t ||ddddddf|||z||<7|t |fz }m|Stjgd}tt |D cgc]} d}} tt |D];}||ddddddddf}t ||ddddddddf|||z||<=|Scc} w)Nr)rrrrr1rri)rrrrr)r-r?rrnrprr0) r/r@r7slice_start_baserLrMrNrO update_slice_s r3 _update_pastzOTFGenerationMixin._update_model_kwargs_for_xla_generation.._update_pastVs[Q#%;;|#< "1 9J%)*%5N"3~bq'9#:;'1!}Q23Y'? -A&qM!Q)4lDTWeDe-q) ~!6 88H 9(O$&;;#? */O0D*EFQDFFs?34A#21#5aArsl#CL#7'*1aCRC<8,HX[iHi#HQK OGs> DzPNo known `past_key_values variable` found in model outputs (model outputs keys: rr/rr)r2rrrrrr0)r]r5rr6rrrr7r=rDrPrUr/is_past_initializedr;r<rLr@s ` r3'_update_model_kwargs_for_xla_generationz9TFGenerationMixin._update_model_kwargs_for_xla_generations 8 " " 4>>}M  "++-./q2 +../@$GtS"",g!5!9 (7IK]^D'9KZXH %q[N$\>CUVD#O^ZPH D!*// &'r2c*t}|jdkDr6t|jtrt |jdz}nd}nd}|j 3|j dk7r$|jt|j |j5|jdk7r&|jt|j||j5|jdkr&|jt|j||S)z This class returns a [`TFLogitsProcessorList`] list object that contains all relevant [`TFLogitsWarper`] instances used for multinomial sampling. rr?r)rmin_tokens_to_keep)top_prZ) rrrrrrp temperaturerrrr r[r!)r]rwarpersrZs r3rz$TFGenerationMixin._get_logits_warpers()  & & *+88$?%():)G)G%H1%L"%&"!"   ( ( 49J9V9VZ]9] NN45F5R5RS T  " " .3D3J3Ja3O NN-4E4K4K`rs t  " " .3D3J3JS3P NN-4E4K4K`rs tr2rct}|j4|jdk7r%|jt|j|j3|jdkDr$|jt |j|j /|jt|j |j|jJ|j>|jdkDr/|jt|j|j|j$|jt|j|j/|jt|j|j|j $|jt#|j |j$a|}|dkDs |j|n|dz}t'|dd||j(ddz }|jt+|j$|t'|dd$|jt-|j(|j/||}|S)z This class returns a [`TFLogitsProcessorList`] list object that contains all relevant [`TFLogitsProcessor`] instances used to modify the scores of the language model head. NrY)penaltyrrforced_decoder_idsri)rrepetition_penaltyrrno_repeat_ngram_sizer bad_words_idsrrrrforced_bos_token_idrforced_eos_token_idrrsuppress_tokensrbegin_suppress_tokensgetattrr`rr_merge_criteria_processor_list)r]rrr processors begin_indexs r3rz'TFGenerationMixin._get_logits_processorsN+,   / / ;@Q@d@dhk@k   @IZImImn o  1 1 =BSBhBhklBl   <=N=c=cd e  * * 6   +,=,K,KM^MkMkl   ( ( 4!..:!,,q0   89J9U9UWhWuWuv w  0 0 <   =>O>c>cd e  0 0 <   /0A0L0LN_NsNst   , , 8   =>O>_>_` a  2 2 >.K)1,0A0U0U0] 1_  (*>EQ0CCBG     67H7^7^`kl  $&:D A M   :;L;_;_` a88EUV r2 default_list custom_listct|dk(r|S|D]K}|D]D}t|t|usd}td|dt|d|d|d|d|d |d M|j||S) Nrzlogits processorz A custom z of type z with values zP has been passed to `generate`, but it has already been created with the values z. z has been created by passing the corresponding arguments to generate or by the model's config default values. If you just want to change the default values of zL consider passing them as arguments to `generate` instead of using a custom .)rprrextend)r]rlrmrcustom object_types r3riz0TFGenerationMixin._merge_criteria_processor_lists { q  # G% <4=0"4K$#K= $v,}U[T\]XX_W``bcjbklUU`TabVVaUbbc e   K(r2rrrrc   n tnjjnjjnjjt t rgnjjnjjnjj  njj | jdjjtj dtvrtj n tt#fddDrdnddt%t'j(j*j,j/v rrgnd rrgnd rrgnd rrgndt1|\} tj2| z ftj4 xsdz} tj6|| gd } tj8ftj: }d } fd }|| || | \} }} } | z }tj<||| || | f|\} }} }s | ddd| f} rj>j@r~r| djCdnd}r| djCdnd} tEnd tEnd tEnd tEndtG| ||StI| S| S)a Generates sequences for models with a language modeling head using greedy decoding. Parameters: input_ids (`tf.Tensor` of shape `(batch_size, sequence_length)`): The sequence used as a prompt for the generation. logits_processor (`TFLogitsProcessorList`, *optional*): An instance of [`TFLogitsProcessorList`]. List of instances of class derived from [`TFLogitsProcessor`] used to modify the prediction scores of the language modeling head applied at each generation step. max_length (`int`, *optional*, defaults to 20): The maximum length of the sequence to be generated. pad_token_id (`int`, *optional*): The id of the *padding* token. eos_token_id (`Union[int, list[int]]`, *optional*): The id of the *end-of-sequence* token. Optionally, use a list to set multiple *end-of-sequence* tokens. output_attentions (`bool`, *optional*, defaults to `False`): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more details. output_hidden_states (`bool`, *optional*, defaults to `False`): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more details. output_scores (`bool`, *optional*, defaults to `False`): Whether or not to return the prediction scores. See `scores` under returned tensors for more details. return_dict_in_generate (`bool`, *optional*, defaults to `False`): Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. model_kwargs: Additional model specific keyword arguments will be forwarded to the `call` function of the model. If model is an encoder-decoder model the kwargs should include `encoder_outputs`. Return: [`~generation.TFGreedySearchDecoderOnlyOutput`], [`~generation.TFGreedySearchEncoderDecoderOutput`] or `tf.Tensor`: A `tf.Tensor` containing the generated tokens (default behaviour) or a [`~generation.TFGreedySearchDecoderOnlyOutput`] if `model.config.is_encoder_decoder=False` and `return_dict_in_generate=True` or a [`~generation.TFGreedySearchEncoderDecoderOutput`] if `model.config.is_encoder_decoder=True`. Examples: ```python >>> from transformers import ( ... AutoTokenizer, ... TFAutoModelForCausalLM, ... TFLogitsProcessorList, ... TFMinLengthLogitsProcessor, ... ) >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2") >>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2") >>> # set pad_token_id to eos_token_id because GPT2 does not have a PAD token >>> model.generation_config.pad_token_id = model.generation_config.eos_token_id >>> input_prompt = "Today is a beautiful day, and" >>> input_ids = tokenizer(input_prompt, return_tensors="tf").input_ids >>> # instantiate logits processors >>> logits_processor = TFLogitsProcessorList( ... [ ... TFMinLengthLogitsProcessor(15, eos_token_id=model.generation_config.eos_token_id), ... ] ... ) >>> outputs = model.greedy_search(input_ids, logits_processor=logits_processor) >>> tokenizer.batch_decode(outputs, skip_special_tokens=True) ["Today is a beautiful day, and I'm so happy to be here. I'm so happy to"] ```NrEncoderDecoderc3&K|]}|v ywrr1r model_prefix model_names r3rz2TFGenerationMixin.greedy_search..`#h#hTFGPT2TFCTRLrruse_memsrjrirgc.tj|Szstate termination condition fn.r-r generatedfinished_sequencesr6rs r3greedy_search_cond_fnz>TFGenerationMixin.greedy_search..greedy_search_cond_fntsMM"455 5r2c |jdr |ddd|f}n tj|dd|dz fd}j|fdi|}di|dd}|jdddf}|||}srrj |r2j jrj |jndrbj jsLj |jj jrj |jr2j jrj |jn3r1j jrj |jtj|dtj}  td dtj |tjz } | | zd| z zz} tj"j%tj&tj(| t+ ftjdd } || z}tj,tj. tj(| ggd } tj0|| | }|dz }r-j3||| j j }nLj5||j j}|jdd|j7dd||||fS)state update fn.r/NrrirTrrr)rh output_typeGIf `eos_token_id` is defined, make sure that `pad_token_id` is defined.rrgrrrHr5rr6rrrr7rr1)rr-rmrclogitsrrtrr9r&r:r;r'argmaxr|rr{rxrequalr~rprsrntensor_scatter_nd_updaterWr4r)rrr6rr model_inputsr5next_token_logitsnext_tokens_scores next_tokensunfinished_seqnext_token_is_eosupdate_indicesrcache_batch_axisr:r9r;rrrneeds_full_inputrrrrrr%r]rrs r3greedy_search_body_fnz>TFGenerationMixin.greedy_search..greedy_search_body_fnys$ 12:>N%a'k2 NN9Q! ^+DbI =4==imS\m`lmL  "3%9 M !. 4 4QU ; "2)=NPW!X 6 MM"45$)G)G&--m.N.NO&t{{/M/M&--m.F.FG{{55(// 0N0NO'DKK,J,J)001T1TU)dkk.L.L)001L1LM))$6RRXXVK''$%noo!"RWW-?%J!J)N:\QQ_M_=`` $&GG$6$6HH c,6G5TUWYWeWefrtvWw %7%! &8:K%K" XXrxx ';R__WWaVb=c&dkmnN339nfqrI qLG#KK"/!-#))'+{{'E'E/ L   $GG!L)6)B H^H^HlHl !2!> DDZDZDlDl %9$D $J`J`JuJu  '2 $''?? !$$[$2H2H2R2RS **,,+;c$i*GS&SQUY ##hSg#h h1no%W->->t?a?a-b-m-m-r-r-t)uu0M$;@QRX\"9>O2VZ'>CW^b)3 GGGZg1E$FbhhW[g[lklmIIy*;<2F XXzm277C 6 L HL HL H`@U )7L@ < %w ('1#%== ! ! *G\ B1 $  1gq!!XgX+.I "{{--[l\2C%D%H%H%Vqu"L`L!2377Hfj&+1*<v$BTB`U+=%>fj">N>Z5)9#:`d H]Hi.C(Dos%9'!'9*?'9%5*?7'!1"7  r2rrc    n tn tnjjnjjnjjt t rg  njj njj  njj  njj | jdjjtj dtvrtj n tt#fddDrdnddt%t'j(j*j,j/v r rgnd rrgnd rrgnd r rgndt1|\} tj2| z ftj4 xsdz}tj6||gd }tj8ftj: }d }   fd }|||| | \}}} } | z }tj<||||| | f|\}}} } s |ddd| f} rj>j@r~r| djCdnd} r| djCdnd} tEnd tEnd tEnd tEndtG|||StI|S|S)ar Generates sequences for models with a language modeling head using multinomial sampling. Parameters: input_ids (`tf.Tensor` of shape `(batch_size, sequence_length)`): The sequence used as a prompt for the generation. logits_processor (`TFLogitsProcessorList`, *optional*): An instance of [`TFLogitsProcessorList`]. List of instances of class derived from [`TFLogitsProcessor`] used to modify the prediction scores of the language modeling head applied at each generation step. logits_warper (`TFLogitsProcessorList`, *optional*): An instance of [`TFLogitsProcessorList`]. List of instances of class derived from [`TFLogitsWarper`] used to warp the prediction score distribution of the language modeling head applied before multinomial sampling at each generation step. max_length (`int`, *optional*, defaults to 20): The maximum length of the sequence to be generated. pad_token_id (`int`, *optional*): The id of the *padding* token. eos_token_id (`Union[int, list[int]]`, *optional*): The id of the *end-of-sequence* token. Optionally, use a list to set multiple *end-of-sequence* tokens. seed (`list[int]`, *optional*): Random seed to control sampling, containing two integers, used when `do_sample` is `True`. See the `seed` argument from stateless functions in `tf.random`. output_attentions (`bool`, *optional*, defaults to `False`): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more details. output_hidden_states (`bool`, *optional*, defaults to `False`): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more details. output_scores (`bool`, *optional*, defaults to `False`): Whether or not to return the prediction scores. See `scores` under returned tensors for more details. return_dict_in_generate (`bool`, *optional*, defaults to `False`): Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. model_kwargs: Additional model specific kwargs will be forwarded to the `call` function of the model. If model is an encoder-decoder model the kwargs should include `encoder_outputs`. Return: [`~generation.TFSampleDecoderOnlyOutput`], [`~generation.TFSampleEncoderDecoderOutput`] or `tf.Tensor`: A `tf.Tensor` containing the generated tokens (default behaviour) or a [`~generation.TFSampleDecoderOnlyOutput`] if `model.config.is_encoder_decoder=False` and `return_dict_in_generate=True` or a [`~generation.TFSampleEncoderDecoderOutput`] if `model.config.is_encoder_decoder=True`. Examples: ```python >>> import tensorflow as tf >>> from transformers import ( ... AutoTokenizer, ... TFAutoModelForCausalLM, ... TFLogitsProcessorList, ... TFMinLengthLogitsProcessor, ... TFTopKLogitsWarper, ... TFTemperatureLogitsWarper, ... ) >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2") >>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2") >>> # set pad_token_id to eos_token_id because GPT2 does not have a EOS token >>> model.generation_config.pad_token_id = model.generation_config.eos_token_id >>> input_prompt = "Today is a beautiful day, and" >>> input_ids = tokenizer(input_prompt, return_tensors="tf").input_ids >>> # instantiate logits processors >>> logits_processor = TFLogitsProcessorList( ... [ ... TFMinLengthLogitsProcessor(15, eos_token_id=model.generation_config.eos_token_id), ... ] ... ) >>> # instantiate logits processors >>> logits_warper = TFLogitsProcessorList( ... [ ... TFTopKLogitsWarper(50), ... TFTemperatureLogitsWarper(0.7), ... ] ... ) >>> tf.random.set_seed(0) >>> outputs = model.sample(input_ids, logits_processor=logits_processor, logits_warper=logits_warper) >>> tokenizer.batch_decode(outputs, skip_special_tokens=True) ['Today is a beautiful day, and I love my country. But when I look at Donald Trump,'] ```Nrrtc3&K|]}|v ywrr1rvs r3rz+TFGenerationMixin.sample..|ryrzr{rrr~rjrirgc.tj|Srrrs r3sample_cond_fnz0TFGenerationMixin.sample..sample_cond_fnsMM"455 5r2c |jdr |ddd|f}n tj|dd|dz fd}j|fd i|}di|dd}|jdddf}|||}|||}!srrj |r2j jrj |jndrbj jsLj |jj jrj |jr2j jrj |jn3r1j jrj |j} nutjjjj!tj"j$tj"j&dtj"} tj(tjj+|d| tj" d }  t-d dtj.|tj"z } | | zd| z zz} tj0j3tj4tj6| t9ftjdd } || z}tj:tj<tj6|ggd } tj>|| | }|dz }!r-jA|||j j}nLjC||j j}|jdd|jEdd||||fS)Nr/rrirTr)rrj)r num_samplesrrkrgrrrrrr1)#rr-rmrcrrrtrr9r&r:r;r' experimentalnumpyrZrandintr|minmaxsqueezestateless_categoricalrr{rxrrr~rprsrnrrWr4r)"rrr6rrrr5rr sample_seedrrrrrrr:r9r;rrrrrrrrrrr%rr]rrs" r3sample_body_fnz0TFGenerationMixin.sample..sample_body_fns 12:>N%a'k2 NN9Q! ^+DbI =4==imS\m`lmL  "3%9 M !. 4 4QU ; "2)=NPW!X !.y:Lg!V 6 MM"45$)G)G&--m.N.NO&t{{/M/M&--m.F.FG{{55(// 0N0NO'DKK,J,J)001T1TU)dkk.L.L)001L1LM"  oo33::BB288<L)6)B H^H^HlHl !2!> DDZDZDlDl %9$D $J`J`JuJu  '2 $''?? !$$[$2H2H2R2RS **,,+;c$i*GS&SQUY ##hSg#h h1no%W->->t?a?a-b-m-m-r-r-t)uu0M$;@QRX\"9>O2VZ'>CW^b)3 GGGZg1E$FbhhW[g[lklmIIy*;<2F XXzm277C 6U HU HU Hr@N )7L@ < %w ('1#%==   *G\ B1 $  1gq!!XgX+.I "{{--[l\2C%D%H%H%Vqu"L`L!2377Hfj&+1*<v$BTB`U+=%>fj">N>Z5)9#:`d H]Hi.C(Dos%3'!'9*?'9%5*?1'!1"7  r2cRfd}tjj||S)zDGathers the beam slices indexed by beam_indices into new beam array.c<dkDrltjtjtj|dtjfd}tj||}tj |dd}dkDrtjtjtj|dtjfd}tj j|}tj||}|S)Nrrg)permr)paramsrrh batch_dims)r-rrnrankrqgatherrxinvert_permutation)rrgathered_tensorr7rEs r3 gather_fnz2TFGenerationMixin._gather_beams..gather_fn#sA~yy"((2776?";JK"H"((S]J^!_fghf48 iiv|RS`abOA~yy"((2776?";JK"H"((S]J^!_fghww11$7"$,,T"J" "r2r-nest map_structure)nestedrEr7rs `` r3 _gather_beamszTFGenerationMixin._gather_beamss! #ww$$Y77r2rrrrc  "#$%&'()*+,-.dd(dd,n t  n t njj||njj}njjt t rg| | njj}  njj  njj  njj njjnjjnjj|jdjj-t!j" .dt%vrt%j&n t%)t))fddDrd nd$d t+t-j.j0j2j5v*r rgnd"r rgnd&r rgnd%r rgnd't7|\#+}|}t!j8#+|z ft j: |xsdz}t!j<||gd }t!j8#+ft j: |xsdz}t!j>#+ft j@ }t!jBt!jDt!jFdgdg+d z zzd #d g}t!j8#+fdz}t!j8#+|z ft j: d z}t!j8#+|z ft j: d z}d|vr(|dd|dd<d|vr(|d|d<fd}"#$%&'( *+ ,-.fd}||||||||||| \ }}}}}}}}}}|z }t!jH||||||||||||f |\ }}}}}}}}}}t jJjM|d }t!jN|ddddf||}t!jN|ddddf||}|t!jP||z t jR zz }t!jN|dddf||}(|ddd| ddf}(|ddd| f}(|ddd| ddf}.s|ddd|f}|ddd||z f}rjTjVrO r|djYdnd} r|djYdnd} rtZnt\}!|!||"||| &%' Srt^nt`}!|!||"|&'S|S)a Generates sequences for models with a language modeling head using beam search. If `do_sample` is `False`, uses a greedy approach, otherwise does multinomial sampling without replacement. Parameters: input_ids (`tf.Tensor` of shape `(batch_size, num_beams, sequence_length)`): The sequence used as a prompt for the generation. do_sample (`bool`, *optional*, defaults to `False`): Whether or not to use sampling ; use greedy decoding otherwise. max_length (`int`, *optional*, defaults to 20): The maximum length of the sequence to be generated. pad_token_id (`int`, *optional*): The id of the *padding* token. eos_token_id (`Union[int, list[int]]`, *optional*): The id of the *end-of-sequence* token. Optionally, use a list to set multiple *end-of-sequence* tokens. length_penalty (`float`, *optional*, defaults to 1.0): Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), `length_penalty` > 0.0 promotes longer sequences, while `length_penalty` < 0.0 encourages shorter sequences. early_stopping (`bool` or `str`, *optional*, defaults to `False`): Controls the stopping condition for beam-based methods, like beam-search. It accepts the following values: `True`, where the generation stops as soon as there are `num_beams` complete candidates; `False`, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; `"never"`, where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm). logits_processor (`[TFLogitsProcessorList]`, *optional*): An instance of [`TFLogitsProcessorList`]. List of instances of class derived from [`TFLogitsProcessor`] used to modify the prediction scores of the language modeling head applied at each generation step. logits_warper (`TFLogitsProcessorList`, *optional*): An instance of [`TFLogitsProcessorList`]. List of instances of class derived from [`TFLogitsWarper`] used to warp the prediction score distribution of the language modeling head applied before multinomial sampling at each generation step. num_return_sequences(`int`, *optional*, defaults to 1): The number of independently computed returned sequences for each element in the batch. output_attentions (`bool`, *optional*, defaults to `False`): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more details. output_hidden_states (`bool`, *optional*, defaults to `False`): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more details. return_dict_in_generate (`bool`, *optional*, defaults to `False`): Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple. model_kwargs: Additional model specific kwargs will be forwarded to the `call` function of the model. If model is an encoder-decoder model the kwargs should include `encoder_outputs`. Return: [`~generation.TFBeamSearchDecoderOnlyOutput`], [`~generation.TFBeamSearchEncoderDecoderOutput`] or `tf.Tensor`: A `tf.Tensor` containing the generated tokens (default behaviour) or a [`~generation.TFBeamSearchDecoderOnlyOutput`] if `model.config.is_encoder_decoder=False` and `return_dict_in_generate=True` or a [`~generation.TFBeamSearchEncoderDecoderOutput`] if `model.config.is_encoder_decoder=True`. Examples: ```python >>> from transformers import ( ... AutoTokenizer, ... TFAutoModelForSeq2SeqLM, ... TFLogitsProcessorList, ... TFMinLengthLogitsProcessor, ... ) >>> import tensorflow as tf >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base") >>> model = TFAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base") >>> encoder_input_str = "translate English to German: How old are you?" >>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="tf").input_ids >>> # lets run beam search using 3 beams >>> num_beams = 3 >>> # define decoder start token ids >>> input_ids = tf.ones((1, num_beams, 1), dtype=tf.int32) >>> input_ids = input_ids * model.generation_config.decoder_start_token_id >>> # add encoder_outputs to model keyword arguments >>> encoder_outputs = model.get_encoder()(encoder_input_ids, return_dict=True) >>> encoder_outputs.last_hidden_state = tf.repeat( ... tf.expand_dims(encoder_outputs.last_hidden_state, axis=0), num_beams, axis=1 ... ) >>> model_kwargs = {"encoder_outputs": encoder_outputs} >>> # instantiate logits processors >>> logits_processor = TFLogitsProcessorList( ... [TFMinLengthLogitsProcessor(5, eos_token_id=model.generation_config.eos_token_id)] ... ) >>> outputs = model.beam_search(input_ids, logits_processor=logits_processor, **model_kwargs) >>> tokenizer.batch_decode(outputs, skip_special_tokens=True) ['Wie alt bist du?'] ```rc|t|}tj||d|||||dzzgz||dzdzS)z8Flattens the first two dimensions of a non-scalar array.Nrrrr-rr)rr7ros r3flatten_beam_dimz7TFGenerationMixin.beam_search..flatten_beam_dimsZv&E::kz"eJ&7% Q:O&O%PPSXYcfgYgYiSjj r2cft|}tj||d|d|gz||dzdzS)zFUnflattens the first, flat batch*beam dimension of a non-scalar array.Nrirr)rrr7ros r3unflatten_beam_dimz9TFGenerationMixin.beam_search..unflatten_beam_dimsAv&E::feKZ&8B ?&JUS]`aSaScMd&de er2Nrrtc3&K|]}|v ywrr1rvs r3rz0TFGenerationMixin.beam_search..ryrzr{rr~rjrirgerr+rc |k} dk(rdkDr|ddddf|z zz } n7|ddddftj||z tjzz } tj|tjj |ddd} tjj | | kD} tjj|duz}| |z| zS) z Beam Search termination condition function -- halts the generation loop if any of these conditions becomes False neverrNrrjTrhkeepdimsr)r-r{float32r}rx reduce_minrr)r6running_sequencesrunning_scoresrunning_beam_indicesr$r%rEis_sent_finisheddecoder_prompt_lenrnot_max_length_yetbest_running_scoreworst_finished_scoreimprovement_still_possiblestill_open_beamrrrs r3beam_search_cond_fnz:TFGenerationMixin.beam_search..beam_search_cond_fn s""):!5 (^c-A%3ArrE%:zL^?^cq>q%r"%3ArrE%:GGG&88 K~]&"$&88 "''"4"4V!d"4"SU[$ *,););tj4|dddd|ftA<g|dddd|fjztjtjzz }tj4tj&jM|ddtG|;duz}%||%z}&|tjH|&tjJdzz }tjB||gd }'tjB||gd }(tjB||gd })tjB||gd }*tj&j)|(C d}+Hj+|'|(|)|*g|+\},}-}.}/|dz}d| vr|tjNjQ6CIfd| jR}0Hj+||!}1Hj+|0|16}2tjNjQ6=fd|2| d<Kr0HjU| | |A5CzHjj6}3nLHjW| | Hjj}3| jdd| jYdd||"|#|$|,|-|.|/||3f S)z Beam Search iterative update function -- each iteration adds a new token and updates the best sequences seen so far r/NrrirTrrrgrhrr&rrjrrrc|SNr7r1)rrrrs r3zLTFGenerationMixin.beam_search..beam_search_body_fn.. s#5fiTd#er2rc|Srr1)rrrs r3rzLTFGenerationMixin.beam_search..beam_search_body_fn.. s#3FGW#Xr2rrr1)-rr-rmrcrrvrwrorrrrtrr9r&r:r;r'sample_without_replacementrrxrrrrnrlrsr~rr:rrrrprrrr{rrrrr/rWr4r)Lr6rrrr$r%rErrrrrr5r log_probslog_probs_processedru beams_to_keep topk_indicestopk_log_probstopk_current_beam_indicestopk_running_beam_indicestopk_running_sequencestopk_ids indices_batch indices_beamrtopk_sequencesbatch_modified_indicestopk_beam_indiceseos_in_next_tokendid_topk_just_finishedrunning_topk_log_probsnext_topk_indicesnext_running_sequencesnext_running_scoresnext_running_beam_indicesbeams_in_batch_are_full add_penaltymerged_sequences merged_scores merged_beamsmerged_is_sent_finishedtopk_merged_indicesnext_sequences next_scoresnext_beam_indicesnext_is_sent_finishedcachenext_running_indices next_cachenext_model_kwargs all_scoresrrr:r9r;rrrrrrrrrrrrrrr]rrrsL r3beam_search_body_fnz:TFGenerationMixin.beam_search..beam_search_body_fn) s " 12:>N-aHWHn= NN+$JJI"+J 9z9z;Q.RSI6 %%%,->?,-@A#%)G)G&--m.N.NO&t{{/M/M&--m.F.FG{{55(// 0N0NO'DKK,J,J)001T1TU)dkk.L.L)001L1LM MM9)]S !#9lWX!Y/1ww}}Y-}/X, (4 (B %(,(:(:;OQj(k %%)%7%78IKd%e "#j0HIIbhhz&:]OLM77288M#:ZLILXX boog UbHbGc.delnN 88-& 8j=.H-IJN&?rxx 3i?aHJcJiJiB& " XX! OOG.@$@:P]C]B^_  N!# ; ;0& #9J)\^\c\c@delmn,-:& "&4bgg>OQSQ[Q[6\_e6e%e " !# .D RST U UYUgUg!79JKM^V R "$79R,! &88 K~]N')oo""#3"t"LjYoNp'4'') #214KKK bggk2::>G GN "yy)^)D1M IIv~&>QGM99l4E%FQOL&(ii1ACY0Zab&c #"$''-- -"KA"N TXTfTf!=,@WXZmU QNK):L$8$D $J`J`JuJu  "3!> DDZDZDlDl %9$D $J`J`JuJu *7)B H^H^HlHl '2 $''?? ,:+E4KaKaKpKp+9+E4KaKaKpKp $$[$2H2H2R2RS **,,+;c$i*GS&SQUY ##hSg#h h1no%W->->t?a?a-b-m-m-r-r-t)uu4 RD $;@QRX\"9>O2VZ'>CW^b*4I)>& Iw$GGZJ\-]L) *' URa a a a `                    ('1 MM  !$ "  2          ,**+;!*D HH]1dD=99FWX xx atm