L i+dZddlZddlZddlZddlZddlmZmZddlm Z m Z ddl m Z m Z mZmZmZddlmZddlmZdd lmZmZmZmZmZmZmZmZmZe rdd lm Z ejBe"Z#d Z$d Z%d Z&dZ'e%e'zZ(e(e&zZ)erddl*m+Z+m,Z,GddeZ-GddeZ.e GddeZ/e Gdde/Z0e Gdde/Z1e GddZ2y)z-Generation configuration class and utilities.N)ABCabstractmethod) dataclass is_dataclass) TYPE_CHECKINGAnyCallableOptionalUnion) __version__)PretrainedConfig) GENERATION_CONFIG_NAME ExplicitEnumPushToHubMixin cached_file download_urlextract_commit_hash is_remote_urlis_torch_availablelogging)PreTrainedModel)_from_model_config _commit_hash_original_object_hashtransformers_version)staticoffloaded_static)dynamic dynamic_full offloaded quantized)sliding_windowhybridhybrid_chunkedoffloaded_hybridoffloaded_hybrid_chunked)#SynthIDTextWatermarkLogitsProcessorWatermarkLogitsProcessorc4eZdZdZdZdZdZdZdZdZ dZ d Z d Z y ) GenerationModezg Possible generation modes, downstream of the [`~generation.GenerationMixin.generate`] method. contrastive_search greedy_searchsampleassisted_generationdola_generation beam_search beam_sampleconstrained_beam_searchgroup_beam_searchN) __name__ __module__ __qualname____doc__CONTRASTIVE_SEARCH GREEDY_SEARCHSAMPLEASSISTED_GENERATIONDOLA_GENERATION BEAM_SEARCH BEAM_SAMPLECONSTRAINED_BEAM_SEARCHGROUP_BEAM_SEARCHq/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/generation/configuration_utils.pyr,r,@s: .#M F/'OKK7+rDr,cbeZdZdZdZdZdZdZdZd(de d d e fd Z d)d Z d*d e eej fde e eej fdefdZe d+de eej fde e eej fde e eej fdedede e eefded dfdZede eej ffdZedeeefd dfdZdeeefd dfdZd eeeffdZd eeeffdZd,d ed!ed efd"Zd-d#e eej fd efd$Zed%ed dfd&Zd'Z y).GenerationConfigaJ Class that holds a configuration for a generation task. A `generate` call supports the following generation methods for text-decoder, text-to-text, speech-to-text, and vision-to-text models: - *greedy decoding* if `num_beams=1` and `do_sample=False` - *multinomial sampling* if `num_beams=1` and `do_sample=True` - *beam-search decoding* if `num_beams>1` and `do_sample=False` - *beam-search multinomial sampling* if `num_beams>1` and `do_sample=True` - *assisted decoding* if `assistant_model` or `prompt_lookup_num_tokens` is passed to `.generate()` To learn more about decoding strategies refer to the [text generation strategies guide](../generation_strategies). A large number of these flags control the logits or the stopping criteria of the generation. Make sure you check the [generate-related classes](https://huggingface.co/docs/transformers/internal/generation_utils) for a full description of the possible manipulations, as well as examples of their usage. Arg: > Parameters that control the length of the output max_length (`int`, *optional*, defaults to 20): The maximum length the generated tokens can have. Corresponds to the length of the input prompt + `max_new_tokens`. Its effect is overridden by `max_new_tokens`, if also set. max_new_tokens (`int`, *optional*): The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. min_length (`int`, *optional*, defaults to 0): The minimum length of the sequence to be generated. Corresponds to the length of the input prompt + `min_new_tokens`. Its effect is overridden by `min_new_tokens`, if also set. min_new_tokens (`int`, *optional*): The minimum numbers of tokens to generate, ignoring the number of tokens in the prompt. early_stopping (`bool` or `str`, *optional*, defaults to `False`): Controls the stopping condition for beam-based methods, like beam-search. It accepts the following values: `True`, where the generation stops as soon as there are `num_beams` complete candidates; `False`, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; `"never"`, where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm). max_time (`float`, *optional*): The maximum amount of time you allow the computation to run for in seconds. generation will still finish the current pass after allocated time has been passed. stop_strings (`str or list[str]`, *optional*): A string or a list of strings that should terminate generation if the model outputs them. > Parameters that control the generation strategy used do_sample (`bool`, *optional*, defaults to `False`): Whether or not to use sampling ; use greedy decoding otherwise. num_beams (`int`, *optional*, defaults to 1): Number of beams for beam search. 1 means no beam search. > Parameters that control the cache use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding. cache_implementation (`str`, *optional*, default to `None`): Name of the cache class that will be instantiated in `generate`, for faster decoding. Possible values are: - `"dynamic"`: [`DynamicCache`] - `"static"`: [`StaticCache`] - `"offloaded"`: [`DynamicCache(offloaded=True)`] - `"offloaded_static"`: [`StaticCache(offloaded=True)`] - `"quantized"`: [`QuantizedCache`] If none is specified, we will use the default cache for the model (which is often [`DynamicCache`]). See our [cache documentation](https://huggingface.co/docs/transformers/en/kv_cache) for further information. cache_config (`dict`, *optional*, default to `None`): Arguments used in the key-value cache class can be passed in `cache_config`. return_legacy_cache (`bool`, *optional*, default to `True`): Whether to return the legacy or new format of the cache when `DynamicCache` is used by default. > Parameters for manipulation of the model output logits temperature (`float`, *optional*, defaults to 1.0): The value used to module the next token probabilities. This value is set in a model's `generation_config.json` file. If it isn't set, the default value is 1.0 top_k (`int`, *optional*, defaults to 50): The number of highest probability vocabulary tokens to keep for top-k-filtering. This value is set in a model's `generation_config.json` file. If it isn't set, the default value is 50. top_p (`float`, *optional*, defaults to 1.0): If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or higher are kept for generation. This value is set in a model's `generation_config.json` file. If it isn't set, the default value is 1.0 min_p (`float`, *optional*): Minimum token probability, which will be scaled by the probability of the most likely token. It must be a value between 0 and 1. Typical values are in the 0.01-0.2 range, comparably selective as setting `top_p` in the 0.99-0.8 range (use the opposite of normal `top_p` values). typical_p (`float`, *optional*, defaults to 1.0): Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to `typical_p` or higher are kept for generation. See [this paper](https://huggingface.co/papers/2202.00666) for more details. epsilon_cutoff (`float`, *optional*, defaults to 0.0): If set to float strictly between 0 and 1, only tokens with a conditional probability greater than `epsilon_cutoff` will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See [Truncation Sampling as Language Model Desmoothing](https://huggingface.co/papers/2210.15191) for more details. eta_cutoff (`float`, *optional*, defaults to 0.0): Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either `eta_cutoff` or `sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits)))`. The latter term is intuitively the expected next token probability, scaled by `sqrt(eta_cutoff)`. In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See [Truncation Sampling as Language Model Desmoothing](https://huggingface.co/papers/2210.15191) for more details. repetition_penalty (`float`, *optional*, defaults to 1.0): The parameter for repetition penalty. 1.0 means no penalty. See [this paper](https://huggingface.co/papers/1909.05858) for more details. encoder_repetition_penalty (`float`, *optional*, defaults to 1.0): The parameter for encoder_repetition_penalty. An exponential penalty on sequences that are not in the original input. 1.0 means no penalty. length_penalty (`float`, *optional*, defaults to 1.0): Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), `length_penalty` > 0.0 promotes longer sequences, while `length_penalty` < 0.0 encourages shorter sequences. no_repeat_ngram_size (`int`, *optional*, defaults to 0): If set to int > 0, all ngrams of that size can only occur once. bad_words_ids (`list[list[int]]`, *optional*): List of list of token ids that are not allowed to be generated. Check [`~generation.NoBadWordsLogitsProcessor`] for further documentation and examples. renormalize_logits (`bool`, *optional*, defaults to `False`): Whether to renormalize the logits after applying all the logits processors (including the custom ones). It's highly recommended to set this flag to `True` as the search algorithms suppose the score logits are normalized but some logit processors break the normalization. forced_bos_token_id (`int`, *optional*, defaults to `model.config.forced_bos_token_id`): The id of the token to force as the first generated token after the `decoder_start_token_id`. Useful for multilingual models like [mBART](../model_doc/mbart) where the first generated token needs to be the target language token. forced_eos_token_id (`int` or list[int]`, *optional*, defaults to `model.config.forced_eos_token_id`): The id of the token to force as the last generated token when `max_length` is reached. Optionally, use a list to set multiple *end-of-sequence* tokens. remove_invalid_values (`bool`, *optional*, defaults to `model.config.remove_invalid_values`): Whether to remove possible *nan* and *inf* outputs of the model to prevent the generation method to crash. Note that using `remove_invalid_values` can slow down generation. exponential_decay_length_penalty (`tuple(int, float)`, *optional*): This Tuple adds an exponentially increasing length penalty, after a certain amount of tokens have been generated. The tuple shall consist of: `(start_index, decay_factor)` where `start_index` indicates where penalty starts and `decay_factor` represents the factor of exponential decay suppress_tokens (`list[int]`, *optional*): A list of tokens that will be suppressed at generation. The `SuppressTokens` logit processor will set their log probs to `-inf` so that they are not sampled. begin_suppress_tokens (`list[int]`, *optional*): A list of tokens that will be suppressed at the beginning of the generation. The `SuppressBeginTokens` logit processor will set their log probs to `-inf` so that they are not sampled. sequence_bias (`dict[tuple[int], float]`, *optional*)): Dictionary that maps a sequence of tokens to its bias term. Positive biases increase the odds of the sequence being selected, while negative biases do the opposite. Check [`~generation.SequenceBiasLogitsProcessor`] for further documentation and examples. token_healing (`bool`, *optional*, defaults to `False`): Heal tail tokens of prompts by replacing them with their appropriate extensions. This enhances the quality of completions for prompts affected by greedy tokenization bias. guidance_scale (`float`, *optional*): The guidance scale for classifier free guidance (CFG). CFG is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages the model to generate samples that are more closely linked to the input prompt, usually at the expense of poorer quality. watermarking_config (`BaseWatermarkingConfig` or `dict`, *optional*): Arguments used to watermark the model outputs by adding a small bias to randomly selected set of "green" tokens. See the docs of [`SynthIDTextWatermarkingConfig`] and [`WatermarkingConfig`] for more details. If passed as `Dict`, it will be converted to a `WatermarkingConfig` internally. > Parameters that define the output variables of generate num_return_sequences (`int`, *optional*, defaults to 1): The number of independently computed returned sequences for each element in the batch. output_attentions (`bool`, *optional*, defaults to `False`): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more details. output_hidden_states (`bool`, *optional*, defaults to `False`): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more details. output_scores (`bool`, *optional*, defaults to `False`): Whether or not to return the prediction scores. See `scores` under returned tensors for more details. output_logits (`bool`, *optional*): Whether or not to return the unprocessed prediction logit scores. See `logits` under returned tensors for more details. return_dict_in_generate (`bool`, *optional*, defaults to `False`): Whether or not to return a [`~utils.ModelOutput`], as opposed to returning exclusively the generated sequence. This flag must be set to `True` to return the generation cache (when `use_cache` is `True`) or optional outputs (see flags starting with `output_`) > Special tokens that can be used at generation time pad_token_id (`int`, *optional*): The id of the *padding* token. bos_token_id (`int`, *optional*): The id of the *beginning-of-sequence* token. eos_token_id (`Union[int, list[int]]`, *optional*): The id of the *end-of-sequence* token. Optionally, use a list to set multiple *end-of-sequence* tokens. > Generation parameters exclusive to encoder-decoder models encoder_no_repeat_ngram_size (`int`, *optional*, defaults to 0): If set to int > 0, all ngrams of that size that occur in the `encoder_input_ids` cannot occur in the `decoder_input_ids`. decoder_start_token_id (`int` or `list[int]`, *optional*): If an encoder-decoder model starts decoding with a different token than *bos*, the id of that token or a list of length `batch_size`. Indicating a list enables different start ids for each element in the batch (e.g. multilingual models with different target languages in one batch) > Generation parameters exclusive to assistant generation is_assistant (`bool`, *optional*, defaults to `False`): Whether the model is an assistant (draft) model. num_assistant_tokens (`int`, *optional*, defaults to 20): Defines the number of _speculative tokens_ that shall be generated by the assistant model before being checked by the target model at each iteration. Higher values for `num_assistant_tokens` make the generation more _speculative_ : If the assistant model is performant larger speed-ups can be reached, if the assistant model requires lots of corrections, lower speed-ups are reached. num_assistant_tokens_schedule (`str`, *optional*, defaults to `"constant"`): Defines the schedule at which max assistant tokens shall be changed during inference. - `"heuristic"`: When all speculative tokens are correct, increase `num_assistant_tokens` by 2 else reduce by 1. `num_assistant_tokens` value is persistent over multiple generation calls with the same assistant model. - `"heuristic_transient"`: Same as `"heuristic"` but `num_assistant_tokens` is reset to its initial value after each generation call. - `"constant"`: `num_assistant_tokens` stays unchanged during generation assistant_confidence_threshold (`float`, *optional*, defaults to 0.4): The confidence threshold for the assistant model. If the assistant model's confidence in its prediction for the current token is lower than this threshold, the assistant model stops the current token generation iteration, even if the number of _speculative tokens_ (defined by `num_assistant_tokens`) is not yet reached. The assistant's confidence threshold is adjusted throughout the speculative iterations to reduce the number of unnecessary draft and target forward passes, biased towards avoiding false negatives. `assistant_confidence_threshold` value is persistent over multiple generation calls with the same assistant model. It is an unsupervised version of the dynamic speculation lookahead from Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models . prompt_lookup_num_tokens (`int`, *optional*): The number of tokens to be output as candidate tokens. max_matching_ngram_size (`int`, *optional*): The maximum ngram size to be considered for matching in the prompt. Default to 2 if not provided. assistant_early_exit(`int`, *optional*): If set to a positive integer, early exit of the model will be used as an assistant. Can only be used with models that support early exit (i.e. models where logits from intermediate layers can be interpreted by the LM head). assistant_lookbehind(`int`, *optional*, defaults to 10): If set to a positive integer, the re-encodeing process will additionally consider the last `assistant_lookbehind` assistant tokens to correctly align tokens. Can only be used with different tokenizers in speculative decoding. See this [blog](https://huggingface.co/blog/universal_assisted_generation) for more details. target_lookbehind(`int`, *optional*, defaults to 10): If set to a positive integer, the re-encodeing process will additionally consider the last `target_lookbehind` target tokens to correctly align tokens. Can only be used with different tokenizers in speculative decoding. See this [blog](https://huggingface.co/blog/universal_assisted_generation) for more details. > Parameters related to performances and compilation compile_config (CompileConfig, *optional*): If using a compilable cache, this controls how `generate` will `compile` the forward pass for faster inference. disable_compile (`bool`, *optional*): Whether to disable the automatic compilation of the forward pass. Automatic compilation happens when specific criteria are met, including using a compilable cache. Please open an issue if you find the need to use this flag. )output_attentionsoutput_hidden_states output_scores output_logitsc |jdd|_|jdd|_|jdd|_|jdd|_|jdd|_|jd d|_|jd d|_|jd d|_|jd d |_ |jdd|_ |jdd|_ |jdd|_ |jdd|_ |jdd|_|jdd|_|jdd|_|jdd|_|jdd|_|jdd|_|jdd|_|jdd|_|jdd|_|jdd|_|jd d|_|jd!d|_|jd"d|_|jd#d|_|jd$d|_|jd%d|_|jd&d|_|jd'd|_|jd(d|_ |jd)d|_!|jd*d|_"|jd+d|_#|jd,d|_$|jd-d}|d|_%n2tM|tNr||_%ntPjS||_%|jd.d |_*|jd/d|_+|jd0d|_,|jd1d|_-|jd2d|_.|jd3d|_/|jd4d|_0|jd5d|_1|jd6d|_2|jd7d|_3|jd8d|_4d|_5|jd9d|_6|jd:d;|_7|jdd|_9|jd?d|_:|jd@d|_;|jdAdB|_<|jdCdB|_=|jdDd|_>|jdEd|_?|jdFd|_@|jdGd|_A|jdHd|_B|jdId|_C|jdJd |_D|jdKd|_E|jdLd|_F|jdMd|_G|jdNd|_H|jdOt|_J|js&|jD]\}} t||||jy#t$r%}tjdP|dQ|dR||d}~wwxYw)SN max_lengthmax_new_tokens min_lengthrmin_new_tokensearly_stoppingFmax_time stop_strings do_sample num_beamsr( use_cacheTcache_implementation cache_configreturn_legacy_cacheprefill_chunk_size temperature?top_k2top_pmin_p typical_pepsilon_cutoff eta_cutoffrepetition_penaltyencoder_repetition_penaltylength_penaltyno_repeat_ngram_size bad_words_idsrenormalize_logitsforced_bos_token_idforced_eos_token_idremove_invalid_values exponential_decay_length_penaltysuppress_tokensbegin_suppress_tokens sequence_bias token_healingguidance_scalewatermarking_confignum_return_sequencesrHrIrJrKreturn_dict_in_generate pad_token_id bos_token_id eos_token_idencoder_no_repeat_ngram_sizedecoder_start_token_idnum_assistant_tokensnum_assistant_tokens_scheduleconstantassistant_confidence_thresholdg?prompt_lookup_num_tokensmax_matching_ngram_sizeassistant_early_exitassistant_lookbehind target_lookbehindcompile_configdisable_compile low_memory penalty_alpha dola_layersdiversity_penaltynum_beam_groups constraintsforce_words_idsrrrz Can't set z with value z for )QpoprMrOrPrQrRrSrTrUrVrWrXrYrZr[r\r^r`rarbrcrerfrgrhrirjrkrlrmrnrorprqrrrsrtru isinstanceBaseWatermarkingConfigWatermarkingConfig from_dictrvrHrIrJrKrwrxryrzr{r| is_assistantr}r~rrrrrrrrrrrrrrrrrr ritemssetattrAttributeErrorloggererrorvalidate)selfkwargsrukeyvalueerrs rE__init__zGenerationConfig.__init__Ms **\26$jj)94@ **\15$jj)94@$jj)95A :t4 "JJ~t< K7K3 K6$*JJ/Et$L!"JJ~t<#)::.CT#J "(**-A4"H"::mS9ZZ, ZZ- ZZ. K5$jj)93? **\37"(**-A3"G*0**5QSV*W'$jj)93?$*JJ/Eq$I!#ZZ>"(**-A5"I#)::.CT#J #)::.CT#J %+ZZ0G%O"06 ;]_c0d-%zz*;TB%+ZZ0G%N"#ZZ>#ZZ?$jj)94@$jj)>E  &'+D $ +-C D':D $'9'C'CDW'XD $%+JJ/Eq$I!!',?!G$*JJ/Eu$M!#ZZ?#ZZ>'-zz2KU'S$#JJ~t<"JJ~t<"JJ~t<-3JJ7UWX,Y)&,jj1I4&P#"$*JJ/Er$J!-3ZZ8WYc-d*.4jj9Y[^._+(. 3Mt(T%'-zz2KT'R$$*JJ/Et$L!$*JJ/Er$J!!',?!D%jj)94@%zz*;UC!**\48#ZZ>!::mT:!',?!E%zz*;Q?!::mT:%zz*;TB#)**-A5"I"JJ~t<$*JJ/E{$S!&&%lln  UD#u-   &LL:cU,ugU4&!QRIs Z"" [+ [  [c8t|jdS)NTignore_metadata)hashto_json_stringrs rE__hash__zGenerationConfig.__hash__sD'''=>>rDczt|tsy|jdd}|jdd}||k(S)NFT)use_diffr)rrGr)rotherself_without_metadataother_without_metadatas rE__eq__zGenerationConfig.__eq__sJ%!12 $ 3 3UTX 3 Y!&!5!5uVZ!5![$(>>>rDcX|jjd|jdS)N Tr __class__r6rrs rE__repr__zGenerationConfig.__repr__s...))*!D,?,?PT,?,U+VWWrDNassistant_modelrreturnc|j |jtj}n|jdk(rw|j durX|j ;|j dkDr,|j |jdkDrtj}nqtj}n`tj}nO|jdkDrtj}n/|j durtj}ntj}||j |j .|dvrtj"}nt$j'd|d|j(/|dvrtj*}|St$j'd|d|S) a Returns the generation mode triggered by the [`GenerationConfig`] instance. Arg: assistant_model (`PreTrainedModel`, *optional*): The assistant model to be used for assisted generation. If set, the generation mode will be assisted generation. Returns: `GenerationMode`: The generation mode triggered by the instance. r(FrT)r.r/zYou've set `assistant_model`, which triggers assisted generate. Currently, assisted generate is only supported with Greedy Search and Sample. However, the base decoding mode (based on current flags) is z* -- some of the set flags will be ignored.zYou've set `dola_layers`, which triggers DoLa generate. Currently, DoLa generate is only supported with Greedy Search and Sample. However, the base decoding mode (based on current flags) is )rrr,rArVrUr^rr:r;r<rrBr@r?rrr=rwarningrr>)rrgeneration_modes rEget_generation_modez$GenerationConfig.get_generation_modesq    '4+?+?+K,DDO ^^q ~~&JJ* Q**6**Q.&4&G&GO&4&B&BO"0"7"7##a'"0"B"B4'"0"<"<"0"<"<  ',,8((4"=="0"D"D))8(99ce    '"=="0"@"@ ))8(99ce rDc i}|jdvrtd|jd|j(|jdkrtd|jd|j"|jdkrd|jd|d <|j1|jt vrtd |jd t |j FTneverz6`early_stopping` must be a boolean or 'never', but is .Nrz0`max_new_tokens` must be greater than 0, but is z*`pad_token_id` should be positive but got z. This will cause errors when batch generating, if there is padding. Please set `pad_token_id` explicitly as `model.generation_config.pad_token_id=PAD_TOKEN_ID` to avoid errors in generationrxz Invalid `cache_implementation` (z). Choose one of: z0You provided `compile_config` as an instance of z0, but it must be an instance of `CompileConfig`.Fz`do_sample` is set to `False`. However, `{flag_name}` is set to `{flag_value}` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `{flag_name}`.r]r\) flag_name flag_valuer`rarbr_r^rdrcrer(z`num_beams` is set to 1. However, `{flag_name}` is set to `{flag_value}` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `{flag_name}`.rRrhz_Greedy methods without beam search do not support `num_return_sequences` different than 1 (got z).z`num_return_sequences` (z-) has to be smaller or equal to `num_beams` (zrYou have set `use_cache` to `False`, but {cache_arg} is set to {cache_arg_value}. {cache_arg} will have no effect.)rXrYrZ) cache_argcache_arg_valueTz5`return_dict_in_generate` is NOT set to `True`, but `z5` is. When `return_dict_in_generate` is not `True`, `z ` is ignored.) logits_processorstopping_criteriaprefix_allowed_tokens_fn synced_gpusrstreamernegative_prompt_idsnegative_prompt_attention_maskuse_model_defaultsz Argument `zn` is not a valid argument of `GenerationConfig`. It should be passed to `generate()` (or a pipeline) directly.z- `z`:  z If you're using a pretrained model, note that some of these attributes may be set through the model's `generation_config.json` file.zGenerationConfig is invalid: zAThe following generation flags are not valid and may be ignored: z4 Set `TRANSFORMERS_VERBOSITY=info` for more details.))rR ValueErrorrOrxrXALL_CACHE_IMPLEMENTATIONSrr CompileConfigtyperurrUr\formatr`rarbr^rcrerVrhrvrWgetattrrwextra_output_flagshasattrlenrappendjoinlistkeysr get_verbosityWARNINGr warning_once info_once)rstrict minor_issuesgreedy_wrong_parameter_msgsingle_beam_wrong_parameter_msgno_cache_warningarg_nameextra_output_flaggenerate_argumentsarg info_messageattribute_nameissue_descriptionattributes_with_issueswarning_messages rErzGenerationConfig.validates    &< <UVZViViUjjklm m    *t/B/Ba/GOPTPcPcOddefg g    (T->->-BQ>QS`3aB4H[H[C\B]^//   # # /  $ $ - - / >>U "q '+0@0@C0G.H.O.O+8H8H/P/ ]+zz%$***;(B(I(IT[hlhrhr(I(s W%zz%(B(I(IT[hlhrhr(I(s W%~~)dnn.C,F,M,M)dnn-N- [)zz%$***:(B(I(IT[hlhrhr(I(s W%"".43F3F#3M1K1R1R.4;N;N2S2 -.*t#/E-G-N-N*t.O. \* >>Q g ,""%/1P1W1W.4;N;N2X2 -."".43F3F#3M1P1W1W.4;N;N2X2 -.  $ $ )~~">>U*$ $ 9 9:">**T^^; .t/H/H.IJ'r+ >>U " " \ 4*6-=-D-D"*GD(>. ) ?>2BBe!fg g NT2 #ZZ(8$?NjjN,@,@,Mb,QRG'd'':6:G#99.I WW\\.:JK ,t< -.@-ABC   ' ' -jj) (  G [SX(YYZ Z [sF F?#F::F?pretrained_model_name cache_dirforce_downloadlocal_files_onlyrrevisionc "||nt}|jdd} |jdd} |jdd} |jdd} |jdd} |jdd }|jd d}| )tjd t| t d | }d |d}| | |d<t jj||}t|}t jj|}t jjt jj| |r|}d}n?t|r|}t|}n&|} t||||| | ||||| | }t||} |j%|}||d <|rt,j/d|nt,j/dd||j1ddur*|j2|fi|\}}t5||_||fS|j2|fi|}t5||_|S#t $rt"$rt!d|d|d|dwxYw#t&j(t*f$rt!d|dwxYw)a) Instantiate a [`GenerationConfig`] from a generation configuration file. Args: pretrained_model_name (`str` or `os.PathLike`): This can be either: - a string, the *model id* of a pretrained model configuration hosted inside a model repo on huggingface.co. - a path to a *directory* containing a configuration file saved using the [`~GenerationConfig.save_pretrained`] method, e.g., `./my_model_directory/`. config_file_name (`str` or `os.PathLike`, *optional*, defaults to `"generation_config.json"`): Name of the generation configuration JSON file to be loaded from `pretrained_model_name`. cache_dir (`str` or `os.PathLike`, *optional*): Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. force_download (`bool`, *optional*, defaults to `False`): Whether or not to force to (re-)download the configuration files and override the cached versions if they exist. resume_download: Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. proxies (`dict[str, str]`, *optional*): A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request. token (`str` or `bool`, *optional*): The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use the token generated when running `hf auth login` (stored in `~/.huggingface`). revision (`str`, *optional*, defaults to `"main"`): The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. To test a pull request you made on the Hub, you can pass `revision="refs/pr/"`. return_unused_kwargs (`bool`, *optional*, defaults to `False`): If `False`, then this function returns just the final configuration object. If `True`, then this functions returns a `Tuple(config, unused_kwargs)` where *unused_kwargs* is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the part of `kwargs` which has not been used to update `config` and is otherwise ignored. subfolder (`str`, *optional*, defaults to `""`): In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. kwargs (`dict[str, Any]`, *optional*): The values in kwargs of any keys which are configuration attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are *not* configuration attributes is controlled by the `return_unused_kwargs` keyword parameter. Returns: [`GenerationConfig`]: The configuration object instantiated from this pretrained model. Examples: ```python >>> from transformers import GenerationConfig >>> # Download configuration from huggingface.co and cache. >>> generation_config = GenerationConfig.from_pretrained("openai-community/gpt2") >>> # E.g. config was saved using *save_pretrained('./test/saved_model/')* >>> generation_config.save_pretrained("./test/saved_model/") >>> generation_config = GenerationConfig.from_pretrained("./test/saved_model/") >>> # You can also specify configuration names to your generation configuration file >>> generation_config.save_pretrained("./test/saved_model/", config_file_name="my_configuration.json") >>> generation_config = GenerationConfig.from_pretrained("./test/saved_model/", "my_configuration.json") >>> # If you'd like to try a minor variation to an existing configuration, you can also pass generation >>> # arguments to `.from_pretrained()`. Be mindful that typos and unused arguments will be ignored >>> generation_config, unused_kwargs = GenerationConfig.from_pretrained( ... "openai-community/gpt2", top_k=1, foo=False, do_sample=True, return_unused_kwargs=True ... ) >>> generation_config.top_k 1 >>> unused_kwargs {'foo': False} ```Nresume_downloadproxiesr subfolder_from_pipeline _from_autoFrrrconfig) file_typefrom_auto_classusing_pipelineT) r r rrrr user_agentrrrz!Can't load the configuration of 'z'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'z2' is the correct path to a directory containing a z filez"It looks like the config file at 'z' is not a valid JSON file.zloading configuration file z from cache at return_unused_kwargs)rrrrrrrrrrexistsrrrrrOSError Exception_dict_from_json_filejsonJSONDecodeErrorUnicodeDecodeErrorrrrrrr)clsr rr r rrrrrrrr from_pipeliner commit_hashr config_pathis_localresolved_config_fileconfiguration_file config_dictr unused_kwargss rEfrom_pretrainedz GenerationConfig.from_pretraineds~0@/K+Qg **%6=**Y-$4d;JJ{B/  #3T:  **\59jj6  % MME   l#E#+P  $+8J' (ggll#8:JK +& 77>>+. 77>>"'',,y+> ?#. H ; '!, #/ #< !1  '2)&'#1#$3%5)%'!, ($22F T  r223GHK*5K '  KK56J5KL M KK56H5IYmXno p ::, - 5$1CMM+$H$H !FM+/?S>TTopq q rs(#H: I%:(I"%)J json_filect|dd5}|j}dddtjS#1swYxYw)Nrutf-8encoding)openreadr!loads)r$r.readertexts rEr z%GenerationConfig._dict_from_json_files? )S7 3 !v;;=D !zz$ ! !s =Ar+c |jdd}|jdd|jddd|vr d|vr|d|d<|dii||}|jdi|}tjd||r||fS|S) a Instantiates a [`GenerationConfig`] from a Python dictionary of parameters. Args: config_dict (`dict[str, Any]`): Dictionary that will be used to instantiate the configuration object. kwargs (`dict[str, Any]`): Additional parameters from which to initialize the configuration object. Returns: [`GenerationConfig`]: The configuration object instantiated from those parameters. rFrNrrzGenerate config rC)rupdaterr)r$r+rrrr,s rErzGenerationConfig.from_dicts &zz*@%H  <& #T* V #+(E%0%@F> "10+001% //  &vh/0 =( (MrDdc|jd6t|dts#t|djdd|d<|j D]$}t|t s|j |&y)a( Checks whether the passed dictionary and its nested dicts have a *dtype* key and if it's not None, converts torch.dtype to a string of just the type. For example, `torch.float32` get converted into *"float32"* string, which can then be stored in the json format. dtypeNrr()rrrrvaluesdictdict_dtype_to_str)rr;rs rEr@z"GenerationConfig.dict_dtype_to_strsm 55> %j7S.IQwZ..s3A6AgJXXZ .E%&&&u- .rDc|j}tj}i}|jD]\}}||vs|dk(s |||k7s|||<|j||S)a' Removes all attributes from config which correspond to the default config attributes for better readability and serializes to a Python dictionary. Returns: `dict[str, Any]`: Dictionary of all the attributes that make up this configuration instance, r)to_dictrGrr@)rr+default_config_dictserializable_config_dictrrs rE to_diff_dictzGenerationConfig.to_diff_dictslln /088:#% &++- 6JC--8N1NRW[nor[sRs05(- 6 78''rDctj|j}d|vr|d=d|vr|d=d|vr|d=t|d<|j ||S)z Serializes this instance to a Python dictionary. Returns: `dict[str, Any]`: Dictionary of all the attributes that make up this configuration instance. rrrr)copydeepcopy__dict__r r@routputs rErBzGenerationConfig.to_dictsjt}}- V #~& "f ,./ v %'(*5%& v& rDrrc|dur|j}n|j}|rtD]}|j|dfdfd|}|}t j |dddzS)aG Serializes this instance to a JSON string. Args: use_diff (`bool`, *optional*, defaults to `True`): If set to `True`, only the difference between the config instance and the default `GenerationConfig()` is serialized to JSON string. ignore_metadata (`bool`, *optional*, defaults to `False`): Whether to ignore the metadata fields present in the instance Returns: `str`: String containing all the attributes that make up this configuration instance in JSON format. TNct|tr3|jDcic]\}}t||c}}St|tr|Dcgc] }| c}S|Scc}}wcc}wN)rr?rrr)objrritemconvert_keys_to_strings rErQz?GenerationConfig.to_json_string..convert_keys_to_string1si#t$RUR[R[R]^JCC"8"??^^C&ADE.t4EE _Es A-A3ct|tr*|jDcic]\}}||c}}St|r|j S|Scc}}wrN)rr?rrrB)rOrrconvert_dataclass_to_dicts rErSzBGenerationConfig.to_json_string..convert_dataclass_to_dict9sU#t$PSPYPYP[\*#u6u==\\c"{{}$ ]sAr indent sort_keysr)rErBMETADATA_FIELDSrr!dumps)rrrr+metadata_fieldrSrQs @@rErzGenerationConfig.to_json_strings~ t ++-K,,.K "1 65 6  -[9 / < zz+a4@4GGrDjson_file_pathct|dd5}|j|j|dddy#1swYyxYw)a Save this instance to a JSON file. Args: json_file_path (`str` or `os.PathLike`): Path to the JSON file in which this configuration instance's parameters will be saved. use_diff (`bool`, *optional*, defaults to `True`): If set to `True`, only the difference between the config instance and the default `GenerationConfig()` is serialized to JSON file. wr1r2rN)r4writer)rrZrwriters rErzGenerationConfig.to_json_fileFsE.# 8 AF LL,,h,? @ A A As ":A model_configc> |j}|jdd|jDcic] \}}| || }}}|j|dd |j d}||ur`t }|j} jD]3}t |t ||k(} ||vs!| s$t |||5 jdur%t fd jDrd _t  _ Scc}}w)a Instantiates a [`GenerationConfig`] from a [`PretrainedConfig`]. This function is useful to convert legacy [`PretrainedConfig`] objects, which may contain generation parameters, into a stand-alone [`GenerationConfig`]. Args: model_config (`PretrainedConfig`): The model config that will be used to instantiate the generation config. Returns: [`GenerationConfig`]: The configuration object instantiated from those parameters. rNFT)rr)decoderc38K|]}t|dyw)FN)r).0rgeneration_configs rE z5GenerationConfig.from_model_config..vs$%)+ .d .(d38n(.c3h.*Ht*HT*HVY*HX A5bkk1A+B Ad A)!-=)!BT)!)!VrDrGceZdZdZedZdeeejffdZ de ee ffdZ dZdZd Zd Zed Zed Zy )rzGeneric watermarking configc |di|}g}|jD]0\}}t||st||||j|2|D]}|j |d|S)a Constructs a BaseWatermarkingConfig instance from a dictionary of parameters. Args: config_dict (dict[str, Any]): Dictionary containing configuration parameters. **kwargs: Additional keyword arguments to override dictionary values. Returns: BaseWatermarkingConfig: Instance of BaseWatermarkingConfig constructed from the dictionary. NrC)rrrrr)r$r+rrrorrs rErz BaseWatermarkingConfig.from_dictsw#{#  ,,. &JCvs#U+  % & "C JJsD ! " rDrZct|dd5}|j}tj|dddz}|j |dddy#1swYyxYw) z Save this instance to a JSON file. Args: json_file_path (Union[str, os.PathLike]): Path to the JSON file in which this configuration instance's parameters will be saved. r\r1r2r TrTrN)r4rBr!rXr])rrZr^r+ json_strings rErz#BaseWatermarkingConfig.to_json_filesU.# 8 &F,,.K**[dKdRK LL %  & & &s =AArcDtj|j}|S)z Serializes this instance to a Python dictionary. Returns: dict[str, Any]: Dictionary of all the attributes that make up this configuration instance. )rGrHrIrJs rErBzBaseWatermarkingConfig.to_dictst}}- rDc#Ktj|jjD] \}}||f ywrNrGrHrIr)rrkrs rE__iter__zBaseWatermarkingConfig.__iter__s9==7==? KD%+  s=?cT|jjd|jS)Nrrrs rErzBaseWatermarkingConfig.__repr__s(..))*!D,?,?,A+BCCrDcJtj|jddzS)z Serializes this instance to a JSON formatted string. Returns: str: JSON formatted string representing the configuration instance. r )rUr)r!rXrIrs rErz%BaseWatermarkingConfig.to_json_stringszz$--2T99rDc h|jD]\}}t||st|||!y)z Update the configuration attributes with new values. Args: **kwargs: Keyword arguments representing configuration attributes and their new values. N)rrr)rrrrs rEr:zBaseWatermarkingConfig.updates3!,,. *JCtS!c5) *rDcyrNrCrs rErzBaseWatermarkingConfig.validatesrDcyrNrC)r vocab_sizes rEconstruct_processorz*BaseWatermarkingConfig.construct_processors/2rDN)r6r7r8r9rsrr rrrqrr?rrBr{rrr:rrrrCrDrErrsx%* &5bkk1A+B &c3hD: *22rDrc NeZdZdZ ddededededef dZdZd ed d fd Z y )ra Class that holds arguments for watermark generation and should be passed into `GenerationConfig` during `generate`. See [this paper](https://huggingface.co/papers/2306.04634) for more details on the arguments. Accepts the following keys: - greenlist_ratio (`float`): Used for watermarking. The ratio of "green" tokens used to the vocabulary size. Defaults to 0.25. - bias (`float`): Used with watermarking. The bias added to the selected "green" tokens' logits. Defaults to 2.0. - hashing_key (`int`): Hashing key used for watermarking. Defaults to 15485863 (the millionth prime). - seeding_scheme (`str`): Algorithm to use for watermarking. Accepts values: - "lefthash" (default): "green" tokens selection depend on the last token (Algorithm 2 from the paper) - "selfhash": "green" tokens selection depends on the current token itself (Algorithm 3 from the paper) The downside of this scheme is that it considers all possible next tokens and can be slower than "lefthash". - context_width(`int`): The context length of previous tokens to use in seeding. Higher context length makes watermarking more robust. greenlist_ratiobias hashing_keyseeding_scheme context_widthcJ||_||_||_||_||_yrN)rrrrr)rrrrrrs rErzWatermarkingConfig.__init__s+ / &,*rDcZd}|jdvr't|jdd|jd|jcxkrdks)nt|jdd |j|jd k\s't|jd d |jy) N}Some of the keys in `watermarking_config` are defined incorrectly. `{key}` should be {correct_value}` but found {found_value})selfhashlefthashrz[`selfhash`, `lefthash`]r correct_value found_valuerdr]rzin range between 0.0 and 1.0r(rza positive integer)rrrrrrwatermark_missing_arg_msgs rErzWatermarkingConfig.validates & "   &> >)00("< $ 3 31 d**1c1)00)"@ $ 3 31 !!Q&)00'"6 $ 2 21 'rDrrr*c t|||j|j|j|j|j S)N)rdevicerrrrr)r*rrrrrrrrs rErz&WatermarkingConfig.construct_processor/s@'! 00((..,,  rDN)g?g@iKrr() r6r7r8r9floatintrrrrrCrDrErrsf,"&#( + + + +  +  +<  c  >X  rDrc\eZdZdZ ddedeedededededefd Zd Zd ed d fdZ y)SynthIDTextWatermarkingConfiga Class that holds arguments for watermark generation and should be passed into `GenerationConfig` during `generate`. See [this paper](https://www.nature.com/articles/s41586-024-08025-4) for more details on the arguments. Args: ngram_len (`int`): Ngram length. keys (`list[int]`): A sequence of watermarking keys, one for each depth. context_history_size (`int`, *optional*, defaults to 1024): Size of the tensor to keep track of seen contexts. sampling_table_seed (`int`, *optional*, defaults to 0): Random seed to generate the sampling table. sampling_table_size (`int`, *optional*, defaults to 65536): Size of the sampling table. skip_first_ngram_calls (`bool`, *optional*, defaults to `False`): Whether to skip first ngram calls. debug_mode (`bool`, optional, *optional*, defaults to `False`): Logits are modified to uniform one got before watermarking modification is applied. This is to test the implementation. Examples: ```python >>> from transformers import AutoModelForCausalLM, AutoTokenizer, SynthIDTextWatermarkingConfig >>> tokenizer = AutoTokenizer.from_pretrained('google/gemma-2-2b', padding_side="left") >>> model = AutoModelForCausalLM.from_pretrained('google/gemma-2-2b') >>> # SynthID Text configuration >>> watermarking_config = SynthIDTextWatermarkingConfig( ... keys=[654, 400, 836, 123, 340, 443, 597, 160, 57], ... ngram_len=5, ... ) >>> # Generation with watermarking >>> tokenized_prompts = tokenizer(["Once upon a time, "], return_tensors="pt", padding=True) >>> output_sequences = model.generate( ... **tokenized_prompts, watermarking_config=watermarking_config, do_sample=True, max_new_tokens=10 ... ) >>> watermarked_text = tokenizer.batch_decode(output_sequences, skip_special_tokens=True) ``` ngram_lenrcontext_history_sizesampling_table_seedsampling_table_sizeskip_first_ngram_calls debug_modecf||_||_||_||_||_||_||_yrN)rrrrrrr)rrrrrrrrs rErz&SynthIDTextWatermarkingConfig.__init__hs9# #6 #6 $8!&<#$rDctd}|jdkDr't|jdd|jy)Nrirz< 2**24r)rrrrs rErz&SynthIDTextWatermarkingConfig.validatezsP & "  # #e +)00-"+ $ 8 81  ,rDrrr*c t|j|j|j|j|j ||j |jS)N)rrrrrrrr)r)rrrrrrrrs rErz1SynthIDTextWatermarkingConfig.construct_processorsK2nn $ 8 8 $ 8 8!%!:!:#'#>#>  rDN)iriFF) r6r7r8r9rrrrrrrrCrDrErr;s)^%)#$#(', %%3i%" % ! % ! %!%%%$   c  >X  rDrceZdZUdZdZeed<dZeeed<dZ e e e fed<dZ e ed <dZeeed <dZd ee effd Zy) ra% Class that holds arguments relative to `torch.compile` behavior, when using automatic compilation in `generate`. See [`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html) for more details on the arguments. Args: fullgraph (`bool`, *optional*, defaults to `False`): If False (default), attempts to discover compileable regions that will be optimized. If True, then require that the entire function be capturable into a single graph. If this is not possible (that is, if there are graph breaks), then an error will be raised. dynamic (`bool` or `None`, *optional*): Whether to try to use dynamic shape graphs. backend (`str` or `Callable`, *optional*, defaults to `"inductor"`): Backend to be used. mode (`str`, *optional*, defaults to `"reduce-overhead"`): Controls balance between performance and overhead. options (`dict`, *optional*): A dictionary of options to pass to the backend. Examples: ```python >>> from transformers import AutoModelForCausalLM, AutoTokenizer, CompileConfig >>> tokenizer = AutoTokenizer.from_pretrained('google/gemma-2-2b') >>> model = AutoModelForCausalLM.from_pretrained('google/gemma-2-2b').cuda() >>> # Automatic compile configuration, used with static cache >>> compile_config = CompileConfig(dynamic=True) >>> # Generation with static cache and compile config >>> input = tokenizer.encode("Hello there, how", return_tensors="pt").cuda() >>> output = model.generate( ... input, do_sample=False, max_new_tokens=300, cache_implementation="static", compile_config=compile_config ... ) >>> output_text = tokenizer.batch_decode(output, skip_special_tokens=True)[0] ``` F fullgraphNrinductorbackendzreduce-overheadmodeoptionsrctj|jjDcic]\}}|dk7s ||c}}Scc}}w)z0Serializes this instance to a Python dictionary._compile_all_devicesrz)rrrs rErBzCompileConfig.to_dicts=}}4==;N;N;PrZS%TW[qTqc5jrssrs A A )r6r7r8r9rrr__annotations__rr rr rr rrr?rrrBrCrDrErrsi#JIt"GXd^"$.GU3= !.!D#!"GXd^"tc3htrDr)3r9rGr!rrabcrr dataclassesrrtypingrrr r r rr configuration_utilsrutilsrrrrrrrrrmodeling_utilsr get_loggerr6rrWSTATIC_CACHE_IMPLEMENTATIONSDYNAMIC_CACHE_IMPLEMENTATIONS'DEPRECATED_STATIC_CACHE_IMPLEMENTATIONS ALL_STATIC_CACHE_IMPLEMENTATIONSrlogits_processr)r*r,rGrrrrrCrDrErs(4 #/@@2   0   H %i= U+'$@Bi#i