L iIddlZddlZddlmZmZddlmZddlmZddl m Z m Z m Z m Z ddlmZmZe r ddlZdd lmZe rdd lmZe j.eZGd d ej4Ze ed GddeZe ed GddeZe ed GddeZy)N)AnyUnion)GenerationConfig)TruncationStrategy)add_end_docstringsis_tf_availableis_torch_availablelogging)Pipelinebuild_pipeline_init_args)/TF_MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES),MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMESceZdZdZdZy) ReturnTyperr N)__name__ __module__ __qualname__TENSORSTEXTq/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/pipelines/text2text_generation.pyrrs G DrrT) has_tokenizerc eZdZdZdZdZdZdZdZe ddZ dZ fdZ dd Z d ed ed efd ZdZdeeeefdedeeeefffd Zej0fdZdZej8dfdZxZS)Text2TextGenerationPipelinea9 Pipeline for text to text generation using seq2seq models. Unless the model you're using explicitly sets these generation parameters in its configuration files (`generation_config.json`), the following default values will be used: - max_new_tokens: 256 - num_beams: 4 Example: ```python >>> from transformers import pipeline >>> generator = pipeline(model="mrm8488/t5-base-finetuned-question-generation-ap") >>> generator( ... "answer: Manuel context: Manuel has created RuPERTa-base with the support of HF-Transformers and Google" ... ) [{'generated_text': 'question: Who created the RuPERTa-base?'}] ``` Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial). You can pass text generation parameters to this pipeline to control stopping criteria, decoding strategy, and more. Learn more about text generation parameters in [Text generation strategies](../generation_strategies) and [Text generation](text_generation). This Text2TextGenerationPipeline pipeline can currently be loaded from [`pipeline`] using the following task identifier: `"text2text-generation"`. The models that this pipeline can use are models that have been fine-tuned on a translation task. See the up-to-date list of available models on [huggingface.co/models](https://huggingface.co/models?filter=text2text-generation). For a list of available parameters, see the [following documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.generation.GenerationMixin.generate) Usage: ```python text2text_generator = pipeline("text2text-generation") text2text_generator("question: What is 42 ? context: 42 is the answer to life, the universe and everything") ```TF)max_new_tokens num_beams generatedct||i||j|jdk(r tyt y)Ntf)super__init__check_model_type frameworkrrselfargskwargs __class__s rr&z$Text2TextGenerationPipeline.__init__Ts> $)&) ~~% < > rc i}|||d<|} i} |$|"|rtjntj}||| d<||| d<|H|jj |d} t | dkDrt jd| d|d <|j|j| d <|j|j| d <|j| d <|| | fS) N truncation return_typeclean_up_tokenization_spacesF)add_special_tokensr zStopping on a multiple token sequence is not yet supported on transformers. The first token of the stop sequence will be used as the stop sequence string in the interim.r eos_token_idassistant_model tokenizerassistant_tokenizer) rrrr5encodelenwarningswarnr4r6) r*return_tensors return_textr0r1r/ stop_sequencegenerate_kwargspreprocess_paramsforward_paramspostprocess_paramsstop_sequence_idss r_sanitize_parametersz0Text2TextGenerationPipeline._sanitize_parameters]s  !.8 l +(  %+*=0>*,,JOOK  "0; } - ' 3A] = >  $ $ 5 5mX] 5 ^ $%) b/@.BON +    +040D0DN, -  # # /*...N; '484L4LN0 1 .2DDDr input_length min_length max_lengthcy)j Checks whether there might be something wrong with given input with regard to the model. Trr*rDrErFs r check_inputsz(Text2TextGenerationPipeline.check_inputssrc|j |jnd}t|dtr;|jj t d|dDcgc]}||z c}f}d}n1t|dt r ||dzf}d}ntd|dd|j||||jd}d |vr|d =|Scc}w) NrzOPlease make sure that the tokenizer has a pad_token_id when using a batch inputTFz `args[0]`: zI have the wrong format. The should be either of type `str` or type `list`)paddingr/r;token_type_ids) prefix isinstancelistr5 pad_token_id ValueErrorstr TypeErrorr()r*r/r+rOargrMinputss r_parse_and_tokenizez/Text2TextGenerationPipeline._parse_and_tokenizes $ 7R d1gt $~~**2 !rss-1!W5cVc\57DG Q %T!W$&DGtAwi'pq  w:^b^l^lm v %'( 6s Cr+r,returnct||i|}t|dtr:t d|dDr%t d|Dr|Dcgc]}|d c}S|Scc}w)a Generate the output text(s) using text(s) given as inputs. Args: args (`str` or `list[str]`): Input text for the encoder. return_tensors (`bool`, *optional*, defaults to `False`): Whether or not to include the tensors of predictions (as token indices) in the outputs. return_text (`bool`, *optional*, defaults to `True`): Whether or not to include the decoded texts in the outputs. clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`): Whether or not to clean up the potential extra spaces in the text output. truncation (`TruncationStrategy`, *optional*, defaults to `TruncationStrategy.DO_NOT_TRUNCATE`): The truncation strategy for the tokenization within the pipeline. `TruncationStrategy.DO_NOT_TRUNCATE` (default) will never truncate, but it is sometimes desirable to truncate the input to fit the model's max_length instead of throwing an error down the line. generate_kwargs: Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework [here](./text_generation)). Return: A list or a list of list of `dict`: Each result comes as a dictionary with the following keys: - **generated_text** (`str`, present when `return_text=True`) -- The generated text. - **generated_token_ids** (`torch.Tensor` or `tf.Tensor`, present when `return_tensors=True`) -- The token ids of the generated text. rc3<K|]}t|tyw)N)rPrT).0els r z7Text2TextGenerationPipeline.__call__..s:BJr3':sc38K|]}t|dk(yw)r N)r8)r\ress rr^z7Text2TextGenerationPipeline.__call__..s4cCHM4s)r%__call__rPrQall)r*r+r,resultr`r-s rraz$Text2TextGenerationPipeline.__call__sf:!4262 tAw %:$q'::4V44&,-sCF- - .s A c 0|j|fd|i|}|S)Nr/)rX)r*rWr/r,s r preprocessz&Text2TextGenerationPipeline.preprocesss#)))&RZR6R rc |jdk(r|dj\}}n8|jdk(r)tj|dj\}}|j |j d|j j|j d|j jd|vr|j |d<|jjd i||}|jd}|jdk(r(|j||zg|jdd}d |iS|jdk(r+tj|||zg|jdd}d |iS) Npt input_idsr$rErFgeneration_configrr output_idsr) r(shaper$numpyrJgetrirErFmodelgeneratereshape)r* model_inputsr>in_brDrjout_bs r_forwardz$Text2TextGenerationPipeline._forwardsm >>T !!-k!:!@!@ D, ^^t #!#,{*C!D!J!J!L D,      d.D.D.O.O P    d.D.D.O.O P  o 5373I3IO/ 0(TZZ((K<K?K   # >>T !+++D%4-W*BRBRSTSUBVWJj))^^t #Ju}0\zGWGWXYXZG[0\]Jj))rcg}|ddD]x}|tjk(r|jd|i}n@|tjk(r-|jd|jj |d|i}|j z|S)Nrjr _token_ids_textT)skip_special_tokensr1)rr return_namerr5decodeappend)r* model_outputsr0r1recordsrjrecords r postprocessz'Text2TextGenerationPipeline.postprocesss' 5a8 #Jj000!--.j9:F /''(.0E0E",05Q1F1 NN6 " #r)NNNNNN)rrr__doc___pipeline_calls_generate_load_processor_load_image_processor_load_feature_extractor_load_tokenizerr_default_generation_configryr&rCintrJrXrrTrQrdictrarDO_NOT_TRUNCATErertrrr __classcell__r-s@rrrs' R $O!#O!1" K %)(ET#3 *$eCcN3$s$tDQTVYQYNG[$L-?,N,N*06@__chrrc>eZdZdZdZfdZdedededefdZxZ S) SummarizationPipelinea Summarize news articles and other documents. This summarizing pipeline can currently be loaded from [`pipeline`] using the following task identifier: `"summarization"`. The models that this pipeline can use are models that have been fine-tuned on a summarization task, which is currently, '*bart-large-cnn*', '*google-t5/t5-small*', '*google-t5/t5-base*', '*google-t5/t5-large*', '*google-t5/t5-3b*', '*google-t5/t5-11b*'. See the up-to-date list of available models on [huggingface.co/models](https://huggingface.co/models?filter=summarization). For a list of available parameters, see the [following documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.generation.GenerationMixin.generate) Unless the model you're using explicitly sets these generation parameters in its configuration files (`generation_config.json`), the following default values will be used: - max_new_tokens: 256 - num_beams: 4 Usage: ```python # use bart in pytorch summarizer = pipeline("summarization") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20) # use t5 in tf summarizer = pipeline("summarization", model="google-t5/t5-base", tokenizer="google-t5/t5-base", framework="tf") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20) ```summaryc"t||i|S)a Summarize the text(s) given as inputs. Args: documents (*str* or `list[str]`): One or several articles (or one list of articles) to summarize. return_text (`bool`, *optional*, defaults to `True`): Whether or not to include the decoded texts in the outputs return_tensors (`bool`, *optional*, defaults to `False`): Whether or not to include the tensors of predictions (as token indices) in the outputs. clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`): Whether or not to clean up the potential extra spaces in the text output. generate_kwargs: Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework [here](./text_generation)). Return: A list or a list of list of `dict`: Each result comes as a dictionary with the following keys: - **summary_text** (`str`, present when `return_text=True`) -- The summary of the corresponding input. - **summary_token_ids** (`torch.Tensor` or `tf.Tensor`, present when `return_tensors=True`) -- The token ids of the summary. r%rar)s rrazSummarizationPipeline.__call__s0w000rrDrErFrYc ||krtjd|d|d||kr#tjd|d|d|dzdy y ) rHzYour min_length=z' must be inferior than your max_length=.zYour max_length is set to z , but your input_length is only z. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=r)NloggerwarningrIs rrJz"SummarizationPipeline.check_inputs1sq  " NN-j\9`ak`llmn o * $ NN,ZL8XYeXfg^^jno^o]ppqs  %r) rrrrryrarboolrJrrs@rrrs6 <K14  # 3 SW rrcneZdZdZdZdededefdZejdddfd Z d fd Z fd Z xZ S) TranslationPipelinea Translates from one language to another. This translation pipeline can currently be loaded from [`pipeline`] using the following task identifier: `"translation_xx_to_yy"`. The models that this pipeline can use are models that have been fine-tuned on a translation task. See the up-to-date list of available models on [huggingface.co/models](https://huggingface.co/models?filter=translation). For a list of available parameters, see the [following documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.generation.GenerationMixin.generate) Unless the model you're using explicitly sets these generation parameters in its configuration files (`generation_config.json`), the following default values will be used: - max_new_tokens: 256 - num_beams: 4 Usage: ```python en_fr_translator = pipeline("translation_en_to_fr") en_fr_translator("How old are you?") ``` translationrDrErFcL|d|zkDrtjd|d|dy)Ng?zYour input_length: z" is bigger than 0.9 * max_length: z`. You might consider increasing your max_length manually, e.g. translator('...', max_length=400)TrrIs rrJz TranslationPipeline.check_inputs\s= # * * NN%l^3UV`Uab^^ rN)r/src_langtgt_langct|jddr)|jj||j|||dSt ||d|iS)N_build_translation_inputs)r;r/rrr/)getattrr5rr(r%rX)r*r/rrr+r-s rrezTranslationPipeline.preprocessdsU 4>>#> E;4>>;;dnnV^iq 7.LL Lrc t |di|\}}}|||d<|||d<|O|M|jd|j}|j d}|rt |dk(r|d|d<|d|d<|||fS) Nrrtask_rr r)r%rCrmrsplitr8) r*rrr,r?r@rAritemsr-s rrCz(TranslationPipeline._sanitize_parametersls@E@\@f_e@f=>+=  ,4 j )  ,4 j )   0::fdii0DJJsOEE a05a!*-05a!*- .2DDDrc"t||i|S)a Translate the text(s) given as inputs. Args: args (`str` or `list[str]`): Texts to be translated. return_tensors (`bool`, *optional*, defaults to `False`): Whether or not to include the tensors of predictions (as token indices) in the outputs. return_text (`bool`, *optional*, defaults to `True`): Whether or not to include the decoded texts in the outputs. clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`): Whether or not to clean up the potential extra spaces in the text output. src_lang (`str`, *optional*): The language of the input. Might be required for multilingual models. Will not have any effect for single pair translation models tgt_lang (`str`, *optional*): The language of the desired output. Might be required for multilingual models. Will not have any effect for single pair translation models generate_kwargs: Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework [here](./text_generation)). Return: A list or a list of list of `dict`: Each result comes as a dictionary with the following keys: - **translation_text** (`str`, present when `return_text=True`) -- The translation. - **translation_token_ids** (`torch.Tensor` or `tf.Tensor`, present when `return_tensors=True`) -- The token ids of the translation. rr)s rrazTranslationPipeline.__call__|s<w000r)NN) rrrrryrrJrrrerCrarrs@rrr@sP 0 K#3,>+M+MX\gkME 11rr)enumr9typingrr generationrtokenization_utilsrutilsrr r r baser r tensorflowr$models.auto.modeling_tf_autormodels.auto.modeling_autor get_loggerrrEnumrrrrrrrrs )3TT4^X   H %  ,4@AV(VBVr,4@AG7GBGT,4@AY15Y1BY1r