L i/ddlmZmZmZddlZddlmZmZm Z m Z ddl m Z m Z mZmZer ddlZddlmZe rddlZe j*eZeed d Gd d e Zy) )AnyUnionoverloadN)add_end_docstringsis_tf_availableis_torch_availablelogging) GenericTensorPipelinePipelineExceptionbuild_pipeline_init_args)stable_softmaxT) has_tokenizeraO top_k (`int`, *optional*, defaults to 5): The number of predictions to return. targets (`str` or `list[str]`, *optional*): When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocab. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower). tokenizer_kwargs (`dict`, *optional*): Additional dictionary of keyword arguments passed along to the tokenizer.c peZdZdZdZdZdZ dedejfdZ dedejfdZ defdZ dde eeffd Zd Zdd Zd Zdd Zedededee eeffdZedeededeee eeffdZdeeeefdedeee eefeee eeffffd ZxZS)FillMaskPipelineFT input_idsreturnc,|jdk(r>T !88I1M1M$MNTTVL  ^^t # ==dnn6R6R)R]bcL45 5c|j|}tj|j}|dkr9t d|j j d|jjdy)Nr fill-maskzNo mask_token (z) found on the input) r&npprodshapermodelbase_model_prefixr mask_token)r#rr$numels r%_ensure_exactly_one_mask_tokenz/FillMaskPipeline._ensure_exactly_one_mask_tokenesg,,Y7  **+ 19# ,,!$..";";!< ?r'c v| |j}|i}|j|fd|i|}|j||S)Nreturn_tensors)rrr7)r#inputsr9tokenizer_kwargspreprocess_parametersr2s r% preprocesszFillMaskPipeline.preprocesswsN  !!^^N  #! %t~~f`^`O_`  **<8r'c:|jdi|}|d|d<|S)Nr)r-)r#r2 model_outputss r%_forwardzFillMaskPipeline._forwards*" 2\2 %1+%> k"r'c|!|jd|kr|jd}|dd}|d}|jdk(rtj||jj k(j dddf}|j }|d|ddf}t|d}|Ptjtj|d|jdd}tj|d}tjj||} | jj | jj } } nvt!j"||jj k(d jd}|d|ddf}|j%d }||d |f}|j'|\} } g} | jddk(} t)t+| j-| j-D]\}\}}g}t+||D]\}}|j j/}|||j-}||||<|t1j||jj2k7}|jj5|| }|||jj5|g|d}|j7|| j7|| r| dS| S)Nrrlogitsr)axisr )kFr)dim.)skip_special_tokens)scoretoken token_strsequence)r,rrrrrrr gather_ndsqueezereshape expand_dimsmathtop_kvaluesindicesr r!softmaxtopk enumerateziptolistcopyr* pad_token_iddecodeappend)r#r@rR target_idsroutputsr$rCprobsrVrS predictionsresult single_maski_values _predictionsrowvptokensrL propositions r% postprocesszFillMaskPipeline.postprocesss  !j&6&6q&9E&A$$Q'E!+.q1 ) >>T !88I1M1M$MNTTVWXZ[W[\LmmoGQ a/0F"63E% RZZq%9:;M;MbRS;TUua077==%=0D"&++"3"3"5t||7I7I7KKF ==dnn6R6R)R]bckklnoLQ a/0FNNrN*E%c:o."'**U"3 FKll1o* *3C I[I[I]4^*_  &A&CG\2 (1"*//1)"1 ,,.A*+|A'4>>3N3N)N OP >>00[0Y()ADNNDYDY[\Z]D^ltu  ;' ( MM# # $ !9  r'c Zt|tr|g} |jj}g}|D]}|j |}|||j|dddddd}t |dk(rtjd|dX|d}tjd|d |jj|d |j|tt|}t |dk(r td tj|}|S#t$ri}YwxYw) NFr T)add_special_tokensreturn_attention_maskreturn_token_type_ids max_length truncationrrzThe specified target token `zd` does not exist in the model vocabulary. We cannot replace it with anything meaningful, ignoring itz:` does not exist in the model vocabulary. Replacing with `z`.z1At least one target must be provided when passed.)r4strr get_vocab Exceptiongetlenloggerwarningconvert_ids_to_tokensr]r5setr"r*array)r#targetsvocabr^targetid_rs r%get_target_idszFillMaskPipeline.get_target_idssX gs #iG NN,,.E  #F))F#C{ NN',*/*/ # + y>Q&NN6vh?UUl 26(;''+~~'K'KC'P&QQSU   c "5 #6#j/* z?a PQ QXXj) E E sD D*)D*ci}|||d<i}||j|}||d<|||d<|jj!td|jj d|i|fS)Nr;r^rRr)z-The tokenizer does not define a `mask_token`.)rrrrr-r.)r#rRr}r;preprocess_paramspostprocess_paramsr^s r%_sanitize_parametersz%FillMaskPipeline._sanitize_parameterss  '4D 0 1  ,,W5J/9 | ,  */ w ' >> ' ' /#TZZ99;j !"&888r'r:kwargsc yNr?r#r:rs r%__call__zFillMaskPipeline.__call__sLOr'c yrr?rs r%rzFillMaskPipeline.__call__sX[r'c nt||fi|}t|trt |dk(r|dS|S)a Fill the masked token in the text(s) given as inputs. Args: inputs (`str` or `list[str]`): One or several texts (or one list of prompts) with masked tokens. targets (`str` or `list[str]`, *optional*): When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocab. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower). top_k (`int`, *optional*): When passed, overrides the number of predictions to return. Return: A list or a list of list of `dict`: Each result comes as list of dictionaries with the following keys: - **sequence** (`str`) -- The corresponding input with the mask token prediction. - **score** (`float`) -- The corresponding probability. - **token** (`int`) -- The predicted token id (to replace the masked one). - **token_str** (`str`) -- The predicted token (to replace the masked one). r r)superrr4r5rw)r#r:rr_ __class__s r%rzFillMaskPipeline.__call__s=0'"64V4 fd #F q(81: r')NN)N)NNN)__name__ __module__ __qualname___load_processor_load_image_processor_load_feature_extractor_load_tokenizerr r*ndarrayr&r1r7dictrsr=rArlrrrrr5rr __classcell__)rs@r%rrsFO!#O2h-BJJ "**?-?=A  c= !  5n'R9*OsOcOd4S>6JOO [tCy[C[Dd3PS8nAU%?? @r'r)typingrrrrr*utilsrrr r baser r rr tensorflowrtf_utilsrr get_loggerrrxrr?r'r%rsx''TTVV)   H %40Y |x| |r'