L iiy ddlZddlZddlZddlmZddlmZmZmZddl Z ddl m Z m Z mZddlmZddlmZddlmZmZmZmZmZmZd d lmZmZmZej<eZ erdd l!m"Z"dd l#m$Z$erddl%Z%er ddl&Z'dd l(m)Z)dZ*erddl+Z+ddl,m*Z*ddl-m.Z.de j^de j^de0de0de j^de1f dZ2 ddZ3GddeZ4eedGddeZ5y)N)Iterable) TYPE_CHECKINGOptionalUnion) SquadExample SquadFeatures"squad_convert_examples_to_features) ModelCard)PreTrainedTokenizer)PaddingStrategyadd_end_docstringsis_tf_availableis_tokenizers_availableis_torch_availablelogging)ArgumentHandler ChunkPipelinebuild_pipeline_init_args)TFPreTrainedModel)PreTrainedModel)-TF_MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES)Dataset)*MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMESstartendtopkmax_answer_lenundesired_tokensreturnc@|jdk(r|d}|jdk(r|d}tjtj|dtj|d}tjtj ||dz }|j }|dk(rtj|g}n[t||krtj| }n6tj| |d|} | tj||  }tj||jdd\} } tj| |jtj| |jz} | | } | | } |d| | f} | | | fS)aG Take the output of any `ModelForQuestionAnswering` and will generate probabilities for each span to be the actual answer. In addition, it filters out some unwanted/impossible cases like answer len being greater than max_answer_len or answer end position being before the starting position. The method supports output the k-best answer through the topk argument. Args: start (`np.ndarray`): Individual start probabilities for each token. end (`np.ndarray`): Individual end probabilities for each token. topk (`int`): Indicates how many possible answer span(s) to extract from the model output. max_answer_len (`int`): Maximum size of the answer to extract from the model's output. undesired_tokens (`np.ndarray`): Mask determining tokens that can be part of the answer rNr)ndimnpmatmul expand_dimstriltriuflattenargmaxlenargsort argpartition unravel_indexshapeisinnonzero)rrrrr outer candidates scores_flatidx_sortidxstartsends desired_spansscoress o/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/pipelines/question_answering.py decode_spansr=.st& zzQd  xx1}$i IIbnnUB/Q1G HE!);.>?CLFDGGF$4$<$<$>?"''$P`PhPhPjBkkM M "F  D 64 (F 4 cLtjtj|dz }|||z}|dk(} tj| d|}tj| d|}tj||j ddz }||j z }tj||j ddz }||j z }|r#t||d|dzj}dx|d<|d<t|||||\} } } | | | |fS)ai Takes the raw output of any `ModelForQuestionAnswering` and first normalizes its outputs and then uses `decode_spans()` to generate probabilities for each span to be the actual answer. Args: start (`np.ndarray`): Individual start logits for each token. end (`np.ndarray`): Individual end logits for each token. p_mask (`np.ndarray`): A mask with 1 for values that cannot be in the answer attention_mask (`np.ndarray`): The attention mask generated by the tokenizer min_null_score(`float`): The minimum null (empty) answer score seen so far. topk (`int`): Indicates how many possible answer span(s) to extract from the model output. handle_impossible_answer(`bool`): Whether to allow null (empty) answers max_answer_len (`int`): Maximum size of the answer to extract from the model's output. rggr#T)axiskeepdims)rr) r%absarraywhereexpmaxsumminitemr=) rrp_maskattention_maskmin_null_scoretop_khandle_impossible_answerrr undesired_tokens_maskr8r9r;s r<select_starts_endsrP`s*2vvbhhv.23!+n<-3 HH*He K[\FD& 4 //r>c,eZdZdZdZdZdZdZdZdZ y) QuestionAnsweringArgumentHandlera& QuestionAnsweringPipeline requires the user to provide multiple arguments (i.e. question & context) to be mapped to internal [`SquadExample`]. QuestionAnsweringArgumentHandler manages all the possible to create a [`SquadExample`] from the command-line supplied arguments. FTcHt|tr|St|trsdD]Y}||vr td||t d|dt||t s:t ||dk(sLt d|dtjdi|St |d) NquestioncontextzFYou need to provide a dictionary with keys {question:..., context:...}`z` cannot be Nonerz` cannot be emptyz2 argument needs to be of type (SquadExample, dict)) isinstancerdictKeyError ValueErrorstrr,QuestionAnsweringPipeline create_sample)selfrIks r< normalizez*QuestionAnsweringArgumentHandler.normalizes dL )K d #, ?D="#kll!W_$q+;%<==Q-#d1g,!2C$q+<%=>>  ?-::BTB BD6!STUUr>c|mt|dkDr_t|dk(r|d}nt|dk(r/|Dchc] }t|c}thk(r|d|ddg}n`t|}nSd|vr!t j dt |d}n.d|vr!t j dt |d}n d |vrd |vrt|d tr,t|d tr|d Dcgc] }||d d }}nt|d trct|d trPt|d t|d k7r td t|d |d Dcgc] \}}||d }}}nLt|d tr t|d tr |d |d dg}ntd td |ttjtfntjf}t||r|St|tr|g}n*t|tr t|}ntd|t|D]\}} |j!| ||<|Scc}wcc}wcc}}w)NrrrrTXzPassing the `X` argument to the pipeline is deprecated and will be removed in v5. Inputs should be passed using the `question` and `context` keyword arguments instead.datazPassing the `data` argument to the pipeline is deprecated and will be removed in v5. Inputs should be passed using the `question` and `context` keyword arguments instead.rUrVz2Questions and contexts don't have the same lengthszArguments can't be understoodzUnknown arguments zInvalid arguments )r,typer]listwarningswarn FutureWarningrYr\ziprtypes GeneratorTyperZr enumeraterb) r`argskwargsinputselQCgenerator_typesirIs r<__call__z)QuestionAnsweringArgumentHandler.__call__s  D A 4yA~aTa$="T"X$=#$F'+Aw47CDdF] MMz C[F v  MM} F^F 6 !i6&9&,d3 6)CTVY8ZQWXbQcdAqVI5FGddF:.5*VIEVX\:]vj)*c&2C.DD$%YZZDGzHZ\bcl\mDnoDAqqQ7ooF:.4F9DUWZ9['-j'9fYFWXY !@AA1&:; ;=D(e psI%/I*$I/N) __name__ __module__ __qualname____doc___load_processor_load_image_processor_load_feature_extractor_load_tokenizerrbrwrXr>r<rRrRs*O!#O V7r>rRT) has_tokenizerc`eZdZdZdZdZ d"deddedee dee d e f fd Z e d ee e e fd ee e e fd eee effdZ d#dZfdZd$dZdZ d%dZde ede d eefdZdddedededed eeeff dZde ded ed ee ee efffd!ZxZS)&r^a Question Answering pipeline using any `ModelForQuestionAnswering`. See the [question answering examples](../task_summary#question-answering) for more information. Example: ```python >>> from transformers import pipeline >>> oracle = pipeline(model="deepset/roberta-base-squad2") >>> oracle(question="Where do I live?", context="My name is Wolfgang and I live in Berlin") {'score': 0.9191, 'start': 34, 'end': 40, 'answer': 'Berlin'} ``` Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial) This question answering pipeline can currently be loaded from [`pipeline`] using the following task identifier: `"question-answering"`. The models that this pipeline can use are models that have been fine-tuned on a question answering task. See the up-to-date list of available models on [huggingface.co/models](https://huggingface.co/models?filter=question-answering). zquestion,contextFmodel)rr tokenizer modelcard frameworktaskc t|d|||||d|t|_|j |j dk(r t yty)N)rrrrrtfrX)super__init__rR _args_parsercheck_model_typerrr)r`rrrrrrp __class__s r<rz"QuestionAnsweringPipeline.__init__ sf      => ~~% : < r>rUrVr!c t|tr-t||Dcgc]\}}td||dddc}}Std||dddScc}}w)aC QuestionAnsweringPipeline leverages the [`SquadExample`] internally. This helper method encapsulate all the logic for converting question(s) and context(s) to [`SquadExample`]. We currently support extractive question answering. Arguments: question (`str` or `list[str]`): The question(s) asked. context (`str` or `list[str]`): The context(s) in which we will look for the answer. Returns: One or a list of [`SquadExample`]: The corresponding [`SquadExample`] grouping question and context. N)rYrgrkr)rUrVqcs r<r_z'QuestionAnsweringPipeline.create_sample#sU" h %KNxY`Kab41aLq!T4>b bhtTJ JcsAc  i} ||| d<||| d<||| d<||| d<i} ||tjdt|}||dkrtd|d|| d <||dkrtd ||| d <||| d <| | | d <| i| fS)Npadding doc_stridemax_question_len max_seq_lenz/topk parameter is deprecated, use top_k insteadrz$top_k parameter should be >= 1 (got )rMz-max_answer_len parameter should be >= 1 (got rrNalign_to_words)rhri UserWarningr\) r`rrrMrrrrrNrrppreprocess_paramspostprocess_paramss r<_sanitize_parametersz.QuestionAnsweringPipeline._sanitize_parameters9s  +2 i (  !.8 l +  '4D 0 1  "/: m ,   MMK[ YE  qy #Gwa!PQQ*/ w '  %! #PQ_P`!abb3A / 0 # /=U 9 :  %3A / 0 "&888r>c|rtjdt|j|i|}t |t t fr!t|dk(rt|$|dfi|St|$|fi|S)a Answer the question(s) given as inputs by using the context(s). Args: question (`str` or `list[str]`): One or several question(s) (must be used in conjunction with the `context` argument). context (`str` or `list[str]`): One or several context(s) associated with the question(s) (must be used in conjunction with the `question` argument). top_k (`int`, *optional*, defaults to 1): The number of answers to return (will be chosen by order of likelihood). Note that we return less than top_k answers if there are not enough options available within the context. doc_stride (`int`, *optional*, defaults to 128): If the context is too long to fit with the question for the model, it will be split in several chunks with some overlap. This argument controls the size of that overlap. max_answer_len (`int`, *optional*, defaults to 15): The maximum length of predicted answers (e.g., only answers with a shorter length are considered). max_seq_len (`int`, *optional*, defaults to 384): The maximum length of the total sentence (context + question) in tokens of each chunk passed to the model. The context will be split in several chunks (using `doc_stride` as overlap) if needed. max_question_len (`int`, *optional*, defaults to 64): The maximum length of the question after tokenization. It will be truncated if needed. handle_impossible_answer (`bool`, *optional*, defaults to `False`): Whether or not we accept impossible as an answer. align_to_words (`bool`, *optional*, defaults to `True`): Attempts to align the answer to real words. Improves quality on space separated languages. Might hurt on non-space-separated languages (like Japanese or Chinese) Return: A `dict` or a list of `dict`: Each result comes as a dictionary with the following keys: - **score** (`float`) -- The probability associated to the answer. - **start** (`int`) -- The character start index of the answer (in the tokenized version of the input). - **end** (`int`) -- The character end index of the answer (in the tokenized version of the input). - **answer** (`str`) -- The answer to the question. zPassing a list of SQuAD examples to the pipeline is deprecated and will be removed in v5. Inputs should be passed using the `question` and `context` keyword arguments instead.rr) rhrirjrrYrgtupler,rrw)r`rorpexamplesrs r<rwz"QuestionAnsweringPipeline.__call__cs~N  MMB  %4$$d5f5 hu .3x=A3E7#HQK:6: :w3F33r>c #Kt|trtd|d|dddd}| t|jj d}|t|dzd}||kDrt d|d|d|jjs.t|g|j|||tjd d }n|jjd k(}|j |r |jn |j|r |jn |j||rd nd ||dddd }t|d} t| D cgc](} |j!| D cgc] } |r| dk7nd c} *} } } g}t| D]} |d| }d|vr|d| nd}d|vr|d| nd}|jj"Qt%j&t%j(||jj"k(d}|D] }d| | |< | | }|j+t-d%id|d|d|d|d|| dddidddddddddgddddd d d!dt/|D]R\}}i}i}|jj0ddgz}|j2j5D]\}}||vr|j6d"k(rpt9j:|}|j<t8j>k(r$t9j@|t8jB}t9jD|d||<|j6d#k(stGjH|}|j<tFjBk(r|jK}|jMd||<|||<|t|dz k(}||d$||Uycc} wcc} } ww)&NrUrVirz`doc_stride` (z ) is larger than `max_seq_len` (rF)rrmax_seq_lengthrmax_query_lengthpadding_strategy is_training tqdm_enabledright only_second only_firstT) text text_pairr truncation max_lengthstridereturn_token_type_idsreturn_overflowing_tokensreturn_offsets_mappingreturn_special_tokens_mask input_idsrrrKtoken_type_idsrJencoding cls_indextoken_to_orig_map example_index unique_id paragraph_lentoken_is_max_contexttokensstart_position end_position is_impossibleqas_idrpt)exampleis_lastrX)'rYrZrrHrmodel_max_lengthr\is_fastr r MAX_LENGTH padding_side question_text context_textr,range sequence_ids cls_token_idr%r2rCappendr rnmodel_input_names__dict__itemsrrconstantdtypeint64castint32r'torchtensorlong unsqueeze)r`rrrrrfeaturesquestion_firstencoded_inputs num_spansspan_idtokrJspan_idxinput_ids_span_idxattention_mask_span_idxtoken_type_ids_span_idx cls_indicesrsubmaskrvfeaturefw_argsothersrravrrs r< preprocessz$QuestionAnsweringPipeline.preprocesssV gt $"4)PRVX\^bcG  dnn==sCK  [A-s3J  #~j\9YZeYffghi i~~%%9!..*%!1!0!;!;!" H"^^88GCN!^^.Y>YZa>bcs^2cF H!),# %3K%@%J"BRVdBdN#34X>jn(CSVdBdN#34X>jn(>>..:"$**RXX6H-IT^^MhMh-h"ijk"lK%08 67x(38 *!"4'>(? '  "0!9 #'+-'(#$'(./ "() &'!"',#$ $%# J$H- PJAwGF $ @ @HN^C_ _ ((..0 "1))~~-!#Q!<<2883%'WWVRXX%>F%'^^FA%> 4/!&a!<<5;;6%+[[]F%+%5%5a%8 !F1I "3x=1,,G%'OWOO O+ PUds2D9O;N?N: #N?(GOO:N??Oc|d}|jjDcic]}||| }}|jdk(r|jjn|jj }dt j|jvrd|d<|jd i|}t|tr|d|d|d|S|dd\}}|||d|Scc}w) Nrr use_cacheF start_logits end_logits)rrrrrX) rrrrforwardcallinspect signature parametersrYrZ) r`rqrra model_inputs model_forwardoutputrrs r<_forwardz"QuestionAnsweringPipeline._forward s#.2nn.N.NO6!9 O O.2nn.D **$**// '++M:EE E(-L %+l+ fd ##N3F<z7QuestionAnsweringPipeline.postprocess..}s ' r>T)keyreverse)rrrbfloat16tofloat32getnumpyrPrrr%rCchar_to_word_offsetrkrrIrDjoin doc_tokensr pad_token_idrG get_indicesr get_answersortedr,)r` model_outputsrMrNrrrLanswersrstart_end_rrJrKpre_topkr8r9r; char_to_wordserrrencoffsetsequence_index start_index end_index target_answerrs r< postprocessz%QuestionAnsweringPipeline.postprocesssi!#W F~~%&/*?*?5>>*Q++EMM:e}'' 6e}Y'GH%F4:JJ?OQU4V4b'(..0hl  #1 Be 4F( 4 0FD&.>>))!xx(C(CD $'vtV#< KAq%(./B(C%NN%*ZZ\%'XXl>OPQ>R.R%STU%VWX%Y%^%^%`#%88L>..&8$[1T^^5P5PPWWY]]_FF '5!#&vtV#<KAq%F AF A-1-=-=c1aYg-h*K$+$8$8Y$OM!__WmDFw5::<7).)4'0*1*>*>{9*U IW r $ NN^aUWX Y&:DI&5Q w<1 1: r>rtargetcb|D]*}|dj|jk(s(|cSy)Nr)lower)r`rrrs r<rz$QuestionAnsweringPipeline.get_answers7 Fh%%'6<<>9  r>rztokenizers.Encodingrrrrch|rS |j|}|j|}|j||d}|j||d} || fS|j|d}|j|d} || fS#t$r*|j|d}|j|d} Y|| fSwxYw)N)rrr) token_to_word word_to_chars Exceptionoffsets) r`rrrrr start_wordend_wordrrs r<r z%QuestionAnsweringPipeline.get_indicess  . ..q1 ,,Q/!// >/Z[\] --h~-VWXY I%%++a.+K Aq)II%% .!kk!nQ/ KKN1- I%% .sAA>>-B10B1rrrcg}dx}x}x}}|jdD]q} |jj| } ||cxkr|kr#nn ||k(r|}||k(r|t| z}|| gz }||kDrn!|t| z }|t| dzz }sdj |t d|t t||dS)a When decoding from token probabilities, this method maps token indexes to actual word in the initial context. Args: text (`str`): The actual context to extract the answer from. start (`int`): The answer starting token index. end (`int`): The answer end token index. Returns: Dictionary like `{'answer': str, 'start': int, 'end': int}` rrr)rrr)splitrtokenizer,r rFrH) r`rrrwords token_idxchar_start_idx char_end_idx chars_idxwordtokens r<span_to_answerz(QuestionAnsweringPipeline.span_to_answers@AA ANA\IJJsO 'DNN++D1E (S(%%.N##,s4y#8L$3 U #I TQ &I' '.hhuoN+s4y,/  r>)NNr) NNNNNNNNN) do_not_padN@N)rFT)rxryrzr{default_input_namesrNrr rr r]r staticmethodrgrr_rrwrrrrZrintboolrr r3 __classcell__)rs@r<r^r^s0-$ *.#'  ;< ' I&  C=    2KT#Y'K27T#Y2GK |T,// 0KK. !%(9T04dsPj N"!& hT$t*chtn &(&-0&58&JM&_c& sCx&$) 3) s) ) c5QTVYQY?FZA[) r>r^)rrFr6)6rrlrhcollections.abcrtypingrrrrr%rerr r rr tokenization_utilsr utilsr rrrrrbaserrr get_loggerrxloggermodeling_tf_utilsrmodeling_utilsr tokenizers tensorflowrmodels.auto.modeling_tf_autorrrtorch.utils.datamodels.auto.modeling_autorndarrayr9rr=rPrRr^rXr>r<rKs $11RR!4KJ   H %50 \G(V/ ::/ JJ/ .1/ CF/ Z\ZdZd/  / n "30lTTn,4@AU U BU r>