JL i"/VdZddlZddlZddlZddlmZmZddlZdeedeede fdZ deed eefd Z deed eede e ffd Z d e d e de de de de f dZ ddeedede de deegeefdee f dZ ddeeedeede de deegeefdeee f dZy)zLEPOR score implementation.N)CallableList reference hypothesisreturnct|}t|}||k(ry||krtjd||z z Stjd||z z S)a This function calculates the length penalty(LP) for the LEPOR metric, which is defined to embrace the penaltyvfor both longer and shorter hypothesis compared with the reference translations. Refer from Eq (2) on https://aclanthology.org/C12-2044 :param reference: Reference sentence :type reference: str :param hypothesis: Hypothesis sentence :type hypothesis: str :return: Penalty of difference in length in reference and hypothesis sentence. :rtype: float )lenmathexp)rrref_lenhyp_lens Z/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/translate/lepor.pylength_penaltyrs[)nG*oG' 7 xxWw./00xxWw./00 ref_tokens hyp_tokenscg}t|}t|}t|D]\}}|j|dk(r|jd-|j|dk(r!|j|j |bt|Dcgc] \}}||k(s |} }}g} t| D]|\} } d| dz cxkr|kr+nn(d|dz cxkr|krnn|| dz ||dz k(rd| | <?d| dzcxkr|kr+nn(d|dzcxkr|krnn|| dz||dzk(rd| | <xd| | <~| jddk(r%|j| | j dJ| jddkDrFd} d}t | | D] \}} |s t || z }|| kDs|} | }"|j|d} d}| D]} t || z }|| kDs|} | }|j|| D]} t || z }|| kDs|} | }|j| |Dcgc] }|dk7s |dz}}|Scc}}wcc}w)a  This function computes the context-dependent n-gram word alignment tasks that takes into account the surrounding context (neighbouring words) of the potential word to select a better matching pairs between the output and the reference. This alignment task is used to compute the ngram positional difference penalty component of the LEPOR score. Generally, the function finds the matching tokens between the reference and hypothesis, then find the indices of longest matching n-grams by checking the left and right unigram window of the matching tokens. :param ref_tokens: A list of tokens in reference sentence. :type ref_tokens: List[str] :param hyp_tokens: A list of tokens in hypothesis sentence. :type hyp_tokens: List[str] rr TF)r enumeratecountappendindexzipabs)rr alignmentsrr hyp_index hyp_tokeni ref_token ref_indexes is_matchedind ref_index min_distance min_indexmatchdistanceas r alignmentr*,s J*oG*oG )* 5G- 9   I &! +   b !   i (A -   j..y9 : '0 &;"ayI?UK J"+K"8 ,Y A //IM3G3"9q=1Z A 5NN&*JsO A //IM3G3"9q=1Z A 5NN&*JsO',JsO# ,*%*!!+j.>.>t.D"EF!!$'!+  (+J (D2$E9#&y9'<#=#l2+3L(1I 2 !!), !  !,.I"9y#89H,.'/ $- . !!),!,.I"9y#89H,.'/ $- . !!),OG-T",7AqBw!a%7J7 C@8s I "I 3 I>Ic (t||}t|}g}t|D]=\}}|jt |dzt|z |t|z z ?t |t|z }t j| |fS)aS This function calculates the n-gram position difference penalty (NPosPenal) described in the LEPOR paper. The NPosPenal is an exponential of the length normalized n-gram matches between the reference and the hypothesis. :param ref_tokens: A list of words in reference sentence. :type ref_tokens: List[str] :param hyp_tokens: A list of words in hypothesis sentence. :type hyp_tokens: List[str] :return: A tuple containing two elements: - NPosPenal: N-gram positional penalty. - match_count: Count of matched n-grams. :rtype: tuple r )r*r rrrsumr r )rrr match_countpdrr)npds rngram_positional_penaltyr0s$:z2Jj/K B*%H1 #q1uJ/!c*o2EEFGH b'C O #C 88SD>; &&rr-reference_lengthhypothesis_lengthalphabetac|tjj}||z }||z }||z|||zz |||zz zz }|S)a Function will calculate the precision and recall of matched words and calculate a final score on wighting using alpha and beta parameters. :param match_count: Number of words in hypothesis aligned with reference. :type match_count: int :param reference_length: Length of the reference sentence :type reference_length: int :param hypothesis_length: Length of the hypothesis sentence :type hypothesis_length: int :param alpha: A parameter to set weight fot recall. :type alpha: float :param beta: A parameter to set weight fot precision. :type beta: float :return: Harmonic mean. :rtype: float )sys float_infoepsilon) r-r1r2r3r4r8 precisionrecallharmonic_scores rharmonicr<s[4nn$$G//I + +Fdl &7" # G0C(DEN r references tokenizerct}|r'||}t|D]\}}||||<n@tj|}t|D]\}}tj|||<|D]|}t |dk(st |dk(r t dt ||}t||\} } t| t |t |||} |j|| z| z~|S)a Calculate LEPOR score a sentence from Han, A. L.-F. (2017). LEPOR: An Augmented Machine Translation Evaluation Metric. https://arxiv.org/abs/1703.08748v2 >>> hypothesis = 'a bird is on a stone.' >>> reference1 = 'a bird behind the stone.' >>> reference2 = 'a bird is on the rock.' >>> sentence_lepor([reference1, reference2], hypothesis) [0.7824248013113159, 0.7739937377760259] :param references: Reference sentences :type references: list(str) :param hypothesis: Hypothesis sentence :type hypothesis: str :param alpha: A parameter to set weight fot recall. :type alpha: float :param beta: A parameter to set weight fot precision. :type beta: float :param tokenizer: A callable tokenizer that will accept a string and returns a list of tokens. :type tokenizer: Callable[[str], List[str]] :return: The list of Lepor scores for a hypothesis with all references. :rtype: list(float) rz#One of the sentence is empty. Exit.) listrnltk word_tokenizer ValueErrorrr0r<r) r=rr3r4r> lepor_scoresrrlpr/r-r;s rsentence_leporrFsF6Lz* )* 5 5 E9 )) 4Ju  5'' 3 )* 5 > E9 $ 2 29 =Ju  > 7 y>Q #j/Q"6BC CIz 24IzJ[! YZ%  BH~567 rc t|dk(st|dk(r tdt|t|k(sJdt}t||D]#\}}|j t |||||%|S)aK Calculate LEPOR score for list of sentences from Han, A. L.-F. (2017). LEPOR: An Augmented Machine Translation Evaluation Metric. https://arxiv.org/abs/1703.08748v2 >>> hypothesis = ['a bird is on a stone.', 'scary crow was not bad.'] >>> references = [['a bird behind the stone.', 'a bird is on the rock'], ... ['scary cow was good.', 'scary crow was elegant.']] >>> corpus_lepor(references, hypothesis) [[0.7824248013113159, 0.7931427828105261], [0.5639427891892225, 0.7860963170056643]] :param references: Reference sentences :type references: list(list(str)) :param hypothesis: Hypothesis sentences :type hypothesis: list(str) :param alpha: A parameter to set weight fot recall. :type alpha: float :param beta: A parameter to set weight fot precision. :type beta: float :param tokenizer: A callable tokenizer that will accept a string and returns a list of tokens. :type tokenizer: Callable[[str], List[str]] :return: The Lepor score. Returns a list for all sentences :rtype: list(list(float)) rzThere is an Empty list. Exit.zCThe number of hypothesis and their reference(s) should be the same )r rCr@rrrF)r=rr3r4r>rD reference_senhypothesis_sens r corpus_leporrJsH :!s:!3899 z?c*o -P -6L),Z)D % ~ =.%y Q  r)?rKN)__doc__r rer6typingrrrAstrfloatrr*r0intr<rFrJrrrSs" ! 1d3i1T#Y1514a$s)acaH'S ''+Cy' U^'@####  #  #  #R,0 @S @@ @  @ tCy() @  %[ @L,0 3T#Y3S 3 3  3 tCy() 3  $u+ 3r