JL iP<ddlZddlmZddlmZddlmZddlm Z dZGddZ d Z dd Z ejd Zdd ZdZ ddZdZejdej&ZejdZdZgdd fdZdZedk(reyy)N)accuracy)map_tag) str2tuple)Treecg}g}|D]=}|j|j}|t|z }|t|z }?t||S)a| Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score. :type chunker: ChunkParserI :param chunker: The chunker being evaluated. :type gold: tree :param gold: The chunk structures to score the chunker on. :rtype: float )parseflattentree2conlltags _accuracy)chunkergold gold_tags test_tags gold_tree test_trees U/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/chunk/util.pyrrsaII/ MM)"3"3"56 ^I.. ^I.. / Y **cfeZdZdZdZdZdZdZdZdZ ddZ d Z d Z d Z d Zd ZdZdZy) ChunkScorea; A utility class for scoring chunk parsers. ``ChunkScore`` can evaluate a chunk parser's output, based on a number of statistics (precision, recall, f-measure, misssed chunks, incorrect chunks). It can also combine the scores from the parsing of multiple texts; this makes it significantly easier to evaluate a chunk parser that operates one sentence at a time. Texts are evaluated with the ``score`` method. The results of evaluation can be accessed via a number of accessor methods, such as ``precision`` and ``f_measure``. A typical use of the ``ChunkScore`` class is:: >>> chunkscore = ChunkScore() # doctest: +SKIP >>> for correct in correct_sentences: # doctest: +SKIP ... guess = chunkparser.parse(correct.leaves()) # doctest: +SKIP ... chunkscore.score(correct, guess) # doctest: +SKIP >>> print('F Measure:', chunkscore.f_measure()) # doctest: +SKIP F Measure: 0.823 :ivar kwargs: Keyword arguments: - max_tp_examples: The maximum number actual examples of true positives to record. This affects the ``correct`` member function: ``correct`` will not return more than this number of true positive examples. This does *not* affect any of the numerical metrics (precision, recall, or f-measure) - max_fp_examples: The maximum number actual examples of false positives to record. This affects the ``incorrect`` member function and the ``guessed`` member function: ``incorrect`` will not return more than this number of examples, and ``guessed`` will not return more than this number of true positive examples. This does *not* affect any of the numerical metrics (precision, recall, or f-measure) - max_fn_examples: The maximum number actual examples of false negatives to record. This affects the ``missed`` member function and the ``correct`` member function: ``missed`` will not return more than this number of examples, and ``correct`` will not return more than this number of true negative examples. This does *not* affect any of the numerical metrics (precision, recall, or f-measure) - chunk_label: A regular expression indicating which chunks should be compared. Defaults to ``'.*'`` (i.e., all chunks). :type _tp: list(Token) :ivar _tp: List of true positives :type _fp: list(Token) :ivar _fp: List of false positives :type _fn: list(Token) :ivar _fn: List of false negatives :type _tp_num: int :ivar _tp_num: Number of true positives :type _fp_num: int :ivar _fp_num: Number of false positives :type _fn_num: int :ivar _fn_num: Number of false negatives. c t|_t|_t|_t|_t|_|j dd|_|j dd|_|j dd|_ |j dd|_ d|_ d|_ d|_ d|_d|_d|_d |_y) Nmax_tp_examplesdmax_fp_examplesmax_fn_examples chunk_labelz.*rgF)set_correct_guessed_tp_fp_fnget_max_tp_max_fp_max_fn _chunk_label_tp_num_fp_num_fn_num_count _tags_correct _tags_total_measuresNeedUpdate)selfkwargss r__init__zChunkScore.__init__rs  555zz"3S9 zz"3S9 zz"3S9 "JJ}d;     #( rc||jr|j|jz|_|j|jz |_|j|jz |_t |j|_t |j |_t |j|_ d|_yy)NF) r-rrrr!r lenr'r(r)r.s r_updateMeasureszChunkScore._updateMeasuress  # #}}t}}4DH}}t}}4DH}}t}}4DHtxx=DLtxx=DLtxx=DL',D $ $rc |xjt||j|jzc_|xjt||j|jzc_|xjdz c_d|_ t |}t |}|xjt|z c_|xjtdt||Dz c_ y#t$rdx}}Y]wxYw)aU Given a correctly chunked sentence, score another chunked version of the same sentence. :type correct: chunk structure :param correct: The known-correct ("gold standard") chunked sentence. :type guessed: chunk structure :param guessed: The chunked sentence to be scored. Tc32K|]\}}||k(s dyw)r6Nr7).0tgs r z#ChunkScore.score..s" 1aqAvA" s N) r _chunksetsr*r&rr-r ValueErrorr,r2r+sumzip)r.correctguessed correct_tags guessed_tagss rscorezChunkScore.scores GT[[$:K:KLL  GT[[$:K:KLL  q #'  -)'2L)'2L C -- c"  l;"     -+- ,L<  -sC** C:9C:cT|jdk(ry|j|jz S)z Return the overall tag-based accuracy for all text that have been scored by this ``ChunkScore``, using the IOB (conll2000) tag encoding. :rtype: float rr6)r,r+r3s rrzChunkScore.accuracys,   q !!D$4$444rc~|j|j|jz}|dk(ry|j|z S)z Return the overall precision for all texts that have been scored by this ``ChunkScore``. :rtype: float r)r4r'r(r.divs r precisionzChunkScore.precision; llT\\) !8<<#% %rc~|j|j|jz}|dk(ry|j|z S)z Return the overall recall for all texts that have been scored by this ``ChunkScore``. :rtype: float rr4r'r)rHs rrecallzChunkScore.recallrKrc|j|j}|j}|dk(s|dk(ryd||z d|z |z zz S)a Return the overall F measure for all texts that have been scored by this ``ChunkScore``. :param alpha: the relative weighting of precision and recall. Larger alpha biases the score towards the precision value, while smaller alpha biases the score towards the recall value. ``alpha`` should have a value in the range [0,1]. :type alpha: float :rtype: float rr6)r4rJrN)r.alphaprs r f_measurezChunkScore.f_measuresS  NN  KKM 6Q!VEAIUa/00rc||jt|j}|Dcgc]}|d c}Scc}w)z Return the chunks which were included in the correct chunk structures, but not in the guessed chunk structures, listed in input order. :rtype: list of chunks r6)r4listr!r.chunkscs rmissedzChunkScore.misseds5 dhh$%!%%% 9c||jt|j}|Dcgc]}|d c}Scc}w)z Return the chunks which were included in the guessed chunk structures, but not in the correct chunk structures, listed in input order. :rtype: list of chunks r6)r4rUr rVs r incorrectzChunkScore.incorrects5 dhh$%!%%%rZc\t|j}|Dcgc]}|d c}Scc}w)z Return the chunks which were included in the correct chunk structures, listed in input order. :rtype: list of chunks r6)rUrrVs rrAzChunkScore.correct*dmm$$%!%%% )c\t|j}|Dcgc]}|d c}Scc}w)z Return the chunks which were included in the guessed chunk structures, listed in input order. :rtype: list of chunks r6)rUrrVs rrBzChunkScore.guessedr^r_cT|j|j|jzS)NrMr3s r__len__zChunkScore.__len__s! ||dll**rc6dtt|zdzS)z` Return a concise representation of this ``ChunkScoring``. :rtype: str z)reprr2r3s r__repr__zChunkScore.__repr__s #T#d)_4zAArcdd|jdzddzd|jdzddzd|jdzddzd|jdzdd zS) a- Return a verbose representation of this ``ChunkScoring``. This representation includes the precision, recall, and f-measure scores. For other information about the score, use the accessor methods (e.g., ``missed()`` and ``incorrect()``). :rtype: str zChunkParse score: z IOB Accuracy: rz5.1fz% z Precision: z Recall: z F-Measure: %)rrJrNrSr3s r__str__zChunkScore.__str__s ""4==?S#8">cB C"4>>#3c#9$"?sC D#4;;=3#6t">#3c#9$"?qA  B rN)g?)__name__ __module__ __qualname____doc__r0r4rErrJrNrSrYr\rArBrbrerhr7rrrr3sO<|)&- : 5 & &1& & &&&+B rrc d}g}|D]{}t|trdtj||j r#|j ||f|j f|t|jz }w|dz }}t|S)Nrr6) isinstancerrematchlabelappendfreezer2leavesr)r:countrposrWchilds rr=r=2s~ C F eT "xx U[[]3 s|U\\^<= 3u||~& &C 1HC  v;rSctjd}t|gg}|j|D] }|j } | ddk(r]t |dk7rt d|jdt|g} |dj| |j| y| ddk(rstartrrpoprr) sr root_labelsep source_tagset target_tagsetWORD_OR_BRACKETstackrptextchunkwordtags r tagstr2treer?s](jj!45O *b ! "E ))!,.{{} 7c>5zQ #8q8I!JKKb)E "I  U # LL  !W^5zQ #8q8I!JKK IIK{b   &%dC0 c ]!-DCb   $-'.* 5zQ.s1vaj9:: 8Orz(\S+)\s+(\S+)\s+([IOB])-?(\S+)?c0t|gg}t|jdD]\}}|jstj |}|t d|d|j\}}} } || |vrd} | dk(xr| |djk7} | dvs| rt|dk(r|j| d k(s| r1t| g} |dj| |j| |dj||f|d S) a* Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default). :param s: The CoNLL string to be converted. :type s: str :param chunk_types: The chunk types to be converted. :type chunk_types: tuple :param root_label: The node label to use for the root. :type root_label: str :rtype: Tree  zError on line r{OIr|BOr~Br) r enumeratesplitstrip_LINE_RErpr>groupsrqr2rrr) r chunk_typesrrlinenolinerprrstate chunk_type mismatch_Irs r conllstr2treerus%$*b ! "E!!''$-0& zz| t$ =~fQZ89 9).&sE:  "z'DEc\EjE"IOO4E&E D=J5zQ  C<:R(E "I  U # LL  b $%9&< 8Orcg}|D]V} |j}d}|D]<}t|tr td|j |d|d||zfd}>X|S#t $r|j |d|ddfYwxYw)z Return a list of 3-tuples containing ``(word, tag, IOB-tag)``. Convert a tree to the CoNLL IOB tag format. :param t: The tree to be converted. :type t: Tree :rtype: list(tuple) B-z7Tree is too deeply nested to be printed in CoNLL formatrr6I-r)rqrnrr>rrAttributeError)r:tagsrwcategoryprefixcontentss rr r s D 3 3{{}HF! h-$Q Xa[(1+v7HIJ   3 K 3 KKq58S1 2 3sAA  #BBcRt|g}|D]\}}}|!|r td|j||f+|jdr"|jt|dd||fg^|jdrt |dk(s,t |dtr|dj |ddk7r/|r td|jt|dd||fg|dj||f|dk(r|j||f td ||S) z1 Convert the CoNLL IOB format to a tree. NzBad conll tag sequencerr~rrr|rzBad conll tag )rr>rr startswithr2rnrq)sentencerrstricttreerpostagchunktags rconlltags2treers0  B D"*<fh   !9:: T6N+   & KKXab\T6N+;< =   &D Q!$r(D18>>#x|3$%=>>KKXab\T6N3C DERv/ _ KKv '~h\:; ;3<4 Krc|t|Dcgc]}dj|}}dj|Scc}w)z Return a multiline string where each line contains a word, tag and IOB tag. Convert a tree to the CoNLL IOB string format :param t: The tree to be converted. :type t: Tree :rtype: str  r)r join)r:tokenliness r tree2conllstrrs8+9*; <SXXe_ \s*(\s*(?P.+?)\s*\s*)?(\s*(?P.+?)\s*\s*)?(\s*(?P.+?)\s*\s*)?\s*(\s*(?P.+?)\s*\s*)?(?P.*?)\s*\s*\s*z#]*?type="(?P\w+)"czt|gg}|gStjd|D]}|j} |j drdt j |}| td|t|jdg}|dj||j|n6|j dr|jn|dj|t|d k7r td |d S#ttf$r$}td|jdd |d}~wwxYw) Nz<[^>]+>|[^\s<]+zrr2)rrrpiece_mpiecemres r_ieer_read_textrs1 *b ! "E y ;;115  &!''.9&%(QWWV_b1b   ' U#!!%( b   '!* 5zQ*++ 8O J' 6w}}q6IK  sB+DD:D55D:) LOCATION ORGANIZATIONPERSONDURATIONDATECARDINALPERCENTMONEYMEASUREc tj|}|rgt|jd||jd|jd|jdt|jd|dSt||S)ap Return a chunk structure containing the chunked tagged text that is encoded in the given IEER style string. Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE. :rtype: Tree rdocnodoctype date_timeheadline)rrrrr) _IEER_DOC_RErprr)rrrrs r ieerstr2treer's{8 1A#AGGFOZ@WWW%wwy)-( (;ZH  q*--rc.d}ddl}|jj|d}|jt d}t |d}|jt dt |jj |t y) Nzd[ Pierre/NNP Vinken/NNP ] ,/, [ 61/CD years/NNS ] old/JJ ,/, will/MD join/VB [ the/DT board/NN ] ./.rNP)rav These DT B-NP research NN I-NP protocols NNS I-NP offer VBP B-VP to TO B-PP the DT B-NP patient NN I-NP not RB O only RB O the DT B-NP very RB I-NP best JJS I-NP therapy NN I-NP which WDT B-NP we PRP B-NP have VBP B-VP established VBN I-VP today NN B-NP but CC B-NP also RB I-NP the DT B-NP hope NN I-NP of IN B-PP something NN B-NP still RB B-ADJP better JJR I-ADJP . . O )rPP)rz CoNLL output:)nltkrrpprintrrr)rrr: conll_trees rdemorRsxnA qd3AHHJ G A<ql;J / $** " ": ./ Gr__main__)rrx/NN)rrVPrx)rrxF)ro nltk.metricsrr nltk.tag.mappingr nltk.tag.utilr nltk.treerrr=rrrrr rrDOTALLrrrrrrir7rrrs .$#+