JL i>dZddlZddlmZmZddlmZdZejZ dZ dZ ddl mZdZ d Z d Z Gd d eZGddeZGddeZGddeZGddZy#e$rd ZYKwxYw)z Provides scoring functions for a number of association measures through a generic, abstract implementation in ``NgramAssocMeasures``, and n-specific ``BigramAssocMeasures`` and ``TrigramAssocMeasures``. N)ABCMetaabstractmethodreducec,tj|SN)_mathlog2)xs ^/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/metrics/association.pyr s%**Q-ctd|S)Nc ||zSr)r ys r r z..s Qrr)ss r r r sV.2rg#B ;) fisher_exactctrNotImplementedError)_args_kwargss r rrs!!rceZdZdZdZeedZeedZe dZ edZ e dZ e dZ ed Ze d Ze d Ze d Ze d Zy)NgramAssocMeasuresa An abstract class defining a collection of generic association measures. Each public method returns a score, taking the following arguments:: score_fn(count_of_ngram, (count_of_n-1gram_1, ..., count_of_n-1gram_j), (count_of_n-2gram_1, ..., count_of_n-2gram_k), ..., (count_of_1gram_1, ..., count_of_1gram_n), count_of_total_words) See ``BigramAssocMeasures`` and ``TrigramAssocMeasures`` Inheriting classes should define a property _n, and a method _contingency which calculates contingency values from marginals in order for all association measures defined here to be usable. rctd)z>Calculates values of a contingency table from marginal values.?The contingency table is not availablein the general ngram caser marginalss r _contingencyzNgramAssocMeasures._contingencyB" P  rctd)ACalculates values of contingency table marginals from its values.rr) contingencys r _marginalszNgramAssocMeasures._marginalsJr#rc#Kt}tjDcgc]}d|z }}ttD]-t fd|D|jdz zz /ycc}ww)3Calculates expected values for a contingency table.c3xK|]0tfdtdjzD2yw)c3@K|]}|zzk(s|ywrr).0r contijs r z@NgramAssocMeasures._expected_values...]s$PAa!eQ=OQPs N)sumrange_n)r-r0clsr.r/s @r r1z6NgramAssocMeasures._expected_values..\s1Pq#&&y)9PPs6:N)r3r4r5len_product)r6r.n_allr/bitss`` ` r _expected_valuesz#NgramAssocMeasures._expected_valuesRsD  %cff .1Q..s4y! A!SVVaZ( *  /s#B A;A Bc(|t|tz S)z Scores ngrams by their frequency)NGRAMTOTALr s r raw_freqzNgramAssocMeasures.raw_freqcs)E"222rc|tt|t|t|jdz zz z |tt zdzz S)zScores ngrams using Student's t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1. r*g?)r=r8UNIGRAMSr>r5_SMALLr6r!s r student_tzNgramAssocMeasures.student_thsR e y*+y/?CFFQJ/OP Q u  &3 ./ /rcz|j|}|j|}tdt||DS)zZScores ngrams using Pearson's chi-square as in Manning and Schutze 5.3.3. c3FK|]\}}||z dz|tzz yw)r2N)rBr-obsexps r r1z,NgramAssocMeasures.chi_sq..ys&UcC#I!#sV|4Us!)r"r;r3zip)r6r!r.expss r chi_sqzNgramAssocMeasures.chi_sqrs=  s+##D)USt_UUUrc`|t|jddzt|tz S)zScores ngrams using a variant of mutual information. The keyword argument power sets an exponent (default 3) for the numerator. No logarithm of the result is calculated. power)r=getr8rA)r!kwargss r mi_likezNgramAssocMeasures.mi_like{s5 6::gq#99H h =   rct|t|t|jdz zztt |t z S)z^Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4. r*)_log2r=r>r5r8rArCs r pmizNgramAssocMeasures.pmisG Yu% %(8SVVaZ(HHIE Yx( )M   rc ||j|}dtdt||j|DzS)zFScores ngrams using likelihood ratios as in Manning and Schutze 5.3.4.r2c3`K|]&\}}|t||tzz tzz(ywr)_lnrBrGs r r1z6NgramAssocMeasures.likelihood_ratio..s4 S #cS6\*V34 4 s,.)r"r3rJr;r6r!r.s r likelihood_ratioz#NgramAssocMeasures.likelihood_ratiosI s+3 c&:&:4&@A    rct|t|t|jdz zz }|tt |t|z dz zS)z1Scores ngrams using the Poisson-Stirling measure.r*)r8rAr>r5r=rT)r6r!rIs r poisson_stirlingz#NgramAssocMeasures.poisson_stirlingsNy*+y/?CFFQJ/OP55)9C)?#@1#DEErcH|j|}|dt|ddz S)z&Scores ngrams using the Jaccard index.rNr)r"r3rYs r jaccardzNgramAssocMeasures.jaccards/ s+AwT#2Y''rN)__name__ __module__ __qualname____doc__r5 staticmethodrr"r' classmethodr;r?rDrLrRrUrZr\r^rrr rr-s$ B     33//VV      FF ((rr) metaclassceZdZdZdZedZedZedZe dZ e dZ e dZ ed Z y ) BigramAssocMeasuresa A collection of bigram association measures. Each association measure is provided as a function with three arguments:: bigram_score_fn(n_ii, (n_ix, n_xi), n_xx) The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example: - n_ii counts ``(w1, w2)``, i.e. the bigram being scored - n_ix counts ``(w1, *)`` - n_xi counts ``(*, w2)`` - n_xx counts ``(*, *)``, i.e. any bigram This may be shown with respect to a contingency table:: w1 ~w1 ------ ------ w2 | n_ii | n_oi | = n_xi ------ ------ ~w2 | n_io | n_oo | ------ ------ = n_ix TOTAL = n_xx r2c>|\}}||z }||z }|||||z |z |z fS)zECalculates values of a bigram contingency table from marginal values.r)n_ii n_ix_xi_tuplen_xxn_ixn_xin_oin_ios r r"z BigramAssocMeasures._contingencys<% td{d{dD$+"4t";<>> TrigramAssocMeasures._contingency(1, (1, 1, 1), (1, 73, 1), 2000) (1, 0, 0, 0, 0, 72, 0, 1927) r)n_iii n_iix_tuple n_ixx_tuplen_xxxn_iixn_ixin_xiin_ixxn_xixn_xxin_oiin_ioin_iion_ooin_oion_ioon_ooos r r"z!TrigramAssocMeasures._contingencys!,u +u    %- %- %- %-5=EMueUE5%GGrc|\}}}}}}}}|||z||z||zf||z|z|z||z|z|z||z|z|zft|fS)zCalculates values of contingency table marginals from its values. >>> TrigramAssocMeasures._marginals(1, 0, 0, 0, 0, 72, 0, 1927) (1, (1, 1, 1), (1, 73, 1), 2000) r3) r&rrrrrrrrs r r'zTrigramAssocMeasures._marginals&s BM>ueUE5%  U]EEM55= 9 %- %- %-     rNr_r`rarbr5rcr"r'rrr rrs6& BHH$  rrc4eZdZdZdZedZedZy)QuadgramAssocMeasuresaF A collection of quadgram association measures. Each association measure is provided as a function with five arguments:: trigram_score_fn(n_iiii, (n_iiix, n_iixi, n_ixii, n_xiii), (n_iixx, n_ixix, n_ixxi, n_xixi, n_xxii, n_xiix), (n_ixxx, n_xixx, n_xxix, n_xxxi), n_all) The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example: - n_iiii counts ``(w1, w2, w3, w4)``, i.e. the quadgram being scored - n_ixxi counts ``(w1, *, *, w4)`` - n_xxxx counts ``(*, *, *, *)``, i.e. any quadgram rsc|\}}}}|\} } } } } }|\}}}}||z }||z }||z }| |z |z |z }| |z |z |z }| |z |z |z }||z |z |z |z |z |z |z }||z }||z |z |z }| |z |z |z }||z |z |z |z |z |z |z }| |z |z |z }||z |z |z |z |z |z |z }||z |z |z |z |z |z |z } ||z |z |z |z |z |z |z |z |z |z |z |z |z |z | z }!||||||||||||||| |!fS)zXCalculates values of a quadgram contingency table from marginal values. r)"n_iiii n_iiix_tuple n_iixx_tuple n_ixxx_tuplen_xxxxn_iiixn_iixin_ixiin_xiiin_iixxn_ixixn_ixxin_xixin_xxiin_xiixn_ixxxn_xixxn_xxixn_xxxin_oiiin_ioiin_iioin_ooiin_oioin_iooin_oooin_iiion_oiion_ioion_ooion_iioon_oioon_iooon_oooos" r r"z"QuadgramAssocMeasures._contingencyPs> ,8(;G8+7(&&&&6)F2&6)F2&6)F2&6)F2V;fDvMPVV&&6)F2&6)F2&6)F2V;fDvMPVV&6)F2&6)F2V;fDvMPVV&6)F2V;fDvMPVV                      (                !  rc|\}}}}}}}}} } } } } }}}|| z}||z}||z}||z}||z| z| z}||z| z| z}||z|z|z}||z|z|z}||z|z|z}||z| z| z}||z|z| z|z| z| z|z}||z|z| z|z| z| z|z}||z|z| z|z| z| z| z}||z|z|z|z|z|z|z}t|}|||||f||||||f||||f|fS)aCalculates values of contingency table marginals from its values. QuadgramAssocMeasures._marginals(1, 0, 2, 46, 552, 825, 2577, 34967, 1, 0, 2, 48, 7250, 9031, 28585, 356653) (1, (2, 553, 3, 1), (7804, 6, 3132, 1378, 49, 2), (38970, 17660, 100, 38970), 440540) r) r&rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr9s r r'z QuadgramAssocMeasures._marginalss. #                &&&&&6)F2&6)F2&6)F2&6)F2&6)F2&6)F2&6)F2V;fDvMPVV&6)F2V;fDvMPVV&6)F2V;fDvMPVV&6)F2V;fDvMPVVK   VVV , VVVVV < VVV ,    rNrrrr rr9s5( B9 9 v1 1 rrc&eZdZdZdZedZy)ContingencyMeasureszWraps NgramAssocMeasures classes such that the arguments of association measures are contingency table values rather than marginals. cd|jjz|j_t|D]P}|jdrt ||}|jds|j ||}t |||Ry)zAConstructs a ContingencyMeasures given a NgramAssocMeasures class Contingency___N) __class__r_dir startswithgetattr_make_contingency_fnsetattr)selfmeasureskvs r __init__zContingencyMeasures.__init__sw"/(2D2D2M2M"MX A||D!!$A<<$--h: D!Q   rcZfd}j|_j|_|S)zFrom an association measure function, produces a new function which accepts contingency table values as its arguments. c(j|Sr)r')r&rold_fns r resz5ContingencyMeasures._make_contingency_fn..ress.8.. <= =r)rbr_)rrrs`` r rz(ContingencyMeasures._make_contingency_fns%  >nn   rN)r_r`rarbrrcrrrr rrs     rr)rbmathr abcrr functoolsrrTlogrXr8rB scipy.statsr ImportErrorr=rAr>rrgrrrrrr rs ' ii 2 "( ) 7 9t(7t(nV(,V(r9 -9 xE .E PM """sA11A<;A<