JL idZ ddlZddlZddlZddlmZddlmZddl m Z m Z m Z ddl mZmZmZddlmZmZmZddlmZdd lmZdd lmZd ZGd d eZeZGddZGddeZGddeZ Gdde Z!Gdde Z"GddeZ# d(dZ$dZ%dZ& d(dZ'dZ(dZ) d)d Z*Gd!d"eZ+d#Z,d*d$Z-d%Z.d&Z/e0d'k(re/yy#e$rYwxYw)+a A classifier model based on maximum entropy modeling framework. This framework considers all of the probability distributions that are empirically consistent with the training data; and chooses the distribution with the highest entropy. A probability distribution is "empirically consistent" with a set of training data if its estimated frequency with which a class and a feature vector value co-occur is equal to the actual frequency in the data. Terminology: 'feature' ====================== The term *feature* is usually used to refer to some property of an unlabeled token. For example, when performing word sense disambiguation, we might define a ``'prevword'`` feature whose value is the word preceding the target word. However, in the context of maxent modeling, the term *feature* is typically used to refer to a property of a "labeled" token. In order to prevent confusion, we will introduce two distinct terms to disambiguate these two different concepts: - An "input-feature" is a property of an unlabeled token. - A "joint-feature" is a property of a labeled token. In the rest of the ``nltk.classify`` module, the term "features" is used to refer to what we will call "input-features" in this module. In literature that describes and discusses maximum entropy models, input-features are typically called "contexts", and joint-features are simply referred to as "features". Converting Input-Features to Joint-Features ------------------------------------------- In maximum entropy models, joint-features are required to have numeric values. Typically, each input-feature ``input_feat`` is mapped to a set of joint-features of the form: | joint_feat(token, label) = { 1 if input_feat(token) == feat_val | { and label == some_label | { | { 0 otherwise For all values of ``feat_val`` and ``some_label``. This mapping is performed by classes that implement the ``MaxentFeatureEncodingI`` interface. N) defaultdict) ClassifierI) call_megamparse_megam_weightswrite_megam_file) call_tadmparse_tadm_weightswrite_tadm_file) CutoffCheckeraccuracylog_likelihood)gzip_open_unicode)DictionaryProbDist) OrderedDictz epytext encxeZdZdZddZdZdZdZdZdZ ddZ dd Z dd Z d Z gd Ze ddZy )MaxentClassifiera A maximum entropy classifier (also known as a "conditional exponential classifier"). This classifier is parameterized by a set of "weights", which are used to combine the joint-features that are generated from a featureset by an "encoding". In particular, the encoding maps each ``(featureset, label)`` pair to a vector. The probability of each label is then computed using the following equation:: dotprod(weights, encode(fs,label)) prob(fs|label) = --------------------------------------------------- sum(dotprod(weights, encode(fs,l)) for l in labels) Where ``dotprod`` is the dot product:: dotprod(a,b) = sum(x*y for (x,y) in zip(a,b)) cj||_||_||_|jt |k(sJy)a{ Construct a new maxent classifier model. Typically, new classifier models are created using the ``train()`` method. :type encoding: MaxentFeatureEncodingI :param encoding: An encoding that is used to convert the featuresets that are given to the ``classify`` method into joint-feature vectors, which are used by the maxent classifier model. :type weights: list of float :param weights: The feature weight vector for this classifier. :type logarithmic: bool :param logarithmic: If false, then use non-logarithmic weights. N) _encoding_weights _logarithmiclengthlen)selfencodingweights logarithmics Z/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/classify/maxent.py__init__zMaxentClassifier.__init__as3"" ' CL000c6|jjSN)rlabelsrs rr"zMaxentClassifier.labelsxs~~$$&&rcb||_|jjt|k(sJy)z Set the feature weight vector for this classifier. :param new_weights: The new feature weight vector. :type new_weights: list of float N)rrrr)r new_weightss r set_weightszMaxentClassifier.set_weights{s+ $ ~~$$&#k*::::rc|jS)zg :return: The feature weight vector for this classifier. :rtype: list of float )rr#s rrzMaxentClassifier.weightss }}rc@|j|jSr!) prob_classifymax)r featuresets rclassifyzMaxentClassifier.classifys!!*-1133rc^i}|jjD]w}|jj||}|jr'd}|D]\}}||j||zz }|||<Rd}|D]\}}||j||zz}|||<yt ||jdS)Ng?T)log normalize)rr"encoderrr) rr+ prob_dictlabelfeature_vectortotalf_idf_valprods rr)zMaxentClassifier.prob_classifys ^^**, (E!^^22:uEN  #19KD%T]]40588E9#( % #19KD%DMM$/588D9#' %  ( ")1B1BdSSrc t d}dt|dz zdz}j| t j jd}|d|}t dj |d jd |Dzt d d |dz d t|zzzzttt|D]\}}jj||}|jfdd|D]\} } jrj | | z} nj | | z} jj#| } | j%dd} | d| zz } t| dkDr| dddz} t || |d zdz| fz|xx| z cc<t d d |dz d t|zzzzt dj |d jfd|Dzt dj |d j fd|Dzy)z Print a table showing the effect of each of the features in the given feature set, and how they combine to determine the probabilities of each label for that featureset. 2z %-zs%s%8.3fTkeyreverseNz Featurec32K|]}dd|zddzyw)z%8s%sN).0ls r z+MaxentClassifier.explain..s?1eq"1~.?sz -c:tj|dS)Nrabsr)fid__rs rz*MaxentClassifier.explain..s#dmmE!H&=">r and label is rz (%s)/,z... z TOTAL:c3.K|] }d|zywz%8.3fNrC)rDrEsumss rrFz+MaxentClassifier.explain..s3V!Gd1g4E3Vsz PROBS:c3FK|]}dj|zywrT)prob)rDrEpdists rrFz+MaxentClassifier.explain..s>!g 1 ->s!)strr)sortedsamplesrWprintljustjoinrrint enumeraterr1sortrrdescribesplit)rr+columns descr_widthTEMPLATEr"ir3r4r6r7scoredescrrXrUs` @@rexplainzMaxentClassifier.explains6  C a00:="":. UZZF!   k *gg??? @  dSK!Oa#f+o=>>?3!&) %HAu!^^22:uEN   >   . % e$$ MM$/%7E MM$/58E//5 $45a85(u:?!#2J.Eh%Qe!<<=U u$  %  %" dSK!Oa#f+o=>>?   [ )BGG3Vv3V,V V     [ )gg>v>> ? rc tdrjd|Sttt t j fdd_jd|S)zW Generates the ranked list of informative features from most to least. _most_informative_featuresNc4tj|Sr!rJ)fidrs rrMz.sDMM#$6 7rTr<)hasattrrlrZlistrangerr)rns` rmost_informative_featuresz*MaxentClassifier.most_informative_featuressa 45 6222A6 6.4U3t}}-./7/D + 222A6 6rcZ|jd}|dk(r#|Dcgc]}|j|dkDs|}}n'|dk(r"|Dcgc]}|j|dks|}}|d|D]9}t|j|dd|jj |;ycc}wcc}w)z :param show: all, neg, or pos (for negative-only or positive-only) :type show: str :param n: The no. of top features :type n: int Nposrnegz8.3frQ)rsrr\rrb)rrrshowfidsrns rshow_most_informative_featuresz/MaxentClassifier.show_most_informative_featuress--d3 5=#'BC4==+=+ACBDB U]#'BC4==+=+ACBDB8 OC T]]3'-Qt~~/F/Fs/K.LM N OCBsB#B#B(B(cdt|jj|jjfzS)Nz:)rrr"rr#s r__repr__zMaxentClassifier.__repr__s:K %%' ( NN ! ! #O   r)GISIISMEGAMTADMNc T|d}|D]}|dvstd|z|j}|dk(rt||||fi|S|dk(rt||||fi|S|dk(rt |||||fi|S|dk(r,|} || d<|| d<|| d <|| d <t j |fi| Std |z) a Train a new maxent classifier based on the given corpus of training samples. This classifier will have its weights chosen to maximize entropy while remaining empirically consistent with the training corpus. :rtype: MaxentClassifier :return: The new maxent classifier :type train_toks: list :param train_toks: Training data, represented as a list of pairs, the first member of which is a featureset, and the second of which is a classification label. :type algorithm: str :param algorithm: A case-insensitive string, specifying which algorithm should be used to train the classifier. The following algorithms are currently available. - Iterative Scaling Methods: Generalized Iterative Scaling (``'GIS'``), Improved Iterative Scaling (``'IIS'``) - External Libraries (requiring megam): LM-BFGS algorithm, with training performed by Megam (``'megam'``) The default algorithm is ``'IIS'``. :type trace: int :param trace: The level of diagnostic tracing output to produce. Higher values produce more verbose output. :type encoding: MaxentFeatureEncodingI :param encoding: A feature encoding, used to convert featuresets into feature vectors. If none is specified, then a ``BinaryMaxentFeatureEncoding`` will be built based on the features that are attested in the training corpus. :type labels: list(str) :param labels: The set of possible labels. If none is given, then the set of all labels attested in the training data will be used instead. :param gaussian_prior_sigma: The sigma value for a gaussian prior on model weights. Currently, this is supported by ``megam``. For other algorithms, its value is ignored. :param cutoffs: Arguments specifying various conditions under which the training should be halted. (Some of the cutoff conditions are not supported by some algorithms.) - ``max_iter=v``: Terminate after ``v`` iterations. - ``min_ll=v``: Terminate after the negative average log-likelihood drops under ``v``. - ``min_lldelta=v``: Terminate if a single iteration improves log likelihood by less than ``v``. iis) max_itermin_ll min_lldeltamax_acc min_accdelta count_cutoffnormexplicit bernoullizUnexpected keyword arg %rgismegamtadmtracerr"gaussian_prior_sigmazUnknown algorithm %s) TypeErrorlower train_maxent_classifier_with_iis train_maxent_classifier_with_gis"train_maxent_classifier_with_megamTadmMaxentClassifiertrain ValueError) cls train_toks algorithmrrr"rcutoffsr=kwargss rrzMaxentClassifier.trains&|  I CC   ;c ABB COO%  3E8V7> % 3E8V7> ' !5E8V5IMT & F#F7O!)F: %F8 -AF) *'--jCFC C3i?@ @r)T)) )rall)NNNr)__name__ __module__ __qualname____doc__rr"r&rr,r)rjrsryr{ ALGORITHMS classmethodrrCrrrrNsj$1.';4T(* X 7O  1JaAaArrc.eZdZdZdZdZdZdZdZy)MaxentFeatureEncodingIa A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. This conversion is necessary to translate featuresets into a format that can be used by maximum entropy models. The set of joint-features used by a given encoding is fixed, and each index in the generated joint-feature vectors corresponds to a single joint-feature. The length of the generated joint-feature vectors is therefore constant (for a given encoding). Because the joint-feature vectors generated by ``MaxentFeatureEncodingI`` are typically very sparse, they are represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature. Feature encodings are generally created using the ``train()`` method, which generates an appropriate encoding based on the input-feature values and labels that are present in a given corpus. ct)aC Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature. :type featureset: dict :rtype: list(tuple(int, int)) NotImplementedErrorrr+r3s rr1zMaxentFeatureEncodingI.encode{ "##rct)z :return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int rr#s rrzMaxentFeatureEncodingI.length "##rct)z :return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list rr#s rr"zMaxentFeatureEncodingI.labelss "##rct)z :return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str rrrns rrbzMaxentFeatureEncodingI.describerrct)ao Construct and return new feature encoding, based on a given training corpus ``train_toks``. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label. r)rrs rrzMaxentFeatureEncodingI.trainrrN) rrrrr1rr"rbrrCrrrrds , $$$$ $rrc.eZdZdZdZdZdZdZdZy)#FunctionBackedMaxentFeatureEncodingz A feature encoding that calls a user-supplied function to map a given featureset/label pair to a sparse joint-feature vector. c.||_||_||_y)ag Construct a new feature encoding based on the given function. :type func: (callable) :param func: A function that takes two arguments, a featureset and a label, and returns the sparse joint feature vector that encodes them:: func(featureset, label) -> feature_vector This sparse joint feature vector (``feature_vector``) is a list of ``(index,value)`` tuples. :type length: int :param length: The size of the fixed-length joint-feature vectors that are generated by this encoding. :type labels: list :param labels: A list of the "known labels" for this encoding -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. N)_length_func_labels)rfuncrr"s rrz,FunctionBackedMaxentFeatureEncoding.__init__s0   rc&|j||Sr!)rrs rr1z*FunctionBackedMaxentFeatureEncoding.encodeszz*e,,rc|jSr!rr#s rrz*FunctionBackedMaxentFeatureEncoding.length ||rc|jSr!rr#s rr"z*FunctionBackedMaxentFeatureEncoding.labelsrrcy)Nzno description availablerCrs rrbz,FunctionBackedMaxentFeatureEncoding.describes)rN) rrrrrr1rr"rbrCrrrrs  8-*rrcBeZdZdZd dZdZdZdZdZe d dZ y) BinaryMaxentFeatureEncodinga A feature encoding that generates vectors containing a binary joint-features of the form: | joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label) | { | { 0 otherwise Where ``fname`` is the name of an input-feature, ``fval`` is a value for that input-feature, and ``label`` is a label. Typically, these features are constructed based on a training corpus, using the ``train()`` method. This method will create one feature for each combination of ``fname``, ``fval``, and ``label`` that occurs at least once in the training corpus. The ``unseen_features`` parameter can be used to add "unseen-value features", which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form: | joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname]) | { and l == label | { | { 0 otherwise Where ``is_unseen(fname, fval)`` is true if the encoding does not contain any joint features that are true when ``fs[fname]==fval``. The ``alwayson_features`` parameter can be used to add "always-on features", which have the form:: | joint_feat(fs, l) = { 1 if (l == label) | { | { 0 otherwise These always-on features allow the maxent model to directly model the prior probabilities of each label. ct|jttt|k7r t dt ||_ ||_ t||_ d|_ d|_ |rYt|Dcic]\}}|||jzc}}|_ |xjt|jz c_|rg|Dchc]\}}}| } }}}t| Dcic]\}}|||jzc}}|_ |xjt| z c_yycc}}wcc}}}wcc}}w)a :param labels: A list of the "known labels" for this encoding. :param mapping: A dictionary mapping from ``(fname,fval,label)`` tuples to corresponding joint-feature indexes. These indexes must be the set of integers from 0...len(mapping). If ``mapping[fname,fval,label]=id``, then ``self.encode(..., fname:fval, ..., label)[id]`` is 1; otherwise, it is 0. :param unseen_features: If true, then include unseen value features in the generated joint-feature vectors. :param alwayson_features: If true, then include always-on features in the generated joint-feature vectors. HMapping values must be exactly the set of integers from 0...len(mapping)N setvaluesrqrrrpr_mappingr _alwayson_unseenr` rr"mappingunseen_featuresalwayson_featuresrgr3fnamefvalfnamess rrz$BinaryMaxentFeatureEncoding.__init__-" w~~ Cc'l(;$< <8  F| ( 97| <, , :CF:K,6Qq4<<''DN LLC/ /L 8?@@ 4ee@F@FOPVFWX EE1t||#33XDL LLCK 'L   AX E E?Ecg}|jD]\}}|||f|jvr$|j|j|||fdf;|jsH|jD]}|||f|jvsk||jvs{|j|j|df|j r.||j vr |j|j |df|SNrR)itemsrappendrrrrr+r3rrrlabel2s rr1z"BinaryMaxentFeatureEncoding.encode6s&++- BKE4tU#t}}4udE/A!BA FG"llBFtV, =B  , e)>et~~5 OOT^^E2A6 7rct|ts td |j|t |j kr|j|\}}}|d|d|S|jrK||jjvr/|jjD]\}}||k(s d|zcSy|jrK||jjvr/|jjD]\}}||k(s d|zcSytd#t$rSdgt |j z|_|j jD]\}}||j|<YIwxYwNzdescribe() expected an intz==rNz label is %rz %s is unseenzBad feature id isinstancer_r _inv_mappingAttributeErrorrrrrrrrrr6inforgrrr3f_id2s rrbz$BinaryMaxentFeatureEncoding.describeQj$$89 9 ,    #dmm$ $#'#4#4T#: UD%WBthnUI> > ^^(=(=(? ? $ 4 4 6 1 u5=(500 1\\ddll&9&9&;; $ 2 2 4 2 u5=)E11 2-. .# ,!#s4=='9 9D ==..0 ,a'+!!!$ , , DAE32E3c|jSr!rr#s rr"z"BinaryMaxentFeatureEncoding.labelsj ||rc|jSr!rr#s rrz"BinaryMaxentFeatureEncoding.lengthnrrNc Hi}t}tt}|D]u\}} |r| |vrtd| z|j | |j D]8\} } || | fxxdz cc<|| | f|k\s | | | f|vs(t ||| | | f<:w||}|||fi|S)a Construct and return new feature encoding, based on a given training corpus ``train_toks``. See the class description ``BinaryMaxentFeatureEncoding`` for a description of the joint-features that will be included in this encoding. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label. :type count_cutoff: int :param count_cutoff: A cutoff value that is used to discard rare joint-features. If a joint-feature's value is 1 fewer than ``count_cutoff`` times in the training corpus, then that joint-feature is not included in the generated encoding. :type labels: list :param labels: A list of labels that should be used by the classifier. If not specified, then the set of labels attested in ``train_toks`` will be used. :param options: Extra parameters for the constructor, such as ``unseen_features`` and ``alwayson_features``. Unexpected label %srR)rrr_raddrr rrrr"optionsr seen_labelscounttokr3rrs rrz!BinaryMaxentFeatureEncoding.trainrs8e C $ CJC%v- !6!>?? OOE " #yy{ C teTk"a'"%5tU+7:69'ltU 23 C C > F67.g..rFFrN rrrrrr1rbr"rrrrCrrrrs6&P/(b6/20/0/rrc<eZdZdZ ddZedZdZdZdZ y) GISEncodinga A binary feature encoding which adds one new joint-feature to the joint-features defined by ``BinaryMaxentFeatureEncoding``: a correction feature, whose value is chosen to ensure that the sparse vector always sums to a constant non-negative number. This new feature is used to ensure two preconditions for the GIS training algorithm: - At least one feature vector index must be nonzero for every token. - The feature vector must sum to a constant non-negative number for every token. Nc tj||||||$t|Dchc]\}}}| c}}}dz}||_ycc}}}w)a  :param C: The correction constant. The value of the correction feature is based on this value. In particular, its value is ``C - sum([v for (f,v) in encoding])``. :seealso: ``BinaryMaxentFeatureEncoding.__init__`` NrR)rrr_C) rr"rrrCrrr3s rrzGISEncoding.__init__sV $,, &'?4E  9w??3tUU?@1DA@s A c|jS)zOThe non-negative constant that all encoded feature vectors will sum to.)rr#s rrz GISEncoding.Cs wwrctj|||}tj|}td|D}||jk\r t d|j ||j|z f|S)Nc3&K|] \}}| ywr!rC)rDfvs rrFz%GISEncoding.encode..s-&1aA-z&Correction feature is not high enough!)rr1rsumrrr)rr+r3r base_lengthr5s rr1zGISEncoding.encodesp.55dJN188> -H-- DGG EF Fdggo67rc2tj|dzSr)rrr#s rrzGISEncoding.lengths*11$7!;;rc||tj|k(rd|jzStj||S)NzCorrection feature (%s))rrrrb)rr6s rrbzGISEncoding.describes8 .55d; ;,tww6 6.77dC Cr)FFN) rrrrrpropertyrr1rrbrCrrrrs7 RV  <Drrc>eZdZddZdZdZdZdZed dZ y) TadmEventMaxentFeatureEncodingct||_t|_tj |||j||yr!)rr_label_mappingrr)rr"rrrs rrz'TadmEventMaxentFeatureEncoding.__init__s6#G, )m#,, &$--:K rcg}|jD]\}}||f|jvr$t|j|j||f<||jvrBt |t s#t|j|j|<n||j|<|j |j||f|j|f|Sr!)rrrrrr_r)rr+r3rfeaturevalues rr1z%TadmEventMaxentFeatureEncoding.encodes(..0 NGUt}}425dmm2D w./D///!%-14T5H5H1ID''.16D''. OO/0$2E2Ee2LM  rc|jSr!rr#s rr"z%TadmEventMaxentFeatureEncoding.labelsrrc`|jD]\}}|j||f|k(s||fcSyr!)r)rrnr r3s rrbz'TadmEventMaxentFeatureEncoding.describes:"mm (NGU}}gu-.#5'' (rc,t|jSr!)rrr#s rrz%TadmEventMaxentFeatureEncoding.lengths4==!!rNc t}|sg}t|}|D]\}}||vs |j||D]*\}}|D] }|D]}||f|vs t||||f<",|||fi|Sr!)rrprr) rrrr"rrr+r3r s rrz$TadmEventMaxentFeatureEncoding.trains-F*% !+ % JF" e$ %", A J A)AG'w647L% 01A A A 67.g..rrr) rrrrr1r"rbrrrrCrrrrs/  ( "//rrcBeZdZdZd dZdZdZdZdZe d dZ y) TypedMaxentFeatureEncodingaZ A feature encoding that generates vectors containing integer, float and binary joint-features of the form: Binary (for string and boolean features): | joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label) | { | { 0 otherwise Value (for integer and float features): | joint_feat(fs, l) = { fval if (fs[fname] == type(fval)) | { and (l == label) | { | { not encoded otherwise Where ``fname`` is the name of an input-feature, ``fval`` is a value for that input-feature, and ``label`` is a label. Typically, these features are constructed based on a training corpus, using the ``train()`` method. For string and boolean features [type(fval) not in (int, float)] this method will create one feature for each combination of ``fname``, ``fval``, and ``label`` that occurs at least once in the training corpus. For integer and float features [type(fval) in (int, float)] this method will create one feature for each combination of ``fname`` and ``label`` that occurs at least once in the training corpus. For binary features the ``unseen_features`` parameter can be used to add "unseen-value features", which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form: | joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname]) | { and l == label | { | { 0 otherwise Where ``is_unseen(fname, fval)`` is true if the encoding does not contain any joint features that are true when ``fs[fname]==fval``. The ``alwayson_features`` parameter can be used to add "always-on features", which have the form: | joint_feat(fs, l) = { 1 if (l == label) | { | { 0 otherwise These always-on features allow the maxent model to directly model the prior probabilities of each label. ct|jttt|k7r t dt ||_ ||_ t||_ d|_ d|_ |rYt|Dcic]\}}|||jzc}}|_ |xjt|jz c_|rg|Dchc]\}}}| } }}}t| Dcic]\}}|||jzc}}|_ |xjt| z c_yycc}}wcc}}}wcc}}w)a :param labels: A list of the "known labels" for this encoding. :param mapping: A dictionary mapping from ``(fname,fval,label)`` tuples to corresponding joint-feature indexes. These indexes must be the set of integers from 0...len(mapping). If ``mapping[fname,fval,label]=id``, then ``self.encode({..., fname:fval, ...``, label)[id]} is 1; otherwise, it is 0. :param unseen_features: If true, then include unseen value features in the generated joint-feature vectors. :param alwayson_features: If true, then include always-on features in the generated joint-feature vectors. rNrrs rrz#TypedMaxentFeatureEncoding.__init__Trrcg}|jD]\}}t|ttfrH|t ||f|j vs7|j |j |t ||f|fd|||f|j vr$|j |j |||fdf|js|jD]}|||f|j vs||jvs|j |j|df|jr.||jvr |j |j|df|Sr) rrr_floattyperrrrrrs rr1z!TypedMaxentFeatureEncoding.encodes6&++- FKE4$e -4:u->OOT]]5$t*e3K%Ld$ST4'4==8OOT]]5$3E%F$JK\\"&,,F!40DMMA!F !DLL0$OOT\\%-@!,DE' F, >>et~~5 OOT^^E2A6 7rct|ts td |j|t |j kr|j|\}}}|d|d|S|jrK||jjvr/|jjD]\}}||k(s d|zcSy|jrK||jjvr/|jjD]\}}||k(s d|zcSytd#t$rSdgt |j z|_|j jD]\}}||j|<YIwxYwrrrs rrbz#TypedMaxentFeatureEncoding.describerrc|jSr!rr#s rr"z!TypedMaxentFeatureEncoding.labelsrrc|jSr!rr#s rrz!TypedMaxentFeatureEncoding.lengthrrNc i}t}tt}|D]\}} |r| |vrtd| z|j | |j D]Z\} } t | ttfvr t | } || | fxxdz cc<|| | f|k\sB| | | f|vsJt||| | | f<\||}|||fi|S)a) Construct and return new feature encoding, based on a given training corpus ``train_toks``. See the class description ``TypedMaxentFeatureEncoding`` for a description of the joint-features that will be included in this encoding. Note: recognized feature values types are (int, float), over types are interpreted as regular binary features. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label. :type count_cutoff: int :param count_cutoff: A cutoff value that is used to discard rare joint-features. If a joint-feature's value is 1 fewer than ``count_cutoff`` times in the training corpus, then that joint-feature is not included in the generated encoding. :type labels: list :param labels: A list of labels that should be used by the classifier. If not specified, then the set of labels attested in ``train_toks`` will be used. :param options: Extra parameters for the constructor, such as ``unseen_features`` and ``alwayson_features``. rrR) rrr_rrrrrrrs rrz TypedMaxentFeatureEncoding.trains>e C $ CJC%v- !6!>?? OOE " #yy{ C t:#u-:DeTk"a'"%5tU+7:69'ltU 23 C  C" > F67.g..rrrrrCrrrrs76p/(b@/25/5/rrc |jddt|}|tj||}t |ds t dd|j z }t||}ttj|dk(d}tjt|d} |D]} tj| | <t|| } tj|} ~|dkDrt!d |dz|d kDr t!t!d t!d  |d kDrQ|j"xs t%| |} |j&xs t)| |}|j*}t!d || |fzt-| ||}|D]} || xxdz cc<tj|}~| j/} | | |z |zz } | j1| |j3| |rn |d kDr+t%| |} t)| |}t!d| dd|d| S#t4$rt!dYHxYw)a Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the Generalized Iterative Scaling algorithm. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``. :see: ``train_maxent_classifier()`` for parameter descriptions. rdr"rzJThe GIS algorithm requires an encoding that defines C (e.g., GISEncoding).r.rd ==> Training (%d iterations)r;- Iteration Log Likelihood Accuracy- --------------------------------------- %9d %14.5f %9.3frR* Training stopped: keyboard interrupt Final 14.5f 9.3f) setdefaultr rrrorrcalculate_empirical_fcountrnumpynonzerozerosrNINF ConditionalExponentialClassifierlog2r\llr accr itercalculate_estimated_fcountrr&checkKeyboardInterrupt)rrrr"r cutoffcheckerCinvempirical_fcount unattestedrrn classifierlog_empirical_fcountr/r0iternumestimated_fcountlog_estimated_fcounts rrrs{ z3'!'*M$$Z$? 8S ! -    D2*hGU]]#3q#89!<=Jkk#./5G"zz "1(GDJ!::&67 qy .1DDE qy  => => qy"%%O J)O#''K8J +K',,3wC6HHI :J   " + %*% +#(::.>#?  !((*G ,/CCtK KG  " "7 +"":z:54 qy J 3z:. "2e*DT ;<  < :; s$CH%%H=;H=ctj|jd}|D],\}}|j||D]\}}||xx|z cc<.|SNr)r)r+rr1)rrfcountrr3indexvals rr(r(fs_ [[*C 0F ! U"//#u5 !JE3 5MS M !! Mrc$tj|jd}|D]f\}}|j|}|j D]=}|j |}|j ||D]\}} ||xx|| zz cc<?h|Sr?)r)r+rr)r[rWr1) r9rrr@rr3rXrWrnrs rr2r2ps [[*C 0F + U((-]]_ +E::e$D%__S%8 + Ts td{*  + ++ Mrc z|jddt|}|tj||}t ||t |z }t ||}tjt||jd}tj|t |df} ttj|dk(d} tjt |d} | D]} tj| | <t!|| } |dkDrt#d|dz|d kDr t#t#d t#d  |d kDrQ|j$xs t'| |}|j(xs t+| |}|j,}t#d |||fzt/|| | |||| |}| j1} | |z } | j3| |j5| |rn |d kDr+t'| |}t+| |}t#d|dd|d| S#t6$rt#d YHxYw)a Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the Improved Iterative Scaling algorithm. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``. :see: ``train_maxent_classifier()`` for parameter descriptions. rrr)r=rrRrrr;rr r!r"r#r$r%r&)r'r rrr(rcalculate_nfmapr)arrayrZ __getitem__reshaperr*r+r,r-r\r/r r0r r1calculate_deltasrr&r3r4)rrrr"rr5empirical_ffreqnfmapnfarray nftransposer8rrnr9r/r0r;deltass rrrsT z3'!'*M.44Z4O1XFZXO J 1Ekk&E,=,=>DG--#g,):;KU]]?a#78; => qy"%%O J)O#''K8J +K',,3wC6HHI& F!((*G v G  " "7 +"":z:54 qy J 3z:. "2e*DT ;<  < :; s B#H""H:8H:c t}|D]K\}}|jD]3}|jtd|j ||D5Mt |Dcic]\}}|| c}}Scc}}w)a Construct a map that can be used to compress ``nf`` (which is typically sparse). *nf(feature_vector)* is the sum of the feature values for *feature_vector*. This represents the number of features that are active for a given labeled text. This method finds all values of *nf(t)* that are attested for at least one token in the given list of training tokens; and constructs a dictionary mapping these attested values to a continuous range *0...N*. For example, if the only values of *nf()* that were attested were 3, 5, and 7, then ``_nfmap`` might return the dictionary ``{3:0, 5:1, 7:2}``. :return: A map that can be used to compress ``nf`` to a dense vector. :rtype: dict(int -> int) c3&K|] \}}| ywr!rCrDidrBs rrFz"calculate_nfmap..sK)2s#Kr)rr"rrr1r`)rrnfsetr_r3rgnfs rrErEs}* EEMQ__& ME IIcKxsE/JKK L MM"+5!1 2gq"BE 22 2s) A:c nd}d} tj|jd} tjt ||jfd} |D]}\} } |j | }|j D]T} |j| | }td|D}|D])\}}| |||fxx|j| |zz cc<+V| t |z} t| D]}tj|| }d|z}||z}tj|| zd}tj|| zd}|D]}||xxdz cc<| ||z | z z} tjt||z tjt| z }||ks| cS| S) a Calculate the update values for the classifier weights for this iteration of IIS. These update weights are the value of ``delta`` that solves the equation:: ffreq_empirical[i] = SUM[fs,l] (classifier.prob_classify(fs).prob(l) * feature_vector(fs,l)[i] * exp(delta[i] * nf(feature_vector(fs,l)))) Where: - *(fs,l)* is a (featureset, label) tuple from ``train_toks`` - *feature_vector(fs,l)* = ``encoding.encode(fs,l)`` - *nf(vector)* = ``sum([val for (id,val) in vector])`` This method uses Newton's method to solve this equation for *delta[i]*. In particular, it starts with a guess of ``delta[i]`` = 1; and iteratively updates ``delta`` with: | delta[i] -= (ffreq_empirical[i] - sum1[i])/(-sum2[i]) until convergence, where *sum1* and *sum2* are defined as: | sum1[i](delta) = SUM[fs,l] f[i](fs,l,delta) | sum2[i](delta) = SUM[fs,l] (f[i](fs,l,delta).nf(feature_vector(fs,l))) | f[i](fs,l,delta) = (classifier.prob_classify(fs).prob(l) . | feature_vector(fs,l)[i] . | exp(delta[i] . nf(feature_vector(fs,l)))) Note that *sum1* and *sum2* depend on ``delta``; so they need to be re-computed each iteration. The variables ``nfmap``, ``nfarray``, and ``nftranspose`` are used to generate a dense encoding for *nf(ltext)*. This allows ``_deltas`` to calculate *sum1* and *sum2* using matrices, which yields a significant performance improvement. :param train_toks: The set of training tokens. :type train_toks: list(tuple(dict, str)) :param classifier: The current classifier. :type classifier: ClassifierI :param ffreq_empirical: An array containing the empirical frequency for each feature. The *i*\ th element of this array is the empirical frequency for feature *i*. :type ffreq_empirical: sequence of float :param unattested: An array that is 1 for features that are not attested in the training data; and 0 for features that are attested. In other words, ``unattested[i]==0`` iff ``ffreq_empirical[i]==0``. :type unattested: sequence of int :param nfmap: A map that can be used to compress ``nf`` to a dense vector. :type nfmap: dict(int -> int) :param nfarray: An array that can be used to uncompress ``nf`` from a dense vector. :type nfarray: array(float) :param nftranspose: The transpose of ``nfarray`` :type nftranspose: array(float) g-q=i,rc3&K|] \}}| ywr!rCrQs rrFz#calculate_deltas..Ts9Yb#S9rr;r)axisrR) r)onesrr+rr)r"r1rrWrqouterrK)rr9r8ffreq_empiricalrKrLrMrNEWTON_CONVERGE MAX_NEWTONrNArr3distr4rUrRrBrangenumnf_delta exp_nf_deltanf_exp_nf_deltasum1sum2rnn_errors rrIrIsROJ ZZ)3 /F  SZ!23S9A ; U'',__& ;E%__S%8N9.99B) ;C%)R- DIIe$4s$::  ;  ; ;ZA*%;;w/({ % 4yy)2yy1,15 C INI  ?T)dU22))C$ 678599S[;QQ _ $M#& Mrc d}d}d|vr|d}d|vr|d}|,|jdd}tj|||d}n | td t j d \} } t | d 5} t||| || dddtj| g} | gdz } |r| dgz } |s| dgz } |r d|dzz }nd}| dd|zdgz } |dkr| dgz } d|vr | dd|dzgz } d|vr| ddt|dzgz } t|dr| dgz } | d| gz } t| } tj| t!||j#|}|t%j&t$j(z}t+||S#1swY xYw#ttf$r} td | z| d} ~ wwxYw#t$r} td | d!| Yd} ~ d} ~ wwxYw)"a Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the external ``megam`` library. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``. :see: ``train_maxent_classifier()`` for parameter descriptions. :see: ``nltk.classify.megam`` TrrNrr)r"rz$Specify encoding or labels, not bothznltk-prefixw)rrz,Error while creating megam training file: %s)z-nobiasz-repeat10z -explicitz-fvalsr.r;z-lambdaz%.2fz-tunerz-quietrz-maxirAll_deltaz-dppcostz -multilabel multiclasszWarning: unable to delete z: )getrrrtempfilemkstempopenroscloseOSErrorrKrorremover\rrr)r.er)rrrr"rrrrrfdtrainfile_name trainfilerwr inv_variancestdoutrs rrrsqHIV*%f;' zz.!4 .44  Vt5   ?@@T%--W=N .# & ) Hi(i    G ++GK=  H:1144   6L0'::G qyH:VGTF:$6677V FD3vj'9#::;;x M?" n--G  FB .! "&(//*;XFG uzz%''""G Hg ..c   Z TG!KLRSSTD B *>*:"QC@AABsH%F!8F F!2GFF!!G0F??G G+G&&G+ceZdZedZy)rc |jdd}|jdd}|jdd}|jdd}|jdd}|jd d}|jd } |jd } |stj||| }tjd d\} } tjd\} }t | d}t ||||jg}|jdg|jd|g|r|jdd|dzzg| r|jdd| zg| r|jddt| zg|jd| g|jd|g|dkr|jdgn|jdgt|t|5}t|}dddtj| tj|tj tj"z}|||S#1swYbxYw)Nrtao_lmvmrrrr"rrrrrrznltk-tadm-events-z.gz)risuffixznltk-tadm-weights-rhrjz-monitorz-methodz-l2z%.6fr;z-max_itz%dz-fatolz -events_inz -params_outz2>&1z-summary)rorrrprqrr rtextendrKrrrr rsrvr)r.rw)rrrrrrr"sigmarrrl trainfile_fdry weightfile_fdweightfile_namerzr weightfilers rrzTadmMaxentClassifier.trains$JJ{J7  7A&::j$/Hd+ 115zz.!4 ::j)::m,5;;L<H(0'7'7&u( $ n*2)9)9AU)V& %nc:  Hi8 |$ 9-.  NNE6E1H#45 6  NNIth7 8  NNHfs8}&<= > n56 78 19 NNF8 $ NNJ< (' / " 5j(4G 5 .! /" 5::egg&&8W%% 5 5s & IIN)rrrrrrCrrrrs5&5&rrc *ddl}ddlm}|}t|d5}|jt t |j|j|}dddt|d5}|j|}dddt|d5}|j|}dddt|d5}|j|}dddfS#1swYxYw#1swYkxYw#1swYOxYw#1swY3xYw)Nr) MaxentDecoder /weights.txt /mapping.tab /labels.txt /alwayson.tab) r) nltk.tabdatarrrrFrpmapfloat64txt2list tupkey2dict tab2ivdict) tab_dirr)rmdecrwgtmpglabaons rload_maxent_paramsrs* ?D  & 'F1ekk$s5==$--2BCDEF  & '"1q!"  % &!mmA  ' (!Aooa ! S# FF""!!s/?C%5C1C=D %C.1C:=D Dc ddlm}ddlm}ddlm}|}||s||t d|t|dd5} | j|jtt|jdddt|dd5} | j|j|dddt|d d5} | j|j|dddt|d d5} | j|j|dddy#1swYxYw#1swYxYw#1swY[xYw#1swYyxYw) Nr)mkdir)isdir) MaxentEncoderzSaving Maxent parameters in rrjrrr)rsros.pathrrrr\rrwritelist2txtrreprtolist tupdict2tab ivdict2tab) rrrrrrrrmencrs rsave_maxent_paramsr2sD* ?D > g ( 23  & ,= 4==T3::D9"E"E"E9EEEE&cddlm}ddlm}|d}t |\}}}}t t ||||}||S)Nr)find)ClassifierBasedPOSTaggerz.taggers/maxent_treebank_pos_tagger_tab/english)r)r9) nltk.datarnltk.tag.sequentialrrrr)rrrrrrrmcs rmaxent_pos_taggerrIsK<CDG+G4Cc3 #CDc B $r 22rc<ddlm}|tj}y)Nr) names_demo)nltk.classify.utilrrr)rr9s rdemorXs-,223Jr__main__)rNN)rNNr)z/tmp)1rr) ImportErrorrsrp collectionsrnltk.classify.apirnltk.classify.megamrrrnltk.classify.tadmrr r rr r r rrnltk.probabilityr nltk.utilr __docformat__rr-rrrrrrrr(r2rrErIrrrrrrrrCrrrs:,Z  #)QQMMFF'/! JA{JA\$4 F$F$R,**@,*^G/"8G/T:D-:Dz5/%@5/pa/!7a/T04_D &04Yx38{PKLT/x7&+7&~,+. 34  zFM1  sCC%$C%