JL iBhdZddlZddlmZ ddlZdad dZd dZd dZ dZ dZ y#e$rdZYwxYw) aP A set of functions used to interface with the external megam_ maxent optimization package. Before megam can be used, you should tell NLTK where it can find the megam binary, using the ``config_megam()`` function. Typical usage: >>> from nltk.classify import megam >>> megam.config_megam() # pass path to megam if not found in PATH # doctest: +SKIP [Found megam: ...] Use with MaxentClassifier. Example below, see MaxentClassifier documentation for details. nltk.classify.MaxentClassifier.train(corpus, 'megam') .. _megam: https://www.umiacs.umd.edu/~hal/megam/index.html N) find_binaryc*td|dggdday)aA Configure NLTK's interface to the ``megam`` maxent optimization package. :param bin: The full path to the ``megam`` binary. If not specified, then nltk will search the system for a ``megam`` binary; and if one is not found, it will raise a ``LookupError`` exception. :type bin: str megamMEGAM)z megam.optr megam_686zmegam_i686.optz0https://www.umiacs.umd.edu/~hal/megam/index.html)env_vars binary_namesurlN)r _megam_bin)bins Y/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/classify/megam.py config_megamr)s J > Jc j}t|Dcic]\}}|| }}}|D]\ tdr,|jdj  fd|Dn|jd|z|st j ||n5|D]0} |jdt j | ||2|jdycc}}w)a Generate an input file for ``megam`` based on the given corpus of classified tokens. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label. :type encoding: MaxentFeatureEncodingI :param encoding: A feature encoding, used to convert featuresets into feature vectors. May optionally implement a cost() method in order to assign different costs to different class predictions. :type stream: stream :param stream: The stream to which the megam input file should be written. :param bernoulli: If true, then use the 'bernoulli' format. I.e., all joint features have binary values, and are listed iff they are true. Otherwise, list feature values explicitly. If ``bernoulli=False``, then you must call ``megam`` with the ``-fvals`` option. :param explicit: If true, then use the 'explicit' format. I.e., list the features that would fire for any of the possible labels, for each token. If ``explicit=True``, then you must call ``megam`` with the ``-explicit`` option. cost:c3VK|] }tj|"ywN)strr).0lencoding featuresetlabels r z#write_megam_file..is#RaX]]:ua@ARs&)z%dz # N)labels enumeratehasattrwritejoin_write_megam_featuresencode) train_toksrstream bernoulliexplicitrirlabelnumrrs ` ` @r write_megam_filer*Bs>__ F+4V+<=Zaq=H=( E 8V $ LLR6RR  LL/ 0 !(//*e"Dfi X  Y T"%hooj!&DfiX Y  T->s C2c(t td|sJd|jjd}tj|d}|D]=}|js|j\}}t ||t |<?|S)z Given the stdout output generated by ``megam`` when training a model, return a ``numpy`` array containing the corresponding weight vector. This function does not currently handle bias features. z.This function requires that numpy be installedznon-explicit not supported yetrd)numpy ValueErrorstripsplitzerosfloatint)sfeatures_countr'linesweightslinefidweights r parse_megam_weightsr;~s  }IJJ 5558 GGIOOD !Ekk.#.G. ::<**,KC %f GCH . Nrc|s td|D]I\}}|r+|dk(r|jd|z"|dk7s(td|jd|d|Ky)Nz:MEGAM classifier requires the use of an always-on feature.z %srz3If bernoulli=True, then allfeatures must be binary. )r.r )vectorr%r&r9fvals r r"r"sv  K   + T qy US[) L LL1SE4&) * +rct|tr tdt t tg|z}t j |t j}|j\}}|jdk7r tt|tdt|tr|S|jdS)z= Call the ``megam`` binary with the given arguments. z args should be a list of strings)stdoutrzmegam command failed!zutf-8) isinstancer TypeErrorr r subprocessPopenPIPE communicate returncodeprintOSErrordecode)argscmdprBstderrs r call_megamrQs$:;; , CZ__5A}}VV ||q  f -..&# }}W%%rr)TT)T) __doc__rEnltk.internalsrr- ImportErrorr rr*r;r"rQrr rVsR"& 29x$+"&E Es '11