JL i dZddlZddlZddlmZmZddlmZddlmZddl m Z ddl m Z ddl mZGd d e Zd Zd ZddZGdde Zy)zLanguage Model Interface.N)ABCMetaabstractmethod)bisect) accumulate) NgramCounter) log_base2) Vocabularyc6eZdZdZdZedZedZy) SmoothingzNgram Smoothing Interface Implements Chen & Goodman 1995's idea that all smoothing algorithms have certain features in common. This should ideally allow smoothing algorithms to work both with Backoff and Interpolation. c ||_||_y)z :param vocabulary: The Ngram vocabulary object. :type vocabulary: nltk.lm.vocab.Vocabulary :param counter: The counts of the vocabulary items. :type counter: nltk.lm.counter.NgramCounter N)vocabcounts)self vocabularycounters Q/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/lm/api.py__init__zSmoothing.__init__s   ctNNotImplementedError)rwords r unigram_scorezSmoothing.unigram_score& !##rctrrrrcontexts r alpha_gammazSmoothing.alpha_gamma*rrN)__name__ __module__ __qualname____doc__rrrrrrr r s4$$$$rr ) metaclassc0t|t|z S)z0Return average (aka mean) for sequence of items.)sumlen)itemss r_meanr*/s u:E ""rcdt|tjr|Stj|Sr) isinstancerandomRandom)seed_or_generators r_random_generatorr04s'#V]]3  ==* ++rc|s tdt|t|k7r tdtt|}|d}|j }|t |||zS)z`Like random.choice, but with weights. Heavily inspired by python 3.6 `random.choices`. z"Can't choose from empty populationz3The number of weights does not match the population) ValueErrorr(listrr-r) populationweightsrandom_generator cum_weightstotal thresholds r_weighted_choicer;:sl =>> :#g,&NOOz'*+K OE '')I f[%)*;< ==rc\eZdZdZd dZd dZd dZed dZd dZ dZ d Z d Z dd Z y) LanguageModelzKABC for Language Models. Cannot be directly instantiated itself. Nc||_|r?t|ts/tjd|j j dd| tn||_|t|_ y||_ y)apCreates new LanguageModel. :param vocabulary: If provided, this vocabulary will be used instead of creating a new one when training. :type vocabulary: `nltk.lm.Vocabulary` or None :param counter: If provided, use this object to count ngrams. :type counter: `nltk.lm.NgramCounter` or None :param ngrams_fn: If given, defines how sentences in training text are turned to ngram sequences. :type ngrams_fn: function or None :param pad_fn: If given, defines how sentences in training text are padded. :type pad_fn: function or None z$The `vocabulary` argument passed to z- must be an instance of `nltk.lm.Vocabulary`.) stacklevelN) orderr,r warningswarn __class__r r rr)rrArrs rrzLanguageModel.__init__Psh jZ@ MM6t~~7N7N6QR??  &0%7Z\Z (/ln W rcjs(| tdjj|jjfd|Dy)zeTrains the model on a text. :param text: Training text as a sequence of sentences. Nz:Cannot fit without a vocabulary or text to create it from.c3TK|]}jj|!ywr)r lookup).0sentrs r z$LanguageModel.fit..ts Dt4::,,T2Ds%()r r3updater)rtextvocabulary_texts` rfitzLanguageModel.fithsN zz& P JJ  o . DtDDrc|j|jj||r|jj|SdS)zMasks out of vocab (OOV) words and computes their model score. For model-specific logic of calculating scores, see the `unmasked_score` method. N)unmasked_scorer rGrs rscorezLanguageModel.scorevsI "" JJ  d #7TZZ%6%6w%?  PT  rct)aScore a word given some optional context. Concrete models are expected to provide an implementation. Note that this method does not mask its arguments with the OOV label. Use the `score` method for that. :param str word: Word for which we want the score :param tuple(str) context: Context the word is in. If `None`, compute unigram score. :param context: tuple(str) or None :rtype: float rrs rrPzLanguageModel.unmasked_scores "##rc8t|j||S)zEvaluate the log score of this word in this context. The arguments are the same as for `score` and `unmasked_score`. )rrQrs rlogscorezLanguageModel.logscores D'233rcn|r|jt|dz|S|jjS)zHelper method for retrieving counts for a given context. Assumes context has been checked and oov words in it masked. :type context: tuple(str) or None )rr(unigrams)rrs rcontext_countszLanguageModel.context_countss77>DKKG q( )' 2 CG;;CWCW rc pdt|Dcgc]}|j|d|ddc}zScc}w)a?Calculate cross-entropy of model for given evaluation text. This implementation is based on the Shannon-McMillan-Breiman theorem, as used and referenced by Dan Jurafsky and Jordan Boyd-Graber. :param Iterable(tuple(str)) text_ngrams: A sequence of ngram tuples. :rtype: float r2N)r*rT)r text_ngramsngrams rentropyzLanguageModel.entropys>E?J KeT]]59eCRj 1 K   Ks3 c8td|j|S)zCalculates the perplexity of the given text. This is simply 2 ** cross-entropy for the text, so the arguments are the same. g@)powr\)rrZs r perplexityzLanguageModel.perplexitys 3 [122rcL|gn t|}t|}|dk(rt|jk\r|j dzdn|j j j }rF|sDtdkDrddngj j j }r|sDt|}t|tfd|D|Sg}t|D](}|jjd||z|*|S)aGenerate words from the model. :param int num_words: How many words to generate. By default 1. :param text_seed: Generation can be conditioned on preceding context. :param random_seed: A random seed or an instance of `random.Random`. If provided, makes the random sampling part of generation reproducible. :return: One (str) word or a list of words generated from model. Examples: >>> from nltk.lm import MLE >>> lm = MLE(2) >>> lm.fit([[("a", "b"), ("b", "c")]], vocabulary_text=['a', 'b', 'c']) >>> lm.fit([[("a",), ("b",), ("c",)]]) >>> lm.generate(random_seed=3) 'a' >>> lm.generate(text_seed=['a']) 'b' NrVc3BK|]}j|ywr)rQ)rHwrrs rrJz)LanguageModel.generate..s>djjG,>s) num_words text_seed random_seed) r4r0r(rArXr rGsortedr;tuplerangeappendgenerate) rrcrdrer7samples generated_rs ` @rrjzLanguageModel.generates8*$+Bi ,[9 >y>TZZ/4::+/+,  ))$***;*;G*DEG'),W)9'!"+r--djj.?.?.HI' WoG#>g>>    y! A    ')3 0  r)NNr)rVNN)r r!r"r#rrNrQrrPrTrXr\r_rjr$rrr=r=IsE E0 E  $ $4    35rr=r)r#r-rBabcrrr itertoolsrnltk.lm.counterr nltk.lm.utilrnltk.lm.vocabularyr r r*r0r;r=r$rrrssK ' (")$'$6# , >eger