JL i(ndZddlmZddlmZddlmZmZmZm Z GddeZ dZ e dk(re y y ) a A classifier based on the Naive Bayes algorithm. In order to find the probability for a label, this algorithm first uses the Bayes rule to express P(label|features) in terms of P(label) and P(features|label): | P(label) * P(features|label) | P(label|features) = ------------------------------ | P(features) The algorithm then makes the 'naive' assumption that all features are independent, given the label: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = -------------------------------------------- | P(features) Rather than computing P(features) explicitly, the algorithm just calculates the numerator for each label, and normalizes them so they sum to one: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = -------------------------------------------- | SUM[l]( P(l) * P(f1|l) * ... * P(fn|l) ) ) defaultdict) ClassifierI)DictionaryProbDist ELEProbDistFreqDistsum_logscLeZdZdZdZdZdZdZd dZd dZ e e fdZ y ) NaiveBayesClassifiera A Naive Bayes classifier. Naive Bayes classifiers are paramaterized by two probability distributions: - P(label) gives the probability that an input will receive each label, given no information about the input's features. - P(fname=fval|label) gives the probability that a given feature (fname) will receive a given value (fval), given that the label (label). If the classifier encounters an input with a feature that has never been seen with any label, then rather than assigning a probability of 0 to all labels, it will ignore that feature. The feature value 'None' is reserved for unseen feature values; you generally should not use 'None' as a feature value for one of your own features. c\||_||_t|j|_y)a= :param label_probdist: P(label), the probability distribution over labels. It is expressed as a ``ProbDistI`` whose samples are labels. I.e., P(label) = ``label_probdist.prob(label)``. :param feature_probdist: P(fname=fval|label), the probability distribution for feature values, given labels. It is expressed as a dictionary whose keys are ``(label, fname)`` pairs and whose values are ``ProbDistI`` objects over feature values. I.e., P(fname=fval|label) = ``feature_probdist[label,fname].prob(fval)``. If a given ``(label,fname)`` is not a key in ``feature_probdist``, then it is assumed that the corresponding P(fname=fval|label) is 0 for all values of ``fval``. N)_label_probdist_feature_probdistlistsamples_labels)selflabel_probdistfeature_probdists ^/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/classify/naivebayes.py__init__zNaiveBayesClassifier.__init__@s)" .!1N2245 c|jSN)r)rs rlabelszNaiveBayesClassifier.labelsUs ||rc@|j|jSr) prob_classifymax)r featuresets rclassifyzNaiveBayesClassifier.classifyXs!!*-1133rc"|j}t|jD](}|jD]}||f|jvs%||=*i}|jD] }|j j |||<"|jD]n}|jD]Y\}}||f|jvr.|j||f}||xx|j |z cc<D||xxtgz cc<[pt|ddS)NT) normalizelog) copyrkeysrr r logprobitemsrr)rrfnamelabelr$fval feature_probss rrz"NaiveBayesClassifier.prob_classify[s( __& *//+, &E &5>T%;%;; & u%  &\\ AE!1199%@GEN A\\ 3E)//1 3 t5>T%;%;;$($:$:5%<$HMENm&;&;D&AAN ENhrl2N 3 3"'TtDDrc  |jtd|j|D]\fd tfd|jD fdd}t |dk(rB|d}|d }|fj dk(rd }n0d |fj |fj z z}td d ddd|zddddd|zdddd |d y)NzMost Informative Featuresc0|fjSr)prob)lcpdistr&r(s r labelprobzFNaiveBayesClassifier.show_most_informative_features..labelprobsah',,T22rc3RK|]}|fjvs| ywr)r).0r-r.r&r(s r zFNaiveBayesClassifier.show_most_informative_features..s*OqDF1e84D4L4L4N,NOs''c| |fSr)elementr/s rzENaiveBayesClassifier.show_most_informative_features..si&8%8'$BrT)keyreverserINFz%8.1fz>24z = 14 z%sz>6z : 6z : 1.0)r printmost_informative_featuressortedrlenr,) rnrl0l1ratior.r&r(r/s @@@@rshow_most_informative_featuresz3NaiveBayesClassifier.show_most_informative_features|s'' )*99!< KE4 3ODLLOBF 6{aBBb%i %%d+q02u9%**406"e)3D3I3I$3OO $BQ$)Ra%I ) rc  t|dr|jd|St}tt td |j j D]\\}}}|jD]f}||f}|j||j|}t| | |<t| | |< |dk(sV|j|ht| fd|_|jd|S)a Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature ``(fname,fval)`` is equal to the highest value of P(fname=fval|label), for any label, divided by the lowest value of P(fname=fval|label), for any label: | max[ P(fname=fval|label1) / P(fname=fval|label2) ] _most_informative_featuresNcy)Ng?r4r4rrr6z@NaiveBayesClassifier.most_informative_features..srrcf||z |d|ddvt|djfS)Nrr9)NFT)strlower)feature_maxprobminprobs rr6z@NaiveBayesClassifier.most_informative_features..sEH%(99QKQK#66 $**, &r)r7)hasattrrJsetrfloatr r%raddr,rmindiscardrB) rrDfeaturesr'r&probdistr(featureprPrQs @@rrAz.NaiveBayesClassifier.most_informative_featuress 45 6222A6 6uH"%(G!+.G,0,B,B,H,H,J 2($,,.2D$dmGLL) d+A'*1gg.>'?GG$'*1gg.>'?GG$w'1, ((12 2/5/D +..r22rc|t}tt}tt}t}|D]a\}}||xxdz cc<|jD]<\} } ||| f| xxdz cc<|| j | |j | >c|D]U}||} |D]I} ||| fj } | | z dkDs!||| fdxx| | z z cc<|| j dKW||} i}|jD]%\\}} }||t || }|||| f<'|| |S)z :param labeled_featuresets: A list of classified featuresets, i.e., a list of tuples ``(featureset, label)``. r9rN)bins)rrrSr%rUNrC)clslabeled_featuresets estimatorlabel_freqdistfeature_freqdistfeature_valuesfnamesrr'r&r( num_samplescountrrfreqdistrYs rtrainzNaiveBayesClassifier.trains "&x0$S)"5 " J 5 !Q & !)//1 " t .t494u%))$/ 5!  " "$ 4E(/K 4(688:&*$UE\248K%2(8(>(>(@ 6 $NUEH N54I0JKH-5 UE\ * 6>#344rN) )d) __name__ __module__ __qualname____doc__rrrrrHrA classmethodrrir4rrr r +s?(6*4EB<)3V2=.5.5rr c\ddlm}|tj}|j y)Nr) names_demo)nltk.classify.utilrrr rirH)rr classifiers rdemorus"-0667J--/r__main__N) ro collectionsrnltk.classify.apirnltk.probabilityrrrrr rurlr4rrrzs@2$)PPI5;I5b0 zFr