`L i9kdZddlZddlZddlmZddlZddlmZmZm Z m Z m Z ddl m Z mZddlmZddlmZddlmZdd lmZmZdd lmZdd lmZdd lmZdd lm Z ddl!m"Z"m#Z#m$Z$ddl%m&Z&m'Z'ejPjRZ*e+e,fZ-dZ.Gdde+Z/Gdde0Z1ddZ2dZ3GddeeZ4Gdde4Z5Gdde4Z6GddZ7y) aWrappers for various units of text, including the main :class:`TextBlob `, :class:`Word `, and :class:`WordList ` classes. Example usage: :: >>> from textblob import TextBlob >>> b = TextBlob("Simple is better than complex.") >>> b.tags [(u'Simple', u'NN'), (u'is', u'VBZ'), (u'better', u'JJR'), (u'than', u'IN'), (u'complex', u'NN')] >>> b.noun_phrases WordList([u'simple']) >>> b.words WordList([u'Simple', u'is', u'better', u'than', u'complex']) >>> b.sentiment (0.06666666666666667, 0.41904761904761906) >>> b.words[0].synsets()[0] Synset('simple.n.01') .. versionchanged:: 0.8.0 These classes are now imported from ``textblob`` rather than ``text.blob``. N) defaultdict)BaseNPExtractor BaseParserBaseSentimentAnalyzer BaseTagger BaseTokenizer)cached_propertyrequires_nltk_corpus)suggest) pluralize) singularize)BlobComparableMixinStringlikeMixin)FastNPExtractor) PatternParser)PatternAnalyzer) NLTKTagger) WordTokenizer sent_tokenize word_tokenize)PUNCTUATION_REGEX lowerstripc|dvrtjS|dvrtjS|dvrtjS|dvrtjSy)z.Converts a Penn corpus tag into a Wordnet tag.)NNNNSNNPNNPS)JJJJRJJS)VBVBDVBGVBNVBPVBZ)RBRBRRBSN)_wordnetNOUNADJVERBADV)tags S/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/textblob/blob.py_penn_to_wordnetr17sQ **}} ""|| 77}} ""|| ceZdZdZdfd ZddZdZdZdZdZ dZ d Z e e d Ze dd Zej"j$j'Zej"j(j+Zej"j,j/d Zefd Ze dZe dZddZddZxZS)Wordz`A simple word representation. Includes methods for inflection, and WordNet integration. c$t|||S)zReturn a new instance of the class. It is necessary to override this method in order to handle the extra pos_tag argument in the constructor. )super__new__)clsstringpos_tag __class__s r0r7z Word.__new__Is wsF++r2c ||_||_yN)r9r:)selfr9r:s r0__init__z Word.__init__Ps  r2c,t|jSr=)reprr9r>s r0__repr__z Word.__repr__TsDKK  r2c|jSr=)r9rBs r0__str__z Word.__str__Ws {{r2c>tt|jS)z4Return the singular version of the word as a string.)r4 _singularizer9rBs r0r zWord.singularizeZsL-..r2c>tt|jS)z2Return the plural version of the word as a string.)r4 _pluralizer9rBs r0r zWord.pluralize^sJt{{+,,r2c,t|jS)aReturn a list of (word, confidence) tuples of spelling corrections. Based on: Peter Norvig, "How to Write a Spelling Corrector" (http://norvig.com/spell-correct.html) as implemented in the pattern library. .. versionadded:: 0.6.0 )r r9rBs r0 spellcheckzWord.spellcheckbst{{##r2c@t|jddS)zCorrect the spelling of the word. Returns the word with the highest confidence using the spelling corrector. .. versionadded:: 0.6.0 r)r4rKrBs r0correctz Word.correctms DOO%a(+,,r2c:|j|jS)z>Return the lemma of this word using Wordnet's morphy function.pos) lemmatizer:rBs r0lemmaz Word.lemmaus~~$,,~//r2c|tj}n.|tjjvr|}n t |}t j j}|j|j|S)zReturn the lemma for a word using WordNet's morphy function. :param pos: Part of speech to filter upon. If `None`, defaults to ``_wordnet.NOUN``. .. versionadded:: 0.8.1 ) r*r+_FILEMAPkeysr1nltkstemWordNetLemmatizerrQr9)r>rPr/ lemmatizers r0rQzWord.lemmatize{sa ;--C H%%**, ,C"3'CYY002 ##DKK55r2englishc8|j|jS)zmStem a word using various NLTK stemmers. (Default: Porter Stemmer) .. versionadded:: 0.12.0 )rWr9)r>stemmers r0rWz Word.stems ||DKK((r2c&|jdS)ztThe list of Synset objects for this Word. :rtype: list of Synsets .. versionadded:: 0.7.0 NrO) get_synsetsrBs r0synsetsz Word.synsetssD))r2c&|jdS)zThe list of definitions for this word. Each definition corresponds to a synset. .. versionadded:: 0.7.0 NrO)definerBs r0 definitionszWord.definitionss{{t{$$r2cBtj|j|S)aReturn a list of Synset objects for this word. :param pos: A part-of-speech tag to filter upon. If ``None``, all synsets for all parts of speech will be loaded. :rtype: list of Synsets .. versionadded:: 0.7.0 )r*r_r9)r>rPs r0r^zWord.get_synsetss S11r2ch|j|Dcgc]}|jc}Scc}w)a;Return a list of definitions for this word. Each definition corresponds to a synset for this word. :param pos: A part-of-speech tag to filter upon. If ``None``, definitions for all parts of speech will be loaded. :rtype: List of strings .. versionadded:: 0.7.0 rO)r^ definition)r>rPsyns r0raz Word.defines--1,<,<,<,EFS FFFs/r=)__name__ __module__ __qualname____doc__r7r?rCrEr r rKrMr r rRrQrVrWporter PorterStemmer lancasterLancasterStemmersnowballSnowballStemmerr_rbr^ra __classcell__r;s@r0r4r4Ds,!/- $-0066"II$$224Myy**;;=ii((88CO))**%% 2 Gr2r4ceZdZdZfdZfdZfdZfdZfdZfdZ dfd Z fd Z d Z d Z d Zd ZdZdZdZxZS)WordListz A list-like collection of words.c\t||Dcgc] }t|c}ycc}w)z[Initialize a WordList. Takes a collection of strings as its only argument. N)r6r?r4)r> collectionwr;s r0r?zWordList.__init__s$ :6a$q'676s)c t|S)z-Returns a string representation for printing.)r6rC)r>r;s r0rEzWordList.__str__sw!!r2cX|jj}|dt| dS)z.Returns a string representation for debugging.())r;rgr6rC)r> class_namer;s r0rCzWordList.__repr__s.^^,, Quw/12!44r2cht||}t|tr|j |S|S)z$Returns a string at the given index.)r6 __getitem__ isinstanceslicer;)r>keyitemr;s r0r~zWordList.__getitem__s0w"3' c5 !>>$' 'Kr2cB|jt| ||Sr=)r;r6 __getslice__)r>ijr;s r0rzWordList.__getslice__s~~eg21a899r2czt|trt| |t |yt| ||y)zPlaces object at given index, replacing existing item. If the object is a string, inserts a :class:`Word ` object. N)r basestringr6 __setitem__r4)r>indexobjr;s r0rzWordList.__setitem__s1 c: & G tCy 1 G s +r2c|s@|Dcgc]}|jc}j|jg|i|St||g|i|Scc}w)zGet the count of a word or phrase `s` within this WordList. :param strg: The string to count. :param case_sensitive: A boolean, whether or not the search is case-sensitive. )lowercountr6)r>strgcase_sensitiveargskwargswordr;s r0rzWordList.counts[ 8T2TDJJL288WWPVW Ww}T3D3F333sAcvt|trt| t |yt| |y)zkAppend an object to end. If the object is a string, appends a :class:`Word ` object. N)rrr6appendr4)r>rr;s r0rzWordList.appends+ c: & GN49 % GN3 r2c4|D]}|j|y)zExtend WordList by appending elements from ``iterable``. If an element is a string, appends a :class:`Word ` object. N)r)r>iterablees r0extendzWordList.extends A KKN r2cf|j|Dcgc]}|jc}Scc}w)z1Return a new WordList with each word upper-cased.)r;upperr>rs r0rzWordList.upper%~~=tzz|=>>=.cf|j|Dcgc]}|jc}Scc}w)z1Return a new WordList with each word lower-cased.)r;rrs r0rzWordList.lower rrcf|j|Dcgc]}|jc}Scc}w)z8Return the single version of each word in this WordList.)r;r rs r0r zWordList.singularizes(~~dCdt//1CDDCrcf|j|Dcgc]}|jc}Scc}w)z8Return the plural version of each word in this WordList.)r;r rs r0r zWordList.pluralize&~~DADt~~/ABBArcf|j|Dcgc]}|jc}Scc}w)z/Return the lemma of each word in this WordList.)r;rQrs r0rQzWordList.lemmatizerrc j|j|Dcgc]}|j|i|c}Scc}w)z/Return the stem for each word in this WordList.)r;rW)r>rrrs r0rWz WordList.stems/~~dKdytyy$9&9KLLKs0)F)rgrhrirjr?rErCr~rrrrrrrr r rQrWrqrrs@r0rtrtsU*8 "5 :,4 ??ECCMr2rtcj|r|n |j}|t||st|d||xs|S)ajValidates a parameter passed to __init__. Makes sure that obj is the correct class. Return obj if it's not None or falls back to default :param obj: The object passed in. :param name: The name of the parameter. :param base_class: The class that obj must inherit from. :param default: The default object to fall back upon if obj is None. z must be an instance of )rgr ValueError)rname base_classdefaultbase_class_names r0_validated_paramr sC*9oj>Q>QO z#z:D6!9/9JKLL >'r2ct|dttjjj ft jd|_t|dtt j|_ t|dtt j|_ t|dtt j|_ t|dtt j|_||_y ) z;Common initialization between BaseBlob and Blobber classes. tokenizerr)rrr np_extractor)rr pos_taggeranalyzerparserN)rrrVtokenizeapi TokenizerIBaseBlobrrrrrrrrr classifier)rrrrrrrs r0_initialize_modelsr/s %!4==#4#4#?#?@""' CM("%% C &L*h.A.ACN$*3X5F5FCL"&(JPCJCNr2cleZdZdZeZeZeZ e Z e Z ddZedZedZddZddZdZed Zed Zed Zed Zed ZedZeZedZedZddZdZdZ dZ!dZ"dZ#de$jJfdZ&y)raZAn abstract base class that all textblob classes will inherit from. Includes words, POS tag, NP, and word count properties. Also includes basic dunder and string methods for making objects like Python strings. :param text: A string. :param tokenizer: (optional) A tokenizer instance. If ``None``, defaults to :class:`WordTokenizer() `. :param np_extractor: (optional) An NPExtractor instance. If ``None``, defaults to :class:`FastNPExtractor() `. :param pos_tagger: (optional) A Tagger instance. If ``None``, defaults to :class:`NLTKTagger `. :param analyzer: (optional) A sentiment analyzer. If ``None``, defaults to :class:`PatternAnalyzer `. :param parser: A parser. If ``None``, defaults to :class:`PatternParser `. :param classifier: A classifier. .. versionchanged:: 0.6.0 ``clean_html`` parameter deprecated, as it was in NLTK. Nc t|tstdt||r t d|x|_|_t|j d|_t|||||||y)NzEThe `text` argument passed to `__init__(text)` must be a string, not z^clean_html has been deprecated. To remove HTML markup, use BeautifulSoup's get_text() functionT)all) rr TypeErrortypeNotImplementedErrorrawr9rstrippedr) r>textrrrrrr clean_htmls r0r?zBaseBlob.__init__gs~$ +))-d 6  %&  "&%4;"4886  )Zx r2cBtt|jdSzReturn a list of word tokens. This excludes punctuation characters. If you want to include punctuation characters, access the ``tokens`` property. :returns: A :class:`WordList ` of word tokens. F) include_puncrtrrrBs r0wordszBaseBlob.words dhhUCDDr2c^t|jj|jS)zReturn a list of tokens, using this blob's tokenizer object (defaults to :class:`WordTokenizer `). )rtrrrrBs r0tokenszBaseBlob.tokenss" //9::r2cj||n |j}t|j|jS)zReturn a list of tokens, using ``tokenizer``. :param tokenizer: (optional) A tokenizer object. If None, defaults to this blob's default tokenizer. )rrtrr)r>rts r0rzBaseBlob.tokenizes, #.IDNN 488,--r2cX||n |j}|j|jS)zParse the text. :param parser: (optional) A parser instance. If ``None``, defaults to this blob's default parser. .. versionadded:: 0.6.0 )rparser)r>rps r0rzBaseBlob.parses'(Fdkkwwtxx  r2cz|j td|jj|jS)z2Classify the blob using the blob's ``classifier``.z-This blob has no classifier. Train one first!)r NameErrorclassifyrrBs r0rzBaseBlob.classifys1 ?? "KL L''11r2cL|jj|jS)aFReturn a tuple of form (polarity, subjectivity ) where polarity is a float within the range [-1.0, 1.0] and subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. :rtype: namedtuple of the form ``Sentiment(polarity, subjectivity)`` ranalyzerrBs r0 sentimentzBaseBlob.sentiments}}$$TXX..r2cP|jj|jdS)aReturn a tuple of form (polarity, subjectivity, assessments ) where polarity is a float within the range [-1.0, 1.0], subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective, and assessments is a list of polarity and subjectivity scores for the assessed tokens. :rtype: namedtuple of the form ``Sentiment(polarity, subjectivity, assessments)`` T)keep_assessmentsrrBs r0sentiment_assessmentszBaseBlob.sentiment_assessmentss"}}$$TXX$EEr2cNtj|jdS)zaReturn the polarity score as a float within the range [-1.0, 1.0] :rtype: float rrrrrBs r0polarityzBaseBlob.polaritys!  ((2155r2cNtj|jdS)zReturn the subjectivity score as a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. :rtype: float rrBs r0 subjectivityzBaseBlob.subjectivitys! ((2155r2ct|jj|jDcgc].}t |dkDr|j j 0c}Scc}w)z-Returns a list of noun phrases for this blob.r)rtrextractrlenstripr)r>phrases r0 noun_phraseszBaseBlob.noun_phrasess]#//77A v;? $$&    s3A'c t|tr;|jDcgc]}|jc}Dcgc] }|D]}| c}}S|jj |Dcgc]D\}}t jt|s!tt||t|fFc}}Scc}wcc}}wcc}}w)aaReturns an list of tuples of the form (word, POS tag). Example: :: [ ("At", "IN"), ("eight", "CD"), ("o'clock", "JJ"), ("on", "IN"), ("Thursday", "NNP"), ("morning", "NN"), ] :rtype: list of tuples )r:) rTextBlob sentencespos_tagsrr/rmatchstrr4)r>ssublistvalrrs r0rzBaseBlob.pos_tagss$ dH %59NNCq C"  $2248D!(..s1v6c$i+SV4  D sB7B<*A Cctt}|jDcgc] }t|}}|D]}||xxdz cc<|Scc}w)z,Dictionary of word frequencies in this text.r)rintrr)r>countsrstripped_wordss r0 word_countszBaseBlob.word_countssRS!7;zzBt*T*BB" D 4LA L  CsAc`tt}|jD]}||xxdz cc<|S)z3Dictionary of noun phrase frequencies in this text.r)rrr)r>rrs r0 np_countszBaseBlob.np_counts s6S!'' F 6Na N  r2c |dkrgStt|j|z dzDcgc]}t|j|||z}}|Scc}w)zReturn a list of n-grams (tuples of n successive words) for this blob. :rtype: List of :class:`WordLists ` rr)rangerrrt)r>nrgramss r0ngramszBaseBlob.ngramssb 6I5:3tzz?Q;NQR;R5S 01HTZZAE* +    s"Actjj|jd}d|D}dj |}|j |S)zAttempt to correct the spelling of a blob. .. versionadded:: 0.6.0 :rtype: :class:`BaseBlob ` z\w+|[^\w\s]|\sc3NK|]}t|jywr=)r4rM).0rws r0 z#BaseBlob.correct..*s71T!W__&7s#%)rVrregexp_tokenizerjoinr;)r>r correctedrets r0rMzBaseBlob.correct!sF..txx9JK77 ggi ~~c""r2c|jS)zXKey used by ComparableMixin to implement all rich comparison operators. rrBs r0_cmpkeyzBaseBlob._cmpkey.s xxr2c|jS)z8Key used by StringlikeMixin to implement string methods.rrBs r0_strkeyzBaseBlob._strkey4s xxr2c4t|jSr=)hashrrBs r0__hash__zBaseBlob.__hash__8sDLLN##r2ct|tr|j|j|zSt|tr(|j|j|jzSt d|jj d)zConcatenates two text objects the same way Python strings are concatenated. Arguments: - `other`: a string or a text object z#Operands must be either strings or z objects)rrr;rrrrg)r>others r0__add__zBaseBlob.__add__;sp eZ (>>$((U"23 3 x (>>$((UYY"67 75dnn6M6M5NhW r2cTt|jj||S)zBehaves like the built-in str.split() except returns a WordList. :rtype: :class:`WordList ` )rtrsplit)r>sepmaxsplits r0rzBaseBlob.splitKs"  ,,S(;<`. :param str text: A string. :param tokenizer: (optional) A tokenizer instance. If ``None``, defaults to :class:`WordTokenizer() `. :param np_extractor: (optional) An NPExtractor instance. If ``None``, defaults to :class:`FastNPExtractor() `. :param pos_tagger: (optional) A Tagger instance. If ``None``, defaults to :class:`NLTKTagger `. :param analyzer: (optional) A sentiment analyzer. If ``None``, defaults to :class:`PatternAnalyzer `. :param classifier: (optional) A classifier. c"|jS)z4Return list of :class:`Sentence ` objects.)_create_sentence_objectsrBs r0rzTextBlob.sentencesds,,..r2cBtt|jdSrrrBs r0rzTextBlob.wordsirr2cT|jDcgc]}|jc}Scc}w)z/List of strings, the raw sentences in the blob.)rrr>sentences r0 raw_sentenceszTextBlob.raw_sentencesss .2^^< <<<%cT|jDcgc]}|jc}Scc}w)z6Returns a list of each sentence's dict representation.)rdictrs r0 serializedzTextBlob.serializedxs /3nn=( ===rcHtj|jg|i|S)zReturn a json representation (str) of this blob. Takes the same arguments as json.dumps. .. versionadded:: 0.5.1 )jsondumpsr")r>rrs r0to_jsonzTextBlob.to_json}s! zz$//;D;F;;r2c"|jS)zThe json representation of this blob. .. versionchanged:: 0.5.1 Made ``json`` a property instead of a method to restore backwards compatibility that was broken after version 0.4.0. )r&rBs r0r$z TextBlob.jsons||~r2c xg}t|j}d}|D]}|jj||}|t|z }|t|z}t ||||j |j |j|j|j|j }|j||S)z5Returns a list of Sentence objects from the raw text.r) start_index end_indexrrrrrr) rrrrSentencerrrrrrr)r>sentence_objectsr char_indexsentr)r*rs r0rz!TextBlob._create_sentence_objectss!$((+   'D((..z:K #d) #J#c$i/I'#..!..??{{?? A  # #A &% '& r2N) rgrhrirjr rrpropertyrr"r&r$rrr2r0rrTsw //EE==>>< r2rc4eZdZdZdfd ZedZxZS)r+aA sentence within a TextBlob. Inherits from :class:`BaseBlob `. :param sentence: A string, the raw sentence. :param start_index: An int, the index where this sentence begins in a TextBlob. If not given, defaults to 0. :param end_index: An int, the index where this sentence ends in a TextBlob. If not given, defaults to the length of the sentence - 1. ct||g|i||x|_|_|xst |dz x|_|_y)Nr)r6r?startr)rendr*)r>rr)r*rrr;s r0r?zSentence.__init__sE 3D3F3(33 T%$-$BX1BB4>r2c|j|j|j|j|j|j |j dS)z)The dict representation of this sentence.rr)r*rrrrr5rBs r0r!z Sentence.dictsE88++  --  --  r2)rN)rgrhrirjr?r/r!rqrrs@r0r+r+s"C    r2r+czeZdZdZeZeZeZ e Z e Z ddZdZdZeZy)Blobbera:A factory for TextBlobs that all share the same tagger, tokenizer, parser, classifier, and np_extractor. Usage: >>> from textblob import Blobber >>> from textblob.taggers import NLTKTagger >>> from textblob.tokenizers import SentenceTokenizer >>> tb = Blobber(pos_tagger=NLTKTagger(), tokenizer=SentenceTokenizer()) >>> blob1 = tb("This is one blob.") >>> blob2 = tb("This blob has the same tagger and tokenizer.") >>> blob1.pos_tagger is blob2.pos_tagger True :param tokenizer: (optional) A tokenizer instance. If ``None``, defaults to :class:`WordTokenizer() `. :param np_extractor: (optional) An NPExtractor instance. If ``None``, defaults to :class:`FastNPExtractor() `. :param pos_tagger: (optional) A Tagger instance. If ``None``, defaults to :class:`NLTKTagger `. :param analyzer: (optional) A sentiment analyzer. If ``None``, defaults to :class:`PatternAnalyzer `. :param parser: A parser. If ``None``, defaults to :class:`PatternParser `. :param classifier: A classifier. .. versionadded:: 0.4.0 Nc &t|||||||yr=)r)r>rrrrrrs r0r?zBlobber.__init__s  )Zx r2c t||j|j|j|j|j |j S)zReturn a new TextBlob object with this Blobber's ``np_extractor``, ``pos_tagger``, ``tokenizer``, ``analyzer``, and ``classifier``. :returns: A new :class:`TextBlob `. )rrrrrr)rrrrrrr)r>rs r0__call__zBlobber.__call__s@  nn**]];;  r2c |jr#|jjjdznd}d|jjjd|jjjd|j jjd|j jjd|jjjd|d S) Nz()NonezBlobber(tokenizer=z(), pos_tagger=z(), np_extractor=z (), analyzer=z (), parser=z(), classifier=r{)rr;rgrrrrr)r>classifier_names r0rCzBlobber.__repr__ s9=DOO % % . . 5f !!9!9!B!B CD//33<<=> --77@@AB //889:kk++4456)*!  - r2)NNNNNN)rgrhrirjrrrrrrrrrrr?r:rCrErr2r0r7r7sX:#$LJI H _F    Gr2r7r=)8rjr$r collectionsrrV textblob.baserrrrrtextblob.decoratorsr r textblob.enr textblob.inflectr rIr rGtextblob.mixinsrrtextblob.np_extractorsrtextblob.parsersrtextblob.sentimentsrtextblob.taggersrtextblob.tokenizersrrrtextblob.utilsrrcorpuswordnetr*rbytesrr1r4listrtrrrrr+r7rr2r0rNs, # F48@2*/'KK8 ;;  5\  G3GDWMtWMt  8F= 3F=RS xS l x @NNr2