`L i dZddlmZddlZddlmZddlmZddlm Z GddeZ Gd d eZ e jZ e Zd d Zy) z do n't * split commas and single quotes * separate periods that appear at the end of line ctjj|}|r|S|Dcgc]/}t|dr |j dr|n t|d1c}Scc}w)zReturn a list of word tokens. :param text: string of text. :param include_punc: (optional) whether to include punctuation as separate tokens. Default to True. F)all')nltktokenize word_tokenizer startswith)selftext include_punctokenswords Y/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/textblob/tokenizers.pyr zWordTokenizer.tokenizesc,,T2 M#d.,*Tu2MM s4ANT)__name__ __module__ __qualname____doc__r rrrs  rrc eZdZdZedZy)SentenceTokenizerzNLTK's sentence tokenizer (currently PunktSentenceTokenizer). Uses an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences, then uses that to find sentence boundaries. c@tjj|S)zReturn a list of sentences.)r r sent_tokenize)rrs rr zSentenceTokenizer.tokenize9s}}**400rN)rrrrrr rrrrr2s 11rrc^tjfdt|D}|S)zConvenience function for tokenizing text into words. NOTE: NLTK's word tokenizer expects sentences as input, so the text will be tokenized to sentences before being tokenized to words. c3RK|]}tj|gi yw)N)_word_tokenizer itokenize).0sentenceargsrkwargss r z word_tokenize..Ks0   !!(LJ4J6J s$')r from_iterabler )rrr'r(wordss ``` rrrEs.    %d+  E Lrr)r itertoolsrr textblob.basertextblob.decoratorsrtextblob.utilsrrrr$r r#rrrrr0sP  '4% M F 1 1"#-- / r