JL ipdZddlmZmZddlmZmZmZddlm Z ddl m Z GddeZ Gdd e Z y ) z Tokenizer Interface )ABCabstractmethod)IteratorListTuple) overridden)string_span_tokenizeceZdZdZededeefdZdedee e e ffdZ deedeeefdZ deedeee e e ffdZ y ) TokenizerIz A processing interface for tokenizing a string. Subclasses must define ``tokenize()`` or ``tokenize_sents()`` (or both). sreturncXt|jr|j|gdSy)zL Return a tokenized copy of *s*. :rtype: List[str] rN)rtokenize_sentsselfr s W/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/tokenize/api.pytokenizezTokenizerI.tokenizes/ d)) *&&s+A. . +ct)z Identify the tokens using integer offsets ``(start_i, end_i)``, where ``s[start_i:end_i]`` is the corresponding token. :rtype: Iterator[Tuple[int, int]] NotImplementedErrorrs r span_tokenizezTokenizerI.span_tokenize$s "##rstringscJ|Dcgc]}|j|c}Scc}w)z Apply ``self.tokenize()`` to each element of ``strings``. I.e.: return [self.tokenize(s) for s in strings] :rtype: List[List[str]] )rrrr s rrzTokenizerI.tokenize_sents-s!+22Q a 222s c#RK|D]}t|j| yw)z Apply ``self.span_tokenize()`` to each element of ``strings``. I.e.: return [self.span_tokenize(s) for s in strings] :yield: List[Tuple[int, int]] N)listrrs rspan_tokenize_sentszTokenizerI.span_tokenize_sents7s- .At))!,- - .s%'N)__name__ __module__ __qualname____doc__rstrrrrrintrrrrrr r s /#/$s)//$s$xc3h'@$3d3i3DcO3 .Cy . $uS#X' ( .rr c6eZdZdZeedZdZdZy)StringTokenizerzxA tokenizer that divides a string into substrings by splitting on the specified string (defined in subclasses). ctNr)rs r_stringzStringTokenizer._stringJs "!rc8|j|jSr))splitr*rs rrzStringTokenizer.tokenizeOswwt||$$rc#LKt||jEd{y7wr))r r*rs rrzStringTokenizer.span_tokenizeRs'4<<888s $"$N) rr r!r"propertyrr*rrr%rrr'r'Es-""%9rr'N)r"abcrrtypingrrrnltk.internalsrnltk.tokenize.utilr r r'r%rrr3s4$((%3....b9j9r