JL idZddlZddlmZddlmZddlmZddlm Z m Z GddeZ Gd d eZ Gd d e ejZ Gd de ZGdde ZGddeZGddeZGddeZGddeZGddeZGddeZGddeZGdd e ZGd!d"eZGd#d$eZGd%d&eZGd'd(eZGd)d*e ZGd+d,eZGd-d.eZd/Zy)0z Snowball stemmers This module provides a port of the Snowball stemmers developed by Martin Porter. There is also a demo function: `snowball.demo()`. N) stopwords)porter)StemmerI)prefix_replacesuffix_replacec"eZdZdZdZddZdZy)SnowballStemmera Snowball Stemmer The following languages are supported: Arabic, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish and Swedish. The algorithm for English is documented here: Porter, M. "An algorithm for suffix stripping." Program 14.3 (1980): 130-137. The algorithms have been developed by Martin Porter. These stemmers are called Snowball, because Porter created a programming language with this name for creating new stemming algorithms. There is more information available at http://snowball.tartarus.org/ The stemmer is invoked as shown below: >>> from nltk.stem import SnowballStemmer # See which languages are supported >>> print(" ".join(SnowballStemmer.languages)) # doctest: +NORMALIZE_WHITESPACE arabic danish dutch english finnish french german hungarian italian norwegian porter portuguese romanian russian spanish swedish >>> stemmer = SnowballStemmer("german") # Choose a language >>> stemmer.stem("Autobahnen") # Stem a word 'autobahn' Invoking the stemmers that way is useful if you do not know the language to be stemmed at runtime. Alternatively, if you already know the language, then you can invoke the language specific stemmer directly: >>> from nltk.stem.snowball import GermanStemmer >>> stemmer = GermanStemmer() >>> stemmer.stem("Autobahnen") 'autobahn' :param language: The language whose subclass is instantiated. :type language: str or unicode :param ignore_stopwords: If set to True, stopwords are not stemmed and returned unchanged. Set to False by default. :type ignore_stopwords: bool :raise ValueError: If there is no stemmer for the specified language, a ValueError is raised. )arabicdanishdutchenglishfinnishfrenchgerman hungarianitalian norwegianr portugueseromanianrussianspanishswedishc||jvrtd|dt|jdz}|||_|jj |_|jj |_y)NzThe language 'z' is not supported.Stemmer) languages ValueErrorglobals capitalizestemmerstemr)selflanguageignore_stopwords stemmerclasss X/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/stem/snowball.py__init__zSnowballStemmer.__init__gsl 4>> )~hZ7JKL Ly!4!4!6!BC #$45 LL%% //c:|jj||SN)rr r!tokens r%r zSnowballStemmer.stemos||  u--r'NF)__name__ __module__ __qualname____doc__rr&r r'r%r r "s/bI&0.r'r ceZdZdZddZdZy)_LanguageSpecificStemmera This helper subclass offers the possibility to invoke a specific stemmer directly. This is useful if you already know the language to be stemmed at runtime. Create an instance of the Snowball stemmer. :param ignore_stopwords: If set to True, stopwords are not stemmed and returned unchanged. Set to False by default. :type ignore_stopwords: bool c^t|jj}|jdr|dd}t |_|r7 t j |D]}|j j|yy#t$r }tdj||d}~wwxYw)NrizH{!r} has no list of stopwords. Please set 'ignore_stopwords' to 'False'.) typer-lowerendswithsetrwordsaddOSErrorrformat)r!r#r"wordes r%r&z!_LanguageSpecificStemmer.__init__s:&&,,.   Y '}H  %OOH5-DNN&&t,-   66)r5r-)r!s r%__repr__z!_LanguageSpecificStemmer.__repr__s 4:&&'q))r'Nr,)r-r.r/r0r&rBr1r'r%r3r3ss "*r'r3ceZdZdZddZy) PorterStemmeraF A word stemmer based on the original Porter stemming algorithm. Porter, M. "An algorithm for suffix stripping." Program 14.3 (1980): 130-137. A few minor modifications have been made to Porter's basic algorithm. See the source code of the module nltk.stem.porter for more information. cntj||tjj|yr))r3r&rrD)r!r#s r%r&zPorterStemmer.__init__s' ))$0@A%%d+r'Nr,)r-r.r/r0r&r1r'r%rDrDs  ,r'rDceZdZdZdZy)_ScandinavianStemmerz This subclass encapsulates a method for defining the string region R1. It is used by the Danish, Norwegian, and Swedish stemmer. cd}tdt|D]\}|||vs ||dz |vsdt|d|dzcxkDrdkDr nn|dd}|St|d|dzdk\r ||dzd}|S|cS|S)ar Return the region R1 that is used by the Scandinavian stemmers. R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel. But then R1 is adjusted so that the region before it contains at least three letters. :param word: The word whose region R1 is determined. :type word: str or unicode :param vowels: The vowels of the respective language that are used to determine the region R1. :type vowels: unicode :return: the region R1 for the respective word. :rtype: unicode :note: This helper method is invoked by the respective stem method of the subclasses DanishStemmer, NorwegianStemmer, and SwedishStemmer. It is not to be invoked directly! Nrrangelen)r!r=vowelsr1is r%_r1_scandinavianz%_ScandinavianStemmer._r1_scandinavians*q#d)$ AAwf$a!e)>s4!a%=)-A-abB  gA'1,a!egB  K  r'N)r-r.r/r0rRr1r'r%rGrGs  r'rGceZdZdZdZdZy)_StandardStemmerz~ This subclass encapsulates two methods for defining the standard versions of the string regions R1, R2, and RV. cd}d}tdt|D]}|||vs ||dz |vs||dzd}ntdt|D]!}|||vs ||dz |vs||dzd}||fS||fS)a Return the standard interpretations of the string regions R1 and R2. R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel. R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel. :param word: The word whose regions R1 and R2 are determined. :type word: str or unicode :param vowels: The vowels of the respective language that are used to determine the regions R1 and R2. :type vowels: unicode :return: (r1,r2), the regions R1 and R2 for the respective word. :rtype: tuple :note: This helper method is invoked by the respective stem method of the subclasses DutchStemmer, FinnishStemmer, FrenchStemmer, GermanStemmer, ItalianStemmer, PortugueseStemmer, RomanianStemmer, and SpanishStemmer. It is not to be invoked directly! :note: A detailed description of how to define R1 and R2 can be found at http://snowball.tartarus.org/texts/r1r2.html rIrJNrL)r!r=rOrPr2rQs r%_r1r2_standardz_StandardStemmer._r1r2_standards8 q#d)$ AAwf$a!e)>!a%']  q#b'" A!uF"r!a%yF':A[Bx   Bxr'cd}t|dk\rv|d|vr.tdt|D]}|||vs ||dzd}|S|S|d|vr5|d|vr.tdt|D]}|||vs ||dzd}|S|S|dd}|S)a Return the standard interpretation of the string region RV. If the second letter is a consonant, RV is the region after the next following vowel. If the first two letters are vowels, RV is the region after the next following consonant. Otherwise, RV is the region after the third letter. :param word: The word whose region RV is determined. :type word: str or unicode :param vowels: The vowels of the respective language that are used to determine the region RV. :type vowels: unicode :return: the region RV for the respective word. :rtype: unicode :note: This helper method is invoked by the respective stem method of the subclasses ItalianStemmer, PortugueseStemmer, RomanianStemmer, and SpanishStemmer. It is not to be invoked directly! rIrJNrrK)rNrMr!r=rOrvrQs r% _rv_standardz_StandardStemmer._rv_standards, t9>Awf$q#d),AAw&(!!a%']  aF"tAw&'8q#d),AAwf,!!a%']  !"X r'N)r-r.r/r0rWr\r1r'r%rTrTs (T&r'rTcdeZdZdZej dZej dZej dZdZ ej dZ ej dZ ej dZ ej d Z d Zd Zd Zd ZdZdZdZdZdZdZdZdZdZdZdZdZdZdZdZ dZ!dZ"dZ#d Z$d!Z%d"Z&d#Z'd$Z(d%Z)d&Z*d'Z+d(Z,d)Z-d*Z.d+Z/d,Z0d-Z1d-Z2d.Z3d.Z4d.Z5d.Z6d.Z7d.Z8d.Z9d.Z:d.Z;d.Zd/Z?d0Z@d1ZAd2ZBd3ZCd4ZDd5ZEd6ZFd7ZGd8ZHd9ZId:ZJd;ZKd<ZLd=ZMd>ZNd?ZOd@ZPdAZQdBZRdCZSdDZTdEZUdFZVyG)H ArabicStemmera https://github.com/snowballstem/snowball/blob/master/algorithms/arabic/stem_Unicode.sbl (Original Algorithm) The Snowball Arabic light Stemmer Algorithm: - Assem Chelli - Abdelkrim Aries - Lakhdar Benzahia NLTK Version Author: - Lakhdar Benzahia z9[\u064b-\u064c-\u064d-\u064e-\u064f-\u0650-\u0651-\u0652]z[\u0640]z[\u060C-\u061B-\u061F])أإآؤuئz^[\u0622\u0623\u0625]z[\u0624]z[\u0626]z[\u0623\u0622\u0625])كالبالاللل)ةات) يكهناكمهاهنهمكماهمان)اriوrhتrgri) rkrjنيrlrnrprormكنrrrqكمو) rvrtrsrirlتاتنانونينتما)واuتم)ruتموuى)r_أأأآأؤأاأإ)uفالuوال)uفru)rerfrcrd)بrjلببكك)uسيuستuسنuسأ)uيستuنستuتست)rkrj)rwrlrnrprormrx)rrrqry)rtrsri)rlrzr{)r|r}r~)rirjrk)rlrmrnrorp)rqrr)ruفا)rcrd)rerf)rjr)rrTFc|jjd|}|jjd|}|jjd|}|S)zT :param token: string :return: normalized token type string rI)_ArabicStemmer__vocalizationsub_ArabicStemmer__kasheeda(_ArabicStemmer__arabic_punctuation_marksr*s r%__normalize_prezArabicStemmer.__normalize_presO ##''E2##B.//33B> r'cH|jD]"}|j|st||d}n|jj d|}|j j d|}|j j d|}|jj d|}|S)Nuءrtruri)_ArabicStemmer__last_hamzatr7r_ArabicStemmer__initial_hamzatr_ArabicStemmer__waw_hamza_ArabicStemmer__yeh_hamza_ArabicStemmer__alefat)r!r+hamzas r%__normalize_postzArabicStemmer.__normalize_posts'' E~~e$&ueX>  %%))(E:  $$Xu5  $$Xu5 !!(E2 r'c|jD]{}|j|s||jvr%t|dkDrd|_d|_d|_y||jvsWt|dkDsfd|_d|_d|_yy)NTFrK)_ArabicStemmer__checks1 startswith_ArabicStemmer__articles_3lenrNis_nounis_verb is_defined_ArabicStemmer__articles_2lenr!r+prefixs r% __checks_1zArabicStemmer.__checks_1*snn F'T111c%j1n#'DL#(DL&*DOT111c%j1n#'DL#(DL&*DO r'c|jD][}|j|s|dk(rt|dkDrd|_d|_y|dk(s>t|dkDsMd|_d|_yy)NrgrYTFrhrK)_ArabicStemmer__checks2r7rNrrr!r+suffixs r% __checks_2zArabicStemmer.__checks_29sgnn F~~f%X%#e*q.#'DL#(DL^+E Q#'DL#(DL r'cP|jD]}|j|s||jvrt|dk\r|dd}d|_|S||j vrt|dk\r|dd}d|_|S||j vszt|dk\s|dd}d|_|S|SNrT)!_ArabicStemmer__suffix_verb_step1r7)_ArabicStemmer__conjugation_suffix_verb_1rNsuffixes_verb_step1_success)_ArabicStemmer__conjugation_suffix_verb_2)_ArabicStemmer__conjugation_suffix_verb_3rs r%__Suffix_Verb_Step1z!ArabicStemmer.__Suffix_Verb_Step1Fs.. F~~f%T===#e*PQ/!#2JE7;D4 T===#e*PQ/!#2JE7;D4  T===#e*PQ/!#2JE7;D4 !   r'c|jD]}|j|st|dkDs$|dk(rt|dk\r|dd}d|_|S||jvrt|dk\r|dd}d|_|S||j vrt|dk\r|dd}d|_|S||j vrt|dkDr|dd}d|_|S|dk(st|d k\s|dd }d|_|S|S) NrKrvrrTrrrrr)"_ArabicStemmer__suffix_verb_step2ar7rNsuffix_verb_step2a_success)_ArabicStemmer__conjugation_suffix_verb_4,_ArabicStemmer__conjugation_suffix_verb_past/_ArabicStemmer__conjugation_suffix_verb_presentrs r%__Suffix_Verb_Step2az"ArabicStemmer.__Suffix_Verb_Step2aYs;// F~~f%#e*q.X%#e*/!#2JE6:D3* 'T===#e*PQ/!#2JE6:D3  T@@@SZST_!#2JE6:D3 TCCCE UV!#2JE6:D3  11c%jAo!#2JE6:D3 5 4 r'c|jD]K}|j|s|dk(rt|dk\r|dd}|S|dk(s6t|dk\sE|dd}|S|S)Nrrrrurr)"_ArabicStemmer__suffix_verb_step2cr7rNrs r%__Suffix_Verb_Step2cz"ArabicStemmer.__Suffix_Verb_Step2cvsy// F~~f%11c%jAo!#2JE  X%#e*/!#2JE   r'c|jD]1}|j|st|dk\s$|dd}d|_|S|SNrrT)"_ArabicStemmer__suffix_verb_step2br7rNsuffix_verb_step2b_successrs r%__Suffix_Verb_Step2bz"ArabicStemmer.__Suffix_Verb_Step2bO// F~~f%#e*/cr 26/    r'c|jD]1}|j|st|dk\s$|dd}d|_|S|S)NrKrT)#_ArabicStemmer__suffix_noun_step2c2r7rNsuffix_noun_step2c2_successrs r%__Suffix_Noun_Step2c2z#ArabicStemmer.__Suffix_Noun_Step2c2sO00 F~~f%#e*/cr 370    r'cP|jD]}|j|s||jvrt|dk\r|dd}d|_|S||j vrt|dk\r|dd}d|_|S||j vszt|dk\s|dd}d|_|S|Sr)"_ArabicStemmer__suffix_noun_step1ar7)_ArabicStemmer__conjugation_suffix_noun_1rNsuffix_noun_step1a_success)_ArabicStemmer__conjugation_suffix_noun_2)_ArabicStemmer__conjugation_suffix_noun_3rs r%__Suffix_Noun_Step1az"ArabicStemmer.__Suffix_Noun_Step1as// F~~f%T===#e*PQ/!#2JE6:D3 T===#e*PQ/!#2JE6:D3  T===#e*PQ/!#2JE6:D3 !   r'c|jD]1}|j|st|dkDs$|dd}d|_|S|S)NrrT)"_ArabicStemmer__suffix_noun_step2ar7rNsuffix_noun_step2a_successrs r%__Suffix_Noun_Step2az"ArabicStemmer.__Suffix_Noun_Step2asO// F~~f%#e*q.cr 26/    r'c|jD]1}|j|st|dk\s$|dd}d|_|S|Sr)"_ArabicStemmer__suffix_noun_step2br7rNsuffix_noun_step2b_successrs r%__Suffix_Noun_Step2bz"ArabicStemmer.__Suffix_Noun_Step2brr'cx|jD]*}|j|st|dk\s$|dd}|S|S)Nrr)#_ArabicStemmer__suffix_noun_step2c1r7rNrs r%__Suffix_Noun_Step2c1z#ArabicStemmer.__Suffix_Noun_Step2c1sG00 F~~f%#e*/cr    r'c|jD]1}|j|st|dkDs$|dd}d|_|S|S)NrrT)"_ArabicStemmer__suffix_noun_step1br7rNsuffixe_noun_step1b_successrs r%__Suffix_Noun_Step1bz"ArabicStemmer.__Suffix_Noun_Step1bsO// F~~f%#e*q.cr 370    r'cx|jD]*}|j|st|dk\s$|dd}|S|S)NrKr)!_ArabicStemmer__suffix_noun_step3r7rNrs r%__Suffix_Noun_Step3z!ArabicStemmer.__Suffix_Noun_Step3sG.. F~~f%#e*/cr    r'cf|jD]!}|j|st||d}#|S)Nri)'_ArabicStemmer__suffix_all_alef_maqsurar7rrs r%__Suffix_All_alef_maqsuraz'ArabicStemmer.__Suffix_All_alef_maqsuras:44 @F~~f%&ufh? @ r'c<|jD]}|j|st|dkDs$|dk(rt||d}|S|dk(rt||d}|S|dk(rt||d}|S|dk(rt||d }|S|d k(s~t||d }|S|S) NrKrr_rrarrbrrtrr`)_ArabicStemmer__prefix_step1rrNrrs r%__Prefix_Step1zArabicStemmer.__Prefix_Step1s)) F'CJN^+*5&(CE" ~-*5&(CE ~-*5&(CE ~-*5&(CE  ~-*5&(CE + * r'c|jD]:}|j|st|dkDs$|t|d}d|_|S|S)NrT)_ArabicStemmer__prefix_step2arrNprefix_step2a_successrs r%__Prefix_Step2azArabicStemmer.__Prefix_Step2asV** F'CJNc&km,-1*    r'c|jD]E}|j|st|dkDs$|dd|jvs6|t|d}|S|S)NrKrY)_ArabicStemmer__prefix_step2brrN_ArabicStemmer__prefixes1rs r%__Prefix_Step2bzArabicStemmer.__Prefix_Step2bsb** F'CJN!9D$4$44!#f+-0E    r'c|jD]v}|j|s||jvr&t|dkDr|t|d}d|_|S||j vsXt|dkDsg|t|d}|S|S)NrTr)"_ArabicStemmer__prefix_step3a_nounrrrNprefix_step3a_noun_successrrs r%__Prefix_Step3a_Nounz"ArabicStemmer.__Prefix_Step3a_Nouns// F'T111c%j1n!#f+-0E6:D3 T111c%j1n!#f+-0E   r'c\|jD]}|j|st|dkDrE|dk(r|t|d}d|_|S||jvrt |||d}d|_|S||j vswt|dkDs|t|d}d|_|S|S)NrKrTrJr)"_ArabicStemmer__prefix_step3b_nounrrNprefix_step3b_noun_success_ArabicStemmer__prepositions2r_ArabicStemmer__prepositions1rs r%__Prefix_Step3b_Nounz"ArabicStemmer.__Prefix_Step3b_Nouns// F'u:>) %c&km 4:>7 !5!55 .uffQi H:>7  T111c%j1n!#f+-0E6:D3 # " r'c|jD]5}|j|st|dkDs$t|||d}|S|S)NrrJ)!_ArabicStemmer__prefix_step3_verbrrNrrs r%__Prefix_Step3_Verbz!ArabicStemmer.__Prefix_Step3_VerbsP.. F'CJN&uffQi@   r'c|jD]@}|j|st|dkDs$t||d}d|_d|_|S|S)NruاستTF)!_ArabicStemmer__prefix_step4_verbrrNrrrrs r%__Prefix_Step4_Verbz!ArabicStemmer.__Prefix_Step4_Verb s[.. F'CJN&uf6JK# $     r'cd|_d|_d|_d|_d|_d|_d|_d|_d|_d|_ d|_ d|_ d|_ |}|j||j||j|}||j vst#|dkr|S|jrz|j%|}|j&r/|j)|}|js@|j+|}n.|j-|}|js|j)|}|jrG|j/|}|j s|jsi|j1|}|j3|}|js|j5|}|js|js|j7|}n|j9|}|jrX|j3|}|js|j5|}|jsL|js@|j7|}n.|js|j3|}|j5|}|j;|}|js|jr|j=|}|j?|}|jA|}|js|jC|}|jE|}|js|jr|jG|}n:|js.|jr"|jI|}|jK|}|jM|}|}|S)zw Stem an Arabic word and return the stemmed form. :param word: string :return: string TFrY)'rrrrrrrrrrrrr_ArabicStemmer__checks_1_ArabicStemmer__checks_2_ArabicStemmer__normalize_prerrN!_ArabicStemmer__Suffix_Verb_Step1r"_ArabicStemmer__Suffix_Verb_Step2a"_ArabicStemmer__Suffix_Verb_Step2c"_ArabicStemmer__Suffix_Verb_Step2b#_ArabicStemmer__Suffix_Noun_Step2c2"_ArabicStemmer__Suffix_Noun_Step1a"_ArabicStemmer__Suffix_Noun_Step2a"_ArabicStemmer__Suffix_Noun_Step2b#_ArabicStemmer__Suffix_Noun_Step2c1"_ArabicStemmer__Suffix_Noun_Step1b!_ArabicStemmer__Suffix_Noun_Step3'_ArabicStemmer__Suffix_All_alef_maqsura_ArabicStemmer__Prefix_Step1_ArabicStemmer__Prefix_Step2a_ArabicStemmer__Prefix_Step2b"_ArabicStemmer__Prefix_Step3a_Noun"_ArabicStemmer__Prefix_Step3b_Noun!_ArabicStemmer__Prefix_Step3_Verb!_ArabicStemmer__Prefix_Step4_Verb_ArabicStemmer__normalize_post)r!r= modified_word stemmed_words r%r zArabicStemmer.stem)s  */'*/'+0(*/'*/'*/'+0(%*"*/'*/'   &  &,,]; DNN *c-.@A.E << 44]CM// $ 9 9- H 66$($=$=m$LM!% 9 9- H 66$($=$=m$LM << 66}EM33$($=$=m$LM$($=$=m$LM::(,(A(A-(P  ;; $ ? ?(,(B(B=(Q %)$=$=m$LM77(,(A(A-(P #>>,0,E,Em,TM $ ? ?$($C$C,0,F,F},UM#,0,E,Em,TM(,(A(A-(P  44]CM||  ::=IM++M: ,,]; )) 00?M11-@ ..4<< 55mDM22t|| $ 8 8 G $ 8 8 G --m< $ r'N)Wr-r.r/r0recompilerrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr r r r r rrrrrrrrrrrr r1r'r%r^r^0sR  RZZDNK(J!+,E!FGM"rzz":;"**[)K"**[)Krzz12HI+I $9)$$"  <; (NCO*O#7## #A&" )%#A## 3KBO6O+O6OGGJ"'!&!&"'!&!&!&"'!!&!&    &: & 0 (`r'r^c2eZdZdZdZdZdZdZdZdZ dZ d Z y ) DanishStemmera The Danish Snowball stemmer. :cvar __vowels: The Danish vowels. :type __vowels: unicode :cvar __consonants: The Danish consonants. :type __consonants: unicode :cvar __double_consonants: The Danish double consonants. :type __double_consonants: tuple :cvar __s_ending: Letters that may directly appear before a word final 's'. :type __s_ending: unicode :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2_suffixes: Suffixes to be deleted in step 2 of the algorithm. :type __step2_suffixes: tuple :cvar __step3_suffixes: Suffixes to be deleted in step 3 of the algorithm. :type __step3_suffixes: tuple :note: A detailed description of the Danish stemming algorithm can be found under http://snowball.tartarus.org/algorithms/danish/stemmer.html aeiouyæåøbcdfghjklmnpqrstvwxzbbccddffgghhjjkkllmmnnppqqrrssttvvwwxxzzuabcdfghjklmnoprtvyzå) erendeserendehedensethederedehedenhederendeserneserenseretseredendeerneerenererhedsenesereserethedeneereensersetseneresetr>s)gddtgtkt)eligløstligelsigc6|j}||jvr|S|j||j}|jD]T}|j |s|dk(r|d|j vr)|dd}|dd}n|dt| }|dt| }n|jD]}|j |s|dd}|dd}n|j dr |dd}|dd}|jD]h}|j |s|dk(r |dd}|dd}nC|dt| }|dt| }|j |jr |dd}|dd}n|jD]*}|j |st|dkDs$|dd}|S|S)z Stem a Danish word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode rUrNrigstr[rK) r6rrR_DanishStemmer__vowels_DanishStemmer__step1_suffixesr7_DanishStemmer__s_endingrN_DanishStemmer__step2_suffixes_DanishStemmer__step3_suffixes!_DanishStemmer__double_consonants)r!r=rPr double_conss r%r zDanishStemmer.stemszz| 4>> !K " "4 7++ F{{6"S=Bx4??2#CRyW3v;,/DNs6{l+B ++ F{{6"CRyW   ;;v 9DCRB++ F{{6"Y&9DCRB3v;,/DNs6{l+B{{4#8#89#CRyW  33 K}}[)c$i!mCRy    r'N) r-r.r/r0ra_DanishStemmer__consonantsrfrcrbrdrer r1r'r%rrs?0$H)L,+J!D0>Dr'rc"eZdZdZdZdZdZdZy) DutchStemmera The Dutch Snowball stemmer. :cvar __vowels: The Dutch vowels. :type __vowels: unicode :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step3b_suffixes: Suffixes to be deleted in step 3b of the algorithm. :type __step3b_suffixes: tuple :note: A detailed description of the Dutch stemming algorithm can be found under http://snowball.tartarus.org/algorithms/dutch/stemmer.html uaeiouyè)r<rLrQserU)baarlijkbarendingr^c |j}||jvr|Sd}|jddjddjddjddjdd jd d jd d jd d jddjdd}|jdrdj d|ddf}t dt |D]=}||dz |jvs||dk(s!dj |d|d||dzdf}?t dt |dz D]R}||dz |jvs||d k(s!||dz|jvs6dj |d|d||dzdf}T|j||j\}}t dt |D]e}|||jvs||dz |jvs*dt |d|dzcxkDrdkDr nn|dd}nt |d|dzdk(r|cSn|jD]Y}|j|s|dk(r;t||d}t||d}|jdrt||d}n|dvr|jds|t | dz |jvrm|t | dz t | dk7rN|dt | }|dt | }|dt | }|jdrt|dd}|dd}|dd}nd|dvr`|t | dz |jvrB|t | dz d k7r-|dt | }|dt | }|dt | }n|jdrB|d!|jvr1d"}|dd}|dd}|dd}|jdr|dd}|dd}|dd}|jdrp|d#d$k7rh|dd%}|dd%}|dd%}|jd&rH|d'|jvr7|d#d!dk7r/|dd!}|dd!}|dd!}|jdr|dd}|dd}|dd}|jD]}|j|s|d(vr@|dd'}|dd'}|jd)r|d'dk7r|dd!}n|jdr}|dd}nw|d)k(r|d'dk7r|dd!}nd|d*k(rH|dd%}|dd%}|jdrD|d!|jvr3|dd}|jdr|dd}n|d+k(r|dd%}n |d,k(r|r|dd'}nt |d-k\rN|d|jvr=|ddk7r5|d'dd.vr.|d%|jvrdj |dd'|d'|df}|jdd jdd}|S)/z Stem a Dutch word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode Fäaáër>éírQïöoóüuúyrIYrJNIrKrr<heid)rLrQgem)r*r%r2r)rkrUjrTcrQr)rorpr^rmrlrnr)aaeeoouu) r6rreplacerjoinrMrN_DutchStemmer__vowelsrW_DutchStemmer__step1_suffixesr7r_DutchStemmer__step3b_suffixes)r!r= step2_successrQrPrVrs r%r zDutchStemmer.stemAszz| 4>> !K  LL % WVS ! WVS ! WVS ! WVS ! WVS ! WVS ! WVS ! WVS ! WVS ! ??3 77Cab?+Dq#d)$ ?AAE{dmm+Q3wwRa#tAEG}=> ?q#d)a-( ?AQU t}},GsNQK4==0wwRa#tAEG}=>  ?$$T4==9Bq#d)$ AAwdmm+QU t}}0Ls4!a%=)-A-abBgA'1,K  ++ F{{6"W$)$?D'FF;B{{7++B?m+ MM'2c&k\A-.dmmCc&k\A-V =F3v;,/DNs6{l+BNs6{l+B}}%78#CRyWWk)c&k\A-.dmmCc&k\A-.#53v;,/DNs6{l+BNs6{l+B= B ;;s R = M9DCRBCRB}}/0CRyWW ;;v 48s?9DCRBCRB D!HDMM1BK5(CRyWW==!349DCRBCRB,, F{{6"^+9DCRB{{4(T"X_#CRy==);<#'9Dt^RC9Dv%9DCRB{{3'DHDMM,I#CRy==);<#'9Dv%9Du_9D; @ t9>Bxt}},bS2;"::Bxt}}4!wwSb 48T"X'FG||C%--c37 r'N)r-r.r/r0rrrr r1r'r%rjrj-s  H8Chr'rjc$eZdZdZdZdZdZdZdZdZ dZ d Z d Z d Z id d ddddddddddddddddddd d!ddd"d"d#d#d$d$d%d%d&d&id'd'd(d(d)d(d*d*d+d*d,d,d-d,d.d.d/d.d0d0d1d0d2d2d3d2d4d2d5d2d6d6d7d6d6d6d8d8d8d8d9Zd:Zy;)<EnglishStemmera The English Snowball stemmer. :cvar __vowels: The English vowels. :type __vowels: unicode :cvar __double_consonants: The English double consonants. :type __double_consonants: tuple :cvar __li_ending: Letters that may directly appear before a word final 'li'. :type __li_ending: unicode :cvar __step0_suffixes: Suffixes to be deleted in step 0 of the algorithm. :type __step0_suffixes: tuple :cvar __step1a_suffixes: Suffixes to be deleted in step 1a of the algorithm. :type __step1a_suffixes: tuple :cvar __step1b_suffixes: Suffixes to be deleted in step 1b of the algorithm. :type __step1b_suffixes: tuple :cvar __step2_suffixes: Suffixes to be deleted in step 2 of the algorithm. :type __step2_suffixes: tuple :cvar __step3_suffixes: Suffixes to be deleted in step 3 of the algorithm. :type __step3_suffixes: tuple :cvar __step4_suffixes: Suffixes to be deleted in step 4 of the algorithm. :type __step4_suffixes: tuple :cvar __step5_suffixes: Suffixes to be deleted in step 5 of the algorithm. :type __step5_suffixes: tuple :cvar __special_words: A dictionary containing words which have to be stemmed specially. :type __special_words: dict :note: A detailed description of the English stemming algorithm can be found under http://snowball.tartarus.org/algorithms/english/stemmer.html aeiouy) r#r%r&r'r,r-r.r0r2 cdeghkmnrt)z's'z's')ssesiediesusr1rU)eedlyinglyedlyeedrped)izationationalfulnessousnessivenesstionalbilitilesslientliationalismalitiousliivitifulliencianciabliizeratorallibliogili) rralizeicateicitiativeicalnessful)ementanceenceableiblementantentismateitiousiveizeionalrRic)r>lskisskiskiesskydyingdielyinglietyingtieidlyidlgentlygentluglyugliearlyearlionlyonlisinglysinglnewshoweatlascosmosbiasandesinninginningsoutingoutingscanningcanningsherringherringsearringearringsproceedproceeds proceeded proceedingexceedexceedssucceed)exceeded exceedingrsucceeds succeeded succeedingc`|j}||jvst|dkr|S||jvr|j|S|j ddj ddj dd}|j dr|dd}|j drd j d |ddf}tdt|D]=}||dz |jvs||dk(s!d j |d|d ||dzdf}?d }d }d }d }|j d rg|j d r|dd}n|dd}tdt|D]2}|||jvs||dz |jvs*||dzd}n!n|j||j\}}|jD]B}|j|s|dt| }|dt| }|dt| }n|jD]}|j|s|dk(r|dd}|dd}|dd}nq|dvr;t|dt| dkDr|dd}|dd}|dd}nB|dd}|dd}|dd}n2|dk(r-|ddD]}||jvsd}n|r|dd}|dd}|dd}n|jD]J}|j|s|dvrp|j|rt||d}t|t|k\rt||d}nd }t|t|k\rt||d}nd }n|dt| D]}||jvsd}n|r|dt| }|dt| }|dt| }|jdrXd j |df}d j |df}t|dkDst|dk\rd j |df}n|j|jr|dd}|dd}|dd}n|d k(rHt|dk\r:|d|jvr)|ddvr"|d|jvr|d|jvs5|d k(rt|dk(rw|d|jvrf|d|jvrUd j |df}t|dkDrd j |df}t|dkDrd j |df}nt|dkDr||ddvru|d|jvrdd j |dddf}t|dk\rd j |dddf}nd }t|dk\rd j |dddf}nd }|j D]r}|j|s|j|rK|d k(r|dd}|dd}|dd}n5|d!vrgd j |dddf}t|dk\rd j |dddf}nd }t|dk\rd j |dddf}nd }n|d"k(r|dd}|dd}|dd}n|d#vr^t||d$}t|t|k\rt||d$}nd }t|t|k\rt||d$}nVd }nR|d%vr^t||d&}t|t|k\rt||d&}nd }t|t|k\rt||d&}nd}n|d'vr^t||d(}t|t|k\rt||d(}nd }t|t|k\rt||d(}nd }n|d)k(r|dd*}|dd*}|dd*}nx|d+vr^t||d,}t|t|k\rt||d,}nd }t|t|k\rt||d,}nd }n|d-vr\t||d.}t|t|k\rt||d.}nd }t|t|k\rt||d.}nd}n|d/vr\t||d0}t|t|k\rt||d0}nd }t|t|k\rt||d0}nYd }nV|d1k(r|d*d2k(r|dd}|dd}|dd}n9|d3vr|dd}|dd}|dd}n%|d4k(r |d|j"vr|dd}|dd}|dd}n|j$D]j}|j|s|j|rC|d k(r|dd}|dd}|dd}n-|d5k(r\t||d&}t|t|k\rt||d&}nd }t|t|k\rt||d&}nd }n|d6k(r|dd}|dd}|dd}n|d7vr\t||d8}t|t|k\rt||d8}nd }t|t|k\rt||d8}nZd }nW|d9vr.|dt| }|dt| }|dt| }n%|d:k(r |j|r|dd;}|dd;}|dd;}n|j&D]o}|j|s|j|rI|dk\r?|d|jvs)|ddvs"|d|jvs|d*|jvr|dd}|j d d}|S)?z Stem an English word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode rYu’ru‘u‛rJNrrIrF)genercommunarsen)rrrrrr)rrrrUT)rrr)atblizr>rKwxYrryYrQr)rrrr)rrr)rrrr)rrrrrr)rrr)rrr)rrblerr)rrrrr)rrrr)rrrrrstr)r6rrN_EnglishStemmer__special_wordsrrrrM_EnglishStemmer__vowelsrW_EnglishStemmer__step0_suffixesr7 _EnglishStemmer__step1a_suffixes _EnglishStemmer__step1b_suffixesr"_EnglishStemmer__double_consonants_EnglishStemmer__step2_suffixes_EnglishStemmer__li_ending_EnglishStemmer__step3_suffixes_EnglishStemmer__step4_suffixes) r!r=rQstep1a_vowel_foundstep1b_vowel_foundrPrVrletters r%r zEnglishStemmer.stemws zz| 4>> !SY!^K T)) )''- - LL6 * WXv & WXv & ??6 "8D ??3 77Cab?+Dq#d)$ ?AAE{dmm+Q3wwRa#tAEG}=> ?#"   ??7 812!"X!"X1c"g& a5 -"QU)t}}2LAEGB  ((t}}=FB++ F}}V$Ns6{l+3v;,'3v;,'  ,, F}}V$V#9DCRBCRB~-43v;,/014#CRyWW#CRyWWs]"&s)"!T]]215.!" *#CRyWW7 <,,: F}}V$--{{6*-dFDAr7c&k1!/FD!AB!#Br7c&k1!/FD!AB!#B"&~#f+"6"!T]]215.!" *#Ns6{l33v;,/3v;,/==);<#%77D#;#7D!#"c!3B"4y1}B1 %'WWb#Y%7!]]4+C+CD#'9D!#CRB!#CRB"H #D Q $R = $R 5 $RDMM 9 $R ="H #D Q $Q4== 8 $Qt}} <#%77D#;#7D"2w{%'WWb#Y%7"2w{%'WWb#Y%7u: z t9q=T"X-$r($--2O77D"Is+,D2w!|WWb"gs^,2w!|WWb"gs^,++{ F}}V$;;v&)#CRyWW#;;!wwSb 3'78r7aHPK)?F4 ("))) )  )  )  ) ') ) ) ) ') u) ) ) ) (!)" #)$ %)& (')( 8))* (+), 8-). 9/)0 I1)2 93)4 I5)6 97)8 I9): 9;)< I=)> Y?)@ iA)B (C)D 8E)FQ)OVvr'rc:eZdZdZdZdZdZdZdZdZ dZ d Z d Z d Z y ) FinnishStemmeraS The Finnish Snowball stemmer. :cvar __vowels: The Finnish vowels. :type __vowels: unicode :cvar __restricted_vowels: A subset of the Finnish vowels. :type __restricted_vowels: unicode :cvar __long_vowels: The Finnish vowels in their long forms. :type __long_vowels: tuple :cvar __consonants: The Finnish consonants. :type __consonants: unicode :cvar __double_consonants: The Finnish double consonants. :type __double_consonants: tuple :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2_suffixes: Suffixes to be deleted in step 2 of the algorithm. :type __step2_suffixes: tuple :cvar __step3_suffixes: Suffixes to be deleted in step 3 of the algorithm. :type __step3_suffixes: tuple :cvar __step4_suffixes: Suffixes to be deleted in step 4 of the algorithm. :type __step4_suffixes: tuple :note: A detailed description of the Finnish stemming algorithm can be found under http://snowball.tartarus.org/algorithms/finnish/stemmer.html u aeiouyäöu aeiouäö)rriirruääuöör!r") kaanukäänstikinhanhänkouköpaupä) nsaunsämmennesinianänrQ)siinttenseenr"henhinhonr#höndenttattässassästaställallältaltälleksiinetatänanärsrrn)impiimpauimpäimmiimmauimmämpimpampämmimmammäejauejäc |j}||jvr|Sd}|j||j\}}|jD]o}|j |s|dk(r||vrQ|dd}|dd}|dd}nA|t | dz dvr-|dt | }|dt | }|dt | }n|jD]}|j |s|dk(r|ddk7r|dd }|dd }|dd }n|d k(rj|dd }|dd }|dd }|j d r t|d d }|j d r t|d d }|j d rt|d d }nq|d k(r|dd dvs|dd dvr^|dd }|dd }|dd }nN|dk(r|dd dvs|dd dvr;|dd }|dd }|dd }n+|dk(r|dd dvr|dd }|dd }|dd }n|dd}|dd}|dd}n|jD]}|j |s|dvrc|dk(r|ddk(sC|dk(r|ddk(s6|dk(r|ddk(s)|dk(r|ddk(s|d k(r|dd!k(s|d"k(rr|dd#k(ri|dd}|dd}|dd}d$}nV|d%vrd|t | dz dk(rN|t | d&z |jvr0|dt | }|dt | }|dt | }d$}n|d'k(r%|d(d|jvr|dd}|dd}|dd}d$}nƐ|d)vr4|d |jvr|d|jvr|dd*}|dd*}|dd*}d$}n|d+vr|ddk(r|dd}|dd}|dd}d$}nn|d,k(r:|dd*}|dd*}|dd*}d$}|d dd-k(s|d d|jvr?|dd*}|dd*}|dd*}n/|dt | }|dt | }|dt | }d$}n|jD]^}|j |s|d.vr|ddd/k7r=|dd}|dd}|dd}n-|dt | }|dt | }|dt | }n|r t |dk\r|d*d0vr |dd*}|dd*}nw|sut |d&k\rg|d*d1k(r_|d |jvrN|dd*}|dd*}|dd*}|j d2r |dd}|dd}n#|j d3r|ddd/k7r |dd}|dd}|d d|jvr |dd*}|dd*}t |d&k\r"|d |jvr|d*d4vr |dd*}|dd*}|j d5r |dd*}|dd*}|j d6r |dd*}|dd*}tdt |D]p}|| |jvr|dk(r|| dz d|jvr?|dd*}|S|| dz | dz|jvrd7j!|d| || dzdf}|S|S)8z Stem a Finnish word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode Fr NrrJu ntaeiouyäör)krr*kser@r+r)rBrDr)r7r9r;r=r,)rCrE)r8r:r<r>rQ)r?rA)r"r0r1r2r#r3r"rsr0r>r1rQr2rzr#rrr3ryT)r-r4r.rYr/)rsrrr)r5r6rFie)rKrLrMrNrOrPpoijtrJrOuaäei)ojujjorI)r6rrW_FinnishStemmer__vowels_FinnishStemmer__step1_suffixesr7rN_FinnishStemmer__step2_suffixesr_FinnishStemmer__step3_suffixes"_FinnishStemmer__restricted_vowels_FinnishStemmer__long_vowels_FinnishStemmer__consonants_FinnishStemmer__step4_suffixesrM"_FinnishStemmer__double_consonantsr)r!r= step3_successrPrVrrQs r%r zFinnishStemmer.stemcs(zz| 4>> !K $$T4==9B++ F{{6"U?|#CRyWWS[L1,-1CC#Ns6{l33v;,/3v;,/ ++4 F{{6"T>Bx3#CRyWWt^9DCRBCRB}}U+-dE5A{{5)+Bu={{5)+Bu=t^Br{l2d2bkF7 $CRyWWw&Br{&88DBKL= $CRyWWt^Br{n4#CRyWW9DCRBCRBi4 n++B F{{6"MM5T"X_"eORC"eORC"eORC"h.48v3E"h.48v3E#CRyWW(, 66c&k\A-.#5 #f+!12d6N6NN#Ns6{l33v;,/3v;,/(,  v%Br{d&8&88#CRyWW(,  },Bx4==0T"XARAR5R#CRyWW(, 00Bx3#CRyWW(, s]9DCRBCRB$(MBCyD(DI9K9K,K#CRyWW3v;,/DNs6{l+BNs6{l+B$(MEB J++ F{{6"MMBr{d*#CRyWW3v;,/DNs6{l+BNs6{l+B  SW\bfn9DCRBB1 2# 2$--'9DCRBCRB{{6"CRyWU#2b T(9CRyW bc7d(( (9DCRB r7a'BC   r'N)r-r.r/r0r]rarbrcrer^r_r`rdr r1r'r%rrsN4 H)JM)L, X@"`r'rc0eZdZdZdZdZdZdZdZdZ dZ y ) FrenchStemmera The French Snowball stemmer. :cvar __vowels: The French vowels. :type __vowels: unicode :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2a_suffixes: Suffixes to be deleted in step 2a of the algorithm. :type __step2a_suffixes: tuple :cvar __step2b_suffixes: Suffixes to be deleted in step 2b of the algorithm. :type __step2b_suffixes: tuple :cvar __step4_suffixes: Suffixes to be deleted in step 4 of the algorithm. :type __step4_suffixes: tuple :note: A detailed description of the French stemming algorithm can be found under http://snowball.tartarus.org/algorithms/french/stemmer.html uaeiouyâàëéêèïîôûù)+ issementsissementatricesatriceateursationslogiesusionsutionsementsammentemmentancesiqUesismesablesistesateurrlogieusionutionencesreusesmentsriqUeismeristeritésiveseauxeusereuxitérifsauxif)#issaIentissantesiraIentissanteissantsissionsirionsissaisissaitissantissentissiezissonsiraisiraitirentiriezironsirontissesissezuîmesuîtesiraiirasirezisserirauîtrVirisitrQ)&eraIentassionserionsassentassiezèrenteraiseraiteriezeronserontaIentantesassesionseraieraserezâmesâtesanteantsasseéeseraiezaisaitréeésrRezâtaiasrvrs)ièreIèrerierIerr>ruc|j}||jvr|Sd}d}d}d}tdt|D]4}||dz dk(s||dk(sdj |d|d||dzdf}6tdt|dz D]}||dz |j vr_||dz|j vrK||dk(rdj |d|d||dzdf}n%||dk(rdj |d|d ||dzdf}||dz |j vs||dz|j vs||d k(sdj |d|d ||dzdf}|j ||j \}}|j||j } |jD]u} |j| s| d k(r |dd }d}nR| dvr-| |vr|dt|  }d}n7| |vr2t|| d}d}n!| dvr| | vr|dt|  }d}|dddk(rd|vr|dd}|dddk(rd|vr|dd}n|dddk(r(d|vr|dd}nd|vrdj |dd df}n|dddvrd|vsd|vr|dd}n|dddvrd| vsd| vrdj |dddf}nr| dk(r"| | vrt|dd}t| dd} d}nK| d k(r| | vrt|d d!}d}n1| d"vrZ| | vrV| j| sE| | j| dz |j vr"|dt|  }| dt|  } d}n| d#k(r| |vrdj |ddd$f}d}n| d%vr5| |vr1|t|  dz |j vr|dt|  }d}nw| d&vr| |vr|dt|  }d}n\| d'vrA| |vr=|dt|  }d}|ddd(k(r:d(|vr|dd}n/dj |dddf}n| d)vr| |vrt|| d*}d}n| d+vr| |vrt|| d}d}n| d,vr| |vrt|| d!}d}n| d-vry| |vru|dt|  }d}|d.dd/k(r!d/|vr|dd.}ndj |ddd$f}n|ddd(k(r!d(|vr|dd}n{dj |dddf}nd|dddk(r\d|vrX|dd}nR| d0vrN| |vrJ|dt|  }d}|dddk(r1d|vr-|dd}|ddd(k(r d(|vr|dd}ndj |dddf}n|r|r|jD]d} |j| s| | vrKt| t| kDr4| | j| dz |j vr|dt|  }d}n|s|jD]v} | j| s| d1k(r d1|vr|dd.}d}nP| d2vr|dt|  }d}n:| d3vr6|dt|  }| dt|  } d}| jd4r|dd }n|s|s|r>|d d k(rdj |dd df}n|d d5k(rdj |dd d6f}nt|d7k\r|d d8k(r |dd9vr|dd }|jD]_} |j| s| | vs| d:k(r| |vr | d.d;vr|dd}n/| dk(r|dd }n|jd?r|dd }tdt|D]G}|| |j vr|dz }|dk7r'|| d@vrdj |d| d4|| dzdf}n|j!d dj!ddj!d d }|S)Az Stem a French word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode FrJqr}rINUrQrrrrrT)rrr)rrrrivrreusx)abliqUrr)ièrIèrrrrsrrtr)rrrr)rjri) rrrrrrrurvrwrxry)rlrzrrkrmrnr)r{rolog)r|r}rprq)rr~)rrrabil)rrrrr)rrrrrrrrrrrrrrrrrRrrv)rrrrrrrrrrrrrrrrrrsr>çrrYrUuaiouèsrr )rrrrrugu)ennonnettelleill)rvè)r6rrMrNr_FrenchStemmer__vowelsrW_FrenchStemmer__rv_french_FrenchStemmer__step1_suffixesr7rrrindex_FrenchStemmer__step2a_suffixes_FrenchStemmer__step2b_suffixes_FrenchStemmer__step4_suffixesr) r!r= step1_successrv_ending_foundstep2a_successstep2b_successrQrPrVr[rs r%r zFrenchStemmer.stems zz| 4>> !K q#d)$ ?AAE{c!d1gnwwRa#tAEG}=> ?q#d)a-( CAAE{dmm+QU t}}0L7c>77D!Hc4A=#ABD!W^77D!Hc4A=#ABDAE{dmm+tAE{dmm/K7c>77D!Hc4A=#ABD C$$T4==9B   dDMM 2++T F}}V$V#9D$(M00|#Ns6{l3(, 2-dFEB(, 22v|3v;,/D$(MBCyD(TRZ#CRy9,#'9Dbce+ B;#'9D"b[#%77D"Is+;#X^#%77D"Is+;#D22v|)$>D$(MEE&TV,)$D$(M4423v;,/D$(MBCyF*!R<#'9D#%77D"Is+;#Dbcd*2:#'9D;;" 3v;,/D$(MBCyD(TRZ#CRy9,#rz'+CRy')wwSb 5/A'BiT n00 ==(" Gc&k1ryy0145T]]J#Ns6{l3)- ""447F{{6*!V+" #'9D-1N#(*$(3v;,#7D-1N#(($(3v;,#7D!#Ns6{l!3B-1N!{{3/'+CRyo7t NnBx3wwSb 3/0bV#wwSb 3/04yA~$r(c/d2hk6QCRy// ==(|!U?v|2$#'9D#'KK#1$#DD#s]#'9D#v-$r"+2E#'9D " === >9Dq#d)$ AQBxt}},Q6dA2h*::77D1"IsD!aN#CDD  ||C%--c37??SI r'cd}t|dk\rR|jds|d|vr|d|vr|dd}|Stdt|D]}|||vs ||dzd}|S|S)a Return the region RV that is used by the French stemmer. If the word begins with two vowels, RV is the region after the third letter. Otherwise, it is the region after the first vowel not at the beginning of the word, or the end of the word if these positions cannot be found. (Exceptionally, u'par', u'col' or u'tap' at the beginning of a word is also taken to define RV as the region to their right.) :param word: The French word whose region RV is determined. :type word: str or unicode :param vowels: The French vowels that are used to determine the region RV. :type vowels: unicode :return: the region RV for the respective French word. :rtype: unicode :note: This helper method is invoked by the stem method of the subclass FrenchStemmer. It is not to be invoked directly! rIrY)parcoltaprrJrKN)rNrrMrZs r% __rv_frenchzFrenchStemmer.__rv_french s, t9>45Q6!d1g&7!"X q#d),AAw&(!!a%']   r'N) r-r.r/r0rrrrrr rr1r'r%rhrhFs?$DH,Z$J'PPvp "r'rhc.eZdZdZdZdZdZdZdZdZ dZ y ) GermanStemmeraD The German Snowball stemmer. :cvar __vowels: The German vowels. :type __vowels: unicode :cvar __s_ending: Letters that may directly appear before a word final 's'. :type __s_ending: unicode :cvar __st_ending: Letter that may directly appear before a word final 'st'. :type __st_ending: unicode :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2_suffixes: Suffixes to be deleted in step 2 of the algorithm. :type __step2_suffixes: tuple :cvar __step3_suffixes: Suffixes to be deleted in step 3 of the algorithm. :type __step3_suffixes: tuple :note: A detailed description of the German stemming algorithm can be found under http://snowball.tartarus.org/algorithms/german/stemmer.html u aeiouyäöü bdfghklmnrt bdfghklmnt)ernemrRrQrSr>rU)estrQrRr )ischlichheitkeitroungr^ikcN |j}||jvr|S|jdd}tdt |dz D]x}||dz |j vs||dz|j vs-||dk(rdj |d|d||dzdf}S||dk(s\dj |d|d ||dzdf}z|j||j \}}tdt |D]e}|||j vs||dz |j vs*d t |d|dzcxkDrd kDr nn|d d}nt |d|dzd k(r|cSn|jD]}|j|s|d vrV|t | d z t | dk(r7|dt | dz }|dt | dz }|dt | dz }nS|dk(r!|d|jvr=|dd}|dd}|dd}n-|dt | }|dt | }|dt | }n|jD]y}|j|s|dk(r2|d|jvrNt |ddd k\r=|dd}|dd}|dd}n-|dt | }|dt | }|dt | }n|jD]}|j|s|dvrdd|t | dz t | vr5d|t | d z t | dz vr|dt | dz }n|dt | }n |dvr.d|t | dz t | vr|dt | }n|dvr_d|t | dz t | vsd|t | dz t | vr|dt | dz }n|dt | }nv|dk(rqd|t | d z t | vr|dt | d z }n@d|t | dz t | vr|dt | dz }n|dt | }n|jddjd d!jd"djddjd d}|S)#z Stem a German word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode ßr1rJr}rINrrrrKr)rQrSr>rnissrUrrr r)rorr^rYr>)r^rr)rrrRrQrrrrrsryrzr|)r6rrrMrN_GermanStemmer__vowelsrrW_GermanStemmer__step1_suffixesr7_GermanStemmer__s_ending_GermanStemmer__step2_suffixes_GermanStemmer__st_ending_GermanStemmer__step3_suffixes)r!r=rQrPrVrs r%r zGermanStemmer.stemQ sKzz| 4>> !K||FD)q#d)a-( CAAE{dmm+QU t}}0L7c>77D!Hc4A=#ABD!W^77D!Hc4A=#ABD  C$$T4==9Bq#d)$ AAwdmm+QU t}}0Ls4!a%=)-A-abBgA'1,K  ++ F{{6"//c&k\A-V =G 23v;,"23D.s6{lQ./B.s6{lQ./Bs]Bx4??2#CRyWW3v;,/DNs6{l+BNs6{l+B' ,++ F{{6"T>Bx4#3#33D"I!8K#CRyWW3v;,/DNs6{l+BNs6{l+B ++" F{{6"^+CKrzrtrv)*jaitokjeitekjainkjeinkaitokeitekáitokéitekjaimjeimjaidjeideinkainkitekjeikjaikáinkéinkaimeimaideidjaijeiinkaikeikáimáidáikéiméidéikimidreiráir4rQ)ákékuökokekakrSc |j}||jvr|S|j||j|j}|j |j rm|jD]^}|dt|z d|k(sdj|dd|df}|dt|z d|k(rdj|dd|df}n|jD]}|j |s|j |r|dt| }|dt| }|j dr$dj|dddf}t|dd}n4|j d r#dj|ddd f}t|d d }n|jD]O}|j |s|d k(rt||d }t||d }nt||d}t||d}n|jD]s}|j |s|d k(rt||d}t||d}n>|d k(rt||d }t||d }n|dt| }|dt| }n|jD]}|j |s|jD]^}|dt|z d|k(sdj|dd|df}|dt|z d|k(rdj|dd|df}|jD]q}|j |s|dvrt||d}t||d}n=|dvrt||d }t||d }n|dt| }|dt| }n|j D]}|j |s|j |r\|dvrt||d}t||d}n=|dvrt||d }t||d }n|dt| }|dt| }n|j"D]}|j |s|j |r\|dvrt||d}t||d}n=|dvrt||d }t||d }n|dt| }|dt| }n|j$D]`}|j |s|j |r9|dk(rt||d}|S|dk(rt||d }|S|dt| }|S|S)z Stem an Hungarian word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode rrINrrrtrrsrvr>r+r-r.)r0r1)r2r3r5)r:r6rDrErt)r;r7rFrGrv)rfrgror[rPrh)rirjr4r\rQrkrprq)r6r_HungarianStemmer__r1_hungarian_HungarianStemmer__vowels_HungarianStemmer__digraphsr7!_HungarianStemmer__step1_suffixes$_HungarianStemmer__double_consonantsrNr!_HungarianStemmer__step2_suffixesr!_HungarianStemmer__step3_suffixes!_HungarianStemmer__step4_suffixes!_HungarianStemmer__step5_suffixes!_HungarianStemmer__step6_suffixes!_HungarianStemmer__step7_suffixes!_HungarianStemmer__step8_suffixes!_HungarianStemmer__step9_suffixes)r!r=rPrgrs r%r zHungarianStemmer.stem szz| 4>> !K  t}}doo F ;;t,, -#77  S--3{B77D"ItBx#89D"s;//"5DWWb"gr"v%67  ++ F}}V$;;v&3v;,/DNs6{l+B{{6*!wwSb 3'78+B<V,!wwSb 3'78+B<  ++ F{{6"W$)$rTrUrrr)osrrr)rrrr)rrrrr)rr{)rrrrr)rrterrrr)rrrr)rsr>rQrzr!rr"r#r)chgh)r6rrrMrNr_ItalianStemmer__vowelsrWr\_ItalianStemmer__step0_suffixesr7r_ItalianStemmer__step1_suffixes_ItalianStemmer__step2_suffixes)r!r=rrQrPrVr[rs r%r zItalianStemmer.stem3 sazz| 4>> !K  LL ( WVV $ WVV $ WVV $ WVV $ q#d)$ ?AAE{c!d1gnwwRa#tAEG}=> ? q#d)a-( CAAE{dmm+QU t}}0L7c>77D!Hc4A=#ABD!W^77D!Hc4A=#ABD  C$$T4==9B   tT]] 3++ F{{6"s6{lQ&#f+6:JJ3v;,/DNs6{l+BNs6{l+BNs6{l+BV q(CK<8anderCedesrHerteedeanerLrNrOrPhetastertrQrrRrrSrTrsr>rU)rWvt) hetslovelegrZelovslovlegeigr\r]lovr^cx|j}||jvr|S|j||j}|jD]}|j |s|dvrt ||d}t ||d}nX|dk(r5|d|jvs|ddk(r:|d|jvr)|dd}|dd}n|dt| }|dt| }n|jD]}|j |s|dd}|dd}n|jD]%}|j |s|dt| }|S|S) z Stem a Norwegian word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode )r8r=rRrUrrSrNr) r6rrR_NorwegianStemmer__vowels!_NorwegianStemmer__step1_suffixesr7r_NorwegianStemmer__s_endingrN!_NorwegianStemmer__step2_suffixes!_NorwegianStemmer__step3_suffixesr!r=rPrs r%r zNorwegianStemmer.stem'skzz| 4>> !K  " "4 7++ F{{6"_,)$=D'FD9Bs]Bx4??2RCDHDMM,I#CRyW3v;,/DNs6{l+B $++ F{{6"CRyW  ++ F{{6"Ns6{l+    r'N) r-r.r/r0rGrIrHrJrKr r1r'r%r/r/ s1&$H$J@$ 0r'r/c&eZdZdZdZdZdZdZdZy)PortugueseStemmerav The Portuguese Snowball stemmer. :cvar __vowels: The Portuguese vowels. :type __vowels: unicode :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2_suffixes: Suffixes to be deleted in step 2 of the algorithm. :type __step2_suffixes: tuple :cvar __step4_suffixes: Suffixes to be deleted in step 4 of the algorithm. :type __step4_suffixes: tuple :note: A detailed description of the Portuguese stemming algorithm can be found under http://snowball.tartarus.org/algorithms/portuguese/stemmer.html uaeiouáéíóúâêô)/amentosimentosuço~esrradorasadoresuaço~eslogiasênciasridadesuançasismosistasadorauaça~oruânciaruça~oênciaridadeuançaezasicosicasruáveluívelrosososasadorrivasivosrezarrrrrrr)xaríamoseríamosiríamosuássemosuêssemosuíssemosuaríeisueríeisuiríeisuásseisuésseisuísseisáramosuéramosuíramosuávamosaremoseremosiremosariameriamiriamassemessemissemzara~ozera~ozira~oariaseriasiriasardeserdesirdesressesrastesestesryuáreisareisuéreisereisuíreisireisuáveisuíamosarmosermosirmosariaeriairiaresserasteesterareirrarameramiramavamaremeremiremrrindoadasidasarásaraserásrirásavasaresrIiresuíeisadosidosuámosamosemosimosradaidaaráaraeráriráriamadoidoiasreisriarnr?rrrRrrrSreuiuou)r%rsrQrzrtrwr{c|j}||jvr|Sd}d}|jddjddjddjdd }|j||j\}}|j ||j}|j D]b}|j|s|d k(r~|j|rmd }|d d }|d d }|d d }|jdr-|d d}|d d}|d d}|jdr|d d}|d d}n|jdr|d d}|d d}n|dvrN|j|r=|t| dz t| dk(rd }t||d}t||d}nx|j|rfd }|dvrt||d}t||d}nD|dvrt||d}t||d}n$|dvrt||d}t||d}n|dk(r+|d d}|d d}|d d}|jdr|d d}|d d}n|d vre|d t| }|d t| }|d t| }|jd!r |d d}|d d}n|jd"rv|d d}|d d}nk|d#vrI|d t| }|d t| }|d t| }|jdr)|d d}|d d}n|d t| }|d t| }n|sD|jD]5}|j|sd }|d t| }|d t| }n|s|r#|jd$r|dd%k(r |d d&}|d d&}|sD|sB|jD]3}|j|s|d t| }|d t| }n|jd'rT|d d&}|d d&}|jd r|jds"|jd(r5|jd$r$|d d&}n|jd)r t|d)d%}|jddjdd}|S)*z Stem a Portuguese word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode Fãza~õzo~uqüquugürrTNrUrrrr%rrA)rrrJr>r)rrTr)rZrQr})r[rUenterr)ravelivelr)r\rVr'r)rrrcrdrQrr)r>rvêrr) r6rrrW_PortugueseStemmer__vowelsr\"_PortugueseStemmer__step1_suffixesr7rNr"_PortugueseStemmer__step2_suffixes"_PortugueseStemmer__step4_suffixes)r!r=rrrPrVr[rs r%r zPortugueseStemmer.stemszz| 4>> !K   LL & WVT " WWd # WWd # $$T4==9B   tT]] 3++P F}}V$X%"++f*=$(M9DCRBCRB{{4(#CRyWW;;t,#'9D!#CRB%78#CRyWo- F+c&k\A-V =D$(M)$=D'FD9B[[($(M!44-dFEB+B>#88-dFC@+B<#<<-dFFC+B?7*#CRyWW;;'?@#'9D!#CRB#66#Ns6{l33v;,/3v;,/;;|4#'9D!#CRB[[0#'9D!#CRB#AA#Ns6{l33v;,/3v;,/;;t,#'9D!#CRB#Ns6{l33v;,/aP f// ;;v&$(M3v;,/DNs6{l+B   M{{3DHOCRyW]// ;;v&3v;,/DNs6{l+B   ;;, -9DCRB d# C(8 d# C(8CRy ]]6 "!$4D||D&)11$? r'N) r-r.r/r0rrrrr r1r'r%rNrNZs0"7H0bytEVr'rNc*eZdZdZdZdZdZdZdZdZ y) RomanianStemmera The Romanian Snowball stemmer. :cvar __vowels: The Romanian vowels. :type __vowels: unicode :cvar __step0_suffixes: Suffixes to be deleted in step 0 of the algorithm. :type __step0_suffixes: tuple :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2_suffixes: Suffixes to be deleted in step 2 of the algorithm. :type __step2_suffixes: tuple :cvar __step3_suffixes: Suffixes to be deleted in step 3 of the algorithm. :type __step3_suffixes: tuple :note: A detailed description of the Romanian stemming algorithm can be found under http://snowball.tartarus.org/algorithms/romanian/stemmer.html u aeiouăâî)iiloruluieloriileilorateiaţieaţiaauaeleiuaieiiler)ear). abilitate abilitati abilităţi ibilitate abilităiivitateivitati ivităţiicitateicitati icităţiicatoriivităiicităiicatoraţiuneatoareătoareiţiuneitoareicivaiciveiciviicivăicalaicaleicaliicalăativarativiativărătoriitivaitiveitiviitivăitoriicivrativrătoritivitor)>abilarruabilăibilarruibilăritateitatiuităţiribiloasauoasăoaseantarruantăruităiiuneiunirrrristăiştir uatăr rruutărrruitărrrrruicăruoşirrrruivăristrutrrr%r)^ seserăţiu aserăţiu iserăţiu âserăţiu userăţiseserămuaserămuiserămuâserămuuserămserăţiseseşiseserăueascăuarăţiuurăţiuirăţiuârăţiuaseşiuaserăuiseşiuiserăuâseşiuâserăuuseşiuuserăserămseseminduuânduueazăueştiueşteuăştiuăşteueaţiuiaţiuarămuurămuirămuârămasemisemuâsemusemseşiserăseser rMruâreinduândezeeziescuăsceameaieauriaiiauuaşiuarăuuşiuurăuişiuirăuâşiuârăaseiseuâseuseaţieţiiţiâţiseirr?raurruiuâiămrrlâmrkcv|j}||jvr|Sd}d}tdt|dz D]x}||dz |jvs||dz|jvs-||dk(rdj |d|d||dzdf}S||dk(s\dj |d|d||dzdf}z|j ||j\}}|j||j}|jD]}|j|s||vr|d vr&|dt| }||vr|dt| }nud}nr|d k(s|d k(s |d k(r|d ddk7r|dd}nU|dvr"t||d}||vrt||d}n2d}n/|dvr"t||d}||vrt||d}n d}n |dvr|dd}n d} |jD]}|j|s||vrd}d} |dvrt||d}nw|dk(r|dd }nl|dvrt||d}nZ|dvrt||d}nH|dvrt||d}||vr3t||d}n%|d vr!t||d!}||vrt||d!}nd}n| sn|jD]_}|j|s||vrFd}|d"vr|d d#k(r8dj |dd d$f}n!|d%vrt||d&}n|dt| }n|s|s|jD]v}|j|s||vs|d'vr|dt| }|dt| }n9|j|s(||j|dz d(vr|dt| }nd)D](}|j|s||vr|dt| }n|j!ddj!dd}|S)*z Stem a Romanian word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode FrJr}rINrrQr)r)rrrrrrabr)rrrr>)rrrrrr)rrrT)rrrrrr)rrrrr)rrrrrrrrrrrrrrrrr) rrrrrrrrrrrrr) rrrrrrrrrr)rruţrY)rrrrrrrrr)rrrrrrr rrrrrrr r!r$rrlr%rku aeioăâî)rVrsr>rQuă)r6rrMrN_RomanianStemmer__vowelsrrWr\ _RomanianStemmer__step0_suffixesr7r _RomanianStemmer__step1_suffixes _RomanianStemmer__step2_suffixes _RomanianStemmer__step3_suffixesrrr) r!r=rrrQrPrVr[rreplacement_dones r%r zRomanianStemmer.stemszz| 4>> !K  q#d)a-( CAAE{dmm+QU t}}0L7c>77D!Hc4A=#ABD!W^77D!Hc4A=#ABD  C$$T4==9B   tT]] 3++$ F}}V$R</#Ns6{l3!R)zsi^azs').esh'zei`tezui`tezui^utzish'ri`tei^utnnoilaylaenariliyliiloyloenozi^atuetenyzit'zyt'rrrDrrrnorTrt'rzui`ilylrlrrQryti^ui`rrF)$zii^amizii^akhzi^amizii^amzi^akhamiziei`zi^amiemakhzii^uz'i^uzii^az'i^aevovrVz'ernrrrrrr?r>ri^arsr>rQrrzr}rr)zei`shezei`sh)zost'ostc||jvr|Sd}tt|D]}t||dkDsd}n|s|S|j |}d}d}d}d}d}|j |\} } |j D]} | j| s| dvrp| t|  dz t|  dk(s | t|  dz t|  dk(sX|d t|  }| d t|  } | d t|  } d}n2|d t|  }| d t|  } | d t|  } d}n|s6|jD]B} | j| s|d t|  }| d t|  } | d t|  } n|jD]} | j| s| d vrp| t|  dz t|  dk(s | t|  dz t|  dk(sX|d t|  }| d t|  } | d t|  } d}n2|d t|  }| d t|  } | d t|  } d}n|s|jD]} | j| s| d vrp| t|  dz t|  dk(s | t|  dz t|  dk(sX|d t|  }| d t|  } | d t|  } d}n2|d t|  }| d t|  } | d t|  } d}n|sS|sQ|jD]B} | j| s|d t|  }| d t|  } | d t|  } n| jd r |d d }| d d } |jD]$} | j| s|d t|  }n|jdr|d d }d}|sK|jD]&} |j| s|d t|  }d}n|jdr|d d }|s|s|jdr|d d }|j|}|S)z Stem a Russian word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode FT)r5r3r0rKrrJrsN)r6r7r@rArBrCr8r9r:r;r<r=r>r?rPrQrDrErFrGrHrIrJrKrLrMrNrOrnrorZr[r\r]rrrrrRrSrTrUrVrWrXrYrrrrrprqrrrsrtrurvrwrxryrzr{rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr)rrDrrrrrrrFrrrTrrrrrrQrr-r)rrMrNord"_RussianStemmer__cyrillic_to_roman _RussianStemmer__regions_russian+_RussianStemmer__perfective_gerund_suffixesr7#_RussianStemmer__reflexive_suffixes$_RussianStemmer__adjectival_suffixes_RussianStemmer__verb_suffixes_RussianStemmer__noun_suffixes&_RussianStemmer__derivational_suffixes%_RussianStemmer__superlative_suffixes"_RussianStemmer__roman_to_cyrillic) r!r= chr_exceededrQradjectival_removed verb_removedundouble_successsuperlative_removedr[rVrs r%r zRussianStemmer.stems+ 4>> !K s4y! A47|c!#   K''- "  #''-B77 F{{6"44CK#E#Ns6{l33v;,/3v;,/(, 3v;,/DNs6{l+BNs6{l+B$(M# &33 ;;v&3v;,/DNs6{l+BNs6{l+B  44T ;;v&C"CJF |a/3v;,?5H!3v;,"2c&k\BcI#'3v;,#7D!#Ns6{l!3B!#Ns6{l!3B15.!#Ns6{l33v;,/3v;,/-1*iT l&"22#"F{{6*!&(!#CKrQrzr}rrrrre`rrJN)rrMrN)r!r=rPrVr[rOrQs r%__regions_russianz RussianStemmer.__regions_russiansi*  >||E3'//s;CCD#Nq#d)$ AAwf$a!e)>!a%']  q#b'" A!uF"r!a%yF':A[  s4y! AAw& !a%']  ZZU # + +C 7 ? ?T J ZZU # + +C 7 ? ?T JBxr'cJ|jddjddjddjddjddjd djd d jd d jd djddjddjddjddjddjddjddjddjddjddjddjddjd djd!d"jd#d"jd$d%jd&d%jd'd(jd)d(jd*d+jd,d+jd-d.jd/d.jd0d1jd2d1jd3d4jd5d4jd6d7jd8d7jd9d:jd;d:jdd=jd?d@jdAd@jdBdCjdDdCjdEdFjdGdFjdHdIjdJdIjdKdLjdMdLjdNdOjdPdOjdQdRjdSdRjdTdUjdVdUjdWdXjdYdXjdZd[jd\d[jd]d^jd_d^jd`dajdbda}|S)ca# Transliterate a Russian word into the Roman alphabet. A Russian word whose letters consist of the Cyrillic alphabet are transliterated into the Roman alphabet in order to ease the forthcoming stemming process. :param word: The word that is transliterated. :type word: unicode :return: the transliterated word. :rtype: unicode :note: This helper method is invoked by the stem method of the subclass RussianStemmer. It is not to be invoked directly! uАrsаuБbбuВr5вuГgгuДrIдuЕr>еuЁuёuЖzhжuЗzзuИrQиuЙrйuКrSкuЛrлuМrHмuНrFнuОrzоuПpпuРrрuСrUсuТrYтuУr}уuФfфuХkhхuЦt^sцuЧr(чuШshшuЩshchщuЪ''ъuЫrыuЬrьuЭrэuЮrюuЯrяrr!r=s r%__cyrillic_to_romanz"RussianStemmer.__cyrillic_to_romanNs," LL3 ' WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXt $ WXt $ WXs # WXs # WXs # WXs # WXt $ WXt $ WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXs # WXt $ WXt $ WXu % WXu % WXt $ WXt $ WXt $ WXt $ WXv & WXv & WXt $ WXt $ WXs # WXs # WXs # WXs # WXt $ WXt $ WXu % WXu % WXu % WXu %E J r'c*|jddjddjddjddjd d jd d jd djddjddjddjddjddjddjddjddjdd jd!d"jddjd#d$jd%d&jd'd(jd)d*jd+d,jd-d.jd/d0jd1d2jd3d4jd5d6jd7d8jd9d:jd;d<jd=d>jd?d@}|S)AaH Transliterate a Russian word back into the Cyrillic alphabet. A Russian word formerly transliterated into the Roman alphabet in order to ease the stemming process, is transliterated back into the Cyrillic alphabet, its original form. :param word: The word that is transliterated. :type word: str or unicode :return: word, the transliterated word. :rtype: unicode :note: This helper method is invoked by the stem method of the subclass RussianStemmer. It is not to be invoked directly! rrErrFr>r?r7r8r9r:r(r;rrDrr(r<r=rSr)r>r"r#r$rsrrrr5rrr rIr!r%r&rQr'rr*rHr+rFr,rzr-r.r/r0r1rUr2rYr3r}r4r5r6r@rArrBrrCrGrHs r%__roman_to_cyrillicz"RussianStemmer.__roman_to_cyrillics" LL ) WUH % WVX & WT8 $ WUH % WT8 $ WT8 $ WT8 $ WT8 $ WS( # WS( # WT8 $ WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WS( # WT8 $ WS( # WS( #C H r'N)r-r.r/r0rr r r r rr r rrrr1r'r%r/r/sX. $ kX*/O`%OL1-_B .`Un4r'r/c4eZdZdZdZdZdZdZdZdZ dZ d Z y ) SpanishStemmeraR The Spanish Snowball stemmer. :cvar __vowels: The Spanish vowels. :type __vowels: unicode :cvar __step0_suffixes: Suffixes to be deleted in step 0 of the algorithm. :type __step0_suffixes: tuple :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2a_suffixes: Suffixes to be deleted in step 2a of the algorithm. :type __step2a_suffixes: tuple :cvar __step2b_suffixes: Suffixes to be deleted in step 2b of the algorithm. :type __step2b_suffixes: tuple :cvar __step3_suffixes: Suffixes to be deleted in step 3 of the algorithm. :type __step3_suffixes: tuple :note: A detailed description of the Spanish stemming algorithm can be found under http://snowball.tartarus.org/algorithms/spanish/stemmer.html uaeiouáéíóúü) selasselosselaselolasleslosnosmerkrrr)/amientosimientosamientoimientoacionacionesucionesrRrSanciaslogíasenciasrrVanzasrWrxiblesrXrYaciónrancialogíauciónenciarrr^r_rrrrr`rarbridadrcrdrrrrrr) yeronyendoyamosyaisyanyenyasyesyaryouyó)`rfrgrhuiéramosuiésemosuaríaisrjueríaisrkuiríaisrlieraisieseisasteisisteisuábamosriuásemosuaríanuaríasuaréisueríanueríasueréisuiríanuiríasuiréisieraniesenieroniendoierasiesesabaisaraisaseisuéamosuaránruaríaueránrueríauiránruiríaieraieserrabanaranasenaronrabasrrrasesuíaisrrrrrruarérueréruiréabarrrruíanrruíasuáiséisuíarArrmr+uiórrRrruísrQrS)r%rsr>rzrtrvrwr{c~ |j}||jvr|Sd}|j||j\}}|j ||j}|j D]}|j |r|j |s&|dt| j ds<|dt| j dr|dt| j drx|j|dt| }|j|dt| }|j|dt| }|j|dt| }n|jD]}}|j |s|dk(r~|j |rmd}|dd}|dd}|dd}|j d r-|dd }|dd }|dd }|j d r|dd }|dd }n|j d r|dd }|dd }n|j |rd}|d vrK|dt| }|dt| }|dt| }|j dr|dd }|dd }n|dvrt||d}t||d}nb|dvrt||d}t||d}nB|dvrt||d}t||d}n"|dk(rI|dt| }|dt| }|dt| }|j dr|dd}|dd}n|dvre|dt| }|dt| }|dt| }dD]2}|j |s|dt| }|dt| }4nk|dvrI|dt| }|dt| }|dt| }|j d r)|dd }|dd }n|dt| }|dt| }n|s|jD]S}|j |s|t| dz t| dk(s5|dt| }|dt| }n|jD]c}|j |s|dt| }|dt| }|dvr,|j dr|dd}|j dr|dd}n|jD]U}|j |s|dt| }|dvr-|dt| }|d ddk(r|j dr|dd}n|j|}|S) z Stem a Spanish word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode FN) ruándoruárrRuérrzuiéndoruírrjuyendorTrUrrrr) rYrbrcrRrSr[r\rrrdr^r)rer_r)rfr]r})rgr`rr)rrrr)rhrV)rrr)rrrdrcrJ)rQrSrrrr)r>rv)r6rrW_SpanishStemmer__vowelsr\_SpanishStemmer__step0_suffixesr7rN!_SpanishStemmer__replace_accented_SpanishStemmer__step1_suffixesr _SpanishStemmer__step2a_suffixes _SpanishStemmer__step2b_suffixes_SpanishStemmer__step3_suffixes)r!r=rrPrVr[rpre_suffs r%r zSpanishStemmer.stemszz| 4>> !K $$T4==9B   tT]] 3++ FMM&)bkk&.A>c&k\"++  >c&k\"++G4CK<(11(;..tNs6{l/CD,,R3v;,-?@,,R3v;,-?@,,R3v;,-?@ 7 <++W F==(!bkk&&9 $ CRyWW;;t$9DCRBCRB{{4(#CRyW[[!349DCRBV$ $    3v;,/DNs6{l+BNs6{l+B{{4(#CRyW88)$>D'FE:B66)$rwrQr{rzr~r}rGrHs r%__replace_accentedz!SpanishStemmer.__replace_accentedUsG LL % WVS ! WVS ! WVS ! WVS !  r'N) r-r.r/r0rrrrrrr rr1r'r%rMrMsF*/H0b aDMn` r'rMc*eZdZdZdZdZdZdZdZdZ y) SwedishStemmera The Swedish Snowball stemmer. :cvar __vowels: The Swedish vowels. :type __vowels: unicode :cvar __s_ending: Letters that may directly appear before a word final 's'. :type __s_ending: unicode :cvar __step1_suffixes: Suffixes to be deleted in step 1 of the algorithm. :type __step1_suffixes: tuple :cvar __step2_suffixes: Suffixes to be deleted in step 2 of the algorithm. :type __step2_suffixes: tuple :cvar __step3_suffixes: Suffixes to be deleted in step 3 of the algorithm. :type __step3_suffixes: tuple :note: A detailed description of the Swedish stemming algorithm can be found under http://snowball.tartarus.org/algorithms/swedish/stemmer.html u aeiouyäåöbcdfghjklmnoprtvy)%heternar3r4r5andenarnasernasornasrandetarensarnaernaornar6arnerarenadesernsader rrNr;r<rArQrrRorrrSrrsr>rU)r%rVr-rWrXrYr2)fulltlöstr]r\r^c(|j}||jvr|S|j||j}|jD]T}|j |s|dk(r|d|j vr)|dd}|dd}n|dt| }|dt| }n|jD]}|j |s|dd}|dd}n|jD]5}|j |s|dvr|dt| }|S|dvr|dd}|S|S)z Stem a Swedish word and return the stemmed form. :param word: The word that is stemmed. :type word: str or unicode :return: The stemmed form. :rtype: unicode rUrNr)r]r\r^)rr) r6rrR_SwedishStemmer__vowels_SwedishStemmer__step1_suffixesr7_SwedishStemmer__s_endingrN_SwedishStemmer__step2_suffixes_SwedishStemmer__step3_suffixesrLs r%r zSwedishStemmer.stemsVzz| 4>> !K  " "4 7++ F{{6"S=Bx4??2#CRyW3v;,/DNs6{l+B ++ F{{6"CRyW  ++ F{{6"113v;,/D 339D   r'N) r-r.r/r0rrrrrr r1r'r%rris0&$H$J&NB?-r'rcddlm}iddddddd d d d d ddddddddddd ddddddddd d!}td"td#td$td# td%d&j t j zd"zd'z}|d(k(ry*|t j vr td)Vt ||j||d*d+}d,j fd-|D}tjd.d/|d,zj}d,j |}tjd.d/|d,zj}td"td0td1jd2t|td3td4jd2t|td0td"u)5a< This function provides a demonstration of the Snowball stemmers. After invoking this function and specifying a language, it stems an excerpt of the Universal Declaration of Human Rights (which is a part of the NLTK corpus collection) and then prints out the original and the stemmed text. r)udhrr zArabic_Alarabia-Arabicr zDanish_Dansk-Latin1r zDutch_Nederlands-Latin1r zEnglish-Latin1rzFinnish_Suomi-Latin1rzFrench_Francais-Latin1rzGerman_Deutsch-Latin1rzHungarian_Magyar-UTF8rzItalian_Italiano-Latin1rzNorwegian-Latin1rrzPortuguese_Portugues-Latin1rzRomanian_Romana-Latin2rz Russian-UTF8rzSpanish-Latin1rzSwedish_Svenska-Latin1 z******************************zDemo for the Snowball stemmersz9Please enter the name of the language to be demonstrated /z"(enter 'exit' in order to leave): exitz@ Oops, there is no stemmer for this language. Please try again. Ni, c3@K|]}j|ywr))r ).0r=rs r% zdemo..sB$7<<-Bsz (.{,70})\sz\1\nzF----------------------------------------------------------------------ORIGINALFz zSTEMMED RESULTS) nltk.corpusrprintinputrr rr9rrrstripcenter)r udhr_corpusr"excerptstemmedrs @r%demorsJ!*' * #  )  *  ) , , ' " 3 , > # +!K& $K *+ *+ *+  %hh001 2 3  3  v   ?44 4 (  !(+**[23DS9((B'BB&&3?FFH((7#&&3?FFH d  h j#$ g f  &&r*+ g h d G r') r0rrr nltk.stemr nltk.stem.apirnltk.stem.utilrrr r3rDrGrTr^rrjrrrhrr rr/rNrr/rMrrr1r'r%rs`  !"9N.hN.b$*x$*N,,f.B.B,$'3'TW/WtY $Y x^(^B|#|~A%AHS%Sl j$jZX$XvY/Yx t%tn v+vrV(Vr ^&^BE -E PJ %J Z m)m`Hr'