JL i.ddlZddlmZddlmZddlmZddlmZGddejZ Gdd ejZ y) N)closing)data) PorterStemmer)SnowballStemmerc*eZdZdZdZdZdZdZy) SnowballTestc6tdd}|jddk(sJ|jddk(sJ|jddk(sJ|jdd k(sJ|jd d k(sJ|jd d k(sJ|jd d k(sJ|jd d k(sJ|jddk(sJtdd}|jd dk(sJ|jdd k(sJ|jddk(sJtd}|jddk(sJ|jddk(sJ|jddk(sJ|jdd k(sJ|jddk(sJy)z this unit testing for test the snowball arabic light stemmer this stemmer deals with prefixes and suffixes arabicTu&الْعَرَبِــــــيَّةuعربuالعربيةu فقالواuقالuالطالباتuطالبuفالطالباتuوالطالباتuالطالبونu اللذانuمنFuاللذuالكلماتuكلمNrstem)self ar_stemmers ^/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/test/unit/test_stem.py test_arabiczSnowballTest.test_arabic s %Xt4 GHHTTT/0H<<<~.(:::12j@@@34 BBB34 BBB12j@@@~..@@@v&&000$Xu5 ~.*<<<12j@@@/0H<<<$X. GHHTTT/0H<<<~.(:::12j@@@/0H<<<cFtd}|jddk(sJy)Nrussianuавантненькаяuавантненькr )r stemmer_russians r test_russianzSnowballTest.test_russian's'))4##$>?CYYYYrctd}tdd}|jddk(sJ|jddk(sJ|jddk(sJ|jddk(sJy)NgermanT)ignore_stopwordsu Schränkeschrankkeinenkeinr )r stemmer_germanstemmer_german2s r test_germanzSnowballTest.test_german+s|(2)(TJ""=1Y>>>##M2i???""8,666##H-999rcrtd}|jddk(sJ|jddk(sJy)Nspanish Visionadovisionalguealgur r stemmers r test_spanishzSnowballTest.test_spanish5s<!),||K(H444||G$...rcFtd}|jddk(sJy)Nenglishzy'syr r%s rtest_short_strings_bugz#SnowballTest.test_short_strings_bug=s#!),||E"c)))rN)__name__ __module__ __qualname__rrrr'r+rrrr s=:Z:/*rrc6eZdZdZdZdZdZdZdZdZ y) PorterTestcttjdjd5}|j j cdddS#1swYyxYw)Nz*stemmers/porter_test/porter_vocabulary.txtutf-8encoding)rrfindopenread splitlinesr fps r _vocabularyzPorterTest._vocabularyCsW  IIB C H H  I   *779'')  * * *s AA ct|}t|j|D]4\}}|j|}||k(rJdj ||||y)N)modez*{} should stem to {} in {} mode but got {})rzipr<r format)r stemmer_modeexpected_stemsr&word true_stemour_stems r_test_against_expected_outputz(PorterTest._test_against_expected_outputKsl\2"4#3#3#5~F OD)||D)HI% ;BB  % rcttjdjd5}|j t j |jjdddy#1swYyxYw)azTests all words from the test vocabulary provided by M Porter The sample vocabulary and output were sourced from https://tartarus.org/martin/PorterStemmer/voc.txt and https://tartarus.org/martin/PorterStemmer/output.txt and are linked to from the Porter Stemmer algorithm's homepage at https://tartarus.org/martin/PorterStemmer/ z-stemmers/porter_test/porter_martin_output.txtr3r4N) rrr6r7rFrMARTIN_EXTENSIONSr8r9r:s rtest_vocabulary_martin_modez&PorterTest.test_vocabulary_martin_modeXso IIE F K K  L     . .//1E1E1G      =A55A>cttjdjd5}|j t j |jjdddy#1swYyxYw)Nz+stemmers/porter_test/porter_nltk_output.txtr3r4) rrr6r7rFrNLTK_EXTENSIONSr8r9r:s rtest_vocabulary_nltk_modez$PorterTest.test_vocabulary_nltk_modejsm  IIC D I I  J     . .--rwwy/C/C/E     rJcttjdjd5}|j t j |jjddd|j t j tjdjdjjy#1swYixYw)Nz/stemmers/porter_test/porter_original_output.txtr3r4) rrr6r7rFrORIGINAL_ALGORITHMr8r9r:s rtest_vocabulary_original_modez(PorterTest.test_vocabulary_original_modets IIG H M M  N     . .00"'')2F2F2H    **  , , IIG H T7T # TV Z\    s =CCc@tjddk(sJy)zTest for bug https://github.com/nltk/nltk/issues/1581 Ensures that 'oed' can be stemmed without throwing an error. oedoNrr )r s r test_oed_bugzPorterTest.test_oed_bugs ##E*c111rct}|jddk(sJ|jddk(sJ|jdddk(sJ|jddk(sJ|jdddk(sJy ) zTest for improvement on https://github.com/nltk/nltk/issues/2507 Ensures that stems are lowercased when `to_lowercase=True` OnonIiF) to_lowercaseGithubgithubNrT)r porters rtest_lowercase_optionz PorterTest.test_lowercase_options {{4 D((({{33&&&{{3U{3s:::{{8$000{{8%{8HDDDrN) r,r-r.r<rFrIrMrPrUr_r/rrr1r1Bs&* $ 02 Err1) unittest contextlibrnltkrnltk.stem.porterrnltk.stem.snowballrTestCaserr1r/rrrfs=*.6*8$$6*r[E""[Er