JL i>2dZddlZddlmZGddeZy)a[ ARLSTem2 Arabic Light Stemmer The details about the implementation of this algorithm are described in: K. Abainia and H. Rebbani, Comparing the Effectiveness of the Improved ARLSTem Algorithm with Existing Arabic Light Stemmers, International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS'19), Skikda, Algeria, December 15-16, 2019. ARLSTem2 is an Arabic light stemmer based on removing the affixes from the words (i.e. prefixes, suffixes and infixes). It is an improvement of the previous Arabic light stemmer (ARLSTem). The new version was compared to the original algorithm and several existing Arabic light stemmers, where the results showed that the new version considerably improves the under-stemming errors that are common to light stemmers. Both ARLSTem and ARLSTem2 can be run online and do not use any dictionary. N)StemmerIcpeZdZdZdZdZdZdZdZdZ dZ d Z d Z d Z d Zd ZdZdZdZdZy)ARLSTem2u Return a stemmed Arabic word after removing affixes. This an improved version of the previous algorithm, which reduces under-stemming errors. Typically used in Arabic search engine, information retrieval and NLP. >>> from nltk.stem import arlstem2 >>> stemmer = ARLSTem2() >>> word = stemmer.stem('يعمل') >>> print(word) عمل :param token: The input Arabic word (unicode) to be stemmed :type token: unicode :return: A unicode Arabic word ctjd|_tjd|_tjd|_gd|_gd|_ddg|_gd|_d d g|_ d d g|_ d dg|_ ddg|_ gd|_ ddg|_ddg|_ddg|_ddg|_gd|_ddg|_gd|_gd|_y)Nz[\u0622\u0623\u0625]z[\u0649]z[\u064B-\u065F])uالuللuفلuفب)uبالuكالuوالuفللuولل)uفبالuوبالuفكالuكيuكمuهاuهمuكماuكنّuهماuهنّ)انuينونuتانuتينrruستuسيuساuسن)uلنuلتuليuلأuتماuتنّ)ناuتمuتاوا)تان)recompilere_hamzated_alifre_alifMaqsura re_diacriticspr2pr3pr32pr4su2su22su3su32pl_si2pl_si3verb_su2verb_pr2 verb_pr22 verb_pr33 verb_suf3 verb_suf2 verb_suf1)selfs X/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/stem/arlstem2.py__init__zARLSTem2.__init__2s " +B C jj5ZZ(:;TU)+?@  #N3#^4 (*>?)+?@ G +-AB (8 '8 (.9 /0DE 8c | tdd|_|j|}|j|}||}|j |}||S|j |}||S|j |}|j|}| ||j|}| d|_|S|S|S#t$r}t|Yd}~yd}~wwxYw)z: call this function to get the first stem NUThe word could not be stemmed, because it is empty !FT) ValueErroris_verbnormpreffem2masc adjectivesuff plur2singverbprint)r$tokenprefmadjpsr2es r%stem1zARLSTem2.stem1ls$ } 0!DLIIe$E))E"Cu%B~ ..'C IIe$E&Bz;99U+D''+ # L   !HH s1AB,B,&AB,(B,*B,, C 5 CC c | td|j|}t|dkDrL|jdr|ddk(r |dd|dz}|S|jdr|dd k(r |dd|dz}|St|d kDr?|jd s|j dr|dd}|S|jd r|ddS|S#t$r}t |Yd}~yd}~wwxYw) Nr)r يموr uل)r*r:len startswithendswithr3)r$r4r9s r%stemz ARLSTem2.stems  } 0 JJu%E5zA~##H-%)x2G!!BK%)3E L##H-%)x2G!!BK%)3E L5zA~''1ennX6N!#2JE L##H- 9$L  !HH s0AC %C :6C 1C C C' C""C'c|jjd|}|jjd|}|jjd|}|j drt |dkDr|dd}|S)z normalize the word by removing diacritics, replace hamzated Alif with Alif bare, replace AlifMaqsura with Yaa and remove Waaw at the beginning. r r>rBrCr?N)rsubrrrErDr$r4s r%r,z ARLSTem2.normsv""&&r51%%))(E:##''%8   H %#e*q.!"IE r'ct|dkDr)|jD]}|j|s|ddcSt|dkDr)|jD]}|j|s|ddcSt|dkDr)|jD]}|j|s|ddcSt|dkDr*|j D]}|j|s|ddcSyy)z< remove prefixes from the words' beginning. rCNr<)rDrrErrr)r$r4p3p4p2s r%r-z ARLSTem2.prefs u:>hh %##B' 9$ % u:>hh %##B' 9$ % u:>ii %##B' 9$ % u:>hh %##B' 9$ % r'ct|dkDr8|jdr&|ddk(r|jdr |dd|dzSyyyy)z4 remove the infixes from adjectives rMr r>Nr=rDrErFrKs r%r/zARLSTem2.adjectives^ u:>  *"I)NN8,SbzE"I---*+ r'c|jdrt|dkDr|ddSt|dkDr)|jD]}|j|s|ddcSt|dkDr)|jD]}|j|s|ddcS|jd rt|dkDr|dd}|St|dkDr)|jD]}|j|s|ddcSt|dkDr)|j D]}|j|s|ddcS|jd rt|dkDr|ddS|S) z= remove the suffixes from the word's ending. uكrCNr@r<r=rMrTuهr )rFrDrrrr)r$r4s2s3s r%r0z ARLSTem2.suffsS >>( #E Q":  u:>hh &>>"% ":% & u:>hh &>>"% ":% & >>( #E Q#2JEL u:>ii &>>"% ":% & u:>ii &>>"% ":% & >>. )c%j1n":  r'ct|dkDrj|jdr$|ddk(r|jdr |dd|dzS|jdr$|ddk(r|jdr |d d|dzS|jd rt|d kDr|d d St|d kDr:|ddk(r|jdr |d|ddzS|jdr|d d S|jdrt|dkDr|d dSy y )zR transform the word from the feminine form to the masculine form. rNr r>uيةr?rTr NuايةrMr=r<uةrrOr@rCrUrKs r%r.zARLSTem2.fem2mascs% u:>  *"I)NN>2Qr{U2Y..  *"I)NN>2SbzE"I-- >>. /CJN":  u:>Qx8#x(@Qx%"+--~~n-Sbz! >>( #E Q": )7 #r'c&t|dkDr'|jdr|jdr|ddSt|dkDr)|jD]}|j|s|ddcSt|dkDr)|jD]}|j|s|ddcSt|dkDr`|jd r|ddS|jd r|d d k(r |dd |d dzS|jd r|dd k(r |dd|d zSyyy)zO transform the word from the plural form to the singular form. rMrArr?r=r<NrTuاتr rOrCr@)rDrErFrr)r$r4ps2ps3s r%r1zARLSTem2.plur2sing)s3 u:>)enn^.LQr{" u:>{{ &>>#& ":% & u:>{{ &>>#& ":% & u:>~~n-Sbz!)eAh(.BRay59,,)eBi8.CQr{U2Y../D) r'c|j|}||S|j|}||S|j|}||S|j|}||S|j |}||S|j |}|S)z= stem the verb prefixes and suffixes or both )verb_t1verb_t2verb_t3verb_t4verb_t5verb_t6)r$r4vbs r%r2z ARLSTem2.verbDs\\%  >I \\%  >I \\%  >I \\%  >I \\%  >I \\%  r'ct|dkDr:|jdr)|jD]}|j|s|ddcSt|dkDr:|jdr)|jD]}|j|s|ddcSt|dkDrw|jdrft|dkDr|jdr|ddS|jdr|dd S|jdr|dd S|jd r|dd St|dkDr'|jdr|jd r|dd St|dkDr)|jdr|jd r|dd Sy y y ) zJ stem the present tense co-occurred prefixes and suffixes rMr r?r=r>r<r r r@r N)rDrErrFrr$r4rWs r%r_zARLSTem2.verb_t1Zsy u:>e..x8kk '>>"% 2;& ' u:>e..x8mm '>>"% 2;& ' u:>e..x85zA~%.."@Qr{"~~h'Qr{"~~h'Qr{"~~h'Qr{" u:>e..x8U^^H=U2;  u:>e..x8U^^H=U2; >V8>r'ct|dkDr|jD]9}|j|jds"|j |s4|ddcS|j|jdr#|j |jdr|ddS|j|jdr#|j |jdr|ddSt|dkDr4|j|jdr|j dr|ddSt|dkDr6|j|jdr|j dr|ddSy y y ) zI stem the future tense co-occurred prefixes and suffixes rNrrOr=r?rMr r@N)rDrrErrFrgs r%r`zARLSTem2.verb_t2zsG u:>kk '##DMM!$45%..:L 2;& '  a 01ennT[[QR^6TQr{" a 01ennT[[QR^6TQr{" JN  q!12x(2;  JN  q!12x(2; )3 r'cPt|dkDr)|jD]}|j|s|ddcSt|dkDr)|jD]}|j|s|ddcSt|dkDr*|jD]}|j|s|ddcSyy)z1 stem the present tense suffixes rMNrTr<r=rCr@)rDr!rFr"r#)r$r4rrsu1s r%razARLSTem2.verb_t3s u:>~~ &>>#& ":% & u:>~~ &>>#& ":% & u:>~~ &>>#& ":% & r'ct|dkDr@|jD]}|j|s|ddcS|jdr|ddSyy)z1 stem the present tense prefixes rCr?Nr>)rDr#rE)r$r4pr1s r%rbzARLSTem2.verb_t4sb u:>~~ %##C( 9$ %)QRy * r'ct|dkDrS|jD]}|j|s|ddcS|jD]}|j|s|ddcSyy)z0 stem the future tense prefixes r<rON)rDrrEr)r$r4rs r%rczARLSTem2.verb_t5sp u:>~~ %##C( 9$ %}} %##C( 9$ % r'ctt|dkDr)|jD]}|j|s|ddcS|S)z4 stem the imperative tense prefixes r<rON)rDr rE)r$r4rs r%rdzARLSTem2.verb_t6sC u:>~~ %##C( 9$ % r'N)__name__ __module__ __qualname____doc__r&r:rGr,r-r/r0r.r1r2r_r`rarbrcrdr'r%rr!s\ 88t(T@$%* .<B/6,@<&" ! % r'r)rrr nltk.stem.apirrrsr'r%rus  "hxhr'