JL iKCdZddlZddlmZddlmZej dZdJdZdKdZ dZ d Z dJd Z d Z d Zejeej Zid ddddddddddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.id/d0d1d2d3d4d5d6d7d8d9d:d;d<d=d>d?d@dAdBdCdDdEdFdGdHdIdJdKdLdMdNdOdPidQdRdSdTdUdVdWdXdYdZd[d\d]d^d_d`dadbdcdddedfdgdhdidjdkdldmdndodpdqdridsdtdudvdwdxdydzd{d|d}d~ddddddddddddddddddddddiddddddddddddddddddddddddddddddddddidddddddddddd“ddēddƓddȓddʓdd̓ddΓddГddғddԓdd֓ddؓiddړddܓddޓddddddddddddddddddddddddddddiddddddddddddddd d d d d dddddddddddddddidddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5d6d7d8d9d:d;d<d=d>id?d@dAdBdCdDdEdFdGdHdIdJdKdLdMdNdOdPdQdRdSdTdUdVdWdXdYdZd[d\d]d^d_d`dadbdcdddedfdgdhdidjdkdldmdndoZidpdqdrdsdtdudvdwdxdydzd{d|d}d~dddddddddddddddddddiddddddddddddddddddddddddddddddddddiddddddddddddddddÓdĐdœdƐdǓdȐdɓdʐd˓d̐d͓dΐdϓdАdѓdҐdӓdԐdՓid֐dדdؐdٓdڐdۓdܐdݓdސdߓddddddddddddddddddddddddidddddddddddddddddd d d d d ddddddddddddiddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5d6d7d8d9d:d;id<d=d>d?d@dAdBdCdDdEdFdGdHdIdJdKdLdMdNdOdPdQdRdSdTdUdVdWdXdYdZd[d\d]id^d_d`dadbdcdddedfdgdhdidjdkdldmdndodpdqdrdsdtdudvdwdxdydzd{d|d}d~diddddddddddddddddddddddddddddddddddiddddddddddddddddddddddddddddddddddÓidĐdœdƐdǓdȐdɓdʐd˓d̐d͓dΐdϓdАdѓdҐdӓdԐdՓd֐dדdؐdٓdڐdۓdܐdݓdސdߓddddddiddddddddddddddddddddddddddddddddddidd d d d d ddddddddddddddddddd d!d"d#d$d%d&d'd(d)id*d+d,d-d.d/d0d1d2d3d4d5d6d7d8d9d:d;d<d=d>d?d@dAdBdCdDdEdFdGdHdIdJdKidLdMdNdOdPdQdRdSdTdUdVdWdXdYdZd[d\d]d^d_d`dadbdcdddedfdgdhdidjdkdldmidndodpdqdrdsdtdudvdwdxdydzd{d|d}d~dddddddddddddddddiddddddddddddddddddddddddddddddddddiddddddddddddddddddÓdĐdœdƐdǓdȐdɓdʐd˓d̐d͓dΐdϓdАdѓdҐdӓidԐdՓd֐dדdؐdٓdڐdۓdܐdݓdސdߓddddddddddddddddddddddidddddddddddddddddddd d d d d ddddddddddiddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5d6d7d8d9d:d;d<d=d>d?d@dAdBdCdDdEdFdGdHdIZeeZeeZy(La Translate between language names and language codes. The iso639-3 language codes were downloaded from the registration authority at https://iso639-3.sil.org/ The iso639-3 codeset is evolving, so retired language codes are kept in the "iso639retired" dictionary, which is used as fallback by the wrapper functions "langname" and "langcode", in order to support the lookup of retired codes. The "langcode" function returns the current iso639-3 code if there is one, and falls back to the retired code otherwise. As specified by BCP-47, it returns the shortest (2-letter) code by default, but 3-letter codes are also available: >>> import nltk.langnames as lgn >>> lgn.langname('fri') #'fri' is a retired code 'Western Frisian' The current code is different from the retired one: >>> lgn.langcode('Western Frisian') 'fy' >>> lgn.langcode('Western Frisian', typ = 3) 'fry' N)warn)bcp47z[a-z][a-z][a-z]?c|jd}|dj}tj|r|tvr t|S|t vr4t |}t d|d|ddj|g|ddz}tj|}|d k(r|S|r|jd dSyt d |dy) z Convert a composite BCP-47 tag to a language name >>> from nltk.langnames import langname >>> langname('ca-Latn-ES-valencia') 'Catalan: Latin: Spain: Valencian' >>> langname('ca-Latn-ES-valencia', typ="short") 'Catalan' -rz Shortening z to  stacklevelNfull:zCould not find code in ) splitlower codepattern fullmatch iso639retired iso639shortrjoinrname)tagtyptagscodecode2rs T/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/langnames.pylangnamer.s 99S>D 7==?DT" =  & & [ %E ;thd5)4 C((E7T!"X-.Czz# &=K ::c?1% % &th /A>c|tjvr+tj|}|dk(r|tvr t|}|S|tvr t|St d|dy)ai Convert language name to iso639-3 language code. Returns the short 2-letter code by default, if one is available, and the 3-letter code otherwise: >>> from nltk.langnames import langcode >>> langcode('Modern Greek (1453-)') 'el' Specify 'typ=3' to get the 3-letter code: >>> langcode('Modern Greek (1453-)', typ=3) 'ell' zCould not find language in rrN)rlangcode iso639longiso639code_retiredr)rrrs rrrKsa u~~~~d# !8 *d#D # #!$'' *4( 3Brc(tj|S)z^ Convert BCP-47 tag to Wikidata Q-code >>> tag2q('nds-u-sd-demv') 'Q4289225' )rwiki_q)rs rtag2qr$is << rct|S)z^ Convert Wikidata Q-code to BCP-47 tag >>> q2tag('Q4289225') 'nds-u-sd-demv' ) wiki_bcp47)qcodes rq2tagr(ss e rc,tt||S)z Convert Wikidata Q-code to BCP-47 (full or short) language name >>> q2name('Q4289225') 'Low German: Mecklenburg-Vorpommern' >>> q2name('Q4289225', "short") 'Low German' )rr()r'rs rq2namer*}s E%L# &&rc*tt|S)zd Convert simple language name to Wikidata Q-code >>> lang2q('Low German') 'Q25433' )r$r)rs rlang2qr,s $  rct|jtt|jk(r$|j Dcic]\}}|| c}}St dycc}}w)z3Return inverse mapping, but only if it is bijectivez1This dictionary has no bijective inverse mapping.N)lenkeyssetvaluesitemsr)dickeyvals r inverse_dictr6sQ 388:#c#**,/00+.99;7Zc3S77 @A8s A-aaraaabkabafrafakaakamhamaraararganasmasavaavaveaeaymayazeazbakbabambmbelbebenbnbisbibodbobosbsbrebrbulbgcatcacescschachchecechucuchvcvcorkwcoscocrecrcymcydandadeudedivdvdzodzellelengenepoeoesteteuseueweeefaofofasfafijfjfinfifrafrfryfyfulffglagdglegaglgglglvgvgrngngujguhaththauhahbsshhebheherhzhinhihmohohrvhrhunhuhyehyiboigidoioiiiiiikuiuileieinaiaindidipkikislisitaitjavjvjpnjakalklkanknkaskskatkakaukrkazkkkhmkmkikkikinrwkirkykomkvkonkgkorkokuakjkurkulaololatlalavlvlimlilinlnlitltltzlblubluluglgmahmhmalmlmarmrmkdmkmlgmgmltmtmonmnmrimimsamsmyamynaunanavnvnblnrndendndongnepnenldnlnnonnnobnbnornonyanyociocojiojoriorormomossospanpaplipipolplporptpuspsquequrohrmronrorunrnrusrusagsgsansasinsislkskslvslsmesesmosmsnasnsndsdsomsosotstspaessqisqsrdscsrpsrsswsssunsuswaswswesvtahtytamtatatttteltetgktgtgltlthathtirtitontotsntntsotstuktkturtrtwitwuigugukuruzvevivowawoxhyiyozazhzu)ukrurduzbvenvievolwlnwolxhoyidyorzhazhozulfrizWestern Frisianauv AuvergnatgscGasconlmsLimousinlnc Languedocienprvu Provençalamdu Amapá CreolebghBoganbnhuBanawábvszBelgian Sign LanguageccyzSouthern Zhuangcit Chittagonianflmz Falam ChinjapuJaruárakob KohoroxitarimobMoinbamzfAikunhjzTlalitzlipa NahuatlnhszSoutheastern Puebla Nahuatlocc OccidentaltmxTomyangtotzPatla-Chicontla TotonacxmiuMiarrãyibYinglishztczLachirioag ZapotecatfAtuencebqezNavarro-Labourdin BasquebszzSouletin BasqueaexAmeraxaheAheaizAariaknAmikoanaarfArafundiazrAdzerabcxPamonabiiBisubkeBengkulubluz Hmong Njuabocz Bakung KenyahbsdzSarawak BisayabwvzBahau River KenyahbxtBuxinhuabyuBuyangccxzNorthern Zhuangcruu Carútanadatz Darang Dengdykz Land DayakeniEnimfizIzeregenz Geman Denggghz Garreh-AjuranituItutangkdszLahu ShiknhzKayan River Kenyahkrgz North KorowaikrqKruikxgKatinganlmtLematanglntLintanglodBerawanmbguNorthern NambikuáramdozSouthwest Gbayamhv ArakanesemivMimimqdMadangnkyzKhiamniungan NaganxjNyaduognOganorkOrokaivapajz Ipeka-TapuiapeczSouthern PesisirpenPenesakplm Palembangpojz Lower PokomopunPubianraeRanaurjb RajbanshirwsRawassddSemendosdizSindang KelingisklSelakoslbzKahumamahon SaluansrjSerawaisufTarpiasuhSubasuuSungkaiszkSizakitlezSouthern MarakwettnjTanjongttxzTutong 1ubmzUpper Baram Kenyahvkyz Kayu Agungvmoz Muko-MukowreWarexahKahayanxkmzMahakam KenyahxufKunfalyiozDayao YiymjzMuji YiyplzPula YiypwzPuwa Yiywmz Wumeng YiyymzYuanjiang-Mojiang YimlyzMalay (individual language)muwMundarixstzSilt'eopez Old PersiansccSerbianscrCroatianxskSakanmol MoldavianaayAariyaaccu Cubulco Achícbmz Yepocapa Southwestern CakchiquelchsChumashckczNorthern CakchiquelckdzSouth Central CakchiquelckezEastern CakchiquelckfzSouthern Cakchiquelckiu!Santa María De Jesús Cakchiquelckjz Santo Domingo Xenacoj Cakchiquelckkz"Acatenango Southwestern CakchiquelckwzWestern Cakchiquelcnmu Ixtatán Chujctiz Tila CholcunuCunén QuichéemlzEmiliano-Romagnoloeur EuropantogmozGamo-Gofa-DawrohsfzSoutheastern HuastechvauSan Luís Potosí Huastecixiz Nebaj Ixilixjz Chajul IxiljaizWestern Jacaltecommsz Southern Mammpfz Tajumulco MammtzTacanecmvcz Central MammvjuTodos Santos Cuchumatán MampoazEastern PokomampobuWestern PokomchípouzSouthern PokomamppvuPapavôqujuJoyabaj QuichéqutuWest Central QuichéquuuEastern QuichéqxiuSan Andrés Quichésic Malinguatstcz Santa CruztlzzToala'tzbuBachajón TzeltaltzczChamula TzotziltzeuChenalhó TzotziltzsuSan Andrés Larrainzar TzotziltztzWestern TzutujiltzuuHuixtán TzotziltzzuZinacantán TzotzilvlrVatratayuszChan Santa Cruz MayanfgNyengnfkShakaraagpParananbhkzAlbay BicolanobkbFinalligbtbzBeti (Cameroon)cjr ChorotegacmkChimakumdrhDarkhatdrwDarwazigav GabutamonmofzMohegan-Montauk-NarragansettmstzCataelano MandayamytzSangab MandayarmruCalósglzSanglechi-Ishkashimisul Surigaononsumz Sumo-Mayangnatnf TangshewiwgwWagawagaayxz Ayi (China)bjqzSouthern Betsimisaraka MalagasydhazDhanwar (India)dklzKolum So DogonmjaMaheinbfNaxinooNootkatieTingaltkkTakpabazTunenbjd Bandjigaliccq ChaungthackazKhumi Awa Chindapz Nisi (India)dwlzWalo Kumbe Dogonelp ElpaputihgbcGarawagioGelaohrrHoruruibiIbilojarzJarawa (Nigeria)kdvKadokghzUpper Tanudan Kalingakppz Paku KarenkzhzKenuzi-DongolalcqLuhumgxOmatinlnzDurango NahuatlpbzPalupgyPongyongscaSansutlwz South WemaleunpWororawiwWiranguybdYangbyeyenYendangymaYamphedafDandjlDjiwarliggrzAghu TharnggaluilwTalurizizIzi-Ezaa-Ikwo-MgbomegMeamldMalakhelmntMaykulanmwdMudburamyqzForest ManinkanbxNguranlrNgarlapcrPanangpprPirutggTanggawitWintuxiaXiandaoyiyz Yir YorontyosYosemoEmokggmz Gugu MinilegLengualmmLamammhhz Maskoy Pidginpuzz Purum Nagasapu SanapanáyuuYughaamAramanikadpAdapaueuǂKxʼauǁʼeinbmyz$Bemba (Democratic Republic of Congo)bxxz$Borna (Democratic Republic of Congo)byyBuyadzdDazagfxuMangetti Dune ǃXunggtizGbati-riime ImeraguenkbfKakauhuakojz Sara DunjokwqKwakkxeKakihumliiLingkhimmwjMaligonnxNgongounuǃOǃungpmuzMirpur PanjabisgoSongathxThetsfzSouthwestern TamanguokUokhaxsjSubiydszYiddish Sign LanguageymtzMator-Taygi-KaragasynhYanghobgmz Baga MbotenibtlBhatolacbe ChipiajescbhCaguacoyCoyaimacquzChilean QuechuacumCumeraldujDhuwalggnzEastern GurungggozSouthern GondiguvGeyiapIapamaillIranunkgcKassengkoxCoximaktrzKota Marudu TinagaskvsKunggarakzjzCoastal KadazankztzTambunan DusunnadNijadalints NatagaimasomeOmejespmcPalumatapodPonaresppaPaopryzPray 3rnaRunasvrSavaratduzTempasuk Dusunthcz Tai Hang TongtidTidongtmpu Tai MènetnezTinoc KallahantoeTomedesxbazKamba (Brazil)xbxuKabixíxipu Xipináwaxkh KarahawyanayriuYaríjegJengkgdKataangkrmKrimprbzLua'pukzPu KorieRienrsizRennellese Sign LanguageskkSoksnhShinabolsgzLyons Sign LanguagemwxMediakmwyMosironcpNdaktupaisz Nataoran AmisasdAsasditDiraridudz Hun-SaarelbaLuilloKhlormydMarambamyiz Mina (India)nnsNingyeaohArmaayyz Tayabas AytabbzzBabalia Creole Arabicbpb BarbacoasccaCaucacdgChamaridguDegarudrrDororoekczEastern KarnicgliGuligulikjfKhalajkxlz Nepali Kuruxkxuz Kui (India)lmzLumbeenxuNarauplpPalpasdm SemandangtbbTapebaxrqKarrangaxtz TasmanianzirZiriyathwThudambicBikarubijzVaghat-Ya-Bijim-LegeriblgBalaugjiGejiMuyaNgoni PapitalaizIja-ZubaWarapuzJudeo-Tunisian Arabic ChungmbokozLaka (Nigeria)zLango (South Sudan)PiniSamaSebuyauz Kulon-PazehWardujiWyandot)mvmngopatvkiwraajtcuglaklnopiismdsnbuunwrdwya)r )r)__doc__rewarningsr nltk.corpusrcompilerrrr$r(r*r,r6 load_wiki_qr#r&rrr r!rrrs8 bjj+, ?:C< '!B %,, ' y 4y 4y 4y 4 y  4 y  4 y 4y 4y 4y 4y 4y 4y 4y 4y 4y  4!y" 4#y$ 4%y& 4'y( 4)y* 4+y, 4-y. 4/y0 41y2 43y4 45y6 47y8 49y: 4;y< 4=y> 4?y@ 4AyB 4CyD 4EyF 4GyH 4IyJ 4KyL 4MyN 4OyP 4QyR 4SyT 4UyV 4WyX 4YyZ 4[y\ 4]y^ 4_y` 4ayb 4cyd 4eyf 4gyh 4iyj 4kyl 4myn 4oyp 4qyr 4syt 4uyv 4wyx 4yyz 4{y| 4}y~ 4y@ 4AyB 4CyD 4EyF 4GyH 4IyJ 4KyL 4MyN 4OyP 4QyR 4SyT 4UyV 4WyX 4YyZ 4[y\ 4]y^ 4_y` 4ayb 4cyd 4eyf 4gyh 4iyj 4kyl 4myn 4oyp 4qyr 4syt 4uyv 4wyx 4yyz 4{y| 4}y~ 4y@ 4AyB 4CyD 4EyF 4GyH 4IyJ 4KyL 4MyN 4OyP 4QyR 4SyT 4UyV 4WyX 4YyZ 4[y\ 4]y^ 4_y` 4ayb 4cyd 4eyf 4gyh 4iyj 4kyl 4myn 4oyp 4qyr 4syt 4uyv 4wyx 4yyz 4{y| 4}y~ 4y@ 4AyB 4CyD 4EyF 4GyH 4IyJ 4KyL 4MyN 4OyP 4QyR 4SyT 4UyV 4WyX 4YyZ 4[y\ 4]y^ 4_y` 4ayb 4cyd 4eyf 4gyh 4iyj 4kyl 4myn 4oyp 4qyr 4syt 4uyv 4wyx 4yyz 4{y| 4}y~ 4y@ 4AyB 4CyD 4EyF 4GyH 4IyJ 4KyL 4MyN 4OyP 4QyR 4SyT 4UyV              qy xu u ;u 8u : u  > u  < u ?u 7u 9u "u u >u <u :u >u  8!u" 6#u$ %u& ('u( <)u* 9+u, $-u. 9/u0 :1u2 3u4 95u6 %7u8 9u: 8;u< 5=u> 6?u@ :AuB :CuD 8EuF 8GuH 6IuJ :KuL MuN OuP 9QuR ;SuT >UuV 8WuX 7YuZ ;[u\ 7]u^ 9_u` aub 8cud euf 9guh 8iuj 6kul 9mun 8oup qur 9sut :uuv wux OuP ?QuR 9SuT =UuV )WuX YuZ [u\ ]u^ 9_u` aub !cud euf guh ;iuj CuD EuF ;GuH 8IuJ 7KuL 8MuN 7OuP QuR 6SuT "UuV guh 8iuj 9kul 9mun 9oup 8qur 5sut :uuv wux 7yuz {u| 5}u~ :u@ :AuB 9CuD EuF 7GuH 8IuJ 8KuL 6MuN 8OuP 7QuR 9SuT _u` 9aub ;cud 7euf 9guh iuj 9kul 8mun oup qur 5sut 8uuv 8wux 9yuz 8{u| }u~ :u@  A uB  C uD  :E uF  U uV  8W uX  6Y uZ  >[ u\  "] u^  ;_ u`  7a ub  9c ud  8e uf  8g uh  i uj  :k ul  8m un  >o up  =q ur  8s ut  7u uv  7w ux  ;y uz  8{ u|  :} u~  ; u@  8A uB  8C uD  8E uF  #G uH  7I uJ  6K uL      "        i u p + & !-0r