JL i!dZddlZ ddlmZddlmZejdZ Gd d Z d Z d Z e fd Ze fdZdZdZy#e$r dZdZdZYCwxYw)z A port of the Gale-Church Aligner. Gale & Church (1993), A Program for Aligning Sentences in Bilingual Corpora. https://aclweb.org/anthology/J93-1004.pdf N)logsf)normct|}ddd|zzz }|tj| |zdz |d|d|d|d|d|d |d |d |d zzzzzzzzzzzzzzzzzzz}|d k\r|Sd|z S)zComplementary error function.?gś??g5?g`yg?gƸ?gꪂIǿg#v?g9)gS?gޅ1Ogv(?gg@)absmathexp)xztrs `/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/translate/gale_church.pyerfccrs F S1W   BF "' *"#$/&''1Aq:~9U4V'V'X%X#"!"      < 8H7NcPddt|tjdz zz S)u>Return the area under the normal distribution from M{-∞..x}.rr)rr sqrtr s rnorm_cdfr@s$3q499Q</0000rcz tjdt|z S#t$rt dcYSwxYw)Nr-inf)r logr ValueErrorfloatrs r norm_logsfrDs7 !88A O, , !=  !s #::rc&eZdZdddddddZdZdZy) LanguageIndependentgׁsF?g{Gz?gbX9ȶ?gI +?))rr)rr)rr)rr)rr)rrrg333333@N)__name__ __module__ __qualname__PRIORSAVERAGE_CHARACTERSVARIANCE_CHARACTERSrrrrNs+  Frrcg}t|t|f}|dk7rtd|Drv ||\}}t|D]7}t|D]'}|j |d|z dz |d|z dz f)9|d|z |d|z f}|dk7rtd|Drv|dddS#t$r|ddz |ddz f}YwxYw)a Traverse the alignment cost from the tracebacks and retrieves appropriate sentence pairs. :param backlinks: A dictionary where the key is the alignment points and value is the cost (referencing the LanguageIndependent.PRIORS) :type backlinks: dict :param source_sents_lens: A list of target sentences' lengths :type source_sents_lens: list(int) :param target_sents_lens: A list of target sentences' lengths :type target_sents_lens: list(int) )rrc3&K|] }|dk\ yw)rNr%).0ps r ztrace..ns$>Q!V$>srrN)lenall TypeErrorrangeappend) backlinkssource_sents_lenstarget_sents_lenslinkspositionsr ijs rtracer9`s E%&,=(>?H f $>X$>!> X&DAqq IA1X I hqkAo18A;?Q3FGH I IQK!OXa[1_5 f $>X$>!> 2;   a!q9H  sB..C  C ctfdt|dD}tfdt|dD} |||jz zdz }||jz|z tj||j zz } ttt| ztj|j|z S#t $rtdcYSwxYw)aPReturns the log probability of the two sentences C{source_sents[i]}, C{target_sents[j]} being aligned with a specific C{alignment}. @param i: The offset of the source sentence. @param j: The offset of the target sentence. @param source_sents: The list of source sentence lengths. @param target_sents: The list of target sentence lengths. @param alignment: The alignment type, a tuple of two integers. @param params: The sentence alignment parameters. @returns: The log probability of a specific alignment between the two sentences, given the parameters. c34K|]}|z dz ywrNr%)r(offsetr7 source_sentss rr*z!align_log_prob.. Mvl1v:>* Mrc34K|]}|z dz ywr<r%)r(r=r8 target_sentss rr*z!align_log_prob..r?r@rrr) sumr/r#r rr$ZeroDivisionErrorrLOG2rrrr") r7r8r>rB alignmentparamsl_sl_tmdeltas ```` ralign_log_probrL|s  My|9L M MC My|9L M MC3222 2a 7v00036$)) ** *;   Js5z* *TXXfmmI6N-O O PP V}sA CC('C(c Jt|jj}gg}i}tt |dzD]}tt |dzD]}t d}d} |D]J} d| dz } || dz } | t | ks| dkr(|| | t ||||| |z} | |ksG| }| } L|t dk(rd}| |||f<|dj|t |dkDr|jd|jgt|||S)aReturn the sentence alignment of two text blocks (usually paragraphs). >>> align_blocks([5,5,5], [7,7,7]) [(0, 0), (1, 1), (2, 2)] >>> align_blocks([10,5,5], [12,20]) [(0, 0), (1, 1), (2, 1)] >>> align_blocks([12,20], [10,5,5]) [(0, 0), (1, 1), (1, 2)] >>> align_blocks([10,2,10,10,2,10], [12,3,20,3,12]) [(0, 0), (1, 1), (2, 2), (3, 2), (4, 3), (5, 4)] @param source_sents_lens: The list of source sentence lengths. @param target_sents_lens: The list of target sentence lengths. @param params: the sentence alignment parameters. @return: The sentence alignments, a list of index pairs. rinfNr+rr) listr"keysr/r,rrLr0popr9)r2r3rGalignment_typesDr1r7r8min_dist min_alignaprev_iprev_jr)s r align_blocksrYsX$6==--/0O AI 3()A- .s,-12 #AU|HI$ "adQqTSVG#vzfIf%q+->6)x< H !I "5<' )Iq!f  bELL "' #* q6A: EE!H  14 -/@ AArc t|t|k7r tdt||Dcgc]\}}t|||c}}Scc}}w)aCreates the sentence alignment of two texts. Texts can consist of several blocks. Block boundaries cannot be crossed by sentence alignment links. Each block consists of a list that contains the lengths (in characters) of the sentences in this block. @param source_blocks: The list of blocks in the source text. @param target_blocks: The list of blocks in the target text. @param params: the sentence alignment parameters. @returns: A list of sentence alignment lists z>Source and target texts do not have the same number of blocks.)r,rziprY) source_blocks target_blocksrG source_block target_blocks r align_textsr`s\ =S// L  +.m]*K  &L, \<8  sA c#NKfd} |jw)zSplits an iterator C{it} at values of C{split_value}. Each instance of C{split_value} is swallowed. The iterator produces subiterators which need to be consumed fully before the next subiterator can be used. c3RK|}|k7r|j}|k7ryywNnext)firstvit split_values r_chunk_iteratorz!split_at.._chunk_iterators. ;G A;s!''rd)rhrirjs`` rsplit_atrks' bggi(( s!%c t||Dcgc],}t||Dcgc]}td|Dc}.c}}Scc}wcc}}w)zParses a stream of tokens and splits it into sentences (using C{soft_delimiter} tokens) and blocks (using C{hard_delimiter} tokens) for use with the L{align_texts} function. c32K|]}t|ywrc)r,)r(tokens rr*z%parse_token_stream..s4uE 4s)rkrC)streamsoft_delimiterhard_delimiterblock_it sentence_its rparse_token_streamrtsW!8    (.A  4 4 4    sA A A A )__doc__r rrr scipy.stats ImportErrorrrrrErr9rLrYr`rkrtr%rrrxs 4!( jtxx{$8Q8?R3Bl6I:)$ M1!%N1![1!s AAA