JL i{dZddlZddlZddlmZddlmZddlmZddl m Z ddl m Z Gdd Z dd ZGd d eZd ZddZdZdZdZdZdZdZedk(reyy)z Tools for reading and writing dependency trees. The input is assumed to be in Malt-TAB format (https://stp.lingfil.uu.se/~nivre/research/MaltXML.html). N) defaultdict)chain)pformat) find_binary)TreeceZdZdZ ddZdZdZdZdZdZ d Z d Z d Z d Z d Ze ddZdZdZdZ ddZd dZdZdZd!dZdZdZdZdZdZdZy)"DependencyGraphzQ A container for the nodes and labelled edges of a dependency structure. Nctd|_|jdjddddd|_|r|j |||||yy)aDependency graph. We place a dummy `TOP` node with the index 0, since the root node is often assigned 0 as its head. This also means that the indexing of the nodes corresponds directly to the Malt-TAB format, which starts at 1. If zero-based is True, then Malt-TAB-like input with node numbers starting at 0 and the root node assigned -1 (as produced by, e.g., zpar). :param str cell_separator: the cell separator. If not provided, cells are split by whitespace. :param str top_relation_label: the label by which the top relation is identified, for examlple, `ROOT`, `null` or `TOP`. c 4dddddddttdd S)N) addresswordlemmactagtagfeatsheaddepsrel)rlist`/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/parse/dependencygraph.pyz*DependencyGraph.__init__..=s)#D) rrTOP)rrr N)cell_extractor zero_basedcell_separatortop_relation_label)rnodesupdateroot_parse)selftree_strrrrrs r__init__zDependencyGraph.__init__$sc0!     1 eEaHI  KK-%-#5   rc|j|=y)zw Removes the node with the given address. References to this node in others will still exist. Nr)r#r s rremove_by_addressz!DependencyGraph.remove_by_addressWs JJw rc|jjD]:}g}|dD])}||vr|j||j|+||d<<y)zp Redirects arcs to any of the nodes in the originals list to the redirect node address. rN)rvaluesappend)r# originalsredirectnodenew_depsdeps r redirect_arcszDependencyGraph.redirect_arcs^sc JJ%%' $DHF| ))#OOH-OOC(  ) $DL $rc|j|d}|j|dj|g|j|d|j|y)zw Adds an arc from the node specified by head_address to the node specified by the mod address. rrN)r setdefaultr+)r# head_address mod_addressrelations radd_arczDependencyGraph.add_arclsV ::k*51 < (33HbA < (299+Frc|jjD]j}|jjD]K}|d|dk7s|ddk7s|d}|dj|g|d|j|dMly)zr Fully connects all non-root nodes. All nodes are set to be dependents of the root node. r rrrN)rr*r3r+)r#node1node2r6s r connect_graphzDependencyGraph.connect_graphvs ZZ&&( EE**, E#uY'77E%LE>> dg = DependencyGraph( ... 'John N 2\n' ... 'loves V 0\n' ... 'Mary N 2' ... ) >>> print(dg.to_dot()) digraph G{ edge [dir=forward] node [shape=plaintext] 0 [label="0 (None)"] 0 -> 2 [label="ROOT"] 1 [label="1 (John)"] 2 [label="2 (loves)"] 2 -> 1 [label=""] 2 -> 3 [label=""] 3 [label="3 (Mary)"] } z digraph G{ zedge [dir=forward] znode [shape=plaintext] c |dSNr r)vs rrz(DependencyGraph.to_dot..s a lr)keyz {} [label="{} ({})"]r r rz {} -> {} [label="{}"]z {} -> {} z })sortedrr*formatitems)r#sr.rrr0s rto_dotzDependencyGraph.to_dots0  ## ''4::,,.4JK HD )00YYV  A "&\//1 H THC6==d9osTWXX]11$y/3GG H H H U rc8|j}t|S)aShow SVG representation of the transducer (IPython magic). >>> from nltk.test.setup_fixt import check_binary >>> check_binary('dot') >>> dg = DependencyGraph( ... 'John N 2\n' ... 'loves V 0\n' ... 'Mary N 2' ... ) >>> dg._repr_svg_().split('\n')[0] '' )rKdot2img)r# dot_strings r _repr_svg_zDependencyGraph._repr_svg_s[[] z""rc,t|jSN)rrr#s r__str__zDependencyGraph.__str__stzz""rc4dt|jdS)Nz)lenrrRs r__repr__zDependencyGraph.__repr__s'DJJ'8@@rc t|5}|jjdDcgc]}t||||c}cdddScc}w#1swYyxYw)a :param filename: a name of a file in Malt-TAB format :param zero_based: nodes in the input file are numbered starting from 0 rather than 1 (as produced by, e.g., zpar) :param str cell_separator: the cell separator. If not provided, cells are split by whitespace. :param str top_relation_label: the label by which the top relation is identified, for examlple, `ROOT`, `null` or `TOP`. :return: a list of DependencyGraphs  )rrrN)openreadsplitr )filenamerrrinfiler$s rloadzDependencyGraph.loadsh (^ v!' 3 3F ;  )#1'9     s"AAAAActj|j|dj}|j|dt fd|DS)zl Returns the number of left children under the node specified by the given address. rr c3.K|] }|ks dywNr.0cindexs r z0DependencyGraph.left_children..4!e)14 r from_iterablerr*sumr# node_indexchildrenrfs @r left_childrenzDependencyGraph.left_childrenP &&tzz*'=f'E'L'L'NO :&y14h444rctj|j|dj}|j|dt fd|DS)zm Returns the number of right children under the node specified by the given address. rr c3.K|] }|kDs dywrarrcs rrgz1DependencyGraph.right_children..rhrirjrms @rright_childrenzDependencyGraph.right_childrenrqrcp|j|ds"|j|dj|yyrD)rArr )r#r.s radd_nodezDependencyGraph.add_nodes4$$T)_5 JJtI ' . .t 46rc d}d}d}d} |||| d} t|trd|jdD}d|D} d | D} d } t| d D]\} }|j|}| t |} n| t |k(sJ| | | } ||| \} }}}}}}}|dk(rVt|}|r|d z }|j| j| |||||||d| dk(r|dk(r|}|j|d|j| |jdd|r4|jdd|d}|j||_ ||_y tj dy #t $r }t d j| |d }~wwxYw#tt f$r||\}}}}}}}YwxYw)aParse a sentence. :param extractor: a function that given a tuple of cells returns a 7-tuple, where the values are ``word, lemma, ctag, tag, feats, head, rel``. :param str cell_separator: the cell separator. If not provided, cells are split by whitespace. :param str top_relation_label: the label by which the top relation is identified, for examlple, `ROOT`, `null` or `TOP`. c"|\}}}|||||d|dfSNr)cellsrfr rrs rextract_3_cellsz/DependencyGraph._parse..extract_3_cellss$#OD#t$c3D"< .extract_4_cellss'#( D#tS$c3D#= =rc`|\}}}}}}} t|}|||||d||fS#t$rYwxYwryint ValueError) r{rf line_indexr rr_rrs rextract_7_cellsz/DependencyGraph._parse..extract_7_cells sQ9> 6JeS!T3 J$sCT3> >  s ! --c f|\ }}}}}}}} } } t|}|||||||| fS#t$rYwxYwrQr) r{rfrr rrrrrrrs rextract_10_cellsz0DependencyGraph._parse..extract_10_cells)sWIN FJeT3tS!Q J$tS%sB B  s $ 00) c3 K|]}|ywrQr)rdlines rrgz)DependencyGraph._parse..:s:td:s  c3<K|]}|jywrQ)rstriprdls rrgz)DependencyGraph._parse..<s,,sc3&K|] }|s| ywrQrrs rrgz)DependencyGraph._parse..=s'qQ'sNrb)startTNumber of tab-delimited fields ({}) not supported by CoNLL(10) or Malt-Tab(4) formatr)r r rrrrrrrrrzBThe graph doesn't contain a node that depends on the root element.) isinstancestrr[ enumeraterUKeyErrorrrH TypeErrorrrr r+r!rwarningswarn)r#input_rrrrr|r~rr extractorslines cell_numberrfrr{er rrrrrr root_addresss rr"zDependencyGraph._parses9, = > ? C   fc ":v||D'9:F,V,'E' $U!41 8KE4JJ~.E"!%j "c%j000%%/ %>> dg = DependencyGraph(treebank_data) >>> dg.contains_cycle() False >>> cyclic_dg = DependencyGraph() >>> top = {'word': None, 'deps': [1], 'rel': 'TOP', 'address': 0} >>> child1 = {'word': None, 'deps': [2], 'rel': 'NTOP', 'address': 1} >>> child2 = {'word': None, 'deps': [4], 'rel': 'NTOP', 'address': 2} >>> child3 = {'word': None, 'deps': [1], 'rel': 'NTOP', 'address': 3} >>> child4 = {'word': None, 'deps': [3], 'rel': 'NTOP', 'address': 4} >>> cyclic_dg.nodes = { ... 0: top, ... 1: child1, ... 2: child2, ... 3: child3, ... 4: child4, ... } >>> cyclic_dg.root = top >>> cyclic_dg.contains_cycle() [1, 2, 4, 3] rr rbrF)rr*tupleget_cycle_pathr?) r# distancesr.r0rFr new_entriespair1pair2pairpaths rcontains_cyclezDependencyGraph.contains_cycles/4 JJ%%' #DF| #T)_c23!" # # #  AK" O&OEQx58+#U1XuQx$89+4U+;i>N+N C(O O $ "-d"3 $7d1g%..t/B/B47/KTRSWUDK   rc|dD]}||k(s |dgcS|dD]J}|j|j||}t|dkDs3|jd|d|cSgS)Nrr r)rr?rUinsert)r# curr_nodegoal_node_indexr0rs rrzDependencyGraph.get_cycle_pathsV$ .Co%!),-- .V$ C&&t':':3'?QD4y1} Ay34    rc|dk(rdn*|dk(rdn"|dk(rdntdj|djfd t|jj DS) z The dependency graph in CoNLL format. :param style: the style to use for the format (3, 4, 10 columns) :type style: int :rtype: str rz{word} {tag} {head} rz{word} {tag} {head} {rel} rz9{i} {word} {lemma} {ctag} {tag} {feats} {head} {rel} _ _ rrzc3\K|]#\}}|ddk7rjdd|i|%yw)rrrNr)rH)rdrr.templates rrgz+DependencyGraph.to_conll.. s< 4E{e# HOO (a (4 ( s),)rrHjoinrGrrI)r#stylers @rto_conllzDependencyGraph.to_conlls{ A:0H aZ7H b[U 228&-  ww !$**"2"2"45   rcddl}ttdt|j}|Dcgc]7}|j |s||j ||j |f9}}i|_|D]!}|j|d|j|<#|j}|j||j||Scc}w)zJConvert the data in a ``nodelist`` into a networkx labeled directed graph.rNrbr ) networkxrrangerUrrr nx_labels MultiDiGraphadd_nodes_fromadd_edges_from)r#r nx_nodelistn nx_edgelistgs rnx_graphzDependencyGraph.nx_graphs5C O45 4? /0488A;Q TYYq\ *   6A $ 1 f 5DNN1  6  ! ! # % % s C%C)NNFNROOT)FNr)NFNr)TrQ)__name__ __module__ __qualname____doc__r%r(r1r7r;r?rArKrOrSrV staticmethodr^rprtrvr"rrrrrrrrrrrrrr r s ! 1f  $G E(*+Z# #ALR4555! xt  = . 0d  :rr c@ td |dvrtjdd|zgd|d}n'tjdd|zgt|d}|jS#t d j |xYw#t$r}t d |d }~wwxYw) a Create image representation fom dot_string, using the 'dot' program from the Graphviz package. Use the 't' argument to specify the image file format, for ex. 'jpeg', 'eps', 'json', 'png' or 'webp' (Running 'dot -T:' lists all available formats). Note that the "capture_output" option of subprocess.run() is only available with text formats (like svg), but not with binary image formats (like png). dot)rdot_jsonjsonsvgz-T%sT)capture_outputinputtextutf8)encoding)rzACannot create image representation by running dot from string: {}z0Cannot find the dot binary from Graphviz packageN)r subprocessrunbytesstdout ExceptionrHOSError)rNtprocrs rrMrM&sSE 66!~~FQJ'#'$ "~~FQJ' V<;;  6*%  SJKQRRSs) BAA$$BB B BBceZdZdZy)DependencyGraphErrorzDependency graph exception.N)rrrrrrrrrKs%rrcTttttyrQ) malt_demo conll_democonll_file_democycle_finding_demorrrdemorOs KLrctd}|j}|j|rddl}ddlm}|j }|j|j|d}|j||d|j|||j|jg|jg|jd |jyy) zw A demonstration of the result of reading a dependency version of the first sentence of the Penn Treebank. Pierre NNP 2 NMOD Vinken NNP 8 SUB , , 2 P 61 CD 5 NMOD years NNS 6 AMOD old JJ 2 NMOD , , 2 P will MD 0 ROOT join VB 8 VC the DT 11 NMOD board NN 9 OBJ as IN 9 VMOD a DT 15 NMOD nonexecutive JJ 15 NMOD director NN 12 PMOD Nov. NNP 9 VMOD 29 CD 16 NMOD . . 9 VMOD rN)pylabrb)dim2) node_sizeztree.png)r rpprintr matplotlibrrinfo spring_layoutdraw_networkx_nodesdraw_networkx_labelsrxticksytickssavefigshow)nxdgrrrrposs rrrVs   B* 779DKKM $ KKM $$QA$.$$Qr$:%%abll; R R j!  rctt}|j}|jt |t |j dy)zg A demonstration of how to read a string representation of a CoNLL format dependency tree. rN)r conll_data1rrprintr)rrs rrrs9  %B 779DKKM "I "++a.rctdtjdDcgc]}|st|}}|D]-}|j }td|j /ycc}w)NzMass conll_read demo...rXr)r  conll_data2r[r rr)entrygraphsgraphrs rrrs` #$2=2C2CF2K Uuoe$ UF Uzz| d  Vs A- A-ctt}t|jt}|j ddgddd|j ddgddd|j ddgddd|j ddgddd|j ddgdddt|jy) Nrbrr)r rrr NTOPrr)r treebank_datar rrv)r cyclic_dgs rrrs  'B "   !I qc%ANO qc&QOP qc&QOP qc&QOP qc&QOP ) " " $%rra/ 1 Ze ze Pron Pron per|3|evofmv|nom 2 su _ _ 2 had heb V V trans|ovt|1of2of3|ev 0 ROOT _ _ 3 met met Prep Prep voor 8 mod _ _ 4 haar haar Pron Pron bez|3|ev|neut|attr 5 det _ _ 5 moeder moeder N N soort|ev|neut 3 obj1 _ _ 6 kunnen kan V V hulp|ott|1of2of3|mv 2 vc _ _ 7 gaan ga V V hulp|inf 6 vc _ _ 8 winkelen winkel V V intrans|inf 11 cnj _ _ 9 , , Punc Punc komma 8 punct _ _ 10 zwemmen zwem V V intrans|inf 11 cnj _ _ 11 of of Conj Conj neven 7 vc _ _ 12 terrassen terras N N soort|mv|neut 11 cnj _ _ 13 . . Punc Punc punt 12 punct _ _ a1 Cathy Cathy N N eigen|ev|neut 2 su _ _ 2 zag zie V V trans|ovt|1of2of3|ev 0 ROOT _ _ 3 hen hen Pron Pron per|3|mv|datofacc 2 obj1 _ _ 4 wild wild Adj Adj attr|stell|onverv 5 mod _ _ 5 zwaaien zwaai N N soort|mv|neut 2 vc _ _ 6 . . Punc Punc punt 5 punct _ _ 1 Ze ze Pron Pron per|3|evofmv|nom 2 su _ _ 2 had heb V V trans|ovt|1of2of3|ev 0 ROOT _ _ 3 met met Prep Prep voor 8 mod _ _ 4 haar haar Pron Pron bez|3|ev|neut|attr 5 det _ _ 5 moeder moeder N N soort|ev|neut 3 obj1 _ _ 6 kunnen kan V V hulp|ott|1of2of3|mv 2 vc _ _ 7 gaan ga V V hulp|inf 6 vc _ _ 8 winkelen winkel V V intrans|inf 11 cnj _ _ 9 , , Punc Punc komma 8 punct _ _ 10 zwemmen zwem V V intrans|inf 11 cnj _ _ 11 of of Conj Conj neven 7 vc _ _ 12 terrassen terras N N soort|mv|neut 11 cnj _ _ 13 . . Punc Punc punt 12 punct _ _ 1 Dat dat Pron Pron aanw|neut|attr 2 det _ _ 2 werkwoord werkwoord N N soort|ev|neut 6 obj1 _ _ 3 had heb V V hulp|ovt|1of2of3|ev 0 ROOT _ _ 4 ze ze Pron Pron per|3|evofmv|nom 6 su _ _ 5 zelf zelf Pron Pron aanw|neut|attr|wzelf 3 predm _ _ 6 uitgevonden vind V V trans|verldw|onverv 3 vc _ _ 7 . . Punc Punc punt 6 punct _ _ 1 Het het Pron Pron onbep|neut|zelfst 2 su _ _ 2 hoorde hoor V V trans|ovt|1of2of3|ev 0 ROOT _ _ 3 bij bij Prep Prep voor 2 ld _ _ 4 de de Art Art bep|zijdofmv|neut 6 det _ _ 5 warme warm Adj Adj attr|stell|vervneut 6 mod _ _ 6 zomerdag zomerdag N N soort|ev|neut 3 obj1 _ _ 7 die die Pron Pron betr|neut|zelfst 6 mod _ _ 8 ze ze Pron Pron per|3|evofmv|nom 12 su _ _ 9 ginds ginds Adv Adv gew|aanw 12 mod _ _ 10 achter achter Adv Adv gew|geenfunc|stell|onverv 12 svp _ _ 11 had heb V V hulp|ovt|1of2of3|ev 7 body _ _ 12 gelaten laat V V trans|verldw|onverv 11 vc _ _ 13 . . Punc Punc punt 12 punct _ _ 1 Ze ze Pron Pron per|3|evofmv|nom 2 su _ _ 2 hadden heb V V trans|ovt|1of2of3|mv 0 ROOT _ _ 3 languit languit Adv Adv gew|geenfunc|stell|onverv 11 mod _ _ 4 naast naast Prep Prep voor 11 mod _ _ 5 elkaar elkaar Pron Pron rec|neut 4 obj1 _ _ 6 op op Prep Prep voor 11 ld _ _ 7 de de Art Art bep|zijdofmv|neut 8 det _ _ 8 strandstoelen strandstoel N N soort|mv|neut 6 obj1 _ _ 9 kunnen kan V V hulp|inf 2 vc _ _ 10 gaan ga V V hulp|inf 9 vc _ _ 11 liggen lig V V intrans|inf 10 vc _ _ 12 . . Punc Punc punt 11 punct _ _ 1 Zij zij Pron Pron per|3|evofmv|nom 2 su _ _ 2 zou zal V V hulp|ovt|1of2of3|ev 7 cnj _ _ 3 mams mams N N soort|ev|neut 4 det _ _ 4 rug rug N N soort|ev|neut 5 obj1 _ _ 5 ingewreven wrijf V V trans|verldw|onverv 6 vc _ _ 6 hebben heb V V hulp|inf 2 vc _ _ 7 en en Conj Conj neven 0 ROOT _ _ 8 mam mam V V trans|ovt|1of2of3|ev 7 cnj _ _ 9 de de Art Art bep|zijdofmv|neut 10 det _ _ 10 hare hare Pron Pron bez|3|ev|neut|attr 8 obj1 _ _ 11 . . Punc Punc punt 10 punct _ _ 1 Of of Conj Conj onder|metfin 0 ROOT _ _ 2 ze ze Pron Pron per|3|evofmv|nom 3 su _ _ 3 had heb V V hulp|ovt|1of2of3|ev 0 ROOT _ _ 4 gewoon gewoon Adj Adj adv|stell|onverv 10 mod _ _ 5 met met Prep Prep voor 10 mod _ _ 6 haar haar Pron Pron bez|3|ev|neut|attr 7 det _ _ 7 vriendinnen vriendin N N soort|mv|neut 5 obj1 _ _ 8 rond rond Adv Adv deelv 10 svp _ _ 9 kunnen kan V V hulp|inf 3 vc _ _ 10 slenteren slenter V V intrans|inf 9 vc _ _ 11 in in Prep Prep voor 10 mod _ _ 12 de de Art Art bep|zijdofmv|neut 13 det _ _ 13 buurt buurt N N soort|ev|neut 11 obj1 _ _ 14 van van Prep Prep voor 13 mod _ _ 15 Trafalgar_Square Trafalgar_Square MWU N_N eigen|ev|neut_eigen|ev|neut 14 obj1 _ _ 16 . . Punc Punc punt 15 punct _ _ __main__)r)F)rrr collectionsr itertoolsrrrnltk.internalsr nltk.treerr rMrrrrrrrrr rrrrrrs #&DDN"SJ&9&*Z  & ( T l zFr