JL i 8dZddlZddlZ ddlZddlZGddeZ dZ dZ d Z d Z d Zd Zd ZdZdZdZdZdZdZdZdZdZdZd&dZdZdZdZdZdZ dZ!dZ"d'd Z#d!Z$d"Z%d#Z&d'd$Z'd'd%Z(y#e$rededYwxYw)(a ============================================ TGrep search implementation for NLTK trees ============================================ This module supports TGrep2 syntax for matching parts of NLTK Trees. Note that many tgrep operators require the tree passed to be a ``ParentedTree``. External links: - `Tgrep tutorial `_ - `Tgrep2 manual `_ - `Tgrep2 source `_ Usage ===== >>> from nltk.tree import ParentedTree >>> from nltk.tgrep import tgrep_nodes, tgrep_positions >>> tree = ParentedTree.fromstring('(S (NP (DT the) (JJ big) (NN dog)) (VP bit) (NP (DT a) (NN cat)))') >>> list(tgrep_nodes('NN', [tree])) [[ParentedTree('NN', ['dog']), ParentedTree('NN', ['cat'])]] >>> list(tgrep_positions('NN', [tree])) [[(0, 2), (2, 1)]] >>> list(tgrep_nodes('DT', [tree])) [[ParentedTree('DT', ['the']), ParentedTree('DT', ['a'])]] >>> list(tgrep_nodes('DT $ JJ', [tree])) [[ParentedTree('DT', ['the'])]] This implementation adds syntax to select nodes based on their NLTK tree position. This syntax is ``N`` plus a Python tuple representing the tree position. For instance, ``N()``, ``N(0,)``, ``N(0,0)`` are valid node selectors. Example: >>> tree = ParentedTree.fromstring('(S (NP (DT the) (JJ big) (NN dog)) (VP bit) (NP (DT a) (NN cat)))') >>> tree[0,0] ParentedTree('DT', ['the']) >>> tree[0,0].treeposition() (0, 0) >>> list(tgrep_nodes('N(0,0)', [tree])) [[ParentedTree('DT', ['the'])]] Caveats: ======== - Link modifiers: "?" and "=" are not implemented. - Tgrep compatibility: Using "@" for "!", "{" for "<", "}" for ">" are not implemented. - The "=" and "~" links are not implemented. Known Issues: ============= - There are some issues with link relations involving leaf nodes (which are represented as bare strings in NLTK trees). For instance, consider the tree:: (S (A x)) The search string ``* !>> S`` should select all nodes which are not dominated in some way by an ``S`` node (i.e., all nodes which are not descendants of an ``S``). Clearly, in this tree, the only node which fulfills this criterion is the top node (since it is not dominated by anything). However, the code here will find both the top node and the leaf node ``x``. This is because we cannot recover the parent of the leaf, since it is stored as a bare string. A possible workaround, when performing this kind of search, would be to filter out all leaf nodes. Implementation notes ==================== This implementation is (somewhat awkwardly) based on lambda functions which are predicates on a node. A predicate is a function which is either True or False; using a predicate function, we can identify sets of nodes with particular properties. A predicate function, could, for instance, return True only if a particular node has a label matching a particular regular expression, and has a daughter node which has no sisters. Because tgrep2 search strings can do things statefully (such as substituting in macros, and binding nodes with node labels), the actual predicate function is declared with three arguments:: pred = lambda n, m, l: return True # some logic here ``n`` is a node in a tree; this argument must always be given ``m`` contains a dictionary, mapping macro names onto predicate functions ``l`` is a dictionary to map node labels onto nodes in the tree ``m`` and ``l`` are declared to default to ``None``, and so need not be specified in a call to a predicate. Predicates which call other predicates must always pass the value of these arguments on. The top-level predicate (constructed by ``_tgrep_exprs_action``) binds the macro definitions to ``m`` and initialises ``l`` to an empty dictionary. NzAWarning: nltk.tgrep will not work without the `pyparsing` packagez installed.ceZdZdZy)TgrepExceptionzTgrep exception type.N)__name__ __module__ __qualname____doc__P/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/tgrep.pyrr|sr rcg} |j}|r$|j||j}|r$|S#t$r|cYSwxYw)z Returns the list of all nodes dominating the given tree node. This method will not work with leaf nodes, since there is no way to recover the parent. )parentAttributeErrorappendnoderesultscurrents r ancestorsrsX G++- w.."  N s< A  A cg} |j}|r@t|dk(r2|j||j}|rt|dk(r2|S#t$r|cYSwxYw)zt Returns the list of all nodes dominating the given node, where there is only a single path of descent. )r rlenrrs r unique_ancestorsrsn G++- c'la'w.." c'la' N sA A&%A&c| |j}|ddDcgc]}|| c}S#t$rgcYSwxYwcc}w)ze Returns the list of all nodes which are descended from the given tree node in some way. rN) treepositionsrrtreeposxs r _descendantsrsJ $$&%QR[ )DG ))   )s( 9 66c |j}|ddDcgc]}td|Ds||c}S#t$rgcYSwxYwcc}w)zf Returns the set of all nodes descended in some way through left branches from this node. rNc3&K|] }|dk( yw)rNr ).0ys r z(_leftmost_descendants..s/B1Q/Bs)rrallrs r _leftmost_descendantsr%sW $$&%QR[ CC/B/B,BDG CC   Cs;A A  A A c t|j}tdt |dzDcgc] }||d| c}S#t$rgcYSwxYwcc}w)zg Returns the set of all nodes descended in some way through right branches from this node. rN)maxrrranger)rrightmost_leafis r _rightmost_descendantsr+sb T//12/4As>7JQ7N.O PD# $ PP   PsAA AAcJt|tjjS)z5Predicate to check whether `obj` is a nltk.tree.Tree.) isinstancenltktreeTree)objs r _istreer2s c499>> **r cg}|}|rKt|r@t|dk(r2|d}|j||rt|rt|dk(r2|S)zx Returns the list of all nodes descended from the given node, where there is only a single path of descent. rr)r2rrrs r _unique_descendantsr4sV GG gg&3w<1+<!*w gg&3w<1+< Nr c  |j}|j}|jDcgc]%}|dt ||dt |ks!||'c}S#t$rgcYSwxYwcc}w)zF Returns the set of all nodes that are before the given node. N treepositionrootrrrrposr/rs r _beforer;s!yy{"//1 SQzS]S3q6]5RDG SS   S A!"A2A2! A/.A/c4 |j}|j}t|dz }d|kr||dk(r|dz}d|kr ||dk(r|dkrgSt |d|dz}|dxxdzcc<||}|gt |zS#t$rgcYSwxYw)z Returns the set of all nodes that are immediately before the given node. Tree node A immediately precedes node B if the last terminal symbol (word) produced by A immediately precedes the first terminal symbol produced by B. rrN)r7r8rrlistr+)rr:r/idxbefores r _immediately_beforerCs!yy{ c(Q,C s(s3x1} q s(s3x1} Qw s9S1W~ CGqLG #YF 8,V4 44  s B BBc  |j}|j}|jDcgc]%}|dt ||dt |kDs!||'c}S#t$rgcYSwxYwcc}w)zE Returns the set of all nodes that are after the given node. Nr6r9s r _afterrEr<r=c |j}|j}|j}t |dz }d|krC||t |dz k(r/|dz}|j}d|kr||t |dz k(r/|dkrgSt |d|dz}|dxxdz cc<||}|gt |zS#t$rgcYSwxYw)z Returns the set of all nodes that are immediately after the given node. Tree node A immediately follows node B if the first terminal symbol (word) produced by A immediately follows the last terminal symbol produced by B. rrNr?)r7r8r rrr@r%)rr:r/rrAafters r _immediately_afterrH s!yy{++- c(Q,C s(s3x3w.macro_use:s8 9 !+ 6*\!BC Cq}Q1%%r NNr)_s_ltokensrUrTs @r _tgrep_macro_use_actionr[2sI v;!   !9Q<3  12J& r c|ddk(r|dd}t|dkDrJtt|ddddgk(sJ|dddDcgc]}tdd|g}}d|St |ddr|dS|dd k(s|dd k(rdd S|dj d rG|dj d sJ|ddd jdd jdd}d|S|dj dr:|dj dsJ|ddd }dtj|S|dj dr+t|||dddjg}d|Sd|dScc}w)zq Builds a lambda function representing a predicate on a tree node depending on the name of its node. r'rN|cdfd S)Nc6tfdDS)Nc32K|]}|ywNr )r!frSrRrQs r r#zI_tgrep_node_action......Ps7Nq!Q 7Nany)rQrRrSts```r z6_tgrep_node_action....PsC7NA7N4Nr rVr )rhs`r riz$_tgrep_node_action..PsNr __call__*__cy)NTr )rQrRrSs r riz$_tgrep_node_action..Wsr "r?z\"z\\\cdfd S)Nc t|k(SrcrLrQrRrSss r riz6_tgrep_node_action....\4Ma4PTU4Ur rVr rts`r riz$_tgrep_node_action..\Ur /cdfd S)Nc8jt|Src)searchrLrQrRrSrs r riz6_tgrep_node_action....bsAHH-a05r rVr r}s`r riz$_tgrep_node_action..b r zi@cdfd S)NcBt|jSrc)rLlower)rQrRrSrds r riz6_tgrep_node_action....isA-a06685r rVr )rds`r riz$_tgrep_node_action..irr cdfd S)Nc t|k(Srcrrrss r riz6_tgrep_node_action....orur rVr rvs`r riz$_tgrep_node_action..orwr rV) rr@set_tgrep_node_actionhasattr startswithendswithreplacerecompiler)rXrYrZrnode_lit node_funcs r rrBs ayC 6{QCqt!t %&3%///EKCaC[QT$T4$8QQNPVWW 6!9j )!9  AY# d!21 1 AY ! !# &!9%%c* **ay2..uc:BB64PHU AY ! !# &!9%%c* **ay2Hjj" $ $ AY ! !$ '*2rF1IabM4G4G4I3JKI  VQi ARsFcTt|dk(sJ|ddk(sJ|ddk(sJ|dS)zm Builds a lambda function representing a predicate on a tree node from a parenthetical notation. r(r^)rrWrXrYrZs r _tgrep_parens_actionrssD v;!   !9   !9   !9r c8td|D}d|S)z Builds a lambda function representing a predicate on a tree node which returns true if the node is located at a specific tree position. c3TK|] }|jst|"ywrc)isdigitint)r!rs r r#z._tgrep_nltk_tree_pos_action..sE!s1vEs((cdfd S)NcFt|dxr|jk(S)Nr7)rr7)rQrRrSr*s r riz?_tgrep_nltk_tree_pos_action....s# A~ & @1>>+;q+@r rVr )r*s`r riz-_tgrep_nltk_tree_pos_action..s  r )tuple)rXrYrZnode_tree_positions r _tgrep_nltk_tree_pos_actionr~s)EvEE   r cxd}|ddk(rd}|dd}|ddk(r!t|dk(sJ|d d k(sJ|d}nst|d k(sJ|\}|d k(rdPfd }nQ|d k(rdPfd }nD|dk(s|dk(rdPfd }n2|dk(s|dk(rdPfd }n |dd k(r1|ddjrt|dd}fd|dz }n|dd k(r1|ddjrt|dd}fd|dz }n|dk(s |dk(s|dk(rdPfd }n|dk(s |dk(s|dk(rdPfd }n|dd dk(r/|d djrt|d d }fd|}nI|dd dk(r/|d djrt|d d }fd |}n|d!k(rdPfd" }n|d#k(rdPfd$ }n|d%k(rdPfd& }n|d'k(rdPfd( }n|d)k(s|d*k(rdPfd+ }n|d,k(rdPfd- }n|d.k(rdPfd/ }n|d0k(rdPfd1 }n|d2k(rdPfd3 }n|d4k(rdPfd5 }n|d6k(rdPfd7 }n|d8k(rdPfd9 }n||d:k(rdPfd; }np|dk(s|d?k(rdPfd@ }nS|dAk(s|dBk(rdPfdC }nB|dDk(s|dEk(rdPfdF }n1|dGk(s|dHk(rdPfdI }n |dJk(s|dKk(rdPfdL }ntdM|dN|r dO|S|S)Qz Builds a lambda function representing a predicate on a tree node depending on its relation to other nodes in the tree. Fr!TrN[rr^]...s"A!9Q1#5"Are)r2rgrQrRrSrs ``r riz(_tgrep_relation_action..s As"Aq"AAr >ct|dxr3t|jxr|j||S)Nr rboolr rs r riz(_tgrep_relation_action..s<8$0$0ahhj!Q/r z<,z<1cdt|xr#tt|xr |d||SNrr2rr@rs r riz(_tgrep_relation_action..s. FtDG}F1Q4A1Fr z>,z>1ct|dxrJt|jxr/||jduxr|j||SNr rrrs r riz(_tgrep_relation_action..sU8$0$0!((*Q-'0ahhj!Q/ r cdfd S)Nct|xr>tt|xr(dcxkxrt|kncxr |||Srr2rr@rrQrRrSr*rs r riz:_tgrep_relation_action....sJAJ.T!W .QQ."!A$1- r rVr r*rs`r riz(_tgrep_relation_action.. r cdfd S)Nct|dxrst|jxrXdcxkxrt|jkncxr/||juxr|j||Srrrr rrs r riz:_tgrep_relation_action....ssAx(4QXXZ(4Q0QXXZ04ahhjm+4"!((*a3 r rVr rs`r riz(_tgrep_relation_action.. r z<'z<-z<-1cdt|xr#tt|xr |d||S)Nr?rrs r riz(_tgrep_relation_action..s. GtDG}G1R5!Q1Gr z>'z>-z>-1ct|dxrJt|jxr/||jduxr|j||S)Nr r?rrs r riz(_tgrep_relation_action..sU8$0$0!((*R.(0ahhj!Q/ r cdfd S)Nct|xrVtt|xr@dt|zcxkxrt|kncxr|t|z||Srrrs r riz:_tgrep_relation_action....s\AJ7T!W 7a#a&j2CF27"!AAJ-A6 r rVr rs`r riz(_tgrep_relation_action..rr cdfd S)Ncnt|dxrt|jxrdt|jzcxkxrt|jkncxrI||jt|jzuxr|j||Srrrs r riz:_tgrep_relation_action....sAx(4QXXZ(4a#ahhj/1DS_D4ahhjS_)<==4"!((*a3 r rVr rs`r riz(_tgrep_relation_action..rr z<:cXt|xrt|dk(xr |d||S)Nrr)r2rrs r riz(_tgrep_relation_action..s. Ds1v{Dy1q!/Dr z>:ct|dxrQt|jxr6t|jdk(xr|j||S)Nr rrrs r riz(_tgrep_relation_action..sU8$0$0 Oq(0ahhj!Q/ r z<... s"O!9Q1#5"Ore)r2rgrrs ``r riz(_tgrep_relation_action.. s# Os"O|A"OOr z>>cFtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....3'( !Q"3rergrrs ``r riz(_tgrep_relation_action..ss3,5aL30r z<<,z<<1c`t|xrtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....s"X!9Q1#5"Xre)r2rgr%rs ``r riz(_tgrep_relation_action..s& Xs"X?TUV?W"XXr z>>,cJtfdtDS)Nc3PK|]}|xr t|vywrc)r%r!rrSrRrQrs r r#z;_tgrep_relation_action....s531a#E-B1-E(EE3#&rrs```r riz(_tgrep_relation_action..s3"130r z<<'c`t|xrtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....!sNq !Q*Nre)r2rgr+rs ``r riz(_tgrep_relation_action..s) ON4J14MNNr z>>'cJtfdtDS)Nc3PK|]}|xr t|vywrc)r+rs r r#z;_tgrep_relation_action....%s531a#F-CA-F(FF3rrrs```r riz(_tgrep_relation_action..%rr z<<:c`t|xrtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....,s"V!9Q1#5"Vre)r2rgr4rs ``r riz(_tgrep_relation_action..+s& Vs"V?RST?U"VVr z>>:cFtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....0rre)rgrrs ``r riz(_tgrep_relation_action..0ss3,...5rre)rgrHrs ``r riz(_tgrep_relation_action..5ss3,>q,A30r ,cFtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....:rre)rgrCrs ``r riz(_tgrep_relation_action..:ss3,?,B30r z..cFtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....?rre)rgrErs ``r riz(_tgrep_relation_action..?ss3,21I30r z,,cFtfdt|DS)Nc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....Drre)rgr;rs ``r riz(_tgrep_relation_action..Dss3,3AJ30r $%ctdxr@tjxr%tfdjDS)Nr c3<K|]}|us|ywrcr rs r r#z;_tgrep_relation_action....Ls Mq!1* !Q*Ms )rrr rgrs```r riz(_tgrep_relation_action..Is@8$N$NMAHHJMMr z$.z%.ct|dxr3t|jxr|j||S)N right_sibling)rrrrs r riz(_tgrep_relation_action..Ps>?+7*+7aoo/A6r z$,z%,ct|dxr3t|jxr|j||S)N left_sibling)rrrrs r riz(_tgrep_relation_action..Ws>>*6)*6ann.15r z$..z%..ct|dxrat|dxrSt|jxr8tfd|j|j dzdDS)Nr parent_indexc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....bsWq !Q*Wrerrrr rgrrs ``r riz(_tgrep_relation_action..^sg8$XA~.X$XWAHHJq~~?ORS?S?U4VWW r z$,,z%,,ct|dxr^t|dxrPt|jxr5tfd|jd|j DS)Nr rc32K|]}|ywrcr rs r r#z;_tgrep_relation_action....jsSq !Q*Srerrs ``r riz(_tgrep_relation_action..fsa8$TA~.T$TSAHHJ?QAQ4RSS r z!cannot interpret tgrep operator "rncdfd S)Nc||| Srcr r|s r riz:_tgrep_relation_action....ps1a^r rVr r~s`r riz(_tgrep_relation_action..psCr rV)rrrr)rXrYrZnegatedretvaloperatorrArs @r _tgrep_relation_actionrs G ayC ayC6{aayC6{a$) s?F_F  T!1F T!1Fa[C HQRL$8$8$:hqrl#C AgFa[C HQRL$8$8$:hqrl#CAgF T!1X5FF  T!1X5FFbq\T !hqrl&:&:&<x|$$C Fbq\T !hqrl&:&:&<x|$$CF F F F F (e"3F F  F  F  F F_F_F F F_CF  T!1F  T!1F  (e"3F (e"3F!#DXJa!PQ QDfMM r cn|Dcgc] }||k7s | }}t|dk(r|dSd|Scc}w)a Builds a lambda function representing a predicate on a tree node from the conjunction of several other such lambda functions. This is prototypically called for expressions like (`tgrep_rel_conjunction`):: < NP & < AP < VP where tokens is a list of predicates representing the relations (`< NP`, `< AP`, and `< VP`), possibly with the character `&` included (as in the example here). This is also called for expressions like (`tgrep_node_expr2`):: NP < NN S=s < /NP/=n : s < /VP/=v : n .. v tokens[0] is a tgrep_expr predicate; tokens[1:] are an (optional) list of segmented patterns (`tgrep_expr_labeled`, processed by `_tgrep_segmented_pattern_action`). rrcdfd S)Nc6tfdDS)Nc32K|]}|ywrcr )r!rrSrRrQs r r#zP_tgrep_conjunction_action......s5'0 !Q"5re)r$)rQrRrStss```r riz=_tgrep_conjunction_action....s54652r rVr )rs`r riz+_tgrep_conjunction_action..s r rW)rXrYrZ join_charrs r _tgrep_conjunction_actionrusL0 2A1 >a 2F 2 6{aay     3s 22c,|d|dddfd }|S)a Builds a lambda function representing a segmented pattern. Called for expressions like (`tgrep_expr_labeled`):: =s .. =v < =n This is a segmented pattern, a tgrep2 expression which begins with a node label. The problem is that for segemented_pattern_action (': =v < =s'), the first element (in this case, =v) is specifically selected by virtue of matching a particular node in the tree; to retrieve the node, we need the label, not a lambda function. For node labels inside a tgrep_node_expr, we need a lambda function which returns true if the node visited is the same as =v. We solve this by creating two copies of a node_label_use in the grammar; the label use inside a tgrep_expr_labeled has a separate parse action to the pred use inside a node_expr. See `_tgrep_node_label_use_action` and `_tgrep_node_label_pred_use_action`. rrNcjvrtddtfdDS)z2This predicate function ignores its node argument. node_label = not bound in patternc32K|]}|ywrcr )r!predrSrRrs r r#zP_tgrep_segmented_pattern_action..pattern_segment_pred..s;4a#;re)rr$)rQrRrSr node_label reln_predss ``@r pattern_segment_predz=_tgrep_segmented_pattern_action..pattern_segment_preds@ 9 !+ < |;P!QR R}; ;;;r rVr )rXrYrZrrrs @@r _tgrep_segmented_pattern_actionrs(2JJ< r c^t|dk(sJ|djdsJ|dddS)aU Returns the node label used to begin a tgrep_expr_labeled. See `_tgrep_segmented_pattern_action`. Called for expressions like (`tgrep_node_label_use`):: =s when they appear as the first element of a `tgrep_expr_labeled` expression (see `_tgrep_segmented_pattern_action`). It returns the node label. rr=Nrrrs r _tgrep_node_label_use_actionr s> v;!   !9   $$ $ !9QR=r cpt|dk(sJ|djdsJ|ddddfd }|S)a Builds a lambda function representing a predicate on a tree node which describes the use of a previously bound node label. Called for expressions like (`tgrep_node_label_use_pred`):: =s when they appear inside a tgrep_node_expr (for example, inside a relation). The predicate returns true if and only if its node argument is identical the the node looked up in the node label dictionary using the node's label. rrrNc@||vrtdd|}||uS)NrrrP)rQrRrSrrs r node_label_use_predz>_tgrep_node_label_pred_use_action..node_label_use_preds7 9 !+ < |;P!QR R}Dyr rVr )rXrYrZr rs @r !_tgrep_node_label_pred_use_actionrsK v;!   !9   $$ $12J r ct|dk(r|dSt|dk(sJ|ddk(sJ|d|ddfd }|S)z Builds a lambda function representing a predicate on a tree node which can optionally bind a matching node into the tgrep2 string's label_dict. Called for expressions like (`tgrep_node_expr2`):: /NP/ @NP=n rrrrr^c^|||r"|tdj||<yy)Nz-cannot bind node_label {}: label_dict is NoneTF)rformat)rQrRrSr node_preds r node_label_bind_predz;_tgrep_bind_node_label_action..node_label_bind_predsCAq!9(GNN& !"* r rVrW)rXrYrZrrrs @@r _tgrep_bind_node_label_actionrs` 6{aay6{aayC1I AY  $#r c|Dcgc] }|dk7s | }}t|dk(r|dSt|dk(rd|d|dSycc}w)z Builds a lambda function representing a predicate on a tree node from the disjunction of several other such lambda functions. r_rrr^cdfd S)Nc0|||xs |||Srcr )rQrRrSabs r rizA_tgrep_rel_disjunction_action....#sqAqz7OQq!QZr rVr )rrs``r riz/_tgrep_rel_disjunction_action..#sOr NrW)rXrYrZrs r _tgrep_rel_disjunction_actionrsc  ,A18a ,F , 6{aay V O 1Ivay   -s AAcJt|dk(sJ|ddk(sJ|d|diS)zF Builds a dictionary structure which defines the given macro. rrrNrr^rWrs r _macro_defn_actionr(s; v;!   !9   1Ivay !!r cB tdk(rdfd SDcgc] }|dk7s | c}i}Dcgc]}t|ts|}}|D]}|j|Dcgc]}t|tr|c} |df fd }|Scc}wcc}wcc}w)a This is the top-lebel node in a tgrep2 search string; the predicate function it returns binds together all the state of a tgrep2 search string. Builds a lambda function representing a predicate on a tree node from the disjunction of several tgrep expressions. Also handles macro definitions and macro name binding, and node label definitions and node label binding. rNcd|diSrr )rQrRrSrZs r riz%_tgrep_exprs_action..=s1dB)?r ;c:itfdDS)Nc32K|]}|ywrcr )r!r label_dictrRrQs r r#z>_tgrep_exprs_action..top_level_pred..LsL99Q:.Lrerf)rQrRrSr" tgrep_exprss`` @r top_level_predz+_tgrep_exprs_action..top_level_predIs L LLLr rV)rr-dictupdate) rXrYrZr macro_dicttok macro_defs macro_defr$r#s ` @r _tgrep_exprs_actionr+1s 6{a?? ,A18a ,FJ!'A#:c4+@#AJA% )$%#)F3 30E3FK'$M -BGs! BBB B+BBc F tjdtjdz}tjddd}tjddd}tjd}tjd }tjd }tj}tj}tj d |zd z} tj d tjtj tjdztjtjtj tjdtjdzzzd z} tjd} tjd| z} | j} tjd}|jdtjd|z}| |z| z|z|z|z|zdz|z}|tj djdz| jjdz|z}| tjd|ztjd|zzz}tjddz|zdz}|||zz}tj}||tjtjd|zzz||tjd|zzz||tj|zz| tj|z}|tjd|zz}tj dtjjz|z|z}tj|tjd|zzdz|ztjd||zzztjdjz}|rP| jt | jt"|jt$|jt&|jt(| jt*| jt,|jt.|jt0|jt2|jt4|jt0|jt6|jt9j:t0d|jt<|j?dtj@zS)zj Builds a pyparsing-based parser object for tokenizing and interpreting tgrep search strings. rz[$%,.<>][%,.<>0-9-':]*rnroF) quoteCharescCharunquoteResultsrxzi@\"(?:[^"\n\r\\]|(?:\\.))*\"zi@\/(?:[^/\n\r\\]|(?:\\.))*\/z[^][ ;:.,&|<>()$!@%'^=]+rrzN(r)delimz [A-Za-z0-9]+rz[^];:.,&|<>()[$!@%'^= ]+rNrkr]r_rr&:r)r#)! pyparsingOptionalRegex QuotedStringForwardLiteralWordnums delimitedListCombinecopysetWhitespaceChars ZeroOrMoreWhitesuppresssetParseActionr rr[rrrrrrrrr functoolspartialr+ignore restOfLine)set_parse_actionstgrep_op tgrep_qstringtgrep_node_regextgrep_qstring_icasetgrep_node_regex_icasetgrep_node_literal tgrep_exprtgrep_relations tgrep_parenstgrep_nltk_tree_postgrep_node_labeltgrep_node_label_usetgrep_node_label_use_predrTrUtgrep_node_exprtgrep_node_expr2 tgrep_nodetgrep_bracketstgrep_relationtgrep_rel_conjunctiontgrep_expr_labeled tgrep_expr2 macro_defnr#s r _build_tgrep_parserr`Qs' !!#&9Q)RRH**tEM!--tE$//*QR&__-TU")IJ""$J'')O$$S)J6> *   '' y~~(FcR$$S)*     !~6$,,S3C-CD 4 9 9 ;!ABJ!!"%!!# "23I!       !  !              C 3 3B 7 8    ! 4 4R 8 9   3     s_4 5 6J '',s2_DsJN#x*'<=N%--/   y11#69NN O P,y/C/C o0*y11/BBB- 0B0B?0SSy33C:L4LMMK#!2!;!;!== J[X : (<(? @ A   s # , , . / ++,HI!001RS  !89!!"45''(EF##$89**+FG%%&<=,,-FG&&'DE!!"45 !!";<))*IJ""   73 G  ""#67   cI$8$88 99r ctd}t|tr|j}t |j |S)z? Tokenizes a TGrep search string into separate tokens. Fr`r-bytesdecoder@ parseString tgrep_stringparsers r tgrep_tokenizeris<! 'F,&#**, ""<0 11r ctd}t|tr|j}t |j |ddS)z` Parses (and tokenizes, if necessary) a TGrep search string into a lambda function. T)parseAllrrbrfs r tgrep_compilerlsF ! &F,&#**, ""<$"? @ CCr c|j}t}|D]/}tt|D]}|j |d|1|Dcgc] }||vs| c}Scc}w)zX Returns all the tree positions in the given tree which are not leaf nodes. N)rrr(radd)r/rprefixesr:lengths r treepositions_no_leavesrqso &&(MuH'CHo 'F LLWf & '') ;B=A>>Bc# Kt|ttfr t|}|D]B} |r|j }n t |}|Dcgc]}|||s||c}Dycc}w#t $rgYZwxYww)a Return the tree nodes in the trees which match the given pattern. :param pattern: a tgrep search pattern :type pattern: str or output of tgrep_compile() :param trees: a sequence of NLTK trees (usually ParentedTrees) :type trees: iter(ParentedTree) or iter(Tree) :param search_leaves: whether to return matching leaf nodes :type search_leaves: bool :rtype: iter(tree nodes) Nrsrts r tgrep_nodesr|s'E3<((  ..0 3D9 2;WhwtH~?V4>W W  X H s@'B"A1 A,A,$A1)B,A11 B>BBB)r2)T))rrErr5 ImportErrorprint nltk.treer. Exceptionrrrrr%r+r2r4r;rCrErHrLr[rrrrrrr rrrrr+r`rirlrqrzr|r r r rsdL   Y $" * D Q+  T54 T2:8  .b  dN F' T&:$$N  "@e:P2D =6G MN ,sBBB