K ixdZddlZddlZddlZddlmZddlmZddlZ ddl m Z gdZ e jdddd d  dd Ze jddd  dd Ze jddd  ddZe jdddd d  ddZe j ddZ ddZ ddZe jd ddZe de jd ddddZy)aOFunctions measuring similarity using graph edit distance. The graph edit distance is the number of edge/node changes needed to make two graphs isomorphic. The default algorithm/implementation is sub-optimal for some graphs. The problem of finding the exact Graph Edit Distance (GED) is NP-hard so it is often slow. If the simple interface `graph_edit_distance` takes too long for your graph, try `optimize_graph_edit_distance` and/or `optimize_edit_paths`. At the same time, I encourage capable people to investigate alternative GED algorithms, in order to improve the choices available. N) dataclass)product)np_random_state)graph_edit_distanceoptimal_edit_pathsoptimize_graph_edit_distanceoptimize_edit_pathssimrank_similaritypanther_similaritygenerate_random_paths)G1G2T)graphspreserve_edge_attrspreserve_node_attrsc Pd} t|||||||||| | d| | D]\}}}|} | S)aReturns GED (graph edit distance) between graphs G1 and G2. Graph edit distance is a graph similarity measure analogous to Levenshtein distance for strings. It is defined as minimum cost of edit path (sequence of node and edge edit operations) transforming graph G1 to graph isomorphic to G2. Parameters ---------- G1, G2: graphs The two graphs G1 and G2 must be of the same type. node_match : callable A function that returns True if node n1 in G1 and n2 in G2 should be considered equal during matching. The function will be called like node_match(G1.nodes[n1], G2.nodes[n2]). That is, the function will receive the node attribute dictionaries for n1 and n2 as inputs. Ignored if node_subst_cost is specified. If neither node_match nor node_subst_cost are specified then node attributes are not considered. edge_match : callable A function that returns True if the edge attribute dictionaries for the pair of nodes (u1, v1) in G1 and (u2, v2) in G2 should be considered equal during matching. The function will be called like edge_match(G1[u1][v1], G2[u2][v2]). That is, the function will receive the edge attribute dictionaries of the edges under consideration. Ignored if edge_subst_cost is specified. If neither edge_match nor edge_subst_cost are specified then edge attributes are not considered. node_subst_cost, node_del_cost, node_ins_cost : callable Functions that return the costs of node substitution, node deletion, and node insertion, respectively. The functions will be called like node_subst_cost(G1.nodes[n1], G2.nodes[n2]), node_del_cost(G1.nodes[n1]), node_ins_cost(G2.nodes[n2]). That is, the functions will receive the node attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function node_subst_cost overrides node_match if specified. If neither node_match nor node_subst_cost are specified then default node substitution cost of 0 is used (node attributes are not considered during matching). If node_del_cost is not specified then default node deletion cost of 1 is used. If node_ins_cost is not specified then default node insertion cost of 1 is used. edge_subst_cost, edge_del_cost, edge_ins_cost : callable Functions that return the costs of edge substitution, edge deletion, and edge insertion, respectively. The functions will be called like edge_subst_cost(G1[u1][v1], G2[u2][v2]), edge_del_cost(G1[u1][v1]), edge_ins_cost(G2[u2][v2]). That is, the functions will receive the edge attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function edge_subst_cost overrides edge_match if specified. If neither edge_match nor edge_subst_cost are specified then default edge substitution cost of 0 is used (edge attributes are not considered during matching). If edge_del_cost is not specified then default edge deletion cost of 1 is used. If edge_ins_cost is not specified then default edge insertion cost of 1 is used. roots : 2-tuple Tuple where first element is a node in G1 and the second is a node in G2. These nodes are forced to be matched in the comparison to allow comparison between rooted graphs. upper_bound : numeric Maximum edit distance to consider. Return None if no edit distance under or equal to upper_bound exists. timeout : numeric Maximum number of seconds to execute. After timeout is met, the current best GED is returned. Examples -------- >>> G1 = nx.cycle_graph(6) >>> G2 = nx.wheel_graph(7) >>> nx.graph_edit_distance(G1, G2) 7.0 >>> G1 = nx.star_graph(5) >>> G2 = nx.star_graph(5) >>> nx.graph_edit_distance(G1, G2, roots=(0, 0)) 0.0 >>> nx.graph_edit_distance(G1, G2, roots=(1, 0)) 8.0 See Also -------- optimal_edit_paths, optimize_graph_edit_distance, is_isomorphic: test for graph edit distance of 0 References ---------- .. [1] Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, Patrick Martineau. An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems. 4th International Conference on Pattern Recognition Applications and Methods 2015, Jan 2015, Lisbon, Portugal. 2015, <10.5220/0005209202710278>. https://hal.archives-ouvertes.fr/hal-01168816 NTr )rr node_match edge_matchnode_subst_cost node_del_cost node_ins_costedge_subst_cost edge_del_cost edge_ins_costroots upper_boundtimeoutbestcost_costs d/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/networkx/algorithms/similarity.pyrr$s\pH)     1d"#$ O)rc g} d} t|||||||||| | d D]$\} }}| || krg} | j| |f|} &| | fS)a*Returns all minimum-cost edit paths transforming G1 to G2. Graph edit path is a sequence of node and edge edit operations transforming graph G1 to graph isomorphic to G2. Edit operations include substitutions, deletions, and insertions. Parameters ---------- G1, G2: graphs The two graphs G1 and G2 must be of the same type. node_match : callable A function that returns True if node n1 in G1 and n2 in G2 should be considered equal during matching. The function will be called like node_match(G1.nodes[n1], G2.nodes[n2]). That is, the function will receive the node attribute dictionaries for n1 and n2 as inputs. Ignored if node_subst_cost is specified. If neither node_match nor node_subst_cost are specified then node attributes are not considered. edge_match : callable A function that returns True if the edge attribute dictionaries for the pair of nodes (u1, v1) in G1 and (u2, v2) in G2 should be considered equal during matching. The function will be called like edge_match(G1[u1][v1], G2[u2][v2]). That is, the function will receive the edge attribute dictionaries of the edges under consideration. Ignored if edge_subst_cost is specified. If neither edge_match nor edge_subst_cost are specified then edge attributes are not considered. node_subst_cost, node_del_cost, node_ins_cost : callable Functions that return the costs of node substitution, node deletion, and node insertion, respectively. The functions will be called like node_subst_cost(G1.nodes[n1], G2.nodes[n2]), node_del_cost(G1.nodes[n1]), node_ins_cost(G2.nodes[n2]). That is, the functions will receive the node attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function node_subst_cost overrides node_match if specified. If neither node_match nor node_subst_cost are specified then default node substitution cost of 0 is used (node attributes are not considered during matching). If node_del_cost is not specified then default node deletion cost of 1 is used. If node_ins_cost is not specified then default node insertion cost of 1 is used. edge_subst_cost, edge_del_cost, edge_ins_cost : callable Functions that return the costs of edge substitution, edge deletion, and edge insertion, respectively. The functions will be called like edge_subst_cost(G1[u1][v1], G2[u2][v2]), edge_del_cost(G1[u1][v1]), edge_ins_cost(G2[u2][v2]). That is, the functions will receive the edge attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function edge_subst_cost overrides edge_match if specified. If neither edge_match nor edge_subst_cost are specified then default edge substitution cost of 0 is used (edge attributes are not considered during matching). If edge_del_cost is not specified then default edge deletion cost of 1 is used. If edge_ins_cost is not specified then default edge insertion cost of 1 is used. upper_bound : numeric Maximum edit distance to consider. Returns ------- edit_paths : list of tuples (node_edit_path, edge_edit_path) - node_edit_path : list of tuples ``(u, v)`` indicating node transformations between `G1` and `G2`. ``u`` is `None` for insertion, ``v`` is `None` for deletion. - edge_edit_path : list of tuples ``((u1, v1), (u2, v2))`` indicating edge transformations between `G1` and `G2`. ``(None, (u2,v2))`` for insertion and ``((u1,v1), None)`` for deletion. cost : numeric Optimal edit path cost (graph edit distance). When the cost is zero, it indicates that `G1` and `G2` are isomorphic. Examples -------- >>> G1 = nx.cycle_graph(4) >>> G2 = nx.wheel_graph(5) >>> paths, cost = nx.optimal_edit_paths(G1, G2) >>> len(paths) 40 >>> cost 5.0 Notes ----- To transform `G1` into a graph isomorphic to `G2`, apply the node and edge edits in the returned ``edit_paths``. In the case of isomorphic graphs, the cost is zero, and the paths represent different isomorphic mappings (isomorphisms). That is, the edits involve renaming nodes and edges to match the structure of `G2`. See Also -------- graph_edit_distance, optimize_edit_paths References ---------- .. [1] Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, Patrick Martineau. An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems. 4th International Conference on Pattern Recognition Applications and Methods 2015, Jan 2015, Lisbon, Portugal. 2015, <10.5220/0005209202710278>. https://hal.archives-ouvertes.fr/hal-01168816 NF)r append)rrrrrrrrrrrpathsr vertex_path edge_pathr"s r#rrsp EH(;    )$ Y  D8OE k9-.%& (?r$c #RKt|||||||||| | d D] \} } } |  yw)aReturns consecutive approximations of GED (graph edit distance) between graphs G1 and G2. Graph edit distance is a graph similarity measure analogous to Levenshtein distance for strings. It is defined as minimum cost of edit path (sequence of node and edge edit operations) transforming graph G1 to graph isomorphic to G2. Parameters ---------- G1, G2: graphs The two graphs G1 and G2 must be of the same type. node_match : callable A function that returns True if node n1 in G1 and n2 in G2 should be considered equal during matching. The function will be called like node_match(G1.nodes[n1], G2.nodes[n2]). That is, the function will receive the node attribute dictionaries for n1 and n2 as inputs. Ignored if node_subst_cost is specified. If neither node_match nor node_subst_cost are specified then node attributes are not considered. edge_match : callable A function that returns True if the edge attribute dictionaries for the pair of nodes (u1, v1) in G1 and (u2, v2) in G2 should be considered equal during matching. The function will be called like edge_match(G1[u1][v1], G2[u2][v2]). That is, the function will receive the edge attribute dictionaries of the edges under consideration. Ignored if edge_subst_cost is specified. If neither edge_match nor edge_subst_cost are specified then edge attributes are not considered. node_subst_cost, node_del_cost, node_ins_cost : callable Functions that return the costs of node substitution, node deletion, and node insertion, respectively. The functions will be called like node_subst_cost(G1.nodes[n1], G2.nodes[n2]), node_del_cost(G1.nodes[n1]), node_ins_cost(G2.nodes[n2]). That is, the functions will receive the node attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function node_subst_cost overrides node_match if specified. If neither node_match nor node_subst_cost are specified then default node substitution cost of 0 is used (node attributes are not considered during matching). If node_del_cost is not specified then default node deletion cost of 1 is used. If node_ins_cost is not specified then default node insertion cost of 1 is used. edge_subst_cost, edge_del_cost, edge_ins_cost : callable Functions that return the costs of edge substitution, edge deletion, and edge insertion, respectively. The functions will be called like edge_subst_cost(G1[u1][v1], G2[u2][v2]), edge_del_cost(G1[u1][v1]), edge_ins_cost(G2[u2][v2]). That is, the functions will receive the edge attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function edge_subst_cost overrides edge_match if specified. If neither edge_match nor edge_subst_cost are specified then default edge substitution cost of 0 is used (edge attributes are not considered during matching). If edge_del_cost is not specified then default edge deletion cost of 1 is used. If edge_ins_cost is not specified then default edge insertion cost of 1 is used. upper_bound : numeric Maximum edit distance to consider. Returns ------- Generator of consecutive approximations of graph edit distance. Examples -------- >>> G1 = nx.cycle_graph(6) >>> G2 = nx.wheel_graph(7) >>> for v in nx.optimize_graph_edit_distance(G1, G2): ... minv = v >>> minv 7.0 See Also -------- graph_edit_distance, optimize_edit_paths References ---------- .. [1] Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, Patrick Martineau. An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems. 4th International Conference on Pattern Recognition Applications and Methods 2015, Jan 2015, Lisbon, Portugal. 2015, <10.5220/0005209202710278>. https://hal.archives-ouvertes.fr/hal-01168816 TNr) rrrrrrrrrrrr!r"s r#rrsNL*     1d s%'c#B &'()*+,-./01234Kddl.ddl3tGdd&&3fd+d'd0d2d&'*+.fd ,+0fd 1&+,/012fd (()-/fd )tj}tj}d}| rD| \}}||vs||vrt j d |j||j|t|}t|}.j||z||zf}|r.j|Dcgc],}|D]%}|j|j|'.c}}j|||d|d|f<| r|jj}n|r.j|Dcgc]8}|D]1}dt|j|j|z 3:c}}j|||d|d|f<| r(d|jjz }n |r#|Dcgc]}|j|}}ndgt|z}|r#|Dcgc]}|j|}}ndgt|z}|d|d|fjt|zt|zdz*.jt|Dcgc]}t|D]}||k(r||n* c}}j|||d||||zf<.jt|Dcgc]}t|D]}||k(r||n* c}}j||||||zd|f<+|||}tj}tj}t|}t|}.j||z||zf}|rc.j|D cgc],}|D]%} |j|j| '.c} }j|||d|d|f<nr|ro.j|D cgc]8}|D]1} dt|j|j| z 3:c} }j|||d|d|f<n |r#|Dcgc]}|j|}}ndgt|z}| r#|D cgc]} | j| }} ndgt|z}|d|d|fjt|zt|zdz*.jt|Dcgc]}t|D]}||k(r||n* c}}j|||d||||zf<.jt|Dcgc]}t|D]}||k(r||n* c}}j||||||zd|f<+|||}!|j j|!j jzdz- . dkrt j"dt%j&4-4 fd/| gn| g}")|"|||g|||!| D](\}#}$}%t|#t|$t)|%f*ycc}}wcc}}wcc}wcc}wcc}}wcc}}wcc} }wcc} }wcc}wcc} wcc}}wcc}}ww)aGED (graph edit distance) calculation: advanced interface. Graph edit path is a sequence of node and edge edit operations transforming graph G1 to graph isomorphic to G2. Edit operations include substitutions, deletions, and insertions. Graph edit distance is defined as minimum cost of edit path. Parameters ---------- G1, G2: graphs The two graphs G1 and G2 must be of the same type. node_match : callable A function that returns True if node n1 in G1 and n2 in G2 should be considered equal during matching. The function will be called like node_match(G1.nodes[n1], G2.nodes[n2]). That is, the function will receive the node attribute dictionaries for n1 and n2 as inputs. Ignored if node_subst_cost is specified. If neither node_match nor node_subst_cost are specified then node attributes are not considered. edge_match : callable A function that returns True if the edge attribute dictionaries for the pair of nodes (u1, v1) in G1 and (u2, v2) in G2 should be considered equal during matching. The function will be called like edge_match(G1[u1][v1], G2[u2][v2]). That is, the function will receive the edge attribute dictionaries of the edges under consideration. Ignored if edge_subst_cost is specified. If neither edge_match nor edge_subst_cost are specified then edge attributes are not considered. node_subst_cost, node_del_cost, node_ins_cost : callable Functions that return the costs of node substitution, node deletion, and node insertion, respectively. The functions will be called like node_subst_cost(G1.nodes[n1], G2.nodes[n2]), node_del_cost(G1.nodes[n1]), node_ins_cost(G2.nodes[n2]). That is, the functions will receive the node attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function node_subst_cost overrides node_match if specified. If neither node_match nor node_subst_cost are specified then default node substitution cost of 0 is used (node attributes are not considered during matching). If node_del_cost is not specified then default node deletion cost of 1 is used. If node_ins_cost is not specified then default node insertion cost of 1 is used. edge_subst_cost, edge_del_cost, edge_ins_cost : callable Functions that return the costs of edge substitution, edge deletion, and edge insertion, respectively. The functions will be called like edge_subst_cost(G1[u1][v1], G2[u2][v2]), edge_del_cost(G1[u1][v1]), edge_ins_cost(G2[u2][v2]). That is, the functions will receive the edge attribute dictionaries as inputs. The functions are expected to return positive numeric values. Function edge_subst_cost overrides edge_match if specified. If neither edge_match nor edge_subst_cost are specified then default edge substitution cost of 0 is used (edge attributes are not considered during matching). If edge_del_cost is not specified then default edge deletion cost of 1 is used. If edge_ins_cost is not specified then default edge insertion cost of 1 is used. upper_bound : numeric Maximum edit distance to consider. strictly_decreasing : bool If True, return consecutive approximations of strictly decreasing cost. Otherwise, return all edit paths of cost less than or equal to the previous minimum cost. roots : 2-tuple Tuple where first element is a node in G1 and the second is a node in G2. These nodes are forced to be matched in the comparison to allow comparison between rooted graphs. timeout : numeric Maximum number of seconds to execute. After timeout is met, the current best GED is returned. Returns ------- Generator of tuples (node_edit_path, edge_edit_path, cost) node_edit_path : list of tuples (u, v) edge_edit_path : list of tuples ((u1, v1), (u2, v2)) cost : numeric See Also -------- graph_edit_distance, optimize_graph_edit_distance, optimal_edit_paths References ---------- .. [1] Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, Patrick Martineau. An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems. 4th International Conference on Pattern Recognition Applications and Methods 2015, Jan 2015, Lisbon, Portugal. 2015, <10.5220/0005209202710278>. https://hal.archives-ouvertes.fr/hal-01168816 rNc6eZdZUded<ded<ded<ded<y)'optimize_edit_paths..CostMatrix.C lsa_row_ind lsa_col_indlsN)__name__ __module__ __qualname____annotations__r$r# CostMatrixr-s  r$r7cjj|\}}||k||kz}||k\||k\z}|||z||<|||z||<||||||fjSN)optimizelinear_sum_assignmentsum) r.mnr/r0is_substis_dummyr7sps r#make_CostMatrixz,optimize_edit_paths..make_CostMatrixs#%;;#D#DQ#G [  !O a81$)9:!,H 5 9 H +H 5 9 H {K; +C)D)H)H)J  r$ct||zDcgc]}||vxs||z |v}}t||zDcgc]}||vxs||z |v}}||ddfdd|fScc}wcc}wr9ranger.ijr=r>krow_indcol_inds r# extract_Cz&optimize_edit_paths..extract_Cs16q1u>A16'QUaZ'>>16q1u>A16'QUaZ'>>!}QZ((?> AA"ct||zDcgc]}||vxr||z |v}}t||zDcgc]}||vxr||z |v}}||ddfdd|fScc}wcc}wr9rDrFs r#reduce_Cz%optimize_edit_paths..reduce_Cs:?A,GQ1A:0!a%q.0GG:?A,GQ1A:0!a%q.0GG!}QZ((HGrMcz||Dcgc]}||vc}}t|D]}|||k\xxdzcc<|Scc}w)Nr )set)indrGrIrinds r# reduce_indz'optimize_edit_paths..reduce_indsJ,1AQJ,-Q !A Oq O ! -s 8c t}t}|t|dk(rg}g} nxt|D cgc]'  ddfk(st fd|Dr )}} t|D cgc]'  ddfk(st fd|Dr )} } t|} t| } | s| rQ|j|| ||}t |D]\}  ddt | D]\}  ddt j st j rtfd|DrPtfd|Drhfk(stfd|Drfk(stfd |Dr|||f<|| | }t|j|jDcgc]3\}}|| ks|| kr$|| kr||n|| |z|| kr| |n|||zf5}}}||fSg}jd ggd}||fScc} wcc} wcc}}w) a Parameters: u, v: matched vertices, u=None or v=None for deletion/insertion pending_g, pending_h: lists of edges not yet mapped Ce: CostMatrix of pending edge mappings matched_uv: partial vertex edit path list of tuples (u, v) of previously matched vertex mappings u<->v, u=None or v=None for deletion/insertion Returns: list of (i, j): indices of edge mappings g<->h localCe: local CostMatrix of edge mappings (basically submatrix of Ce at cross of rows i, cols j) Nrc3JK|]\}}dd|f|f||ffvywNrVr6).0pqrG pending_gus r# z;optimize_edit_paths..match_edges..=EIQIaL!$!Q!Q!Q(@@ #c3JK|]\}}dd|f|f||ffvywrXr6)rYrZr[rH pending_hvs r#r^z;optimize_edit_paths..match_edges..r_r`c3hK|])\}}|fk(xr|fk(xs|fk(xr|fk(+ywr9r6rYrZr[ghr]rcs r#r^z;optimize_edit_paths..match_edges..sR $1!QK7A!QKV1A;;V1QRTUPV;Vs/2c3PK|]\}}|f|ffvxr |f|ffvywr9r6res r#r^z;optimize_edit_paths..match_edges..!sI $11a&1a&!11KaQFQF;K6KKs#&c32K|]\}}||fk(ywr9r6)rYrZr[rfs r#r^z;optimize_edit_paths..match_edges..&)M$!Q!1v+)Mc32K|]\}}||fk(ywr9r6)rYrZr[rgs r#r^z;optimize_edit_paths..match_edges..(rjrk)rr) lenrEanyr. enumeratenx is_directedzipr/r0empty)r]rcr\rbCe matched_uvMNg_indh_indrGrHr=r>r.rIllocalCeijrfrgr7rrrLinfrBnps```` `` @@r# match_edgesz(optimize_edit_paths..match_edgess"  N  N  ZA!5EEqQ<#1v-MWEqQ<#1v-MWE J J "$$ua3A "%( "1aL!$%e,"DAq!! Ra(A~~b)R^^B-?(2%(2%QF{c)M*)M&M QF{c)M*)M&M !AadG%" ",&aA.G   3 3W5H5HI  Aqq5AE !"AE!H1uQx< !AE!H1uQx<B7{B &!12r1=G7{}Rs,H9?,H>8Ic t|rWt|\}}tfd|Dz }tfd|Dz } |j||||S|S)Nc3.K|] }|ks dywr Nr6)rYtr=s r#r^z9optimize_edit_paths..reduce_Ce..?0!a%!0 c3.K|] }|ks dywrr6)rYrr>s r#r^z9optimize_edit_paths..reduce_Ce..@rr)rmrrr<r.) rtr|r=r>rGrHm_in_jrBrOs `` r# reduce_Cez&optimize_edit_paths..reduce_Ce<sf r78DAqc0Q000Cc0Q000C"8BDD!Q1#=sCH H r$c 3rKt|t|tfdt|j|jD\}} |kr||nd| kr|| nd||||\} } || t|t|} ||j z| j z| j zrn|j |f| f|j|| zf|j| |zf|j |j || fz } || f| | | |j || f| j zfg}|| ckrfdtzD}nfdtzD}|D]\}} ||j || fz|j zr/|j |f| f|krdz n| krdz n} ||j || fz| j z|j zr|kr||nd| kr|| nd||||\} } ||j || fz| j z| j zr|| t|t|} ||j || fz| j z| j z| j zrQ|j|| f| | | |j || f| j zft|dEd{y7w)a Parameters: matched_uv: partial vertex edit path list of tuples (u, v) of vertex mappings u<->v, u=None or v=None for deletion/insertion pending_u, pending_v: lists of vertices not yet mapped Cv: CostMatrix of pending vertex mappings pending_g, pending_h: lists of edges not yet mapped Ce: CostMatrix of pending edge mappings matched_cost: cost of partial edit path Returns: sequence of (i, j): indices of vertex mapping u<->v Cv_ij: reduced CostMatrix of pending vertex mappings (basically Cv with row i, col j removed) list of (x, y): indices of edge mappings g<->h Ce_xy: reduced CostMatrix of pending edge mappings (basically Ce with rows x, cols y removed) cost: total cost of edit operation NOTE: most promising ops first c3BK|]\}}|ks|ks||fywr9r6)rYrIrzr=r>s r#r^z.get_edit_ops..bs+ q!a!eqSTuQF s Nc3JK|]}|k7r|ks|zk(r|fywr9r6)rYrfixed_ifixed_jr=s r#r^z.get_edit_ops..s5G r`c3JK|]}|k7r|ks|zk(r|fywr9r6)rYrrrr>s r#r^z.get_edit_ops..s5! r`r cL|d|djz|djzS)Nr )r1)rs r#z;optimize_edit_paths..get_edit_ops..s#qtadgg~!/Gr$)key) rmminrrr/r0r1r.rEr&sorted)ru pending_u pending_vCvr\rbrt matched_costrGrHxyr{Ce_xyCv_ijother candidatesrrr=r>r7rBrprunerOrrTs @@@@r# get_edit_opsz)optimize_edit_paths..get_edit_opsDsC2  N  N "2>>2>>B  1"EIaLtEIaLt      G"b#i.#i.A % 2UXX= > taT1a02>>Aq1u:62>>Aq1u:6QT " E a&%UBDDAJ,CC Ca 6q1uJ q1uJ  NDAq\BDDAJ.67#taT1a0QAAQAAE \BDDAJ.9BEEAB% !A ! 4 !A ! 4 KB\BDDAJ.9GJJFGb"c)nc)nEE\BDDAJ.9GJJFQR LL1a&%UBDDAJ4KL M3 N6%%GHHHsL(L7/L50L7c 3ZK!||jz|jzrytt|t|st | |||fy||||||||} | D]0\} } } } }| \}}!||z| jz| jzr5|t|kr|j |nd}|t|kr|j |nd}|j ||f| D]B\}}t|}t|}|j ||kr||nd||kr||ndfDt d| D}t d| D}t|Dcgc]#}|t|kr|j |nd%}}t|Dcgc]#}|t|kr|j |nd%}}|||| |||| ||z Ed{||j||||j|||j t|t|D]\}}| |j||t|t|D]\}}| |j||| D]}|j 3ycc}wcc}w7ŭw)a Parameters: matched_uv: partial vertex edit path list of tuples (u, v) of vertex mappings u<->v, u=None or v=None for deletion/insertion pending_u, pending_v: lists of vertices not yet mapped Cv: CostMatrix of pending vertex mappings matched_gh: partial edge edit path list of tuples (g, h) of edge mappings g<->h, g=None or h=None for deletion/insertion pending_g, pending_h: lists of edges not yet mapped Ce: CostMatrix of pending edge mappings matched_cost: cost of partial edit path Returns: sequence of (vertex_path, edge_path, cost) vertex_path: complete vertex edit path list of tuples (u, v) of vertex mappings u<->v, u=None or v=None for deletion/insertion edge_path: complete edge edit path list of tuples (g, h) of edge mappings g<->h, g=None or h=None for deletion/insertion cost: total cost of edit path NOTE: path costs are non-increasing Nc3&K|] \}}| ywr9r6rYxys r#r^z>optimize_edit_paths..get_edit_paths.. 2tq! 2c3&K|] \}}| ywr9r6rs r#r^z>optimize_edit_paths..get_edit_paths..rr) r1maxrmrpopr&rreversedinsertrr)"rurrr matched_ghr\rbrtredit_opsr|rrr edit_costrGrHr]rcrrlen_glen_hsortedxsortedyGHrfrgr!rget_edit_paths maxcost_valuers" r#rz+optimize_edit_paths..get_edit_pathssH %- . 3y>3y>2  |1J+0:J+zRoot node not in graph.r z$Timeout value must be greater than 0cltjz kDry|kDry|kDryr|k\ryy)NTF)time perf_counter)r"rstartstrictly_decreasingrrs r#rz"optimize_edit_paths..prunesN    "U*W4  "k! -  4=#8r$r9)numpyscipyrlistnodesrp NodeNotFoundremovermzerosarrayreshapeintr<rEedgesr. NetworkXErrorrrfloat)5rrrrrrrrrrrrrrrr initial_costroot_uroot_vr=r>r.r]rc del_costs ins_costsrGrHrr\rbrfrgrtdone_uvr(r)r"r7rLrrr}rBrrr~rrOrrTrArs5`` `` ` @@@@@@@@@@@@@@@r#r r s^n  &) ) ZZxaIaIFr%lRXXIRXXIL   "fI&=//";< <    IA IA !a%Q Ahh# "   RXXa[9 9  '!Q- !A#qs(  *288F+;RXXf=MNL hh# " C 288A; <== =  '!Q- !A#qs(  z"((6*:BHHV$ Y;i%+== >_  D D N N  D D N NsD\1[ A\#=[! A\1[' \%[,A\ #[1 =\#[7 #B\<1[= -4\!=\ )\\ #\;\A\6#\ =\#\ 9D&\c @ddl}t|}|/||vrtjd|d|j |}nd}|/||vrtjd|d|j |} nd} t ||| |||} t | |jrs| jdk(r#tt|| jSt|| jD cic]\} } | tt|| c} } St| Scc} } w)a Returns the SimRank similarity of nodes in the graph ``G``. SimRank is a similarity metric that says "two objects are considered to be similar if they are referenced by similar objects." [1]_. The pseudo-code definition from the paper is:: def simrank(G, u, v): in_neighbors_u = G.predecessors(u) in_neighbors_v = G.predecessors(v) scale = C / (len(in_neighbors_u) * len(in_neighbors_v)) return scale * sum( simrank(G, w, x) for w, x in product(in_neighbors_u, in_neighbors_v) ) where ``G`` is the graph, ``u`` is the source, ``v`` is the target, and ``C`` is a float decay or importance factor between 0 and 1. The SimRank algorithm for determining node similarity is defined in [2]_. Parameters ---------- G : NetworkX graph A NetworkX graph source : node If this is specified, the returned dictionary maps each node ``v`` in the graph to the similarity between ``source`` and ``v``. target : node If both ``source`` and ``target`` are specified, the similarity value between ``source`` and ``target`` is returned. If ``target`` is specified but ``source`` is not, this argument is ignored. importance_factor : float The relative importance of indirect neighbors with respect to direct neighbors. max_iterations : integer Maximum number of iterations. tolerance : float Error tolerance used to check convergence. When an iteration of the algorithm finds that no similarity value changes more than this amount, the algorithm halts. Returns ------- similarity : dictionary or float If ``source`` and ``target`` are both ``None``, this returns a dictionary of dictionaries, where keys are node pairs and value are similarity of the pair of nodes. If ``source`` is not ``None`` but ``target`` is, this returns a dictionary mapping node to the similarity of ``source`` and that node. If neither ``source`` nor ``target`` is ``None``, this returns the similarity value for the given pair of nodes. Raises ------ ExceededMaxIterations If the algorithm does not converge within ``max_iterations``. NodeNotFound If either ``source`` or ``target`` is not in `G`. Examples -------- >>> G = nx.cycle_graph(2) >>> nx.simrank_similarity(G) {0: {0: 1.0, 1: 0.0}, 1: {0: 0.0, 1: 1.0}} >>> nx.simrank_similarity(G, source=0) {0: 1.0, 1: 0.0} >>> nx.simrank_similarity(G, source=0, target=0) 1.0 The result of this function can be converted to a numpy array representing the SimRank matrix by using the node order of the graph to determine which row and column represent each node. Other ordering of nodes is also possible. >>> import numpy as np >>> sim = nx.simrank_similarity(G) >>> np.array([[sim[u][v] for v in G] for u in G]) array([[1., 0.], [0., 1.]]) >>> sim_1d = nx.simrank_similarity(G, source=0) >>> np.array([sim[0][v] for v in G]) array([1., 0.]) References ---------- .. [1] https://en.wikipedia.org/wiki/SimRank .. [2] G. Jeh and J. Widom. "SimRank: a measure of structural-context similarity", In KDD'02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538--543. ACM Press, 2002. rN Source node not in Gz Target node r ) rrrprindex_simrank_similarity_numpy isinstancendarrayndimdictrrtolistr) rsourcetargetimportance_factormax_iterations tolerancer~nodelists_indxt_indxrr]rows r#r r sbAwH   !//L "BC C^^F+F   !//L "BC C^^F+F! 66,ni A!RZZ 66Q;Aqxxz*+ +36q!((*3EFC4As $$FF 8OGs+ Dc0 |Dcic]}||Dcic] }|||k(rdndc}c}}fd |jr |jn |j fd}t|D][} } |Dcic]"}||Dcic]}|||k7r |||ndc}$c}}t fd| j D} | s[n dz|k(rt jd|d| |||S||SScc}wcc}}wcc}wcc}}w)aReturns the SimRank similarity of nodes in the graph ``G``. This pure Python version is provided for pedagogical purposes. Examples -------- >>> G = nx.cycle_graph(2) >>> nx.similarity._simrank_similarity_python(G) {0: {0: 1, 1: 0.0}, 1: {0: 0.0, 1: 1}} >>> nx.similarity._simrank_similarity_python(G, source=0) {0: 1, 1: 0.0} >>> nx.similarity._simrank_similarity_python(G, source=0, target=0) 1 r rcL|r tfd|Dt|z SdS)Nc34K|]\}}||ywr9r6)rYwrnewsims r#r^z>_simrank_similarity_python..avg_sim..Ws0FQ6!9Q<0sg)r<rm)srs r#avg_simz+_simrank_similarity_python..avg_simVs$=>s0a003q69GCGr$c Ltt||zSr9)rr)r]rcGadjrrs r#simz'_simrank_similarity_python..sim[s' 74Qa0I+J#KKKr$c3nK|]+\}tfd|jD-yw)c3pK|]-\}}t||z dt|zzk/ywr)abs)rYrcoldrrr]s r#r^z7_simrank_similarity_python...bsAAsF1IaL3&'9CH +EEs36N)allitems)rYnbrsr]rrs @r#r^z-_simrank_similarity_python..as7 4 "jjl  s15simrank did not converge after iterations.)rqpredadjrErrrpExceededMaxIterations)rrrrrrr]rcritsoldsimis_closerrrs ` ` @@@r#_simrank_similarity_pythonr;s=.>? ?a3A!!q&Qa'33 ?FH]]_166!%%DL^$ IJKA!Q?aa1fQ!3??K "<<>       Qw. &&-n-=\ J  f0f~f%% f~ ME4 ?@Ks- DDD8 DD DD Dcddl}tj|}|j|j d}d||dk(<||z}|j t ||j} t|D]O} | j} ||j| z|zz} |j| d|j| | |sOn dz|k(rtjd|d ||t| ||fS|| |S| S) a_Calculate SimRank of nodes in ``G`` using matrices with ``numpy``. The SimRank algorithm for determining node similarity is defined in [1]_. Parameters ---------- G : NetworkX graph A NetworkX graph source : node If this is specified, the returned dictionary maps each node ``v`` in the graph to the similarity between ``source`` and ``v``. target : node If both ``source`` and ``target`` are specified, the similarity value between ``source`` and ``target`` is returned. If ``target`` is specified but ``source`` is not, this argument is ignored. importance_factor : float The relative importance of indirect neighbors with respect to direct neighbors. max_iterations : integer Maximum number of iterations. tolerance : float Error tolerance used to check convergence. When an iteration of the algorithm finds that no similarity value changes more than this amount, the algorithm halts. Returns ------- similarity : numpy array or float If ``source`` and ``target`` are both ``None``, this returns a 2D array containing SimRank scores of the nodes. If ``source`` is not ``None`` but ``target`` is, this returns an 1D array containing SimRank scores of ``source`` and that node. If neither ``source`` nor ``target`` is ``None``, this returns the similarity value for the given pair of nodes. Examples -------- >>> G = nx.cycle_graph(2) >>> nx.similarity._simrank_similarity_numpy(G) array([[1., 0.], [0., 1.]]) >>> nx.similarity._simrank_similarity_numpy(G, source=0) array([1., 0.]) >>> nx.similarity._simrank_similarity_numpy(G, source=0, target=0) 1.0 References ---------- .. [1] G. Jeh and J. Widom. "SimRank: a measure of structural-context similarity", In KDD'02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538--543. ACM Press, 2002. rNaxisr )dtype?)atolrr)rrpto_numpy_arrayrr<eyermfloat64rEcopyT fill_diagonalallcloserr) rrrrrrr~adjacency_matrixrrrprevsims r#rrws.^((+ !%%1%-.AAa1fI VVCF"**V -F^$++-"'7'9'9G'CGW&WX % ;;wY; 7   Qw. &&-n-=\ J  f0VFFN+,, f~ Mr$weight) edge_attrsc ddl}||vrtjd|dttj|} || vrtj d|d|j |jD cgc] } | | vs|  c} j}|j} | |krtjd| d|d | }|"|jd |jz }t|jD cic]\} } | |  }} } |j|}t!j"|d }t%||d zz |j'|d z|j)d |z zz}i}t+t-||||| }|j/| }d |z }t||}|j1D]*\} }|j3|}t5||z||| <,|j7|| | d}||j9||ddd}t;t=||j?||j?}|jA|d|Scc} wcc} } w)u| Returns the Panther similarity of nodes in the graph `G` to node ``v``. Panther is a similarity metric that says "two objects are considered to be similar if they frequently appear on the same paths." [1]_. Parameters ---------- G : NetworkX graph A NetworkX graph source : node Source node for which to find the top `k` similar other nodes k : int (default = 5) The number of most similar nodes to return. path_length : int (default = 5) How long the randomly generated paths should be (``T`` in [1]_) c : float (default = 0.5) A universal positive constant used to scale the number of sample random paths to generate. delta : float (default = 0.1) The probability that the similarity $S$ is not an epsilon-approximation to (R, phi), where $R$ is the number of random paths and $\phi$ is the probability that an element sampled from a set $A \subseteq D$, where $D$ is the domain. eps : float or None (default = None) The error bound. Per [1]_, a good value is ``sqrt(1/|E|)``. Therefore, if no value is provided, the recommended computed value will be used. weight : string or None, optional (default="weight") The name of an edge attribute that holds the numerical value used as a weight. If None then each edge has weight 1. Returns ------- similarity : dictionary Dictionary of nodes to similarity scores (as floats). Note: the self-similarity (i.e., ``v``) will not be included in the returned dictionary. So, for ``k = 5``, a dictionary of top 4 nodes and their similarity scores will be returned. Raises ------ NetworkXUnfeasible If `source` is an isolated node. NodeNotFound If `source` is not in `G`. Notes ----- The isolated nodes in `G` are ignored. Examples -------- >>> G = nx.star_graph(10) >>> sim = nx.panther_similarity(G, 0) References ---------- .. [1] Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., & Li, J. Panther: Fast top-k similarity search on large networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. 2015-August, pp. 1445–1454). Association for Computing Machinery. https://doi.org/10.1145/2783258.2783267. rNrrz?Panther similarity is not defined for the isolated source node .zNumber of nodes is z, but requested k is z. Setting k to number of nodes.rrVr ) path_length index_mapr )!rrprrQisolatesNetworkXUnfeasiblesubgraphrrnumber_of_nodeswarningswarnsqrtnumber_of_edgesrormathcombrlog2logrr rr intersectionrm argpartitionargsortrrrrr)rrrIrcdeltaepsr r~rnode num_nodesrname inv_node_mapnode_map t_choose_2 sample_sizerr!Sinv_sample_size source_pathsr' common_pathstop_k_unsorted top_k_sortedtop_k_with_vals r#r r sD Qoo VHI>??2;;q>"H ##MfXUV W   QWWETH0DDEFKKMA!!#I1} !),A!E, ,   {ggcA--//03"?@2FL H\ " ) ) +Q|_-C-C-EFN vt$ qFGs6 I5I5 I:)rc#Kddl}t||jjr |jn |j }t j||} |j| jdjdd} | | z} t|} |j} t|D]}||| }| |}n0|| vrt jd|d|}| j|}|g}|||vr||j!|n|h||<|}t|D]R}|j#| | | }|}| |}|j%||4||vr||j!|M|h||<T|yw) u| Randomly generate `sample_size` paths of length `path_length`. Parameters ---------- G : NetworkX graph A NetworkX graph sample_size : integer The number of paths to generate. This is ``R`` in [1]_. path_length : integer (default = 5) The maximum size of the path to randomly generate. This is ``T`` in [1]_. According to the paper, ``T >= 5`` is recommended. index_map : dictionary, optional If provided, this will be populated with the inverted index of nodes mapped to the set of generated random path indices within ``paths``. weight : string or None, optional (default="weight") The name of an edge attribute that holds the numerical value used as a weight. If None then each edge has weight 1. seed : integer, random_state, or None (default) Indicator of random number generation state. See :ref:`Randomness`. source : node, optional Node to use as the starting point for all generated paths. If None then starting nodes are selected at random with uniform probability. Returns ------- paths : generator of lists Generator of `sample_size` paths each with length `path_length`. Examples -------- The generator yields `sample_size` number of paths of length `path_length` drawn from `G`: >>> G = nx.complete_graph(5) >>> next(nx.generate_random_paths(G, sample_size=1, path_length=3, seed=42)) [3, 4, 2, 3] >>> list(nx.generate_random_paths(G, sample_size=3, path_length=4, seed=42)) [[3, 4, 2, 3, 0], [2, 0, 2, 1, 0], [2, 0, 4, 3, 0]] By passing a dictionary into `index_map`, it will build an inverted index mapping of nodes to the paths in which that node is present: >>> G = nx.wheel_graph(10) >>> index_map = {} >>> random_paths = list( ... nx.generate_random_paths(G, sample_size=3, index_map=index_map, seed=2771) ... ) >>> random_paths [[3, 2, 1, 9, 8, 7], [4, 0, 5, 6, 7, 8], [3, 0, 5, 0, 9, 8]] >>> paths_containing_node_0 = [ ... random_paths[path_idx] for path_idx in index_map.get(0, []) ... ] >>> paths_containing_node_0 [[4, 0, 5, 6, 7, 8], [3, 0, 5, 0, 9, 8]] References ---------- .. [1] Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., & Li, J. Panther: Fast top-k similarity search on large networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. 2015-August, pp. 1445–1454). Association for Computing Machinery. https://doi.org/10.1145/2783258.2783267. rN)r r rrz Initial node r)rZ)rrrandom Generatorintegersrandintrpr reciprocalr<rrrrErraddchoicer&)rr-rrr seedrr~ randint_fnadj_mat inv_row_sumstransition_probabilitiesr+r( path_index node_indexr'pathstarting_indexr! nbr_indexnbr_nodes r#r r ms\$D"))*=*=> DLL &1G==!!45==b!DL&5AwH!!#IK(- >#I.JJ'DX%oo fXY&GHHD!-Jv  y $##J/#-, $#{# 7A 5nE$I 'N *H KK !$y(h'++J7+5,Ih'' 7* [-s EF(F) NNNNNNNNNNN) NNNNNNNNN) NNNNNNNNNTNN)NNg?ig-C6?)rrg?g?Nr )rNr N)__doc__rrr dataclassesr itertoolsrnetworkxrpnetworkx.utilsr__all__ _dispatchablerrrr r rrr r r6r$r#rOs*  !* 1 4T   hhV+,l-l^+,S-Sl1 4T   L >L >^   LLb   9|   jZX&FNE'EPX&  H H'Hr$