L i݆8UddlZddlZddlmZddlmZddlmZmZddl Z ddl m Z m Z ddl mZddlmZddlmZdd lmZdd lmZd gZGd d eZdZdefdZedGdde j8j:ZdZdZ e jBjDjFjHe jBjDjJjHe jBjDjLjHe jBjDjNjPe jBjDjRjHe jBjDjTjVe jBjDjXjHe jBjDjZjHiZ.e/edefedeffe0d<de1ede(fdZ2de1ede1ede1efdZ3edd Z4y)!N) defaultdict)Enum)AnyCallable) FakeTensorFakeTensorMode)Node) compatibility)StorageWeakRef)_pytree) tree_map_only reinplaceceZdZdZdZdZy) _ViewTyperN)__name__ __module__ __qualname__NonViewSingleOutputViewMultiOutputView_/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/fx/passes/reinplace.pyrrsGOrrc|t|tjjr[|j}t |j dkDr6|j d}|jduxr|jj Syyy)Nr) isinstancetorch_ops OpOverload_schemalen arguments alias_infois_writetgtschema first_args r _is_view_opr*s| :c5::+@+@A v 1 $((+I$$D0V9M9M9V9V5V  %Brreturnc|t|tjjr|j}t |j dkDri|j d}|jN|jjs8d|jjvrtjStjStjS)Nr*)rrrr r!r"r#r$r% after_setrrrrr&s r_get_view_typer/%s :c5::+@+@A v 1 $((+I##/ 8L8L8U8U)..888$444$555   rF)is_backward_compatiblec.eZdZdeffd ZfdZxZS)_FunctionalizationMetadataPropnodec|xjdz c_t | |}||jd<|j|jd<|j}|j t jjjjur|dd}|jdk(rt|j }|tjk(rz!_schemas_match..s) B 277s&(rc38K|]}|jduywr_)r$)rarXs rrdz!_schemas_match..sJq||t#Jsr)nameendswithr"r#allzipr$r%)functional_schemainplace_schema names_matcharg_types_matchs r_schemas_matchrns$$S) ?    $(9(>(> >+556#  ; +55~7O7OP    #..:  $ $Q ' 2 2 ; ; < J^-E-Eab-IJ JJ J  *?*rc<t|tjjsyt |ry|j j dd}|jj}ttj|}|dnt||dd}|y|jDcgc]}t||}}|Dcgc]%}t|j|js$|'}}t|dk(ryt|dk(sJ|d} | Scc}wcc}w)N.rSr]rr)rrrr r*rsplitoverloadpacketrgetattrr? overloadsrnr!r") rC op_namespace op_base_namemaybe_namespace_modulemaybe_inplace_op overload_nameinplace_overloadsf'inplace_overloads_with_matching_schemas inplace_ops r_maybe_get_inplace_opr~s1 b%**// 02==&&s+B/L$$--L$UYY = " ) + ~Q-? F .779   -0 %/rzz199(M/+/ 23q8 6 71 << <8;J /s)D%D(D._VIEW_INVERSE_MAPtensor_aliasesop_indexcXd}t}|D]}|j}|D]}d|jvs|jd|kr$||vrLt|jt j js|jtjk(rt|j||S)Ncvt|tr)|jt|j yyr_)rraddr rH)xset_s r_add_if_tensorz2_get_all_later_node_usages.._add_if_tensors+ a $ HH^A$4$4$67 8 %rr6) setusersr<rr>rrr rErFr)rrrnodes_used_aftert usage_nodesns r_get_all_later_node_usagesrs9u $gg  $A'166*+=+IN"qxx)>)>?xx9#4#44   # $$" rlater_node_usages self_aliasesc0d}t}t|dD]g}|jtvr|jd}|jd}t |t sJt |jdtsJt |t sJt |jdtsJt |jtrJt|j}|D]}d|jvr|jd} || jdg|jddi|j} |jd} || jd|jdr|| | r|j|j|S#t$rYwxYw) Nc|j|jk(xrD|j|jk(xr!|j|jk(Sr_)sizestridestorage_offset)rXbs rmatching_view_metadataz=_get_view_inverse_node_usages..matching_view_metadatasS FFH  9 ahhj( 9  "a&6&6&88 rc |jdS)Nr6)r<)rs rz/_get_view_inverse_node_usages..sQVVJ5Gr)keyrrr5r8r) rsortedr>rr=rr r<rstrkwargsr Exception) rrrview_inverse_nodesrbase mutated_view original_view self_aliasself_alias_baseview_replay_metadataexpected_metadatas r_get_view_inverse_node_usagesrs  %+G H# 88, , vvayvvay $%%%$))M2J???,---,++M:JGGGahh,,,*!((3 & J  /(ooi8O (5#((7(:;&&*(HI($%/OOM$B!)#((7=9Q,-ACTU&**1-) #J   sAj@k7rt|j dj} | |vrt%|j.D cgc] } | |us|  c} dkDr)t|j dj} '| }tC||j d }tE||}t%||z dk(}|s|jtFvr||vrtF|j}|jjI|5|j.d}|j.d d }|jjKd||ftM|z|jN}|jjKdtj:j<jPj@||fid d d |jS|n tU|j}|||_t|j dj}'| jW'|'|jW'| |jW|tYjZ|g|D]&&j.d%&j\Dcgc]$}|j d |j d kDs#|&}}|D]}%&fd }t_t`||j.|_t_t`||jN|_'tj0&j d}tj0|j d}|D chc]+} t | tbrt| j-}} |D chc]+} t | tbrt| j-}} t%|dk(s t%|dk(s||k(s!tj0%j d} | D chc]+} t | tbrt| j-}!} t%|!dk(sJ|!\}"|\}#'|#jW'|"'|"jW'|#=|D]}$|jje|$|jg|Scc}wcc} w#1swY xYwcc}wcc} wcc} wcc} w) a Given an fx.GraphModule, modifies it to perform "reinplacing", mutating the nodes of the graph. We look for out-of-place op call sites like `b = a.add(...)`, and convert them to be inplace (`b = a.add_(...)`), as long as the input to the current operator ("a") isn't reused anywhere later in the graph. This pass currently expects to operate on a **functional, ATen** graph. This can be obtained by running `make_fx(functionalize(f))`. Sample inputs are needed to determine aliasing relationships of the inputs. In general, we can't reinplace node `b = a.add(...)` if "a" aliases any of the inputs to the program. Given a node "b = foo(a, args...) the algorithm for re-inplacing is as follows: (1) Perform some initial checks on the metadata of "a" and "args..." that can disqualify them from being reinplaced. (1a) Check that the self argument we're attempting to reinplace has acceptable dtype/size metadata to reinplace with. For example, if we have: a = torch.ones(1) b = torch.ones(10) out = torch.add(a, b) We can't turn that into a.add_(b) Because that would require resizing "a". Similarly, we can't convert torch.ge(a, b) into a.ge_(b), because that would require changing a's dtype (from e.g. float32 to bool). Note that in this specific example, we could technically do better.. If we see the pattern: a_1 = a.ge(b) a_2 = aten._to_copy(a_1, a.dtype) Then we this should be valid to completely re-inplace (this is exactly what functionalization will emit when it sees a.ge_(b)). This optimization is only really important for user programs that directly use inplace comparison ops though. We also cannot re-inplace on tensors that have overlapping memory, e.g. torch.ones(1).expand(4, 4).add_(1) (1b) Check if "a" is an alias of any of the program inputs. If it is, skip and move to the next node. Inplace'ing an op that would cause it to mutate a program is not sound, because that would be a side effect visible to the user. NOTE: there's a future optimization that we should make: if "a" is a (alias of a) program input, but later in the program there is a node that looks like "a.copy_(...)", Then re-inplacing is ok to do - we are temporarily reusing a's buffer, which will later be overwritten by the copy_() call. This will be an important optimization to have for programs that mutate their inputs. It currently isn't implemented though. (1c) Check if "a" and "args..." alias For example, re-inplacing to create code like the below isn't guaranteed to be sound: aten.mul_(a, a) (2) Check that "a" and all of its outstanding aliases are not used anywhere later in the graph. If this is the case, then it's safe to re-inplace to "b = foo_(a)". There are a few caveats to this, explained in more detail below: (a) If "a" is used later as an argument to a view op, that is okay. It's only a problem if "a" (or that view) is later passed into a normal operator, or if it is returned as the program output. (b) If "a" is a repeat argument in `foo()`, then don't reinplace. Most ATen kernels don't make any guarantees that this is sound, e.g. if you do aten.mul_(a, a). So we'll just ban re-inplacing in this case. It's only a problem if "a" (or that view) is later passed (c) If "a" is used as an input into a view "inverse" / "scatter" operator, it is potentially fine to re-inplace (and remove that scatter operator from the graph). See below for a more detailed example. NOTE: there is an optimization in this step that is crucial to fully recovering performance from functionalization. Given this program: def f(x): a = torch.ops.aten.add(x, x) b = torch.ops.aten.diagonal(a) torch.ops.aten.fill_(b, 0) return d Functionalization will emit the following: def f(x): a = torch.ops.aten.add(x, x) b = torch.ops.aten.diagonal(a, 0, 1) b_updated = torch.ops.aten.fill(b, 0) a_updated = torch.ops.aten.diagonal_scatter(a, b_updated, 0, 1) return a_updated Ordinarily, we would not be able to reinplace the fill, because "b" aliases with "a" which is used by the diagonal_scatter call. "re-inplacing" is on the hook for figuring out that it is ok to completely, the expensive diagonal_scatter call, if we re-inplace the add(). So, for every `alias in alias_set(a)`, instead of checking that "alias" is not used anywhere later in the graph, we check that EITHER: (a) alias is not used anywhere later in the graph OR: (b) alias is used exactly once later on in the graph, in the following op: out = foo_scatter(alias, x, args...) where the following must hold: (i) "foo_scatter" is the "inverse" operator for foo. This only applies to "foo" ops that are view operators, which view into a subset of the original tensor's memory. In practice, there are ~4 operators where this applies: diagonal -> diagonal_scatter slice -> slice_scatter select -> select_scatter as_strided -> as_strided_scatter (ii) "args..." are the same between the foo() and foo_scatter() calls. (3) Perform the actual re-inplacing on foo! (3b) is the common case, but special care is needed for {view}_scatter (3a) (3a) {view}_scatter ops. Consider this program: a = torch.zeros(2, 2) b = torch.ones(2) a[0] = b Post functionalization, that will look like: a = torch.zeros(2) b = torch.ones(1) a_updated = torch.select_scatter(a, b, 0, 0) In this case though, there is no "functional" op to re-inplace! Instead, we'd like to directly remove toe select_scatter call. We already know from (3) that this is valid, because "a" has no later usages in the graph. We perform the re-inplacing on the {view}_scatter op like so Before: a_updated = torch.select_scatter(a, b, args...) After: a_slice = a.select(a, args...) a_slice.copy_(b) (3b) Otherwise, replace the functional op with its inplace variant. Before: b = foo(a, args...) After: a.foo_(args...) (4) Finally, after converting either: Before: b = foo(a) After: foo_(a) or Before: b = {slice}_scatter(a, mutated_slice, args...) After: slice = {slice}(a, args...) slice.copy_(mutated_slice) We now need to find all later nodes that use "b" as an argument and update them to take in "a" instead. Note that for the majority of inplace ops, this isn't actually necessary (because most inplace ops return "self" as their output). This isn't generally true for all mutable ops though, which is why we need to actually replace all of the arguments. We also need to update our metadata of Dict[StorageWeakRef, Set[Node]], That maps a given tensor storage to the set of all nodes that take in that storage as an input. Specifically, re-inplacing `b = foo(a)` causes "a" and "b"'s sets to get fused together. (5) Any "view_inverse/scatter" nodes that were identified as "it's ok to ignore them" during step (3) get manually deleted from the graph. Their outputs are no longer used, so technically standard DCE would be able to do this, but we can no longer run FX's DCE pass now that we have mutable ops in the graph. placeholderr5c~t|tr,t|jj yyr_)rrr rHr)rrstorage_to_nodess r _add_to_mapzreinplace.._add_to_map s3a,$^A4D4D4F%GHLLQO-rr7rrFTr6rNc|k(rS|Sr_r)rXnewolds r replace_argzreinplace..replace_args8#&J r)4r2rZgraphnodesrCrr<rrTr rHrrpytree tree_map_r>rr r"r!r#r` TensorTyper= tree_leavesrinumeldtype_debug_has_internal_overlapr?r@resizerBrrrinserting_before create_nodetuplerrArr~update itertoolschainrr r r erase_node recompile)(gm sample_argsr3input_storagesr&all_later_view_inverse_nodes_to_deleteself_argself_flattenednode_flattenedself_has_wrong_metadata self_meta node_metaself_arg_storagerrrlater_view_inverse_node_usages can_reinplaceview_opmutated_slice_noderemaining_slice_args slice_noderxcurr_node_storagernodes_to_updatenode_to_updaterold_flattened_resnode_flattened_resold_res_storagenode_res_storagenew_flattened_resnew_res_storagenew_refnode_ref to_deleterrrs( ` @@@rrrs.R1"2&00+>.HHNN  GG} $499]3U\\B tyy/>>@AN9DC8H XX^^A AFF " P   [!&&*? @A.1U*VU 77o %dkk5::+@+@A4;;&&001A5DKK''11!499:e>N>NNyy|H#// m0LMN#// -0HIN&+ #>"c.&99,/,O7(Iy (IOO,==26/ )//926/88CqH26/7'4;;%))..:O:O:W:W+W . m,;;=   >1tyy:!AMA:;a?- m,;;=  ,,<=L!;dii 3!  .K!<. * 14R RSWXXM  00 FF+DKK8 XX..t4)-1&+/99QR=(!#!5!5'! e,@&AA "J HH((' ,,44&.&7::4@$9#E #+. !/ -(779!  - . 5 5 !23  . / 6 6 !12  3 9 9. !v/MNB Uhhqk"yy#AFF:,>:AV,VA##'6=UN! +8k>+>+>+N'-:k>+@+@-N) )/(:(:388M;R(S%)/););&++M:*& "3'%a4'q'7'7'9:'O'"4(%a4'q'7'7'9:($("O,1 01Q6+/??,2,>,>sxx ?V,W)&7+ !)!Z8+1+;+;+=>++ #?3q888%4 &6  )299:J7:ST(1889I(9ST{=U B UiVUv<'  I&'LLN IkP;Zn#0' (,+s=A#^+ ^ 5^ .B ^%$^% ^%$0^* 0^/ 0^4 ^" )5rEr collectionsrenumrtypingrrrtorch._subclasses.fake_tensorrrtorch.fxr torch.fx._compatibilityr torch.multiprocessing.reductionsr torch.utilsr rtorch.utils._pytreer __all__rr*r/fx Interpreterr2rnr~r?r@diagonal_scatterrBdiagonalselect_scatterselectint slice_scatterslicerTas_strided_scatter as_stridedrdict__annotations__rrrrrrrrs# D1;)- -  9 0e,?+UXX%9%9?+-?+D+2#N IINN##++UYY^^-D-D-L-L IINN!!))599>>+@+@+D+D IINN  ((%))..*>*>*E*E IINN%%--uyy~~/H/H/P/P C4c*HS#X,>>?s4yCB14y103D 1Y1hd+R,Rr