L iWXddlZddlZddlmZmZddlmZmZmZddl m Z m Z m Z m Z mZddlmZgdZdej$de ej$ge ej$ffdZdej$d eej$eee ffd e d e ej$ge ej$ffd Zdej$d eeej$deej$deee fd eej$eee fff dZd efdZGddeZdej$deded eeej$d ef dZGddeZGddeZ dej$dedede d ef dZ!dej$dededeeej$d ef dZ"dej$d eedefd Z#dej$deded efd!Z$ed"ddfdej$deded#ed$e eeej$d%e eeej$d efd&Z%ejLejNhe%_(ejRhe%_*ejVd'e d(e d ed)fd*Z,dej$d+e d ej$fd,Z-dej$d'e d ej$fd-Z. d5dej$d.e d'e deej$d/eej^d0ed1e d e0ej$effd2Z1Gd3d4Z2y)6N)ABCabstractmethod) GeneratorIterableSequence)AnyCallablecastOptionalUnion)always_wrap_policylambda_auto_wrap_policytransformer_auto_wrap_policysize_based_auto_wrap_policy enable_wrapwrap CustomPolicyModuleWrapPolicy root_modulefnc|hdtjdtdttjffd |ddy)aQ This applies ``fn`` to every module in the module tree of ``root_module`` following a post-order traversal. If ``fn`` returns an :class:`nn.Module`, then this replaces the original module with the newly returned one in the tree. Otherwise, ``fn`` should return ``None``, in which case the module is not changed. module module_name parent_modulecT|jD]%\}}|vs j||||'|}|et|tjs Jd|d||s Jd|t|tjs Jd|t |||yy)Nz=Non-root modules should have their parent module set but got z for zTNon-root modules should have their module name set but got an empty module name for z.fn should return None or an nn.Module but got )named_childrenadd isinstancennModulesetattr) rrrchild_module_name child_moduleoptional_module_post_order_apply_innerrvisited_moduless a/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/distributed/fsdp/wrap.pyr%z2_post_order_apply.._post_order_apply_inner-s 06/D/D/F Q + |?2##L1' 6GP QV*  &mRYY7 O /vh0 7 ,,285 ;oryy9 @@QR 9 M; @ 'N)rr strr )rrr%r&s `@@r'_post_order_applyr+sJ(3mOA AA  *A0KT2r(target_module_to_kwargsfsdp_fnreturncjdtjdttjffd }|S)z This constructs the "wrap" function to pass to :func:`_post_order_apply` based on ``target_module_to_kwargs``, which should be constructed from the wrapping policy. rr.c2|vr|ur|}|fi|SyN)rkwargsr-rr,s r'rz_construct_wrap_fn..fnSs2 , ,{1J,V4F6,V, ,r()rr r )rr,r-rs``` r'_construct_wrap_fnr4Hs*299"))!4 Ir(module_classesignored_modules root_kwargsctt|}|jD]%}||vrt||s||vr|||<d||d<'|S)Nmixed_precision)tuplesetmodulesr)rr5r6r7r,module_classes_tuplers r'$_run_mixed_precision_override_policyr>^sq!^!45%%'F _ $   4 5442='/AE #F +,= >F #"r(cy)z A simple recursive wrap policy that always returns ``True``. This means that every submodule is wrapped by the wrapper class in :func:`_recursive_wrap`. Tr2)argsr3s r'r r rs r(c eZdZdZedej deej dee e fdeej ee e fffdZ y)_Policyzk This defines an abstract base class that represents a policy for applying a module-level API. rr6r7r.cy)z This should return a dict ``target_module_to_kwargs`` that maps from each target module to wrap to its kwargs. Nr2)selfrr6r7s r' _run_policyz_Policy._run_policys r(N) __name__ __module__ __qualname____doc__rrr r;dictr*rrEr2r(r'rBrB{sk   YY  RYY  #s(^  biic3h' (    r(rBrrecursenonwrapped_numelc2|ryt|t|S)a This auto wrap policy wraps every module that is an instance of any type in ``module_classes`` as its own FSDP instance. The root module given by ``module`` is always wrapped as an FSDP instance regardless. Since the wrapping proceeds bottom up, each FSDP instance manages the parameters in its subtree excluding any already managed by a child FSDP instance. Args: module (nn.Module): Current module being considered. recurse (bool): If ``False``, then this function must decide whether ``module`` should be wrapped as an FSDP instance or not. If ``True``, then the function is still recursing down the module tree as a part of the DFS. nonwrapped_numel (int): Parameter numel not yet wrapped. module_classes (Set[Type[nn.Module]]): Set of module classes that are wrapped as FSDP instances. Returns: ``True`` if ``recurse=True``, and whether ``module`` should be wrapped if ``recurse=False``. Trr:)rrKrLr5s r'_module_wrap_policyrOs6 feN3 44r(c eZdZdZdeeejfdZdejde ejde e e fde eje e e fffdZ d Zde ffd ZxZS) rz{ This policy applies to every module of the specified module classes, passing in the kwargs given to the root. r5cHt|}||_t||_yr1)r;_module_classesr*_module_classes_str)rDr5module_classes_sets r'__init__zModuleWrapPolicy.__init__s# 01#&'9#: r(rr6r7r.ct|j}i}|jD],}||vrt||st j|||<.|Sr1)r:rRr<rcopy)rDrr6r7r5r,rs r'rEzModuleWrapPolicy._run_policyse t334CE!))+ IF(FN326))K2H'/  I '&r(c4t||d|jS)N)rLr5)rOrR)rDrrKr@r3s r'__call__zModuleWrapPolicy.__call__s" GbAUAU  r(cBt|d|jdzS)N())super__repr__rS)rD __class__s r'r_zModuleWrapPolicy.__repr__s&w!a(@(@'A$CCCr()rFrGrHrIrtyperr rUr;rJr*rrErZr_ __classcell__)r`s@r'rrs ;xRYY'@; 'YY'RYY'#s(^ ' biic3h' ( '  D#DDr(rc eZdZdZdeej geee e e ffffdZ dej de ej de e e fde ej e e e fffdZy ) ra This policy takes in a lambda function that maps a given ``nn.Module`` to either ``False``, ``True``, or a kwarg dictionary. - If the function returns ``False`` or an empty dictionary, then the module does not have the API applied. - If the function returns ``True``, then the module has the API applied with the root's kwargs. - If the function returns a non-empty dictionary, then the module has the API applied, and the dictionary overrides the root's kwargs. Example:: >>> # xdoctest: +SKIP("undefined variables") >>> model = init_transformer_model(...) >>> def lambda_fn(module: nn.Module): >>> if module is model.lm_head: >>> return {"sharding_strategy": ShardingStrategy.SHARD_GRAD_OP} >>> elif isinstance(module, TransformerBlock): >>> return True >>> return False >>> policy = CustomPolicy(lambda_fn) >>> fsdp_model = FSDP(model, auto_wrap_policy=policy) lambda_fnc||_yr1) _lambda_fn)rDrds r'rUzCustomPolicy.__init__s #r(rr6r7r.c$i}|jD]z}||vr|j|}t|ttfst d||s@t j |}t|tr|j||||<||S)Nz_The lambda_fn passed to CustomPolicy should return False/True or a kwarg dict, but it returned )r<rfrrJbool ValueErrorrWupdate)rDrr6r7r,rresr3s r'rEzCustomPolicy._run_policys DF!))+ 5F(//&)CcD$<0 CCF%IYY{+F#t$ c".4 #F +! 5"'&r(N)rFrGrHrIr rr r rhrJr*rrUr;rEr2r(r'rrs0$(BII;dDcN>R8S+S"T$'YY'RYY'#s(^ ' biic3h' ( 'r(rrdc|ry||S)aU A convenient auto wrap policy to wrap submodules based on an arbitrary user function. If `lambda_fn(submodule) == True``, the submodule will be wrapped as a `wrapper_cls` unit. Return if a module should be wrapped during auto wrapping. The first three parameters are required by :func:`_recursive_wrap`. Args: module (nn.Module): Current module being considered. recurse (bool): If ``False``, then this function must decide whether ``module`` should be wrapped as an FSDP instance or not. If ``True``, then the function is still recursing down the module tree as a part of the DFS. nonwrapped_numel (int): Parameter numel not yet wrapped. lambda_fn (Callable[[nn.Module], bool]): If this returns ``True``, then this module will be wrapped. Tr2)rrKrLrds r'rr s. V r(transformer_layer_clsct||||S)a- See :func:`_module_wrap_policy`, where ``transformer_layer_cls`` is the same as ``module_classes``. Note that shared parameters must be wrapped in the same FSDP instance, so this auto wrap policy can help wrap shared embeddings into the same FSDP instance for transformer models. )rO)rrKrLrms r'rr's vw0@BW XXr(c2|ryt|t|S)NTrN)rr5rKr@r3s r'_wrap_module_cls_individuallyrp6s&%"788r(c4tfd|DS)zv A policy that wraps ``module`` if any policy in the passed in iterable of ``policies`` returns ``True``. c34K|]}|yw)rrKrLNr2).0policyrrLrKs r' z_or_policy..Ls&  fg@PQQs)any)rrKrLpoliciess``` r' _or_policyryBs  r(gחAmin_num_paramsforce_leaf_modulesexclude_wrap_modulesc|tjn|}|tjn|}|}||k\}|r|xrt|t | S|xrt|t | S)a A size-based auto wrap policy. Args: module (nn.Module): Current module being considered. recurse (bool): If ``False``, then this function must decide whether ``module`` should be wrapped as an FSDP instance or not. If ``True``, then the function is still recursing down the module tree as a part of the DFS. nonwrapped_numel (int): Parameter numel not yet wrapped. min_num_params (int): Customizable policy input that controls the size threshold over which a module is ready to be wrapped. This is in units of numel. force_leaf_modules (Optional[set[type[nn.Module]]]): Set of module types to keep as leaves, i.e. their children will never be wrapped. exclude_wrap_modules (Optional[set[type[nn.Module]]]): Set of module types to be excluded in wrapping. Returns: Whether ``module`` should be wrapped. )rFORCE_LEAF_MODULESEXCLUDE_WRAP_MODULESrr:)rrKrLrzr{r|min_nonwrapped_numelis_larges r'rrRsB  % $66  ' $88 !*#77HM 659K3L MMMO 659M3N OOOr( wrapper_clswrapper_kwargs)NNNc+`Kd|i|}tdi|5ddddy#1swYyxYww)a Context manager to wrap modules using a wrapper. Useful for when you'd like to apply the same configuration arguments to all child modules that you wrap. A particularly important use case is wrapping large layers so that they get sharded (in-place) during initialization, to avoid running out of system memory. Large layers can indicate that they should be sharded via the ``wrap`` annotation and this context manager can provide the exact configuration for these nested instances. Usage:: with enable_wrap(wrapper_cls, **params): # Wraps layer in FSDP by default if within context self.l1 = wrap(torch.nn.Linear(5, 5)) Args: wrapper_cls: Class that `wrap` annotation will `wrap` modules with, such as `FullyShardedDataParallel`. **wrapper_kwargs: Configuration settings that will be passed to all ``wrap`` instances inside the context rNr2)_ConfigAutoWrap)rrr3s r'rrs@: { F  "6 " s." .+.wrap_overridesc tjrAtjJitj|}t |tjfi|S|S)a Annotate that a module should be wrapped. Annotated modules will only be wrapped if inside of an :func:`enable_wrap` context manager. This allows a module to be initialized both with and without a wrapper without code change. The class that this function wraps the passed in ``nn.Module`` with is the passed in ``wrapper_cls`` argument into ``enable_wrap``. Both ``enable_wrap`` and ``wrap`` can take in kwargs specifying how to construct the ``wrapper_cls`` instance. In the case of duplicate kwargs in ``enable_wrap`` and ``wrap``, the argument passed into ``wrap`` will be respected. Usage:: with enable_wrap(wrapper_cls=FSDP, **fsdp_config): # Wraps layer in FSDP by default if within context self.l1 = wrap(torch.nn.Linear(5, 5)) Args: module (nn.Module): module to wrap (if in :func:`enable_wrap` context) **wrap_overrides: configuration overrides that will take priority over the values provided by the :func:`enable_wrap` context )rin_autowrap_contextrr3_wrap)rrs r'rrs^2****666EO22EnE   ' '   Mr(c f|Jt|dri||j}||fi|S||fi|S)N_wrap_overrides)hasattrr)rrr3 overridess r'rrsP  "" "v() 9v8!7!78 6/Y// v ( ((r(auto_wrap_policyignored_paramsonly_wrap_childrenr3c |Jd|Jd|jD]'\}}||vr t|tt|rJ)t fd|j D} |J||d| rjd} |jD]0\} }||vr td ||||d|\} } t|| | | | z } 2| | z }|s||d|rt||fi|| fS|| fS|dfS#t$rYwxYw) a Wraps submodules of ``module`` for which ``auto_wrap_policy`` returns ``True`` with ``wrapper_cls``. Args: module (nn.Module): Module to recursively wrap. auto_wrap_policy (Callable): A callable representing a policy that determines which modules to recursively wrap with ``wrapper_cls``. ignored_modules (set[torch.nn.Module]): Modules to ignore when wrapping. ignored_params (set[torch.nn.Parameter]): Parameters to ignore when wrapping; these should be the parameters contained in the modules in ``ignored_modules``. Returns: (nn.Module, int): ``module`` after wrapping and the numel recursively wrapped. zMust specify auto_wrap_policy.zMust specify wrapper_clsc3HK|]}|vs|jywr1)numel)rtprs r'rvz"_recursive_wrap.. s#!>2I s ""Trsr)rrrr6rFr2) named_modulesrr ra TypeErrorsum parametersr_recursive_wrapr!r)rrrr6rrr3_childrLtotal_wrapped_numelname wrapped_childnum_wrapped_params remainders ` r'rrs4  'I)II '  ">$>> "((*5 O #  !%dK)@A AAA !,,.  '' 'vtFVW!002 6KD%'0?1!1' /- 1  1 -M- FD- 0 #5 5  6 %':: !&659' 779II I.. . 19K   sC-- C98C9ceZdZUdZdZeed<dZee ed<iZ e e e fed<de e e ffdZede ddfd Zedd Zdd Zd e d e de ddfdZy)rz Helper class to wrap modules based on default config args via a context manager. See :func:`enable_wrap` for more information. FrNrr3c ||_yr1r3)rDr3s r'rUz_ConfigAutoWrap.__init__7s  r(r.ctjr tddt_d|jvsJdt t |dt_|d=|t_y)Nz]You are already within an autowrap context and we currently do not supported nested autowrap.Trz9Expected to pass in wrapper_cls arg into _ConfigAutoWrap.)rrNotImplementedErrorkeysr r rr3rs r'enable_autowrap_contextz'_ConfigAutoWrap.enable_autowrap_context:si  . .%o /3+ - G -'+8VM5J&K# = !!'r(cFdt_dt_it_y)NF)rrrr3r2r(r'disable_autowrap_contextz(_ConfigAutoWrap.disable_autowrap_contextJs.3+&*#!#r(c:|j|jyr1)rr3)rDs r' __enter__z_ConfigAutoWrap.__enter__Ps $$T[[1r(exc_typeexc_valexc_tbc$|jyr1)r)rDrrrs r'__exit__z_ConfigAutoWrap.__exit__Ss %%'r()r.N)rFrGrHrIrrh__annotations__rr r r3rJr*rrU staticmethodrrrrr2r(r'rr-s !&%&*K(#*FDcNc3h ( ( ( ($$ 2((s(C(D(r(r)F)3 contextlibrWabcrrcollections.abcrrrtypingrr r r r torch.nnr__all__r r+rJr*r4rar;r>rhr rBintrOrrrrrpryr ModuleList ModuleDictrMultiheadAttentionr~contextmanagerrrr Parameterr:rrr2r(r'rs  #9977 &3&3"))hryy112&3R!"))T#s(^";<ryyk8BII../ ,##T"))_-#^#c3h # ""))T#s(^";< #(4 c (5 II5 55RYY( 5  5@"Dw"DJ4'74'n II $8;HP 8 Y II Y  Y YtBII/ Y  Y 9 II 9'/~ 9@D 9  II      *c(9=;?3P II3P 3P3P  3P !T"))_!56 3P#3tBII#783P 3Pn57MM2==3Q0242G2G1H.   +.    F""c"bii"J )")) )( ) )& % G IIGGG^ G  % G  GG 299c>GT'('(r(