L ijUddlZddlZddlZddlZddlZddlmZddlmZddl m Z m Z ddl m Z mZmZddlZddlmZddlmZdd gZes:ddlZGd d Zd Zeej2d _eej2d _yddlmZddlm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)m*Z*ejVe,Z-er ddl.m/Z/GddejdZ3e3Z4e3e5d<d!de6fdZ7Gdd Z d"de8ee9e6fee6ejte;e6ejtfffde9de e;e6dfdee;e e6e ejtffdZ= 1.21 to be installed for type checkingc teZdZddZddZdddeedfd eeedfddfd Z dddd e ed ee ee e jfddfd Z ddZddde efdZededefdZededefdZddd edefdZ ddede ede e jddfdZdeeedffdZddd ededfdZy)_MeshEnvreturnNcJg|_i|_i|_i|_i|_yr) mesh_stackchild_to_root_mappingmesh_dim_group_optionsroot_to_flatten_mappingflatten_name_to_root_dimsselfs r__init__z_MeshEnv.__init__Es702DOGID &  'UWD (  *rr cft|jdk(r td|jdS)Nrz#No device mesh is currently active!)lenr* RuntimeErrorr/s rget_current_meshz_MeshEnv.get_current_meshQs.4??#q("#HII??2& &r device_meshsubmesh_dim_names. submesh_dimsc~|Dcgc]}tfd|d}}j}g}g}d} t||D]\} } t| dkDrv|j | d| z | d| z }|j | d| z | t| dz z } |j |j | jd|j | d| z |j j| dtt|j} |D]"} | | vr td| j| $|jg| |jdg|}j}|D]"}t!j"||d}||vs!|}$|_|j$|<|Scc}w) Nc@|jj|zSr)meshsize)xyr7s rz*_MeshEnv.create_sub_mesh..bs[%5%5%:%:1%=!=rrr3) start_dimend_dimzCCurrently, this only allows slicing out a contiguous flattened dim.Fmesh_dim_names _init_backend)rr<zipr4flattenappendr-_dim_group_nameslistrangendimNotImplementedErrorremovepermutereshaperr device_typer+)r0r7r8r9mesh_dimslice_dim_size mesh_tensor slice_dim_idxslice_dim_group_namenum_dims_flattenmesh_dim_indices mesh_dim_namemesh_dims_remained_idxidxpg_ranks_by_dimcur_rankmesh_ndsubmesh res_submeshs ` rcreate_sub_meshz_MeshEnv.create_sub_meshVsM !-   =N&**KM#%  ! 36|EV3W / -'(1,"-"5"5"21"58H"H 0 47G G#6#K"(()9!)}MM&*ii 0@0@0E0E0G&H #$(y~~/B/B)C$D !'; B#%,,-@A B5inn44&)=gb12 !))+H* 8!+))$1#3&6%8 " w&)7& 8>GD & &'9 :"  ( ( 3 3Ir B= QHM$HD * *9 5m D& %w$s+I cF|jj|d}|s|S|Sr)r+get)r0r7rqs rriz_MeshEnv.get_root_meshs)2266{DII&/; >Y >rc|j|}|j}|r.|r,t|dk(sJd|d}|j||Sy)z Returns the index of the mesh dim in the root mesh. The device_mesh passed in needs to be sliced out from the root mesh or submesh of the root mesh. rAz"The submesh can only be a 1D mesh.rN)rirEr4get_mesh_dim_by_name)r0r7rqchild_mesh_dim_nameschild_mesh_dim_names rget_root_mesh_dimz_MeshEnv.get_root_mesh_dimsh **;7I#.#=#= 1/0A585';1&=#00$>>&}o5FGD[E_E_D`aK66<<]KL Lrdimbackend pg_optionsc&||f|j|<yr)r,)r0rrrs r_set_mesh_dim_group_optionsz$_MeshEnv._set_mesh_dim_group_options#s 18/DD ' ' ,rc ||j|k7rtjd|jj |i|j|}g|j | t fd|Dstd|d dd}g}|D]h}||vr||}|d}|j|n-|j j|}|j|f||krtd|d|d|}j|S) z Validate whether the mesh_dim_names is valid for slicing the given device_mesh. If valid, return dim indexes of the slice mesh in the device mesh. aYou are attempting to slice a submesh from another submesh. While we support this operation, it is users' responsibility to ensure that the submesh is consistently sliced across all ranks. If not, this may result in some ranks receiving the submesh while others encounter errors.c3&K|]}|v ywrr).0rZvalid_mesh_dim_namess r z0_MeshEnv._get_slice_mesh_dims..Bs!!55szInvalid mesh_dim_names z% specified. Valid mesh_dim_names are .r3z- specified. Found mesh dim indices to slice: z0. Mesh dim indices should be in ascending order.) riwarningswarnr.rlrEallrrIrj) r0r7rEr.curr_idxslice_mesh_dimsrZ mesh_indicesnext_idxrs @r_get_slice_mesh_dimsz_MeshEnv._get_slice_mesh_dims+sed00== q  * * 5 5k2 F(,(F(F{(S %$++$*$ %3-n-=>00D/EQHH O!/ $  $==#<]#KL ,B/H#**<8*99?? NH#**H;7x'"1.1AB<>> # xdoctest: +SKIP("no rank") >>> from torch.distributed.device_mesh import DeviceMesh >>> >>> # Initialize device mesh as (2, 4) to represent the topology >>> # of cross-host(dim 0), and within-host (dim 1). >>> mesh = DeviceMesh(device_type="cuda", mesh=[[0, 1, 2, 3],[4, 5, 6, 7]]) rRr<.rENT)rErcrFr%rcrFr(c||_t|tjr'|jj dk7rt d|t|tjr.|jjtjn%tj|dtj|_ |r t|nd|_ |d|jjz}t|jjj!|_d|_|dk7r|r!|j'|j)|t+r&t-dk(rt/j0|_|jt3k(j5}|j7dd vsJ|j7ddkDr|dj!nd|_yy) Ncpuz!`mesh` must be a CPU tensor, got dtypedevicer)rxlathreadedr)rrA)rR isinstancerTensorrtype ValueErrordetachtortensorr<rprErMrHtolist_flatten_mesh_list _thread_id_setup_world_group_and_device_init_process_groupsr!r threading get_identrnonzeror=_coordinate_on_dim)r0rRr<rErcrF rank_coordss rr1zDeviceMesh.__init__s +D $ -$++2B2Be2K #DTF!KLLdELL1   uyy 1\\$uEIIF I ?!# (C&/&9&9&;DO $yyHJ6??A "''*f444/:/?/?/BQ/FKN))+D'$rc t}|s tt}|jj |kDr*t d|d|jj dt |j}|r|jsdtjvrMttjd}tjd||j|t%Stjd|j!}||kDr'||zdk7rt d|d |d |jd |jt#|zt%S) Nz2Mesh should not be bigger than default world size z , but found z ranks! LOCAL_RANKzESetting default device for the current process based on LOCAL_RANK=%sa=It seems like you did not set/select the default device for the current process before the DeviceMesh initialization or use a launcher (i.e. torchrun) which populates `LOCAL_RANK` environment variable. It is recommended to set the current device for the process BEFORE the DeviceMesh initialization so that the underlying communicator (i.e. NCCL) can be initialized properly. Given that the current process has no default device selected, DeviceMesh will use a heuristic to set the device_id via `global_rank % num_devices_per_host`, assuming homogeneous hardware cluster. rz8DeviceMesh only support homogeneous hardware, but found z ranks and  z devices!)r!r rr<numelr5rrRosenvironrloggerinfo set_devicerrrrr)r0default_initialized world_size device_handle local_rankrs rrz(DeviceMesh._setup_world_group_and_devicesa"0"2 '"$')Jyy :-"H T`aeajajapapar`ssz{ /t/?/?@M]%A%A%C 2::-!$RZZ %=!>JKK_"",,Z8.&' '+MMv,9+E+E+G("%99&)==B*V)l+6J5K1TM]M]L^^gi",,XZ:N-NO%' 'rc g}t}|jjdk(r|jjt k(rt j jdddk(r|ddk(r{ttt }tjjrt|dk(rtd|dn|}|j|j ||_yt|jjD]}|jj#d|j%d|jj'|}|t j vr.||dk7rt)d |d t j |\}} n||\}} |j*rd |j*|nd |} d}d } t-|ddx} ttjjrV|5|j/tj0dj3|k(rt5|| |j7| }d} |D]} | j7}| | st||| | }|j9|vs9t;||kDr t)d|j9d|d|j|j ||_y)NrArrgloozcpu:gloo,cuda:nccl mesh_default)rranks group_descr3z Dimension zb present both in the backend_override argument and via _mesh_resources._set_mesh_dim_group_optionsmesh_ mesh_dim_Fbound_device_idcuda) parent_pgr split_ranksrT)rrrrzFEach device mesh dimension should get only one process group, but got z in !)rr<rMrrrhr,r|rKrLrrr rr"rI group_namerrQr=r5rEr _get_backendrnamer$rrr4rJ)r0rcdim_group_names default_groupr dim_grouprr]rrrhas_split_grouprdim_meshsubgroup_rankss rrzDeviceMesh._init_process_groupss*,O.0M !#IIOO%)99#::>>q,O $Q'<7U>#345zz..0#M2f<  4##1' &&y';';(>(@'1 % +/ %4I)1): +2/(1&4(/+5+5 )I ==?n<"?3c9&2&lmqmzmzm|l}~**8)9%<'"!",2293G3GH+IIYIt%4D !rcDtjj||Sr)rhr*rIr/s r __enter__zDeviceMesh.__enter__s  & & - -d 3Krc@tjjyr)rhr*pop)r0exc_type exc_value exc_tracebacks r__exit__zDeviceMesh.__exit__s  & & * * ,rc|jrDddjdt|j|jjDdnt |jj}d|d|j d|jj}tjjdd d k(r |d |jjz }|dS) N(z, c30K|]\}}|d|yw)=Nr)rkvs rrz&DeviceMesh.__repr__..s_TQ1QCj_s)z DeviceMesh(z , device: 'z ', stride: TORCH_DISTRIBUTED_DEBUGDETAILz, Mesh: ) rErkrGr<shaperprRstriderrr|r)r0device_mesh_reprs r__repr__zDeviceMesh.__repr__s&&DII_S9L9Ldiioo5^__``abdiioo./  "--=,>k$JZJZI[[fgkgpgpgwgwgyfz{ zz~~7<H htyy/?/?/A.B$CC &'q) )rct|dd|_|jsQt|j|jj |j |j|jf|_|jS)N_hash) rrhashrr<rrRrErr/s r__hash__zDeviceMesh.__hash__sc w5DJ::!// ((++  :: rotherc`||uryt|tsy|j|jk(xr~|jj|jjk(xrO|j |j k(xr4|j |j k(xr|j|jk(S)NTF)rr rr<rrRrEr)r0rs r__eq__zDeviceMesh.__eq__su}eZ0''5+C+CC8IIOOuzz'7'778$$(9(998''5+?+??8OOu'7'77  rcV|js tdt|tr|fn|}||jk(r|Stj ||}t jjj5tj|||}ddd|S#1swYSxYw)aV Slice the current DeviceMesh based on the mesh_dim_names given to create a submesh. The submesh created consists of the dimensions and the communicators indicated by ``mesh_dim_names`` Args: mesh_dim_names (Union[str, Tuple[str]]): the name or the tuple of names of the mesh dimension of the DeviceMesh to create the submesh for. Returns: A :class:`DeviceMesh` object The following program runs on each process/rank in an SPMD manner in a world size of 8. In the first example: Calling mesh_2d["tp"] on rank 0, 1, 2, 3 returns a 1D submesh of DeviceMesh:([0, 1, 2, 3]). Calling mesh_2d["tp"] on rank 4, 5, 6, 7 returns a 1D submesh of DeviceMesh:([4, 5, 6, 7]). Calling mesh_2d["dp"] on rank 0, 4 returns a 1D submesh of DeviceMesh:([0, 4]). Calling mesh_2d["dp"] on rank 1, 5 returns a 1D submesh of DeviceMesh:([1, 5]). Calling mesh_2d["dp"] on rank 2, 6 returns a 1D submesh of DeviceMesh:([2, 6]). Calling mesh_2d["dp"] on rank 3, 7 returns a 1D submesh of DeviceMesh:([3, 7]). In the second example: Calling mesh_3d["dp", "cp"] on rank 0, 1, 4, 5 returns a 2D submesh of DeviceMesh:([[0, 1], [4, 5]]). Calling mesh_3d["dp", "cp"] on rank 2, 3, 6, 7 returns a 2D submesh of DeviceMesh:([[2, 3], [6, 7]]). Calling mesh_3d["cp", "dp"] on rank 0, 1, 4, 5 returns a 2D submesh of DeviceMesh:([[0, 4], [1, 5]]). Calling mesh_3d["cp", "dp"] on rank 2, 3, 6, 7 returns a 2D submesh of DeviceMesh:([[2, 6], [3, 7]]). Example:: >>> # xdoctest: +SKIP("no rank") >>> from torch.distributed.device_mesh import DeviceMesh >>> >>> # Initialize a 2D device mesh as (2, 4) to represent the topology >>> # of cross-host(dim 0), and within-host (dim 1). >>> mesh_2d = init_device_mesh(device_type="cuda", (2,4), mesh_dim_names=("dp", "tp")) >>> tp_mesh = mesh_2d["tp"] >>> dp_mesh = mesh_2d["dp"] >>> >>> # Initialize a 3D mesh. >>> mesh_3d = init_device_mesh(device_type="cuda", (2,2,2), mesh_dim_names=("dp", "pp", "cp")) >>> # The order of the mesh_dim_names provided deteremines the order of dimensions in the submesh. >>> dp_cp_mesh = mesh_3d["dp", "cp"] >>> cp_dp_mesh = mesh_3d["cp", "dp"] z1Cannot slice a DeviceMesh without mesh_dim_names!N) rEr5rrrhrr _subclasses fake_tensorunset_fake_temporarilyrb)r0rErr`s r __getitem__zDeviceMesh.__getitem__s\&&"#VWW&0%D!. !4!44 "1"F"F.#&&22IIK-==noG s <BB(rSct|ds td|jjdkDr'|%td|jjddd|jjdk(r#|!t t |j d Stj|}tjj|d}|r8||jvr&||j d }t t |St|trtj||n|}t|tsJt t |j |S) a Returns the single ProcessGroup specified by mesh_dim, or, if mesh_dim is not specified and the DeviceMesh is 1-dimensional, returns the only ProcessGroup in the mesh. Args: mesh_dim (str/int, optional): it can be the name of the mesh dimension or the index of the mesh dimension. Default is None. Returns: A :class:`ProcessGroup` object. rJz*DeviceMesh process groups not initialized!rANFound the DeviceMesh have dimensionsJOptional kwarg `mesh_dim` needs to be specified when device_mesh.ndim > 1.zmIf you want to get the list of all the ProcessGroups in the DeviceMesh,please use `get_all_groups()` instead.r)hasattrr5r<rMr rrJrhrir-r|rmrrr~r)r0rSrqr-dim_group_names r get_groupzDeviceMesh.get_groupsP4!34"#OPPyy~~!h&6"00@ L`=yy~~"x'7 6t7L7LQ7O PQQ'55d;I&5&M&M&Q&Q4' #'87N7S7S7U+U!8"""1"& 6~ FGG"(C0$88xH! "(C000 6t7L7LX7V WXXrct|jjDcgc]}|j|c}Scc}w)z Returns a list of ProcessGroups for all mesh dimensions. Returns: A list of :class:`ProcessGroup` object. )rLr<rMr)r0is rget_all_groupszDeviceMesh.get_all_groupsEs.05TYY^^/DE!DNN1%E EEs=)rEgroupct|trt|}t|tjr|j |k7s!|9t|tjs||k7rt dt|d|tj|dtj}t|||d}|jg|_ |St|}t|dk(r t d| t d | t d t|tjr/|jj!tjd n%tj|dtj}|j"t|k7r)t d |j d t|dt|||d}|Dcgc]}|jc}|_ |Scc}w)af Constructs a :class:`DeviceMesh` with ``device_type`` from an existing :class:`ProcessGroup` or a list of existing :class:`ProcessGroup`. The constructed device mesh has number of dimensions equal to the number of groups passed. For example, if a single process group is passed in, the resulted DeviceMesh is a 1D mesh. If a list of 2 process groups is passed in, the resulted DeviceMesh is a 2D mesh. If more than one group is passed, then the ``mesh`` and ``mesh_dim_names`` arguments are required. The order of the process groups passed in determines the topology of the mesh. For example, the first process group will be the 0th dimension of the DeviceMesh. The `mesh` tensor passed in must have the same number of dimensions as the number of process groups passed in, and the order of the dimensions in the `mesh` tensor must match the order in the process groups passed in. Args: group (ProcessGroup or list[ProcessGroup]): the existing ProcessGroup or a list of existing ProcessGroups. device_type (str): The device type of the mesh. Currently supports: "cpu", "cuda/cuda-like". Passing in a device type with a GPU index, such as "cuda:0", is not allowed. mesh (torch.Tensor or ArrayLike, optional): A multi-dimensional array or an integer tensor describing the layout of devices, where the IDs are global IDs of the default process group. Default is None. mesh_dim_names (tuple[str], optional): A tuple of mesh dimension names to assign to each dimension of the multi-dimensional array describing the layout of devices. Its length must match the length of `mesh_shape`. Each string in `mesh_dim_names` must be unique. Default is None. Returns: DeviceMesh: A :class:`DeviceMesh` object representing the device layout. z Invalid mesh z for ProcessGroup with ranks rrFrDrz.Expects at least one ProcessGroup to be passedz0Must pass mesh if passing multiple ProcessGroupsz:Must pass mesh_dim_names if passing multiple ProcessGroups)rrzEExpects mesh with ndim equal to number of ProcessGroups but got mesh z and z ProcessGroups)rr#rrrrrrrrr rrJrKr4rrrM)rrRr<rE group_ranksr7groupss r from_groupzDeviceMesh.from_groupNsV%.5e< tU\\2t{{} 7S$&tU\\: +$'D {2OP[}]||KUYYO(#1"'  160@0@/A ,""%[F6{a !QRR| !STT% P dELL1   uyy ?\\$uEIIF  yyCK'  KKM?%F }NL%T.PUKKQ+QE,<,<+QK ( ,RsG+cp||jjS|jj|Sr)r<rr=)r0rSs rr=zDeviceMesh.sizes*(0(8499??$ VdiinnX>V Vrc.|jjSr)r<rMr/s rrMzDeviceMesh.ndims99>> !rc@t|jjSr)rpr<rr/s rrzDeviceMesh.shapes) )rctS)z: Returns the current global rank. )rr/s rrzDeviceMesh.get_ranks : rc|jdkDr&|$td|jjdd|d}t|j |}t |t sJdtt|S)a| Returns the local rank of the given mesh_dim of the DeviceMesh. Args: mesh_dim (str/int, optional): it can be the name of the mesh dimension or the index of the mesh dimension. Default is None. Returns: An integer denotes the local rank. The following program runs on each process/rank in an SPMD manner. In this example, we have 2 hosts with 4 GPUs each. Calling mesh_2d.get_local_rank(mesh_dim=0) on rank 0, 1, 2, 3 would return 0. Calling mesh_2d.get_local_rank(mesh_dim=0) on rank 4, 5, 6, 7 would return 1. Calling mesh_2d.get_local_rank(mesh_dim=1) on rank 0, 4 would return 0. Calling mesh_2d.get_local_rank(mesh_dim=1) on rank 1, 5 would return 1. Calling mesh_2d.get_local_rank(mesh_dim=1) on rank 2, 6 would return 2. Calling mesh_2d.get_local_rank(mesh_dim=1) on rank 3, 7 would return 3. Example:: >>> # xdoctest: +SKIP("no rank") >>> from torch.distributed.device_mesh import DeviceMesh >>> >>> # Initialize device mesh as (2, 4) to represent the topology >>> # of cross-host(dim 0), and within-host (dim 1). >>> mesh = DeviceMesh(device_type="cuda", mesh=[[0, 1, 2, 3],[4, 5, 6, 7]]) rAr rrrz1We expect ProcessGroup before calling `get_rank`!)rMr5r<r rrr#r)r0rSmesh_dim_groups rget_local_rankzDeviceMesh.get_local_ranks:yy1}!1"00@ L`!%dnnX&>?Nnl; C ;H^45 5rc6|jr |jSdS)z Return the relative indices of this rank relative to all dimensions of the mesh. If this rank is not part of the mesh, return None. N)rr/s rget_coordinatezDeviceMesh.get_coordinates /3.E.E4** O4 OrrZc|js td|td|id\}nd}tj |||S)a\ Returns a 1D DeviceMesh by flattening the current DeviceMesh. If no mesh_dim_name is provided, the default is a string concatenating the mesh_dim_names of the given submesh with each mesh_dim_name separated by "_". For example, if we have a 3D mesh DeviceMesh([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], mesh_dim_names=("dp", "cp", "tp")), calling mesh_3d["dp", "cp"]._flatten() will create a 1D submesh DeviceMesh([0, 2, 4, 6], mesh_dim_names=("dp_cp",)) on rank 0, 2, 4, 6 and a 1D submesh DeviceMesh([1, 3, 5, 7], mesh_dim_names=("dp_cp",)) on rank 1, 3, 5, 7. After the flattened dimension is created, to access the flattened dimension in mesh_3d, one can use the existing slicing method to obtain the flattened mesh through calling mesh_3d["dp_cp"]. z3Cannot flatten a DeviceMesh without mesh_dim_names!rrAr)rEr5_normalize_backend_overriderhrz)r0rZrcbackend_override_tuples r_flattenzDeviceMesh._flattensa&&&"I +,G()1-)'*6&"66m%; rrrrr)'rrr__doc__r__annotations__rrrrpr rrboolr1rrrrrrobjectrr rr#rrKrrrr=propertyrMrrr!r#r'rrrr r s * Xll sCx119="&- -  k12-  %U38_5 -  'eHSM8K4G4G+HHI3NO -  - - ^0 (d} 4#hsmXk.A.A%BBCSH} 4~   - *c *   4 H "'U38_(<"=H  H T+ YhuS#X&?+ Y<+ YZ FD$6 F @DX  9= X tL'99:X X 5{!:;<X  %U38_5 X   X  X t W# W# W  "# "  "  *5c? *  * c  ) 68E#s(O+D) 6PS) 6V PHT#Y$7 P,0 ! #C=! $c;..c;;N;N6N0OO!   ! rrcrMrE.r(c#K|d}tt||D]\}}|,||vr(||vrtd|d|d|j|}n||vr|j|}ndOt |t r|dfft |t jrd|f||r+tdt|jd|d|yw) NrzFound redundant dim index z and name z in backend_overriderz,Found invalid keys in backend_override: got z!, expected integers in range [0, z ) or one of ) rrLr5rrrrrrKrm)rcrMrEdim_idxdim_namevals rr%r%s  !N!,U4[.!I  GX#4D(D..&4WI> (z)=?'**84,,&**73""#s#Dk!C!4!45Sk! ' * >tDTDYDYD[?\>]^226|NCSU  sCCrg mesh_shapec l|ktt|t|k7rtdd|t|t|k7r%tddt|dt|d| tt |t||}nd}|r |j std|d d t jd 5t jtj|t j j|}dddt||| }|S#1swYxYw)a Initializes a `DeviceMesh` based on `device_type`, `mesh_shape`, and `mesh_dim_names` parameters. This creates a DeviceMesh with an n-dimensional array layout, where `n` is the length of `mesh_shape`. If `mesh_dim_names` is provided, each dimension is labeled as `mesh_dim_names[i]`. .. note:: `init_device_mesh` follows SPMD programming model, meaning the same PyTorch Python program runs on all processes/ranks in the cluster. Ensure `mesh_shape` (the dimensions of the nD array describing device layout) is identical across all ranks. Inconsistent `mesh_shape` may lead to hanging. .. note:: If no process group is found, init_device_mesh will initialize distributed process group/groups required for distributed communications behind the scene. Args: device_type (str): The device type of the mesh. Currently supports: "cpu", "cuda/cuda-like", "xpu". Passing in a device type with a GPU index, such as "cuda:0", is not allowed. mesh_shape (Tuple[int]): A tuple defining the dimensions of the multi-dimensional array describing the layout of devices. mesh_dim_names (Tuple[str], optional): A tuple of mesh dimension names to assign to each dimension of the multi-dimensional array describing the layout of devices. Its length must match the length of `mesh_shape`. Each string in `mesh_dim_names` must be unique. backend_override (Dict[int | str, tuple[str, Options] | str | Options], optional): Overrides for some or all of the ProcessGroups that will be created for each mesh dimension. Each key can be either the index of a dimension or its name (if mesh_dim_names is provided). Each value can be a tuple containing the name of the backend and its options, or just one of these two components (in which case the other will be set to its default value). Returns: DeviceMesh: A :class:`DeviceMesh` object representing the device layout. Example:: >>> # xdoctest: +SKIP("no rank") >>> from torch.distributed.device_mesh import init_device_mesh >>> >>> mesh_1d = init_device_mesh("cuda", mesh_shape=(8,)) >>> mesh_2d = init_device_mesh("cuda", mesh_shape=(2, 8), mesh_dim_names=("dp", "tp")) Nz"Each mesh_dim_name must be unique.z/Found repeated mesh_dim_name in mesh_dim_names z6mesh_shape and mesh_dim_names should have same length!zFound len(mesh_dim_names): z and len(mesh_shape):rz0Device type with index is not supported but got rfzUIf you maintained a 'torch.device' object, it's recommended to pass in 'device.type'.rr)rRr<rErc)r4setr5rpr%isalpharrarangernrorviewr )rRr1rErcr&r<r7s rr r 4sIj  %3~&'3~+>>"8EnEUV :#n"55"L1#n2E1FF[\_`j\k[llmn  '%*+$c*o~& " &* " {224B;-rRg \\%  Y<< * 5UYYGLLZXD Y #)3   Y Ys AD**D3)rr)=loggingrnrrrcollections.abcr functoolsr itertoolsrrtypingrrr rtorch.distributedr torch.utils._typing_utilsr __all__sysrrmodulesr r torch._C._distributed_c10drr"torch.distributed.distributed_c10drrrrrrr r!r"r#r$ getLoggerrr numpy.typingr% ImportErrorwarninglocalr'rhr)rrdictrrrpr%rrrrIsI $(11 *. | ,~   ?NCKK/0;0KK'B    W  x (F  . z!9??z!x !) OX*11B B T59 # #s(O #{**E#{7J7J2J,KK L N #  # !sCx1 # % x 0C0C'DDE F#R59 ]]#s(O]!sCx1 ] # c3hc;..c;;N;N6N0OOPR  ] ]k  NNU  s>FF21F2