L i1RddlZddlZddlmZmZmZddlZddlmZ dejfdZ de ddfdZ dejfdZ dd ed ed eed ed eedeedefdZGddZGddZ ddeeefdeeedeeeedfffdZ ddededede fdZy)N)AnyOptionalUnion)_get_device_indexreturnc|tjdk(rtjdStjdS)Nwin32z nvcuda.dllz libcuda.so.1)sysplatformctypesCDLLW/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/cuda/_utils.py_get_cuda_libraryr s, ||w{{<(({{>**rresultc|dk(rytj}t}|j|tj||j |j j nd}td|)NrUnknown CUDA error CUDA error: )r c_char_prcuGetErrorStringbyrefvaluedecode RuntimeError)rerr_strlibcuda error_messages r _check_cudarsn {ooG!G VV\\'%:;")--"; AU m_5 66rc ttjjj dd}t j dk(rd|dg}nd|dg}|D]} tj|cStd#t$rY2wxYw) N.rr nvrtc64_z0_0.dllz libnvrtc.so.z libnvrtc.soz Could not find any NVRTC library) inttorchversioncudasplitr r r r OSError) major_version nvrtc_libslib_names r_get_nvrtc_libraryr, s **005a89M ||w}oW - =/ *   ;;x( ( 4 55   sB B  B  kernel_source kernel_namecompute_capability header_codecuda_include_dirs nvcc_optionsc ddl}tddtddffd }|jj dsd|}|r |dz|z}n|}|j d } |M|j j|j j} | j| j}g} | jd |j |r)|D]$} | jd | j &|r'|D]"} | j| j d $dd l m }|Dcgc] }|d k7s | }}| j|Dcgc]}|j d c}t| }t!j"|z| }t!j$}|j't!j(|| |dj dddj+|||}|k7rt!j,}j/|t!j(|t!j0|j2}j5||t7d|j2j9t!j,}|j;|t!j(|t!j0|j2}|j=||j?t!j(||j2Scc}wcc}w)a Compiles a CUDA kernel using NVRTC and returns the PTX code. Args: kernel_source (str): The CUDA kernel source code as a string kernel_name (str): The name of the kernel function to compile compute_capability (str, None): The compute capability to target (e.g., "86"). If None, will detect from current device. header_code (str, optional): Additional header code to prepend to the kernel source cuda_include_dirs (list, None): List of directories containing CUDA headers nvcc_options (list, None): Additional options to pass to NVRTC Returns: str: The compiled PTX code rNrrc|k7rotj}j|tj||j|jj nd}t d|y)Nrr)r rnvrtcGetErrorStringrrrr)rrr NVRTC_SUCCESSlibnvrtcs r check_nvrtcz#_nvrtc_compile..check_nvrtcTso ] "oo'G  ( (g1F G==, $$&)  m_=> > #rz extern "C"z extern "C"  utf-8z--gpu-architecture=sm_z-I)COMMON_NVCC_FLAGSz--expt-relaxed-constexprz.cuzKernel compilation failed: ) torch.cudar,r#strip startswithencoder&get_device_propertiescurrent_devicemajorminorappendtorch.utils.cpp_extensionr;extendlenr rc_void_pnvrtcCreateProgramrnvrtcCompileProgramc_size_tnvrtcGetProgramLogSizecreate_string_bufferrnvrtcGetProgramLogrrnvrtcGetPTXSize nvrtcGetPTXnvrtcDestroyProgram)r-r.r/r0r1r2r$r8 full_source source_bytespropsoptions directoryoptionr;flagnvrtc_compatible_flags num_options options_arrayprogreslog_sizelogptx_sizeptxr6r7s @@r_nvrtc_compilerb3s0"#HM ?C ?D ?    + +L 9%m_5 !D(=8 # %%g.L! 001J1J1LM % }U[[M:G NN+,>+?@GGIJ* 6I NNR {+224 5 6" 3F NN6==1 2 3<+d6P.P NN5KLTDKK(LMg,K__{2W=M ?? D## LL  m3  & & (       & &t[- HC m??$''fll8.DE))(..9##D#.9#)):J:J:L9MNOO H((v||H/EFG  % %hnn 5C$$T3/0   d!34 99SMs5 L6L6L;c@eZdZdejddfdZdeddfdZy) _CudaModulemodulerNc ||_i|_yN)_module_kernels)selfres r__init__z_CudaModule.__init__s 02 rname _CudaKernelc ||jvr|j|Sddlm}|}tj} t |j tj||j|jdt||j}||j|<|S#t$r}td|d|d}~wwxYw)Nr)rr:zNo kernel named 'z' in this module) ritorch.cuda._utilsrr rHrcuModuleGetFunctionrrhr?rmrAttributeError)rjrlrrfunckernelerrs r __getattr__z_CudaModule.__getattr__s 4== ==& & 8#%  V ++LL& dkk'6J  !t||4F"(DMM$ M V #4TF:J!KLRU U VsA.B// C 8CC )__name__ __module__ __qualname__r rHrkstrrurrrrdrds/3v343VV Vrrdc eZdZdZdej dej ddfdZ d deeeefdeeeefd e e d ed e e ddf d Z y)rmzT Represents a compiled CUDA kernel that can be called with PyTorch tensors. rrrerNc ||_||_yrg)rrre)rjrrres rrkz_CudaKernel.__init__s  rgridblockargs shared_memstreamcddl}|jjj}|sg}g}g} |D]O} t | |j r| j s'| jr| js tdtj| j} |j| | jtj| t | tr:tj | } | jtj| t | t"r;tj$| } | jtj| ;t'dt)| tjt+| z}t-| D],\}} tj.| tj||<.|ddl}|jj3}t5|j7|j8|d|d|d|d|d|d||j:|d y)a Call the compiled CUDA kernel Args: grid (tuple): Grid dimensions (grid_x, grid_y, grid_z) block (tuple): Block dimensions (block_x, block_y, block_z) args (list): List of arguments to pass to the kernel. PyTorch tensor arguments will be automatically converted to pointers. shared_mem (int): Shared memory size in bytes stream (torch.cuda.Stream): CUDA stream to use. If None, uses current stream. rNz?All tensor arguments must be CUDA tensors or pinned CPU tensorszUnsupported argument type: )r$r&_utilsr isinstanceTensoris_cudais_cpu is_pinned ValueErrorr rHdata_ptrrDrr#c_intfloatc_float TypeErrortyperG enumeratecastr<current_streamrcuLaunchKernelrr_as_parameter_)rjr|r}r~rrr$rprocessed_argsc_argsargptrrr c_args_arrayis r__call__z_CudaKernel.__call__s& **##557D13 KC#u||,{{CJJ3==?$Yooclln5%%c* fll3/0C% S) fll512C' ..- fll734"=d3i[ IJJ- K2#f+58 ' @FAs$kk#v?LO @ > ZZ..0F  " " QQQaaa%%  r)rrrrNrN) rvrwrx__doc__r rHrktupler#rlistrrrrrrmrmsV__foo$ &/&/# $ P CcM"P S#s]#P tn P  P  P  P rrmra kernel_namesc 8ddl}t}t|tr|j d}t j }|jj}|5t|jt j||ddd|s t|Si}|D]c}t j }t|jt j|||j dt||||<e|S#1swYxYw)a, Loads a CUDA module from PTX code and returns a module object that can access kernels. Args: ptx (bytes or str): The PTX code to load kernel_names (list, optional): List of kernel names to extract from the module. If None, will return a module object with __getattr__. Returns: object: If kernel_names is None, returns a module object with __getattr__ to access kernels. If kernel_names is provided, returns a dict mapping kernel names to _CudaKernel objects. rNr:)r<rrryr?r rHr&rrcuModuleLoadDatarrdrprm) rarr$rrerkernelsrlrrs r_cuda_load_moduler(s  !G#sjj!__ F ZZ & & (F IG,,V\\&-A3GHI 6""G2   ' ' T"FDKK,@  $D&1 2 N!IIs /DDdeviceoptional allow_cpuct|tr|St|trtj|}t|tjr;|r|j dvr+t d||j dk7rt d|tjjs0t|tjjr |jSt|||S)aGet the device index from :attr:`device`, which can be a torch.device object, a Python integer, or ``None``. If :attr:`device` is a torch.device object, returns the device index if it is a CUDA device. Note that for a CUDA device without a specified index, i.e., ``torch.device('cuda')``, this will return the current default CUDA device if :attr:`optional` is ``True``. If :attr:`allow_cpu` is ``True``, CPU devices will be accepted and ``-1`` will be returned in this case. If :attr:`device` is a Python integer, it is returned as is. If :attr:`device` is ``None``, this will return the current default CUDA device if :attr:`optional` is ``True``. )r&cpuz(Expected a cuda or cpu device, but got: r&z!Expected a cuda device, but got: ) rr#ryr$rrrjit is_scriptingr&idx_torch_get_device_index)rrrs rrrXs &# &#f%&%,,' {{/1 #KF8!TUU [[F "@IJ J 99 ! ! # fejj// 0::  "68Y ??r)NNNrg)FF)r r typingrrrr$ torch._utilsrrr rr#rr,ryrbytesrbrdrmdictrboolrrrrsK '' F+6;;+ 7 7 76FKK6,)-(,#' yyy! y y  ~ y 4. y yxVV:Y Y zAE- sEz -*249*=- ;S-/0 01-b