Metadata-Version: 2.2 Name: nvidia-cusparselt-cu12 Version: 0.7.1 Summary: NVIDIA cuSPARSELt Home-page: https://developer.nvidia.com/cusparselt Author: NVIDIA Corporation Author-email: cuda_installer@nvidia.com License: NVIDIA Proprietary Software Keywords: cuda,nvidia,machine learning,high-performance computing Classifier: Topic :: Scientific/Engineering Classifier: Environment :: GPU :: NVIDIA CUDA Classifier: Environment :: GPU :: NVIDIA CUDA :: 12 Description-Content-Type: text/x-rst Dynamic: author Dynamic: author-email Dynamic: classifier Dynamic: description Dynamic: description-content-type Dynamic: home-page Dynamic: keywords Dynamic: license Dynamic: summary ################################################################################### cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication ################################################################################### **NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix: .. math:: D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta, scale` are scalars. The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types. **Download:** `developer.nvidia.com/cusparselt/downloads `_ **Provide Feedback:** `Math-Libs-Feedback@nvidia.com `_ **Examples**: `cuSPARSELt Example 1 `_, `cuSPARSELt Example 2 `_ **Blog post**: - `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt `_ - `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines `__ - `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture `__ ================================================================================ Key Features ================================================================================ * *NVIDIA Sparse MMA tensor core* support * Mixed-precision computation support: +--------------+----------------+-----------------+-------------+ | Input A/B | Input C | Output D | Compute | +==============+================+=================+=============+ | `FP32` | `FP32` | `FP32` | `FP32` | +--------------+----------------+-----------------+-------------+ | `FP16` | `FP16` | `FP16` | `FP32` | + + + +-------------+ | | | | `FP16` | +--------------+----------------+-----------------+-------------+ | `BF16` | `BF16` | `BF16` | `FP32` | +--------------+----------------+-----------------+-------------+ | `INT8` | `INT8` | `INT8` | `INT32` | + +----------------+-----------------+ + | | `INT32` | `INT32` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | +--------------+----------------+-----------------+-------------+ | `E4M3` | `FP16` | `E4M3` | `FP32` | + +----------------+-----------------+ + | | `BF16` | `E4M3` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | + +----------------+-----------------+ + | | `FP32` | `FP32` | | +--------------+----------------+-----------------+-------------+ | `E5M2` | `FP16` | `E5M2` | `FP32` | + +----------------+-----------------+ + | | `BF16` | `E5M2` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | + +----------------+-----------------+ + | | `FP32` | `FP32` | | +--------------+----------------+-----------------+-------------+ * Matrix pruning and compression functionalities * Activation functions, bias vector, and output scaling * Batched computation (multiple matrices in a single run) * GEMM Split-K mode * Auto-tuning functionality (see `cusparseLtMatmulSearch()`) * NVTX ranging and Logging functionalities ================================================================================ Support ================================================================================ * *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.9`, `SM 9.0`, `SM 10.0`, `SM 12.0` * *Supported CPU architectures and operating systems*: +------------+--------------------+ | OS | CPU archs | +============+====================+ | `Windows` | `x86_64` | +------------+--------------------+ | `Linux` | `x86_64`, `Arm64` | +------------+--------------------+ ================================================================================ Documentation ================================================================================ Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation. ================================================================================ Installation ================================================================================ The cuSPARSELt wheel can be installed as follows: .. code-block:: bash pip install nvidia-cusparselt-cuXX where XX is the CUDA major version (currently CUDA 12 only is supported).