Metadata-Version: 2.2
Name: nvidia-cusparselt-cu12
Version: 0.7.1
Summary: NVIDIA cuSPARSELt
Home-page: https://developer.nvidia.com/cusparselt
Author: NVIDIA Corporation
Author-email: cuda_installer@nvidia.com
License: NVIDIA Proprietary Software
Keywords: cuda,nvidia,machine learning,high-performance computing
Classifier: Topic :: Scientific/Engineering
Classifier: Environment :: GPU :: NVIDIA CUDA
Classifier: Environment :: GPU :: NVIDIA CUDA :: 12
Description-Content-Type: text/x-rst
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: summary

###################################################################################
cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication
###################################################################################

**NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

.. math::

   D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale

where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta, scale` are scalars.

The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

**Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_

**Provide Feedback:** `Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>`_

**Examples**:
`cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>`_,
`cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>`_

**Blog post**:

- `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_
- `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>`__
- `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>`__

================================================================================
Key Features
================================================================================

* *NVIDIA Sparse MMA tensor core* support
* Mixed-precision computation support:

    +--------------+----------------+-----------------+-------------+
    | Input A/B    | Input C        | Output D        | Compute     |
    +==============+================+=================+=============+
    | `FP32`       | `FP32`         | `FP32`          | `FP32`      |
    +--------------+----------------+-----------------+-------------+
    | `FP16`       | `FP16`         | `FP16`          | `FP32`      |
    +              +                +                 +-------------+
    |              |                |                 | `FP16`      |
    +--------------+----------------+-----------------+-------------+
    | `BF16`       | `BF16`         | `BF16`          | `FP32`      |
    +--------------+----------------+-----------------+-------------+
    | `INT8`       | `INT8`         | `INT8`          | `INT32`     |
    +              +----------------+-----------------+             +
    |              | `INT32`        | `INT32`         |             |
    +              +----------------+-----------------+             +
    |              | `FP16`         | `FP16`          |             |
    +              +----------------+-----------------+             +
    |              | `BF16`         | `BF16`          |             |
    +--------------+----------------+-----------------+-------------+
    | `E4M3`       | `FP16`         | `E4M3`          | `FP32`      |
    +              +----------------+-----------------+             +
    |              | `BF16`         | `E4M3`          |             |
    +              +----------------+-----------------+             +
    |              | `FP16`         | `FP16`          |             |
    +              +----------------+-----------------+             +
    |              | `BF16`         | `BF16`          |             |
    +              +----------------+-----------------+             +
    |              | `FP32`         | `FP32`          |             |
    +--------------+----------------+-----------------+-------------+
    | `E5M2`       | `FP16`         | `E5M2`          | `FP32`      |
    +              +----------------+-----------------+             +
    |              | `BF16`         | `E5M2`          |             |
    +              +----------------+-----------------+             +
    |              | `FP16`         | `FP16`          |             |
    +              +----------------+-----------------+             +
    |              | `BF16`         | `BF16`          |             |
    +              +----------------+-----------------+             +
    |              | `FP32`         | `FP32`          |             |
    +--------------+----------------+-----------------+-------------+

* Matrix pruning and compression functionalities
* Activation functions, bias vector, and output scaling
* Batched computation (multiple matrices in a single run)
* GEMM Split-K mode
* Auto-tuning functionality (see `cusparseLtMatmulSearch()`)
* NVTX ranging and Logging functionalities

================================================================================
Support
================================================================================

* *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.9`, `SM 9.0`, `SM 10.0`, `SM 12.0`
* *Supported CPU architectures and operating systems*:

+------------+--------------------+
| OS         | CPU archs          |
+============+====================+
| `Windows`  | `x86_64`           |
+------------+--------------------+
| `Linux`    | `x86_64`, `Arm64`  |
+------------+--------------------+


================================================================================
Documentation
================================================================================

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

================================================================================
Installation
================================================================================

The cuSPARSELt wheel can be installed as follows:

.. code-block:: bash

   pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version (currently CUDA 12 only is supported).