L if=&ddlmZmZmZerddlmZerddlZddlmZddl mcm Z eje ZdZdej dej fdZej$d ej d ej&dej fd ZGd d ej*ZGddej.j0ZGddej.j0ZGddej6Z ddZ ddZy))is_accelerate_availableis_torch_availablelogging)init_empty_weightsNquantized_weightsreturnc|j}|dtzdz tz}t|dk(r|f}n |g|dd}|dz }tj||j tj }|jtj }tt|d|zdz}t|D]3}||z}t||z|d} |d| |z xxx||| d|zzzccc5|S)a Packs a tensor of quantized weights into a compact format using 2 bits per value. Parameters: ----------- quantized_weights : torch.Tensor A tensor containing ternary quantized weights with values in {-1, 0, 1}. These values are adjusted to {0, 1, 2} before being packed. Returns: -------- torch.Tensor A packed tensor where each element stores 4 quantized values (each using 2 bits) in an 8-bit format. rNdevicedtyper) shapeVALUES_PER_ITEMlentorchzerosruint8tominrange) r original_shaperow_dimpacked_tensor_shapepackedunpackeditistartends f/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/integrations/bitnet.py pack_weightsr#s  ',,Na ?2Q6?JG >a&j&<);< [[,5F5M5MUZU`U` aF ##EKK0H _~a0G;q@ AB 2Y@G %'/>!#45#+8E##6!a%#??@ Mrrc||j}t|dk(r|dtz}|f}n|dtz}|g|dd}tj||j tj }ttD](}||dz}||dz}dd|zz} || zd|zz |||*|j|dz S)u5 Unpacks a tensor of quantized weights that were stored in a packed format using 2 bits per value. Parameters: ----------- packed : torch.Tensor A tensor containing packed weights where each element represents 4 quantized values (using 2 bits per value). dtype : torch.dtype The dtype of the returned Tensor Returns: -------- torch.Tensor A tensor of unpacked weights, where each value is converted from its packed 2-bit representation. Example: -------- packed = torch.tensor([[0b10100001, 0b00011000], [0b10010000, 0b00001010]], dtype=torch.uint8) # Unpack the values unpacked = unpack_weights(packed) # Resulting unpacked tensor print(unpacked) # Output: tensor([[ 0, -1], [-1, 1], [-1, 1], [-1, 1], [ 1, 0], [ 0, -1], [ 1, -1], [ 1, -1]]) Explanation of the example: --------------------------- Let's take the first value for example 0b10100001, we we will only focus on the first column, because every element is unpacked across the first dimension - First 2 bits: `01` → 0 at [0][0] - Second 2 bits: `00` → -1 at [0][2] - Third 2 bits: `10` → 1 at [0][4] - Fourth 2 bits: `10` → 1 at [0][6] the second value of the same row (0b10010000) will give the values for [0][1], [0][3], [0][5], [0][7] We subtract 1 because during the packing process, it's easier to work with values like 0, 1, and 2. To make this possible, we add 1 to the original ternary weights (which are typically -1, 0, and 1) when packing them. When unpacking, we reverse this by subtracting 1 to restore the original ternary values. r rNr r) rrrrrrrrr) rr packed_shapeoriginal_row_dimunpacked_shaperrr r!masks r"unpack_weightsr+9sb<\!"-=>{{>&--u{{SH ? #9LO#l1o%QU|%}!a%8s 9 ;;u  !!r$c eZdZ d dededededef fd Zejd dZ ejdZ d Z xZ S) BitLinear in_features out_featuresbias use_rms_norm rms_norm_epsct |||_||_||_|j dt j|tz|ft j||j dt jd|||r)|j dt j|||nd|_ d|_ |rddl m}||||_ yy) Nweightrr weight_scaler r0r LlamaRMSNormeps)super__init__rr.r/register_bufferrrrronesr0rms_normmodels.llama.modeling_llamar8) selfr.r/r0rrr1r2r8 __class__s r"r<zBitLinear.__init__s  &(   KK0+>kk     JJ     l5Y_)` aDI  B(,GDM r$c.d|dz z }d|dz zdz }||jjddjjdz }||zj j||}|j t j|fS)aX Activation function : Performs symmetric, per-token quantization on the input activations. Parameters: ----------- x : torch.Tensor Input activations to be quantized. num_bits : int, optional (default=8) Number of bits to use for quantization, determining the quantization range. Returns: -------- result : torch.Tensor Quantized activation tensor, with values mapped to an `int8` range. scale : torch.Tensor The per-channel scaling factors used to quantize the tensor. rr Tdimkeepdimh㈵>r)absmaxvaluesclamproundrrint8)rAinputnum_bitsQnQpscaleresults r"activation_quantzBitLinear.activation_quants$X\" # 8a< 1 $UYY[__T_:AAGGDGQQ%-&&(..r26yy$e++r$c|||zz }|SN)rArP input_scaler6outs r"post_quant_processzBitLinear.post_quant_processs{\12 r$c|j|j|}|j}t||j}|j |\}}t j |j|j|}|j||j|}|j.||jjddj|z }|S)Nrr rD) r?r4r+rrVFlinearrr\r6r0view expand_as)rArPww_quant input_quantrZys r"forwardzBitLinear.forwards == $MM%(E KK $**5#'#8#8#? [ HH[^^DJJ/ 9  # #At'8'8+ F 99 2&003 3Ar$)NNFư>)) __name__ __module__ __qualname__intboolfloatr<rcompilerVr\rg __classcell__rBs@r"r-r-~s ""(H(H(H (H(H(HT ]],,. ]] r$r-cNeZdZdZeej dZedZy) WeightQuanta Implements a custom autograd function for weight quantization. This performs ternary quantization (-1, 0, 1) based on scaling by the mean absolute value of the weights. It uses the Straight-Through Estimator (STE) for the backward pass. c |j}|j}d|jjj dz }||zj j dd|z }|j|S)Ng?rHrIrDr )rrorJmeanclamp_rNrMr)ctxr4rrTs r"rgzWeightQuant.forwardsr fjjl'')00T0::5.'')//A6>yyr$c&|j}|SrXclonerx grad_output grad_inputs r"backwardzWeightQuant.backward &&( r$N rjrkrl__doc__ staticmethodrrprgrrYr$r"rtrts; ]]  r$rtcNeZdZdZeej dZedZy)ActQuanta8 Implements a custom autograd function for activation quantization. This performs symmetric 8-bit quantization (to the range [-128, 127]) based on the maximum absolute value along the last dimension (per-token/row scaling). It uses the Straight-Through Estimator (STE) for the backward pass. c$|j}|j}d|jjddjj dz }||zj jdd|z }|j|S)NrDTrErHrIi) rrorJrKrLrwrNrMr)rx activationrrTs r"rgzActQuant.forwards  %%' jnn&**r4*@GGNNSWNXX 5(//177cBUJ }}U##r$c&|j}|SrXrzr|s r"rzActQuant.backwardrr$NrrYr$r"rrs; ]]$$r$rc ReZdZ d dedededededef fd ZdZd ZxZ S) AutoBitLinearr.r/r0 online_quantr1r2c t ||||||_d|_|rddlm} | |||_|sD|j dtjd|||j|jyy)Nrr7r9r6r r5) r;r<rr?r@r8r=rr>"_register_load_state_dict_pre_hook load_hook) rAr.r/r0rrrr1r2r8rBs r"r<zAutoBitLinear.__init__s{ lD9(  B(,GDM   !   3 3DNN Cr$c|dz|vrV||dzj|jjk7r-t||dz|jj||dz<|S)Nr4r^)rr4r+)rA state_dictprefixargskwargss r"rzAutoBitLinear.load_hook"sh X * ,FXz._replace_with_bitnet_linear..Ts W3#((#344Ws  autobitlinearonline)r.r/r0rrrr1r2offlineFrh)r.r/r0rrr1r2Tr)modules_to_not_convertrquantization_confighas_been_replacedrD)named_childrenappendanyr isinstancennLinearr.r/ linear_classrr0r4rrquantization_moder1r2_modulesrequires_grad_r-rlistchildren_replace_with_bitnet_linearpop) modelrrrr pre_quantizednamemoduler.r/_s ` r"rr=s,,..! f  #! %W@VWW#% -fbii0TAW5W"("4"4K#)#6#6L*/B/O/OSb/b/<(3)5!'D!8#)==#7#7"(--"5"5*=*O*OS[*[)<)I)I)<)I)I 0t,/@@IM!NN40??F/8(3)5!'D!8#)==#7#7"(--"5"5M`)<)I)IfkM`)<)I)Ifj0t,t,;;EB(,%9 -< tFOO%& '! +#>'=!1$7"3 $ A  R ].!^ # ##Q - -s E: #  E    Lr$)NNNFF)NNNF)utilsrrr acceleraterrtorch.nnrtorch.nn.functional functionalr_ get_loggerrjrrTensorr#rprr+Moduler-autogradFunctionrtrrrrrrYr$r"rsHH-##   H %#ELL#U\\#LA"5<<A" A" A"A"HT Tn%..)).u~~&&.7BII7x  @$J  ,r$