L i=ddlmZmZmZddlmZerddlmZddlm Z m Z m Z e rddl Z e jeZGdd eZy) ) TYPE_CHECKINGOptionalUnion) HfQuantizer)PreTrainedModel)is_accelerate_availableis_torch_availableloggingNceZdZdZdZdZdgZfdZdZdd Z dddd e e e fd Z d ee eee ffd ee eee fffdZddZddZed efdZed efdZxZS)BitNetHfQuantizerz 1.58-bit quantization from BitNet quantization method: Before loading: it converts the linear layers into BitLinear layers during loading. Check out the paper introducing this method: https://huggingface.co/papers/2402.17764 FT acceleratec 4t||fi|||_yN)super__init__quantization_config)selfrkwargs __class__s n/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/quantizers/quantizer_bitnet.pyrzBitNetHfQuantizer.__init__-s ,77#6 cts td|jdds|jddr tdtj j stjdy|jd}|tjdy|At|tr0d |jvsd |jvr td yyy) NzOLoading a BitNet quantized model requires accelerate (`pip install accelerate`)from_tfF from_flaxztLoading ternary weights from tf/flax is currently not supported, please make sure the weights are in PyTorch format.zhYou don't have a GPU available to load the model, the inference will be slow because of weight unpacking device_mapzYou have loaded a BitNet model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.cpudiskzYou are attempting to load a BitNet model with a device_map that contains a CPU or disk device.This is not supported. Please remove the CPU or disk device from the device_map.) r ImportErrorget ValueErrortorchcuda is_availablelogger warning_once isinstancedictvalues)rargsrrs rvalidate_environmentz&BitNetHfQuantizer.validate_environment1s&(op p ::i '6::k5+I;  zz&&(   z  ZZ -     I  #*d+*:K:K:M1MQW[e[l[l[nQn gRo+$rmodelr c |Sr)rr-rs r#_process_model_after_weight_loadingz5BitNetHfQuantizer._process_model_after_weight_loadingNs rkeep_in_fp32_modulesc ddlm}|j||jj||_|||j|j|j }y)Nr)replace_with_bitnet_linear)modules_to_not_convertr pre_quantized) integrationsr3get_modules_to_not_convertrr4r5)rr-r1rr3s r$_process_model_before_weight_loadingz6BitNetHfQuantizer._process_model_before_weight_loadingQsX >&*&E&E 4++BBDX' #+ #'#>#> $ 8 8,,  r max_memoryreturnc^|jDcic] \}}||dz }}}|Scc}}w)Ng?)items)rr9keyvals radjust_max_memoryz#BitNetHfQuantizer.adjust_max_memoryds66@6F6F6HI(#sc3:oI IJs)c&tj}|Sr)r#int8)r target_dtypes radjust_target_dtypez%BitNetHfQuantizer.adjust_target_dtypehszz rcy)NTr/)rsafe_serializations ris_serializablez!BitNetHfQuantizer.is_serializablelsrcj|jjdk(xr|jjdk(S)N autobitlinearonliner linear_classquantization_moders r is_trainablezBitNetHfQuantizer.is_trainableos7  $ $ 1 1_ D G((::hF rcj|jjdk(xr|jjdk(S)zUFlag indicating whether the quantized model can carry out quantization aware trainingrHrIrJrMs ris_qat_trainablez"BitNetHfQuantizer.is_qat_trainablevs7  $ $ 1 1_ D G((::hF r)r-r r)rB torch.dtyper:rQ)__name__ __module__ __qualname____doc__ requires_parameters_quantizationrequires_calibrationrequired_packagesrr,r0rliststrr8r)rintr?rCrFpropertyboolrNrP __classcell__)rs@rrr s(-$%7: 59   'tCy1 &DeCHo1E,F4PSUZ[^`c[cUdPdKe d   $  rr)typingrrrbasermodeling_utilsr utilsr r r r# get_loggerrRr&rr/rrrdsK210HH   H %\ \ r