L iddlmZmZddlmZerddlmZddlmZm Z m Z m Z ddl m Z e rddlZe jeZGd d eZy) ) TYPE_CHECKINGOptional) HfQuantizer)PreTrainedModel)is_accelerate_availableis_torch_availableis_vptq_availablelogging)QuantizationConfigMixinNceZdZdZdZdgZdeffd ZdZddZ dd d d e e e fd Z dd ZedefdZddZxZS)VptqHfQuantizerzS Quantizer of the VPTQ method. Enables the loading of prequantized models. Tvptqquantization_configc 4t||fi|||_yN)super__init__r)selfrkwargs __class__s l/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/quantizers/quantizer_vptq.pyrzVptqHfQuantizer.__init__(s ,77#6 cXts tdts tdy)NzGUsing `vptq` quantization requires Accelerate: `pip install accelerate`zEUsing `vptq` quantization requires VPTQ>=0.0.4: `pip install -U vptq`)r ImportErrorr )rargsrs rvalidate_environmentz$VptqHfQuantizer.validate_environment,s+&(gh h "ef f#rreturnc,|tjjr'tj}tj d|Sddl}t|dd}|ddur tdtj}tj d|S) NzCUDA available. Assuming VPTQ inference on GPU and loading the model in `torch.float16`. To overwrite it, set `dtype` manually.rdevice_availabilitycyNF)devices rz.VptqHfQuantizer.update_dtype..=srcpuTzKNo GPU found. Please wait for the next release of VPTQ to use CPU inferencezVNo GPU found. Assuming VPTQ inference on CPU and loading the model in `torch.float32`.) torchcuda is_availablefloat16loggerinforgetattr RuntimeErrorfloat32)rdtyperr!s r update_dtypezVptqHfQuantizer.update_dtype3s =zz&&(  V &-d4IK_&`#&u-5&'tuu  tu rmodelrkeep_in_fp32_modulesc ddlm}|j||jj||_|||j|j|j|j _y)z we don't have param like modules_to_not_convert to indicate which layers should not be quantized because `quantization_config` include the layers that should be quantized r)replace_with_vptq_linear)rmodules_to_not_convertN) integrationsr6get_modules_to_not_convertrr7config)rr3r4rr6s r$_process_model_before_weight_loadingz4VptqHfQuantizer._process_model_before_weight_loadingDsa <&*&E&E 4++BBDX' # !  $ 8 8#'#>#> ,0+C+C (rc |Srr$)rr3rs r#_process_model_after_weight_loadingz3VptqHfQuantizer._process_model_after_weight_loading[s rcyr#r$)rs r is_trainablezVptqHfQuantizer.is_trainable^srcy)NTr$)rsafe_serializations ris_serializablezVptqHfQuantizer.is_serializablebsr)r1 torch.dtyperrCr)r3r)__name__ __module__ __qualname____doc__requires_calibrationrequired_packagesr rrr2rliststrr;r=propertyboolr?rB __classcell__)rs@rrr s} 7,C7g(59D D'tCy1D.drr)typingrrbasermodeling_utilsrutilsr r r r utils.quantization_configr r( get_loggerrDr,rr$rrrUsI+0[[?   H %CkCr