L izddlmZmZddlmZerddlmZddlmZm Z m Z m Z ddl m Z e rddlZe jeZGd d eZy) ) TYPE_CHECKINGOptional) HfQuantizer)PreTrainedModel)is_accelerate_availableis_eetq_availableis_torch_availablelogging)get_module_from_nameNceZdZdZdZdZddgZfdZdZdd Z d d d e de fd Z d d ddd e ddfdZ ddZ dd d deee fdZddZede fdZxZS)EetqHfQuantizera 8-bit quantization from EETQ quantization method: before loading: converts transformer layers into W8A16Linear during loading: load 16bit weight and pass to the layer object after: quantizes individual weights in Linear8bitLt into 8bit at first .cuda() call TFeetq acceleratec 4t||fi|||_yN)super__init__quantization_config)selfrkwargs __class__s l/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/quantizers/quantizer_eetq.pyrzEetqHfQuantizer.__init__-s ,77#6 c@ts td ddl}t s td|j dds|j ddr t d tjjs td |j d }|tjd y|At|tr0d |jvsd|jvr t dyyy#t$r}dt|vr td|d}~wwxYw)NzUsing `eetq` 8-bit quantization requires eetq.Please install the latest version of eetq from : https://github.com/NetEase-FuXi/EETQrshard_checkpointzYou are using a version of EETQ that is incompatible with the current transformers version. Either downgrade transformers to <= v4.46.3 or, if available, upgrade EETQ to > v1.0.0.zNLoading an EETQ quantized model requires accelerate (`pip install accelerate`)from_tfF from_flaxzConverting into 8-bit weights from tf/flax weights is currently not supported, please make sure the weights are in PyTorch format.z/No GPU found. A GPU is needed for quantization. device_mapzYou have loaded an EETQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.cpudiskzYou are attempting to load an EETQ model with a device_map that contains a CPU or disk device. This is not supported. Please remove the CPU or disk device from the device_map.)r ImportErrorrstrr get ValueErrortorchcuda is_available RuntimeErrorlogger warning_once isinstancedictvalues)rargsrrexcr s rvalidate_environmentz$EetqHfQuantizer.validate_environment1s= "h   '(no o ::i '6::k5+I;  zz&&(PQ QZZ -     I  #*d+*:K:K:M1MQW[e[l[l[nQn hRo+$= !SX-"n  sC55 D>DDreturnc|(tj}tjd||S|tjk7rtjd|S)NzOverriding dtype=%s with `dtype=torch.float16` due to requirements of `eetq` to enable model loading in 8-bit. Pass your own dtype to specify the dtype of the remaining non-linear layers or pass dtype=torch.float16 to remove this warning.zLWe suggest you to set `dtype=torch.float16` for better efficiency with EETQ.)r'float16r+info)rdtypes r update_dtypezEetqHfQuantizer.update_dtype_sM =MME KK?   emm # KKf g rmodelr param_namec lddlm}t||\}}t||r|js|dk(ryyy)Nr) EetqLinearbiasFT)rr<r r- pre_quantized)rr9r:rr<module tensor_names rparam_needs_quantizationz(EetqHfQuantizer.param_needs_quantizationms9#25*E fj )!![F%:r param_valuez torch.Tensor target_devicez torch.devicec zddlm}m}t||\}} ||\} } t ||rN|j s| dk(r-| dk(r8|j tjk7rtd| dk(r td| j||j| <|jd| j|y) Nr)r<quantize_and_preprocess_weightsr=weightz6Expect quantized weights but got an unquantized weight weight_scalez;Expect unquantized weights but got a quantized weight_scale weight_scales) rr<rEr r-r>r7r'int8r&to_buffersregister) rr9rBr:rCrr<rEr?r@ new_valuerGs rcreate_quantized_paramz&EetqHfQuantizer.create_quantized_paramys E25*E "A+"N < fj )!![F%:(*{/@/@EJJ/N$%]^^.0$%bcc'0||M'B $)GHrc |Sr)rr9rs r#_process_model_after_weight_loadingz3EetqHfQuantizer._process_model_after_weight_loadings rkeep_in_fp32_modulesc ddlm}|j||jj||_|||j|j|j }|j|j _y)Nr)replace_with_eetq_linear)modules_to_not_convertrr>) integrationsrTget_modules_to_not_convertrrUr>config)rr9rRrrTs r$_process_model_before_weight_loadingz4EetqHfQuantizer._process_model_before_weight_loadingsl <&*&E&E 4++BBDX' #) #'#>#> $ 8 8,,  ,0+C+C (rcyNTrP)rsafe_serializations ris_serializablezEetqHfQuantizer.is_serializablesrcyr[rP)rs r is_trainablezEetqHfQuantizer.is_trainablesr)r7 torch.dtyper3r`)r9rr)__name__ __module__ __qualname____doc__ requires_parameters_quantizationrequires_calibrationrequired_packagesrr2r8r$boolrArNrQrlistrYr]propertyr_ __classcell__)rs@rrr!s (,$ .7,\  .? S _c I I$I I & I2 59D D'tCy1D*drr)typingrrbasermodeling_utilsrutilsr r r r quantizers_utilsr r' get_loggerrar+rrPrrrrsK+0[[2   H %NkNr