L i?ddlmZmZddlmZmZmZmZddlm Z m Z ddl m Z m Z ddlmZerddlmZe r dd lZdd lmZneZe j,eZGd d eZGd deZdee j6e j8gdiZy ))ABCabstractmethod) TYPE_CHECKINGAnyOptionalUnion)is_torch_availablelogging)QuantizationConfigMixinQuantizationMethod)get_module_from_name)PreTrainedModelN) ModuleListc beZdZdZdZdZdZdefdZd6d Z d6d Z d e e e efde e e effd Zd6d Zdee de dee fdZdee dee dee fdZdee dee fdZddde e dffdZde e eee ffde e eee fffdZdefdZddde defdZdZdZd Zd!Zd7d"Z d7d#Z!d$Z"d%Z#d&Z$d'Z%de de fd(Z&e' d8ddd)e ee d*e ee d+efd,Z(e)defd-Z*e)defd.Z+d9d/Z,d0Z-e.d1Z/e.d2Z0e.d:d3Z1e)e.d4Z2d5Z3y); HfQuantizera Abstract class of the HuggingFace quantizer. Supports for now quantizing HF transformers models for inference and/or quantization. This class is used only for transformers.PreTrainedModel.from_pretrained and cannot be easily used outside the scope of that method yet. Attributes quantization_config (`transformers.utils.quantization_config.QuantizationConfigMixin`): The quantization config that defines the quantization parameters of your model that you want to quantize. modules_to_not_convert (`list[str]`, *optional*): The list of module names to not convert when quantizing the model. required_packages (`list[str]`, *optional*): The list of required pip packages to install prior to using the quantizer requires_calibration (`bool`): Whether the quantization method requires to calibrate the model before using it. requires_parameters_quantization (`bool`): Whether the quantization method requires to create a new Parameter. For example, for bitsandbytes, it is required to create a new xxxParameter in order to properly quantize the model. FNquantization_configc ||_|jdg|_|jdd|_|js&|jrt d|j dyy)Nmodules_to_not_convert pre_quantizedTzThe quantization method z does require the model to be pre-quantized. You explicitly passed `pre_quantized=False` meaning your model weights are not quantized. Make sure to pass `pre_quantized=True` while knowing what you are doing.)rpoprrrequires_calibration ValueError quant_method)selfrkwargss b/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/quantizers/base.py__init__zHfQuantizer.__init__:sv#6 '-jj1I2&N##ZZ>!!d&?&?*+>+K+K*LMNO '@!dtype torch.dtypereturncNtjd|j|S)z Deprecared in favor of `update_dtype`! Args: dtype (`torch.dtype`): The input dtype that is passed in `from_pretrained` zb`update_torch_dtype` is deprecated in favor of `update_dtype`! It will be removed in version v4.57)logger warning_once update_dtyperr!s rupdate_torch_dtypezHfQuantizer.update_torch_dtypeHs(  p   ''r c|S)aO Some quantization methods require to explicitly set the dtype of the model to a target dtype. You need to override this method in case you want to make sure that behavior is preserved Args: dtype (`torch.dtype`): The input dtype that is passed in `from_pretrained` r(s rr'zHfQuantizer.update_dtypeU  r device_mapc|S)a Override this method if you want to pass a override the existing device map with a new one. E.g. for bitsandbytes, since `accelerate` is a hard requirement, if no device_map is passed, the device_map is set to `"auto"`` Args: device_map (`Union[dict, str]`, *optional*): The device_map that is passed through the `from_pretrained` method. r+)rr-s rupdate_device_mapzHfQuantizer.update_device_mapas r c|S)a Override this method if you want to adjust the `target_dtype` variable used in `from_pretrained` to compute the device_map in case the device_map is a `str`. E.g. for bitsandbytes we force-set `target_dtype` to `torch.int8` and for 4-bit we pass a custom enum `accelerate.CustomDtype.int4`. Args: dtype (`torch.dtype`, *optional*): The dtype that is used to compute the device_map. r+r(s radjust_target_dtypezHfQuantizer.adjust_target_dtypemr,r missing_keysprefixc|S)z Override this method if you want to adjust the `missing_keys`. Args: missing_keys (`list[str]`, *optional*): The list of missing keys in the checkpoint compared to the state dict of the model r+)rmodelr2r3s rupdate_missing_keyszHfQuantizer.update_missing_keysys r expected_keys loaded_keysc|S)aV Override this method if you want to adjust the `update_expected_keys`. Args: expected_keys (`list[str]`, *optional*): The list of the expected keys in the initialized model. loaded_keys (`list[str]`, *optional*): The list of the loaded keys in the checkpoint. r+)rr5r7r8s rupdate_expected_keysz HfQuantizer.update_expected_keyss r unexpected_keysc|SNr+)rr5r;s rupdate_unexpected_keysz"HfQuantizer.update_unexpected_keyssr c|jDcic]'\}tfd|jDs%|)c}}Scc}}w)a returns dtypes for modules that are not quantized - used for the computation of the device_map in case one passes a str as a device_map. The method will use the `modules_to_not_convert` that is modified in `_process_model_before_weight_loading`. Args: model (`~transformers.PreTrainedModel`): The model to quantize dtype (`torch.dtype`): The dtype passed in `from_pretrained` method. c3&K|]}|v ywr=r+).0mnames r z8HfQuantizer.get_special_dtypes_update..sFvUVqDyFvs)named_parametersanyr)rr5r!rC_s ` rget_special_dtypes_updatez%HfQuantizer.get_special_dtypes_updatesL(-'='='?  #D!3FvZ^ZuZuFvCvD%K   s &AA max_memoryc|S)zaadjust max_memory argument for infer_auto_device_map() if extra memory is needed for quantizationr+)rrIs radjust_max_memoryzHfQuantizer.adjust_max_memoryr cPtjd|j|i|S)zDEPRECATED -> remove in v5z`check_quantized_param` is deprecated in favor of `param_needs_quantization`, which is a much more self.explanatory name for what the method achieves. It will be removed in v5)r%r&param_needs_quantizationrargsrs rcheck_quantized_paramz!HfQuantizer.check_quantized_params0 ` -t,,d=f==r r5r param_namec y)zh Check whether a given param needs quantization as defined by `create_quantized_param`. Fr+)rr5rRrs rrNz$HfQuantizer.param_needs_quantizationsr cb|js#td|jjdy)a! Take needed components from state_dict (those from which `param_needs_quantization` is True) and create quantized param. It usually also load the new param directly in the `model`. Note: only applicable if requires_parameters_quantization == True. zG`.create_quantized_param()` method is not supported by quantizer class .N) requires_parameters_quantizationAttributeError __class____name__rOs rcreate_quantized_paramz"HfQuantizer.create_quantized_params:44 YZ^ZhZhZqZqYrrst 5r cy)a& This method is used to potentially check for potential conflicts with arguments that are passed in `from_pretrained`. You need to define it for all future quantizers that are integrated with transformers. If no explicit check are needed, simply return nothing. Nr+rOs rvalidate_environmentz HfQuantizer.validate_environments r c|Sz"updates the tp plan for the scalesr+rconfigs rupdate_tp_planzHfQuantizer.update_tp_plan r c|Sr^r+r_s rupdate_ep_planzHfQuantizer.update_ep_planrbr c d|_|jj|_|jr|j ||j |fi|S)aQ Setting model attributes and/or converting model before weights loading. At this point the model should be initialized on the meta device so you can freely manipulate the skeleton of the model in order to replace modules in-place. Make sure to override the abstract method `_process_model_before_weight_loading`. Args: model (`~transformers.PreTrainedModel`): The model to quantize kwargs (`dict`, *optional*): The keyword arguments that are passed along `_process_model_before_weight_loading`. T) is_quantizedrrquantization_methodr_convert_model_for_quantization$_process_model_before_weight_loadingrr5rs rpreprocess_modelzHfQuantizer.preprocess_modelsO"$($<$<$I$I!     0 0 78t88I&IIr c (|j|fi|S)a Post-process the model post weights loading. Make sure to override the abstract method `_process_model_after_weight_loading`. Args: model (`~transformers.PreTrainedModel`): The model to quantize kwargs (`dict`, *optional*): The keyword arguments that are passed along `_process_model_after_weight_loading`. )#_process_model_after_weight_loadingrjs rpostprocess_modelzHfQuantizer.postprocess_models8t77HHHr ct|dr|`t|jdr |j`t|jdr |j`t|dr|`d|_y)z@ Remove the quantization config from the model. hf_quantizerr_pre_quantization_dtypergFN)hasattrrpr`rrqrgrfrr5s rremove_quantization_configz&HfQuantizer.remove_quantization_configs_ 5. )" 5< ?%%r cy)zUFlag indicating whether the quantized model can carry out quantization aware trainingFr+rzs ris_qat_trainablezHfQuantizer.is_qat_trainable:r cy)z;Flag indicating whether the quantized model can be compiledFr+rzs ris_compileablezHfQuantizer.is_compileable?rr c difS)zcGet state dict and metadata. Useful when we need to modify a bit the state dict due to quantizationNr+)rr5safe_serializations rget_state_dict_and_metadataz'HfQuantizer.get_state_dict_and_metadataDs Rxr c|S)zEUpdate state dict with metadata. Default behaviour returns state_dictr+)r state_dictmetadatas rupdate_state_dict_with_metadataz+HfQuantizer.update_state_dict_with_metadataHrLr c yr=r+rjs rriz0HfQuantizer._process_model_before_weight_loadingLsEHr c yr=r+rjs rrmz/HfQuantizer._process_model_after_weight_loadingOsDGr cyr=r+)rrs ris_serializablezHfQuantizer.is_serializableRs8;r cyr=r+rzs r is_trainablezHfQuantizer.is_trainableUsr cddlm}|jD]\}}|jj}|t vs%|j jt |dvsH|5t||\}}t |d|jj|j|<dddy#1swYxYw)Nr)init_empty_weightsquantization_methods module_name) accelerater named_modulesrXrY!MODULES_TO_PATCH_FOR_QUANTIZATIONrrrr`get_text_config_modules)rr5rrCmodulemodule_class_name parent_modules rrhz+HfQuantizer._convert_model_for_quantizationYs1!//1 LD& & 0 0 9 9  $EE((5545FGH^_`()*>ud*K'M43TUf3ghu3v 4464M**40  s &AB77C )r!r"r#r")r5r)NNF)Fr=)4rY __module__ __qualname____doc__rrequired_packagesrVr rr)r'rdictstrrr/r1listr6r:r>rHrintrKboolrQrNrZr\rardrkrnrtrwr{rvr staticmethodrpropertyrrrrrrirmrrrhr+r rrr"s|&!',$ ,C  (  HT#s(^,D RVWZ\_W_R`Ia  tCy#RVWZR[ c QUVYQZ _cdg_h T#Y4PS9 m SR_M_H` "DeCHo1E,F4PSUZ[^`c[cUdPdKe>>.?S_c J$ I #   -148"' & &tCy)&'tCy1& &&*$HHGG;;  r rc2eZdZdZfdZ ddZxZS)SequentialLlama4TextExpertsz A module that implements a compressed version of a list of expert modules. This is specifically designed to work with Llama4TextExperts in MoE layers. cddlm}t| t |j Dcgc] }|| c}|j |_ycc}w)Nr) Llama4TextMLP)*transformers.models.llama4.modeling_llama4rsuperrrangenum_local_experts num_experts)rr`rrGrXs rrz$SequentialLlama4TextExperts.__init__osBL v?W?W9XYA-/YZ!33ZsAc|j|jd|jd}tj|}t |jD]}||||||<|S)N)reshapershapetorch zeros_liker)r hidden_states routed_out expert_idxs rforwardz#SequentialLlama4TextExperts.forwardusw&--d.>.>MDWDWXZD[\ %%m4  0 01 QJ%5T*%5mJ6O%PJz " Qr )r torch.Tensorr#r)rYrrrrr __classcell__)rXs@rrris$ 4 % r rLlama4TextExperts)rr)abcrrtypingrrrrutilsr r utils.quantization_configr r quantizers_utilsrmodeling_utilsrrtorch.nnrr get_logger__file__r%rrCOMPRESSED_TENSORSBITS_AND_BYTESrr+r rrs$66/S20#J   H %D#DN *02  1 1  - -! %!r