L iddlZddlmZddlmZddlmZerddlmZddl m Z m Z m Z m Z mZdd lmZmZe rddlZej&eZGd d eZy) N) TYPE_CHECKING)version) HfQuantizer)PreTrainedModel)is_auto_gptq_availableis_gptqmodel_availableis_optimum_availableis_torch_availablelogging) GPTQConfigQuantizationConfigMixinczeZdZdZdZgdZdZdeffd ZdZ dd Z d Z dd Z dd Z edefd ZddZxZS)GptqHfQuantizerz Quantizer of the GPTQ method - for GPTQ the quantizer support calibration of the model through `auto_gptq` or `gptqmodel` package. Quantization is done under the hood for users if they load a non-prequantized model. F)optimum auto_gptq gptqmodelNquantization_configc t||fi|ts tdddlm}|j |jj|_ y)NGLoading a GPTQ quantized model requires optimum (`pip install optimum`)r) GPTQQuantizer) super__init__r ImportError optimum.gptqr from_dictrto_dict_optimumoptimum_quantizer)selfrkwargsr __class__s l/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/quantizers/quantizer_gptq.pyrzGptqHfQuantizer.__init__-sM ,77#%gh h.!.!8!89Q9Q9a9a9c!dcts tdtrtrtj dtxrHt jtjj dt jdkDxs t}|s)tjjs tdtsts tdtrSt jtjj dt jdkr tdtrt jtjj d t jd ksHt jtjj d t jd kr td yy)Nrz4Detected gptqmodel and auto-gptq, will use gptqmodelz auto-gptqz0.4.2z2GPU is required to quantize or run quantize model.z|Loading a GPTQ quantized model requires gptqmodel (`pip install gptqmodel`) or auto-gptq (`pip install auto-gptq`) library. rzYou need a version of auto_gptq >= 0.4.2 to use GPTQ: `pip install --upgrade auto-gptq` or use gptqmodel by `pip install gptqmodel>=1.4.3`.rz1.4.3r1.23.99zJThe gptqmodel version should be >= 1.4.3, optimum version should >= 1.24.0)r rr r loggerwarningrparse importlibmetadatatorchcuda is_available RuntimeError)r argsr!gptq_supports_cpus r#validate_environmentz$GptqHfQuantizer.validate_environment6s#%gh h ! #(>(@ NNQ R # $ ` i0088EFW^I__&$ % !)@)@)BST T(*.D.FO $ %'-- 8J8J8R8RS^8_*`cjcpcp d + ^ $ % MM),,44[A BW]]SZE[ [}}Y//77 BCgmmT]F^^jk k_&r$returnc|'tj}tjd|S|tjk7rtjd|S)NzLLoading the model in `torch.float16`. To overwrite it, set `dtype` manually.zLWe suggest you to set `dtype=torch.float16` for better efficiency with GPTQ.)r,float16r'info)r dtypes r# update_dtypezGptqHfQuantizer.update_dtypeRsD =MME KKf g emm # KKf g r$c|dtjdi}ts"|ddtjdifvr|ddik(|S)Ncpur)r,devicer )r device_maps r#update_device_mapz!GptqHfQuantizer.update_device_mapZsN  ell512J%'J52u||TYGZB[:\,\ 2q' !r$c h|jjdk7r td|jrt j t jj dt j dkr|jj|}y|jj|fi|}yy)N input_idsz%We can only quantize pure text model.rr&) r"main_input_namer/ pre_quantizedrr)r*r+r convert_modelr modelr!s r#$_process_model_before_weight_loadingz4GptqHfQuantizer._process_model_before_weight_loadingbs ?? * *k 9FG G   }}Y//77 BCw}}U^G__..<rFrOpropertyboolrSrV __classcell__)r"s@r#rr#sg !=e,Cel8 Nfdr$r)r*typingr packagingrbasermodeling_utilsrutilsr r r r r utils.quantization_configrrr, get_loggerrXr'rrRr$r#rhsO 0uuK   H %YkYr$