L i]]jddlZddlZddlZddlmZddlmZddlmZddl m Z m Z m Z m Z mZmZe rddlZddlZddlmZddlmZe rddlZddlmZdd lmZmZdd lmZej<eZ dd Z! dd Z"dd Z#dZ$dZ%dZ&dddZ'dZ( ddZ) ddZ*dZ+dZ,ddZ-y)N)deepcopy) signature)version)get_available_devicesis_accelerate_availableis_bitsandbytes_available'is_bitsandbytes_multi_backend_availableis_torch_availablelogging)Conv1D)init_empty_weights)add_hook_to_moduleremove_hook_from_module)find_tied_parametersc d|vrA|jd}|ddD]#}t||}|t|d|d|}%|d}||jvr||jvrt|d|d||jv}t||} | j t j dk(r,|dt j dfvr|t|d|d|du} |s tsd} d} n~ttjd xr1t|j|tjj} t|j|tjj} | s| r|j|} | j jd k7r|| j|}nCt|t j r|jd }nt j"|d }t%|j&t(r| s |j*}| j,}| |j.t j0t j2fvk7rtd |j.d| rt5j6t8j:j5dt5j6dkD}|j.t j0t j2fvr |s tdtjj|fddi|j|}| rt=|d|dj|n| r| rt5j6t8j:j5dt5j6dk\}|j.t j0t j2fvr |s tdtjjj>d||d|d|}n1tjj|fddi|j|}||j|<yy|| j|}nCt|t j r|j|}nt j"|| }|r||j|<ytj@|| jB}||j|<y)a A helper function to set a given tensor (parameter of buffer) of a module on a specific device (note that doing `param.to(device)` creates a new tensor not linked to the parameter, which is why we need this function). The function is adapted from `set_module_tensor_to_device` function from accelerate that is adapted to support the class `Int8Params` from `bitsandbytes`. Args: module (`torch.nn.Module`): The module in which the tensor we want to move lives. tensor_name (`str`): The full name of the parameter/buffer. device (`int`, `str` or `torch.device`): The device on which to set the tensor. value (`torch.Tensor`, *optional*): The value of the tensor (useful when going from the meta device to any other device). quantized_stats (`dict[str, Any]`, *optional*): Dict with items for either 4-bit or 8-bit serialization .Nz has no attribute z- does not have a parameter or a buffer named metaz7 is on the meta device, we need a `value` to put in on F Params4bitcudacpu)devicez Value dtype `z7` is not compatible with parameter quantization status. bitsandbytesz0.37.2zDetected int8 weights but the version of bitsandbytes is not compatible with int8 serialization. Make sure to download the latest `bitsandbytes` version. `pip install --upgrade bitsandbytes`. requires_gradSCBz0.41.3zDetected 4-bit weights but the version of bitsandbytes is not compatible with 4-bit serialization. Make sure to download the latest `bitsandbytes` version. `pip install --upgrade bitsandbytes`.)dataquantized_statsrr)r)"splitgetattr ValueError _parameters_buffersrtorchr hasattrbnbnn isinstancer Int8ParamstypetoTensortensor issubclass source_clsr T__dict__dtypeint8uint8rparse importlibmetadatasetattrfrom_prequantized Parameterr)module tensor_namervaluersplitsr new_module is_buffer old_valueprequantized_loadingis_8bitis_4bitparam new_valuekwargsis_8bit_serializableis_4bit_serializables l/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/integrations/bitsandbytes.py%set_module_quantized_tensor_to_devicerL#se( k""3'CR[ E /J! F8+=eWA!FGGF  Rj &,,,FOO1SF8#PQ\P]]^_``v.I ,I5<<//F65<O>OP'"";/ <<   &}%LL0 E5<<0!HHUO !LLu= &++V4=Q%KK ''F# EJJ ;T(TU #IOO#44kl'.}}Y5G5G5O5OP^5_'`cjcpcpd($??uzz5;;&??H\$y FF--iWuWPVWZZ[ab 'Iuoe.D.G.G.OP'+2==9K9K9S9STb9c+dhohuhu i,(!5::u{{*CCL`(}!$ 1 1 C C!&(7&+% ! ! !I!$ 1 1) [5 [TZ [ ^ ^_e fI.7F  { +g 'l =! V,I u|| ,(I U6:I +4FOOK ( Yi>U>UVI.7F  { +c  |jD]i\}}|g}|j|t|tjt fr||vrdj | t fd|Dst5t|t r|jj\}}n|j}|j}|jdk(rTtjj|||j du|j"|j$|j&|<d}n|j(||j(vrndt+t-tjj.j0vrd|j2ini} tjj.|||j du|j4f|j6|j8d| |j&|<d}t;||j&|_|j&|j?d dddtAt+|jCd kDrtE||||| \} }|jGd l||fS#1swYYxYw) z Private method that wraps the recursion for module replacement. Returns the converted model and a boolean that indicates if the conversion has been successful or not. Nrc3:K|]}|dzvxs|k(ywrNr.0keycurrent_key_name_strs rK z+_replace_with_bnb_linear...Y\s22T?S8STllm_int8)has_fp16_weights thresholdT quant_storage)compress_statistics quant_typeFrhas_been_replacedr)$named_childrenappendr)r(Linearr joinanyrweightshape in_features out_featuresquantization_methodr' Linear8bitLtbiasllm_int8_has_fp16_weightllm_int8_threshold_modulesllm_int8_skip_moduleslistr Linear4bit parametersbnb_4bit_quant_storagebnb_4bit_compute_dtypebnb_4bit_use_double_quantbnb_4bit_quant_typer+r0requires_grad_lenchildren_replace_with_bnb_linearpop) modelmodules_to_not_convertcurrent_key_namequantization_configr_namer<rgrh extra_kwargs_rTs @rKrzrzsx,,.>! f  #! % v 62 3E[9[#&88,<#= `v())?!&&14:MM4G4G1 \&,&8&8 '-':': *>>@JN/2vv/B/B'("KKt3-@-Y-Y&9&L&L 0C0t,-1)0EEQ $(;(Q(Q Q $3d9SVVEVEV;W;b;b6c#c"12E2\2\ ]%') 47663D3D + , & 4 7 3 J J 4 5H4a4a+>+R+R 4#/4ENN4015-6:6lENN4(3NN4(77>S)?T tFOO%& '! +#;& #"3 $ A  R }>!~ # ##i)?)?s F'JJ cd|dgn|}t||||\}}|stjd|S)a A helper function to replace all `torch.nn.Linear` modules by `bnb.nn.Linear8bit` modules from the `bitsandbytes` library. This will enable running your models using mixed int8 precision as described by the paper `LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale`. Make sure `bitsandbytes` compiled with the correct CUDA version of your hardware is installed before running this function. `pip install -i https://test.pypi.org/simple/ bitsandbytes` The function will be run recursively and replace all `torch.nn.Linear` modules except for the `lm_head` that should be kept as a `torch.nn.Linear` module. The replacement is done under `init_empty_weights` context manager so no CPU/GPU memory is required to run this function. Int8 mixed-precision matrix decomposition works by separating a matrix multiplication into two streams: (1) and systematic feature outlier stream matrix multiplied in fp16 (0.01%), (2) a regular stream of int8 matrix multiplication (99.9%). With this method, int8 inference with no predictive degradation is possible for very large models (>=176B parameters). Parameters: model (`torch.nn.Module`): Input model or `torch.nn.Module` as the function is run recursively. modules_to_not_convert (`list[`str`]`, *optional*, defaults to `["lm_head"]`): Names of the modules to not convert in `Linear8bitLt`. In practice we keep the `lm_head` in full precision for numerical stability reasons. current_key_name (`list[`str`]`, *optional*): An array to track the current key of the recursion. This is used to check whether the current key (part of it) is not in the list of modules to not convert (for instances modules that are offloaded to `cpu` or `disk`). quantization_config ('transformers.utils.quantization_config.BitsAndBytesConfig'): To configure and manage settings related to quantization, a technique used to compress neural network models by reducing the precision of the weights and activations, thus making models more efficient in terms of both storage and computation. lm_headzYou are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug.)rzloggerwarning)r|r}r~rr_s rKreplace_with_bnb_linearrsN<-C,Ji[Pf7 %'79L E    LrMcLtjdtt|i|S)Nzj`replace_8bit_linear` will be deprecated in a future version, please use `replace_with_bnb_linear` instead)warningswarn FutureWarningrargsrHs rKreplace_8bit_linearrs& MMt #D 3F 33rMcLtjdtt|i|S)Nz`set_module_8bit_tensor_to_device` will be deprecated in a future version, please use `set_module_quantized_tensor_to_device` instead)rrrrLrs rK set_module_8bit_tensor_to_devicers) MM P 1$ A& AArMct|}|jt|}t|tr>t t |jgt |jz}n t |g}t|dkD}|sN|j}|<|jDcgc]\}}t|t|k(s|!}}}|St |j} | ddg}t|t|z } t t|t | z} ddg} g} | D]1}| D]}||vs|j|d}| j!|3| Scc}}w)a An utility function to get the key of the module to keep in full precision if any For example for CausalLM modules we may want to keep the lm_head in full precision for numerical stability reasons. For other architectures, we want to keep the tied weights of the model. The function will return a list of the keys of the modules to not convert in int8. Parameters: model (`torch.nn.Module`): Input model rrz.weightz.bias)r tie_weightsrr)dictsumrpvalueskeysrxget_output_embeddings named_modulesidnamed_parameterssetreplacera)r| tied_model tied_params tied_keyshas_tied_params output_embrr<list_last_module list_modules intersectionlist_untouchednames_to_removefiltered_module_namesname_to_removes rKget_keys_to_not_convertr#s%J&z2K+t$[//12B7${?O?O?Q:RR  R( )nq(O 002  !9>9L9L9NovRTU[R\`bcm`nRno o# #..01L$R(+,'(3y>9L#i.)D,>>N!'*O+- 8N%||NB7 8 $$T* + ! ' ps 6E6E6ct|tjjst dt |d|j j}|dvr|S|dk(rhtjj|j|j}tjd|jd|j!|S|j"|j"|_t%tjdr5tjj'|j|j"}n,|j|j"j)dd zd z}|j!|S) z Helper function to dequantize 4bit or 8bit bnb weights. If the weight is not a bnb quantized weight, it will be returned as is. z1Input weight should be of type nn.Parameter, got z instead)rr*rz(The model is going to be dequantized in z - if you want to upcast it to another dtype, make sure to pass the desired dtype when quantizing the model through `bnb_4bit_quant_type` argument of `BitsAndBytesConfig`int8_vectorwise_dequantrg@ ?)r)r%r(r; TypeErrorr+ __class____name__r' functionaldequantize_4bitr quant_stater warning_oncer3r,rr&rview)rer3statecls_name output_tensor dequantizeds rKdequantize_bnb_weightrVs, fehh00 1KDQWL>Yabcc((H33 <66v{{FDVDVW 6}7J7J6KLv w && yyJJ s~~89nn<>%  rMcttj|jj}|j }i}t j|j}|D]}||jvs||||<|di|}|S)a Creates a new hook based on the old hook. Use it only if you know what you are doing ! This method is a copy of: https://github.com/huggingface/peft/blob/748f7968f3a31ec06a1c2b0328993319ad9a150a/src/peft/utils/other.py#L245 with some changes r) r! acceleratehooksrrr2inspectr__init__rr)old_hook old_hook_cls old_hook_attrfiltered_old_hook_attrold_hook_init_signatureknew_hooks rK_create_accelerate_new_hookrws :++X-?-?-H-HIL%%M%// 0E0EF 9 '22 2(5a(8 "1 %9545H OrMc Z|j}|dk(rtjjntjj}|j D]\}} |g}|j |t| |rK||vrFdj|tfd|Ds t| dd} | jj} t5tjj| j | j"| du} ddd|dk(r | j$} nd} tjj't)| j||  _ | | | _t-| dr.| j.}t1|}t3| t5| || j7| | |j8|<d}t;t=| j?d kDrtA| ||||| \}}|jCd ||fS#1swY xYw) ap Converts a quantized model into its dequantized original version. The newly converted model will have some performance drop compared to the original model before quantization - use it only for specific usecases such as QLoRA adapters merging. Returns the converted model and a boolean that indicates if the conversion has been successful or not. rXNrc3:K|]}|dzvxs|k(ywrPrrQs rKrUz*_dequantize_and_replace..rVrWrk)rk_hf_hookTrr^r)"rir'r(rjrqr`rar)rcrdr!rerrr%rbrgrhrr;rrkr&rrrrr,rnrxrpry_dequantize_and_replacer{)r|r3r}r~rr_ quant_method target_clsrr<rkrr@rrrrrTs @rKrrs'::> # #N 3    & &~ 69 MM( #9$;?K K.07HH 2? CCrM)NN)NNNF)NNN)N)reztorch.nn.Parameterr3z torch.dtype)F).importlib.metadatar7rrcopyrr packagingrutilsrrr r r r rr'r%torch.nnr( pytorch_utilsr rraccelerate.hooksrraccelerate.utilsr get_loggerrrrLrzrrrrrrrrrrrrrMrKrs&-L5   H %p8j  K$\*\4B/!f!B(  E$T (., DrM