L iU>dZddlZddlmZddlZddlZddlm Z ddl m Z ddl m Z mZmZer ddlZddlmZej$eZdZe rerdd lmZndd lmZGd d eZGd deZdadZdZdZdZ dZ!dZ"ddZ#ddZ$y)z Integration with Deepspeed N) partialmethod)dep_version_check)is_accelerate_availableis_torch_availablelogging)nnctjjddu}|r tjd}yy#tj $rYywxYw)N deepspeedTF) importlibutil find_specimportlib_metadatametadataPackageNotFoundError)package_exists_s i/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/integrations/deepspeed.pyis_deepspeed_availabler$sW^^--k:$FN "++K8A"66  s<AA)HfDeepSpeedConfig)objectc"eZdZdZfdZxZS)raJ This object contains a DeepSpeed configuration dictionary and can be quickly queried for things like zero stage. A `weakref` of this object is stored in the module's globals to be able to access the config from areas where things like the Trainer object is not available (e.g. `from_pretrained` and `_get_resized_embeddings`). Therefore it's important that this object remains alive while the program is still running. [`Trainer`] uses the `HfTrainerDeepSpeedConfig` subclass instead. That subclass has logic to sync the configuration with values of [`TrainingArguments`] by replacing special placeholder values: `"auto"`. Without this special logic the DeepSpeed configuration is not modified in any way. Args: config_file_or_dict (`Union[str, Dict]`): path to DeepSpeed config file or dict. cft|tdtdt| |y)N accelerater )set_hf_deepspeed_configrsuper__init__selfconfig_file_or_dict __class__s rrzHfDeepSpeedConfig.__init__Js)%,'+& ,-)__name__ __module__ __qualname____doc__r __classcell__r!s@rrr9s ..r"rcXeZdZdZfdZdZdZd dZeedZ d dZ d Z xZ S) HfTrainerDeepSpeedConfigz The `HfTrainerDeepSpeedConfig` object is meant to be created during `TrainingArguments` object creation and has the same lifespan as the latter. c@t||d|_g|_yN)rr_dtype mismatchesrs rrz!HfTrainerDeepSpeedConfig.__init__Xs ,- r"cH|j td|jS)Nz8trainer_config_process() wasn't called yet to tell dtype)r- ValueError)rs rdtypezHfTrainerDeepSpeedConfig.dtype]s" ;; WX X{{r"c4|j|}|y|dk(S)NFauto) get_value)r ds_key_longvals ris_autoz HfTrainerDeepSpeedConfig.is_autobs"nn[) ;&= r"c |j|\}}|y|j|dk(r|||<y|sy|j|}|.||k7r(|jjd|d|d|d|yyy)a A utility method that massages the config file and can optionally verify that the values match. 1. Replace "auto" values with `TrainingArguments` value. 2. If it wasn't "auto" and `must_match` is true, then check that DS config matches Trainer config values and if mismatched add the entry to `self.mismatched` - will assert during `trainer_config_finalize` for one or more mismatches. Nr3z- ds =z vs hf )find_config_nodegetr.append)rr5hf_valhf_key must_matchconfigds_keyds_vals r fill_matchz#HfTrainerDeepSpeedConfig.fill_matchis..{; >  ::f  '#F6N  F#  &F"2 OO " "U;-qxqQWPX#Y Z#3 r"F)r?cn|j|jz|jz}|jd|jd| |jd|jd|jd|d| |jd|jd|jd|j d |jd |j |jgd |jd |jd |jd|jd|jdd|jd|j d |js |jr|jdk(rdnd}nd}|jrE|jj!di|jd<|j|jdd<|jd|jxs |jxr|dk(d|jd|dk(d|jd|j"d|jd|j$xs |j&d|j)drt*j,|_y|j1drt*j2|_yt*j4|_y) z Adjust the config with `TrainingArguments` values. This stage is run during `TrainingArguments` object creation. train_micro_batch_size_per_gpuper_device_train_batch_sizegradient_accumulation_stepstrain_batch_sizeztrain_batch_size (calculated)gradient_clipping max_grad_normzoptimizer.params.lr learning_ratezoptimizer.params.betaszadam_beta1+adam_beta2zoptimizer.params.eps adam_epsilonzoptimizer.params.weight_decay weight_decayzscheduler.params.warmup_min_lrrzscheduler.params.warmup_max_lrapexampN checkpointuse_node_local_storagez fp16.enabledz%fp16|fp16_full_eval+fp16_backend(amp)z amp.enabledzfp16+fp16_backend(apex)z amp.opt_levelfp16_opt_levelz bf16.enabledzbf16|bf16_full_eval) world_sizerFrGrCrJrK adam_beta1 adam_beta2rLrM fill_onlyfp16fp16_full_eval fp16_backendsave_on_each_noder@r;rRbf16bf16_full_evalis_truetorchbfloat16r-is_falsefloat32float16)rargsauto_find_batch_sizerHrYs rtrainer_config_processz/HfTrainerDeepSpeedConfig.trainer_config_processsL ??T-M-MMPTPpPpp  ,  , , )$ $   )  , , )    +$ $  +T-?-?Q -t/A/A?S  $ __doo . # .0A0A>R 79J9JN[ 7; 8$:L:Lo^ 99++%)%6%6&%@6eLL  ! !(,  b(IDKK %BFBXBXDKK %&> ?  ii.4.. ILE4I 3   |v'=?XY )<)<>NO )Id6I6ILab << '..DK ]]> *--DK--DKr"cgd}|Dcgc]}|j|s|}}t|dkDrd}t|drt|jdr|jj}nt|jdr t |jj }nt|jdrAt|jjdr!|jjj}n_t|jdrIt|jjdr)t |jjj }|td|d |jd ||z|jr6|jd td |z|z|jd d|z|jd|d|jd|j|dt|jdkDr*dj|j}td|dycc}w)z This stage is run after we have the model and know num_training_steps. Now we can complete the configuration process. )$zero_optimization.reduce_bucket_size-zero_optimization.stage3_prefetch_bucket_size4zero_optimization.stage3_param_persistence_thresholdrNr@ hidden_size hidden_sizes text_configzThe model's config file has neither `hidden_size` nor `hidden_sizes` entry, therefore it's not possible to automatically fill out the following `auto` entries in the DeepSpeed config file: zb. You can fix that by replacing `auto` values for these keys with an integer value of your choice.rgrhg?ri z scheduler.params.total_num_stepsznum_training_steps (calculated)z!scheduler.params.warmup_num_steps warmup_steps z]Please correct the following DeepSpeed config values that mismatch TrainingArguments values: zF The easiest method is to set these DeepSpeed config values to 'auto'.)r7lenhasattrr@rjmaxrkrlr0rVis_zero3intrCget_warmup_stepsr.join) rrcmodelnum_training_stepshidden_size_based_keysxhidden_size_auto_keysrjr.s rtrainer_config_finalizez0HfTrainerDeepSpeedConfig.trainer_config_finalizes "  -C VqdllSTo V V $ % )Kuh'5<<7"',,":":KU\\>:"%ell&?&?"@KU\\=9gellF^F^`m>n"',,":":"F"FKU\\=9gellF^F^`n>o"%ell&>&>&K&K"LK" 55J4KLYY NNA;Q\C\ ]}}Ck)K78J$  .  -  /  ! !"4 5  t ! #4??3J'L(oq  $a!Ws II)NTF) r#r$r%r&rr1r7rCrrVrer|r'r(s@rr*r*Rs8   ![4jU;IH(TCr"r*c.tj|ayr,)weakrefref_hf_deepspeed_config_weak_ref)hf_deepspeed_config_objs rrrs %,KK0G$H!r"cdayr,)rr"runset_hf_deepspeed_configr!s %)!r"cTt"ttjSy)NF)rrsrr"ris_deepspeed_zero3_enabledr's&$05R5T5`,.7799r"cLtttjSyr,)rr@rr"rdeepspeed_configr.s#$05R5T5`,.555r"ct|dd|j}|_gddtjffd ||dS)z Loads state dict into a model specifically for Zero3, since DeepSpeed does not support the `transformers` tensor parallelism API. Nearly identical code to PyTorch's `_load_from_state_dict` _metadataNFmodulecinj|ddi}||d<|||dgg f}trt|Dcgc]}|j|s|c}dkDrddl}t |j |ddd}|D cgc] } | |vs|| } } t| dkDrV|jj| d5tjjdk(r|j|ddd|jjD]\} } |  | ||| zdz|ycc}wcc} w#1swYJxYw) Nassign_to_params_buffersTrF)prefixrecurse) modifier_rank.)r;rrp startswithr dictnamed_parameterszeroGatheredParametersr^ distributedget_rank_load_from_state_dict_modulesitems)r state_dictrrlocal_metadatarckeyr rkparams_to_gathernamechild error_msgsloadrs rrz/_load_state_dict_into_zero3_model..loadFsg'/X\\&"+r5R5M12FND"b*M & 'C 0ecnn]cNd0e,fij,j  $F$;$;6#2;X]$;$^_ =Ga1P`K` 0 3a a#$q(^^667GWX6Y<((113q8444d;<"??002 WKD% UJ (;=UV W1f b <._lr_scheduler_callables=#yy1 -1 )+<<'9Y =  $#r")lr_scheduler_callabler) accelerate.utilsrrr@optimr0 is_offloadloggerinfocreate_optimizer isinstancer) rhf_deepspeed_configrcrxmodel_parametersrrr@rrrs ` ` rdeepspeed_optim_schedres< ' 'FIf :: $8 &67  ) ) + KKV ,,. 26./Lf%i0 & l ""# i , $*)KabL l ""#33GYen3oL l ""r"cddlm}|j}|j}|jj j j}|j||||j|j|rH|js td|jd|jdd\}}d} ||fSd|_|jj!dij!d d } | d kDr1ddl} | j%|| |j'|j }t)t+d |j-} t/||||| \}}||fS) a Init DeepSpeed, after updating the DeepSpeed configuration with any relevant Trainer's args. If `resume_from_checkpoint` was passed then an attempt to resume from a previously saved checkpoint will be made. Args: trainer: Trainer object num_training_steps: per single gpu resume_from_checkpoint: path to a checkpoint if to resume from after normal DeepSpeedEngine load inference: launch in inference mode (no optimizer and no lr scheduler) auto_find_batch_size: whether to ignore the `train_micro_batch_size_per_gpu` argument as it's being set automatically by the auto batch size finder Returns: optimizer, lr_scheduler We may use `deepspeed_init` more than once during the life of Trainer, when we do - it's a temp hack based on: https://github.com/deepspeedai/DeepSpeed/issues/1394#issuecomment-937405374 until Deepspeed fixes a bug where it can't resume from a checkpoint after it did some stepping https://github.com/deepspeedai/DeepSpeed/issues/1612 r)rzMZeRO inference only makes sense with ZeRO Stage 3 - please adjust your configrr)NNNtensor_parallel autotp_size)rwtp_sizer1r@c|jSr,) requires_grad)ps rz deepspeed_init..s r")deepspeed.utilsrrwrc acceleratorstatedeepspeed_plugin hf_ds_configr|setLevelget_process_log_levelrsr0del_config_sub_treerr@r;r tp_model_initr1listfilter parametersr) rrx inference ds_loggerrwrcrrrrdeepspeed_tp_sizer s rdeepspeed_initrsq*4 MME <rs /#9HH   H % !7!9O3..2@0@H!%I) -`:#z@#FPr"