L i΅$dZddlZddlZddlZddlZddlZddlZddlZddlZddl Z ddl m Z m Z m Z mZmZddlZddlmZmZmZmZmZmZmZmZmZmZmZmZmZerddl Z de!de!de!fd Z"dDd e!d e#fd Z$dDd e!d e#fdZ%dZ&GddZ'Gdde Z(Gdde Z)Gdde Z*dZ+ejXde+zdzZ-dZ.GddeZ/GddeZ0Gd d!eZ1Gd"d#eZ2Gd$d%e Z3d&e4e5e6fd'e6fd(Z7d'e4e5e6ffd)Z8d'e4e5e ffd*Z9d+Z:d'e4e5e ffd,Z;Gd-d.eZdEd1Z?Gd2d3eZ@Gd4d5ZAd6ZBd7ZCd8ZD dFd9ee d:e!d;e#fd<ZEGd=d>eZFGd?d@ZGdDdAe5dBe#fdCZHy)Gz6 PyTorch-independent utilities for the Trainer class. N)AnyCallable NamedTupleOptionalUnion) ExplicitEnumis_psutil_availableis_tf_availableis_torch_availableis_torch_cuda_availableis_torch_hpu_availableis_torch_mlu_availableis_torch_mps_availableis_torch_musa_availableis_torch_npu_availableis_torch_xla_availableis_torch_xpu_availablerequires_backends worker_id num_workersrankcXtjdz}||z|z}t|y)zN Helper function to set worker seed during Dataloader initialization. lN)torch initial_seedset_seed)rrr init_seed worker_seeds `/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/trainer_utils.py seed_workerr 4s.""$u,I$y0K [seed warn_onlyct|trdtjd<dtjd<dtjd<dtjd<dtjd<t j d| dtj j_d tj j_ tr)d d l }|jjjy y ) a  Helper function for reproducible behavior during distributed training. See - https://pytorch.org/docs/stable/notes/randomness.html for pytorch - https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_op_determinism for tensorflow 1CUDA_LAUNCH_BLOCKINGz:16:8CUBLAS_WORKSPACE_CONFIGASCEND_LAUNCH_BLOCKINGHCCL_DETERMINISTICFLASH_ATTENTION_DETERMINISTICT)r#FrN)rr osenvironruse_deterministic_algorithmsbackendscudnn deterministic benchmarkr tensorflowconfig experimentalenable_op_determinism)r"r#tfs renable_full_determinismr7=s TN.1 )*07 ,-/2 +,+. '(69 23 **49E.2*).& 446r!r0cRtj|tjj|trKt j |tj j||rt jdtrtjj|trtjj|trtjj|trtj j|t#rtj$j|t'rGddl}|jj+||r%|j,j.j1yyy)aY Helper function for reproducible behavior to set the seed in `random`, `numpy`, `torch` and/or `tf` (if installed). Args: seed (`int`): The seed to set. deterministic (`bool`, *optional*, defaults to `False`): Whether to use deterministic algorithms where available. Can slow down training. TrN)randomr"npr r manual_seedcudamanual_seed_allr-rmlurmusarnpurhpurxpur r2rr3r4r5)r"r0r6s rrr]s  KKIINN4 $ ""4(   . .t 4 !!$'  ""4( !!$' !!$' !!$' 4  II " " 8 8 :  r!c |jrtj|jd|jdz}|jtj |z }|tj |j| |z}|S)aL Implements the NEFTune forward pass for the model using forward hooks. Note this works only for torch.nn.Embedding layers. This method is slightly adapted from the original source code that can be found here: https://github.com/neelsjain/NEFTune Simply add it to your model as follows: ```python model = ... model.embed_tokens.neftune_noise_alpha = 0.1 model.embed_tokens.register_forward_hook(neftune_post_forward_hook) ``` Args: module (`torch.nn.Module`): The embedding module where the hook is attached. Note that you need to set `module.neftune_noise_alpha` to the desired noise alpha value. input (`torch.Tensor`): The input tensor to the model. output (`torch.Tensor`): The output tensor of the model (i.e. the embeddings). r)trainingrtensorsizeneftune_noise_alphasqrt zeros_likeuniform_)moduleinputoutputdimsmag_norms rneftune_post_forward_hookrQsp&||FKKNV[[^;<-- 40@@%**62;;XIxPP Mr!c &eZdZdZ d deej eej fdeej eej fdeeej eej fdeeej eej ffdZ dZ d Z y) EvalPredictionaz Evaluation output (always contains labels), to be used to compute metrics. Parameters: predictions (`np.ndarray`): Predictions of the model. label_ids (`np.ndarray`): Targets to be matched. inputs (`np.ndarray`, *optional*): Input data passed to the model. losses (`np.ndarray`, *optional*): Loss values computed during evaluation. N predictions label_idsinputslossesc(||_||_||_||_|j|jf|_|j |xj|jfz c_|j!|xj|jfz c_yyN)rTrUrVrWelements)selfrTrUrVrWs r__init__zEvalPrediction.__init__sx'"  ))4>>: ;; " MMdkk^ +M ;; " MMdkk^ +M #r!c,t|jSrY)iterrZr[s r__iter__zEvalPrediction.__iter__sDMM""r!cp|dks|t|jk\r td|j|S)Nrztuple index out of range)lenrZ IndexError)r[idxs r __getitem__zEvalPrediction.__getitem__s4 7cS//78 8}}S!!r!)NN) __name__ __module__ __qualname____doc__rr:ndarraytuplerr\r`rer!rrSrSsBFAE ,2::uRZZ'889,U2::%667,rzz5+<<=> , rzz5+<<=> ,"#"r!rSceZdZUeej eej fed<eeej eej fed<ee e e fed<ee ed<y)EvalLoopOutputrTrUmetrics num_samplesN) rfrgrhrr:rjrk__annotations__rdictstrfloatintrlr!rrnrns_rzz5#4455bjj% *;;<== d3:& ''#r!rnceZdZUeej eej fed<eeej eej fed<ee e e fed<y)PredictionOutputrTrUroN) rfrgrhrr:rjrkrqrrrrsrtrlr!rrwrwsUrzz5#4455bjj% *;;<== d3:& ''r!rwc6eZdZUeed<eed<eeefed<y) TrainOutput global_step training_lossroN)rfrgrhrurqrtrrrsrlr!rryrys #u* r!ry checkpoint^z\-(\d+)$c ntj|}|Dcgc]V}tj|?tjj tjj ||r|X}}t|dk(rytjj |t|dScc}w)Nrc`ttj|jdS)Nr)ru_re_checkpointsearchgroups)xs rz%get_last_checkpoint..s&s>CXCXYZC[CbCbCdefCg?hr!)key) r+listdirrrpathisdirjoinrbmax)foldercontentr checkpointss rget_last_checkpointrsjj G    & 2rww}}RWW\\RXZ^E_7` K  ;1 77<<K5h i jjsAB2ceZdZdZdZdZy)IntervalStrategynostepsepochNrfrgrhNOSTEPSEPOCHrlr!rrr B E Er!rceZdZdZdZdZdZy) SaveStrategyrrrbestN)rfrgrhrrrBESTrlr!rrrs B E E Dr!rceZdZdZdZdZy)EvaluationStrategyrrrNrrlr!rrrrr!rceZdZdZdZdZdZy) HubStrategyend every_saver|all_checkpointsN)rfrgrhEND EVERY_SAVE CHECKPOINTALL_CHECKPOINTSrlr!rrrs CJJ'Or!rc^eZdZUdZeed<eeeefed<e ee fed<dZ e e ed<y)BestRunac The best run found by a hyperparameter search (see [`~Trainer.hyperparameter_search`]). Parameters: run_id (`str`): The id of the best run (if models were saved, the corresponding checkpoint will be in the folder ending with run-{run_id}). objective (`float`): The objective that was obtained for this run. hyperparameters (`dict[str, Any]`): The hyperparameters picked to get this run. run_summary (`Optional[Any]`): A summary of tuning experiments. `ray.tune.ExperimentAnalysis` object for Ray backend. run_id objectivehyperparametersN run_summary) rfrgrhrirsrqrrtlistrrrrrrlr!rrrs>  KUDK'((#s(^#!%K#%r!rroreturnctj|}|jdd}|jdd}|Dcgc]8}|jds#|jds|jds7|:}}|D]}|j|d}t |dk(r|St |j Scc}w)aj The default objective to maximize/minimize when doing an hyperparameter search. It is the evaluation loss if no metrics are provided to the [`Trainer`], the sum of all metrics otherwise. Args: metrics (`dict[str, float]`): The metrics returned by the evaluate method. Return: `float`: The objective to minimize or maximize eval_lossNr_runtime _per_second_compilation_timer)copydeepcopypopendswithrbsumvalues)roloss_m speed_metricssms rdefault_compute_objectiversmmG$G ;;{D )D GT"Aajj4 =8QUVU_U_`sUtM" KKD !"w<1$4?#gnn.>*?? s 8B?7B?cddlm}|sJd|jdddd|jd dd |jd dd |j d gddS)Nr)is_optuna_availablez:This function needs Optuna installed: `pip install optuna` learning_rateư>-C6?T)lognum_train_epochsr"(per_device_train_batch_size @rrr"r) integrationsr suggest_float suggest_intsuggest_categorical)trialrs rdefault_hp_space_optunar*so1  ^"^^ ,,_dDd,S!--.@!QG!!&!R0','@'@A^`r's  r!c ddlm}|sJdddlm}|j dd|j t tdd|jdd |j gd d S) Nr)is_ray_tune_availablez:This function needs ray installed: `pip install ray[tune]`r)tunerrrrr) rrrayr loguniformchoicerrangeuniform)rrrs rdefault_hp_space_rayr6sf3 "`$`` "t4 KKU1a[(9: Q#'+{{3E'F  r!cFddddddddd dd d d dd ddd d gddddgS)Nrr)minrrdoubler)boundsnametypetransformationrrrru)rrrrr")48163264r categorical)categorical_valuesrrrl)rs rdefault_hp_space_sigoptrDsL-PXlqrQ'1CUSR(&%H">1!  r!cnddlm}|s tddddddd d d d dd d d ddd dgdiddS)Nr)is_wandb_availablez8This function needs wandb installed: `pip install wandb`r9rminimize)rgoalrrr) distributionrr int_uniformrrrrr)methodmetric parameters)rr ImportError)rrs rdefault_hp_space_wandbrQs[0  TUU& ;.7TR1>qQR S%21RH,46H+I    r!ceZdZdZdZdZdZy)HPSearchBackendoptunarsigoptwandbN)rfrgrhOPTUNARAYSIGOPTWANDBrlr!rrrcs F C F Er!rcPtrddlm}|jdk(S|dvS)z Whether or not the current process is the local process, based on `xr.global_ordinal()` (for TPUs) first, then on `local_rank`. rN)r)rtorch_xla.runtimeruntimeglobal_ordinal) local_rankxrs ris_main_processr js, &  "a''   r!ctrddlm}|jS|dk7r(t rddl}|j jSy)zg Return the number of processes launched in parallel. Works with `torch.distributed` and TPUs. rNrr)rrr world_sizer r distributedget_world_size)r r rs rtotal_processes_numberrvsB&}} r 02  //11 r!ctj|z }|dt|di}|dk(r|S|||z }t|d||d<|||z }t|d||d<|||z } t| d||d<|S)a Measure and return speed performance metrics. This function requires a time snapshot `start_time` before the operation to be measured starts and this function should be run immediately after the operation to be measured has completed. Args: - split: name to prefix metric (like train, eval, test...) - start_time: operation start time - num_samples: number of samples processed - num_steps: number of steps processed - num_tokens: number of tokens processed rrr_samples_per_second_steps_per_second_tokens_per_second)timeround) split start_timerp num_steps num_tokensrresultsamples_per_secondsteps_per_secondtokens_per_seconds rrrsiikJ&Gx %"3 4F!| (72056H!0L%+,-$w..34Da.H%)*+&0/45F/J%*+, Mr!c<eZdZdZdZdZdZdZdZdZ dZ d Z d Z d Z d Zy ) SchedulerTypea Scheduler names for the parameter `lr_scheduler_type` in [`TrainingArguments`]. By default, it uses "linear". Internally, this retrieves `get_linear_schedule_with_warmup` scheduler from [`Trainer`]. Scheduler types: - "linear" = [`get_linear_schedule_with_warmup`] - "cosine" = [`get_cosine_schedule_with_warmup`] - "cosine_with_restarts" = [`get_cosine_with_hard_restarts_schedule_with_warmup`] - "polynomial" = [`get_polynomial_decay_schedule_with_warmup`] - "constant" = [`get_constant_schedule`] - "constant_with_warmup" = [`get_constant_schedule_with_warmup`] - "inverse_sqrt" = [`get_inverse_sqrt_schedule`] - "reduce_lr_on_plateau" = [`get_reduce_on_plateau_schedule`] - "cosine_with_min_lr" = [`get_cosine_with_min_lr_schedule_with_warmup`] - "cosine_warmup_with_min_lr" = [`get_cosine_with_min_lr_schedule_with_warmup_lr_rate`] - "warmup_stable_decay" = [`get_wsd_schedule`] linearcosinecosine_with_restarts polynomialconstantconstant_with_warmup inverse_sqrtreduce_lr_on_plateaucosine_with_min_lrcosine_warmup_with_min_lrwarmup_stable_decayN)rfrgrhriLINEARCOSINECOSINE_WITH_RESTARTS POLYNOMIALCONSTANTCONSTANT_WITH_WARMUP INVERSE_SQRTREDUCE_ON_PLATEAUCOSINE_WITH_MIN_LRCOSINE_WARMUP_WITH_MIN_LRWARMUP_STABLE_DECAYrlr!rr"r"sF"F F1JH1!L.- ;/r!r"cTeZdZdZddddddZddZdZd Zd Zd Z d Z d Z ddZ y)TrainerMemoryTrackera A helper class that tracks cpu and gpu memory. This class will silently skip unless `psutil` is available. Install with `pip install psutil`. When a stage completes, it can pass metrics dict to update with the memory metrics gathered during this stage. Example : ```python self._memory_tracker = TrainerMemoryTracker(self.args.skip_memory_metrics) self._memory_tracker.start() # code ... metrics = {"train_runtime": 10.5} self._memory_tracker.stop_and_update_metrics(metrics) ``` At the moment GPU tracking is only for `pytorch`, but can be extended to support `tensorflow`. To understand this class' intricacies please read the documentation of [`~Trainer.log_metrics`]. inittrainevaltest)r\r<_inner_training_loopevaluatepredictc||_tsd|_|jryddl}tst s t rddl}||_i|_n{trddl}||_i|_n^trddl}||_i|_nAtrddl}||_i|_n$trddl}||_i|_nd|_|j|_ d|_i|_d|_y)NTrF)skip_memory_metricsr psutilr rrrgpurrrrProcessprocess cur_stagecpu init_reported)r[rCrDrs rr\zTrainerMemoryTracker.__init__s#6 "$'+D $  # #  " $(>(@D[D] DJDH # % DJDH # % DJDH # % DJDH # % DJDHDJ~~' "r!ctjjjjj}||j vr|j |St d|d|j j)z+derives the stage/caller name automaticallyzwas called from z+, but only expect to be called from one of )inspect currentframef_backf_codeco_namestages ValueErrorkeys)r[callers r derive_stagez!TrainerMemoryTracker.derive_stagesr%%'..55<<DD T[[ ;;v& &"6(*UVZVaVaVfVfVhUij r!cJ|jjjS)z4get resident set size memory for the current process)rG memory_inforssr_s r cpu_mem_usedz!TrainerMemoryTracker.cpu_mem_useds||'')---r!cd|_ t|j|j|_|jsy7)Nr)cpu_mem_used_peakrrYpeak_monitoringr_s rpeak_monitor_funcz&TrainerMemoryTracker.peak_monitor_funcs>!#%():):)T>T%UD " ''r!c|jry|j}|j|j|k7ry||_tj|j t j jrJ|j j j|j j jntrJ|j jj|j jjnWtrJ|j jj|j jjntrI|j jj|j jjnt!rI|j j"j|j j"jn]t%r%|j j&jn.t)r$|j j*j|j t j jr+|j j j-|_n8tr+|j jj-|_ntr*|j jj-|_ntr*|j jj-|_nt!r*|j j"j-|_ngt%r*|j j&j-|_n3t)r)|j j*j1|_|j3|_d|_t9j:|j<}d|_|jAy)z%start tracking for the caller's stageNT)target)!rCrUrHgccollectrr< is_availablereset_peak_memory_stats empty_cacherr>rr?rrBrr@rrArmpsmemory_allocatedgpu_mem_used_at_startcurrent_allocated_memoryrYcpu_mem_used_at_startr\ threadingThreadr]daemonstart)r[stagepeak_monitor_threads rrmzTrainerMemoryTracker.start*s  # # !!# >> %$..E*A  :: !zz&&( 779 ++-') 668 **,(* 779 ++-') 668 **,') 668 **,') 668() **, :: !zz&&(-1ZZ__-M-M-O*')-1ZZ^^-L-L-N*(*-1ZZ__-M-M-O*')-1ZZ^^-L-L-N*')-1ZZ^^-L-L-N*')-1ZZ^^-L-L-N*')-1ZZ^^-T-T-V*&*%6%6%8"#'..d6L6LM%)"!!#r!c |j|j|k7ryd|_tj|j8tj j r%|jj jntr%|jjjntr%|jjjntr%|jjjnhtr%|jjjn9t!rn.t#r$|jj$j|j:tj j rT|jj j'|_|jj j+|_ntrT|jjj'|_|jjj+|_ntrT|jjj'|_|jjj+|_n^trT|jjj'|_|jjj+|_ntrS|jjj'|_|jjj+|_nt!rS|jj.j'|_|jj.j+|_nFt#r1|jj$j1|_d|_n t3d|j4|j(|j(|j4z d|j6|j<|j,>t9d|j,|j(z |j6|jd<nd|j6|jd<|j;|_|j>|j<|j<|j>z t9d|j@|j<z d|jB|j<d|_y) z"stop tracking for the passed stageNFzNo available GPU device found!)beginrallocrpeakedz Not available)rqrrrrs)"rHr\r`rarr<rbrdrr>rr?rrBrr@rrrerfgpu_mem_used_nowmax_memory_allocatedgpu_mem_used_peakrArhrRrgrErrYcpu_mem_used_nowrir[rI)r[rns rstopzTrainerMemoryTracker.stophs >> %$..E*A  % :: !zz&&( ++-') **,(* ++-') **,') **,')') **, :: !zz&&((, (H(H(J%)-)M)M)O&')(, (G(G(I%)-)L)L)N&(*(, (H(H(J%)-)M)M)O&')(, (G(G(I%)-)L)L)N&')(, (G(G(I%)-)L)L)N&')(, (G(G(I%)-)L)L)N&')(, (O(O(Q%)-&!!ABB33,,//$2L2LL(DHHT^^ $ %%158D> %$..E*A !! MM!V $!%D  OE( ODHH$dhhuo)=<@HHUOAz(denumpify_detensorize..sG!215Gsr) isinstancerrkrrritemsrr:genericitemr rTensornumel)rokvs rrrs'D%=)tG}GwGGG GT "tG}gmmoVdaa!6q!99VWW GRZZ (||~  *Well"C [\H\||~ N WsC. cFt|tjr`tt j |j j}|t|jz t|jz Stt j |jS)za Return the number of arguments of the passed function, even if it's a partial function. ) r functoolspartialrbrL signaturefuncrargskeywords)r total_argss rnumber_of_argumentsrsp$ ))***4995@@A C N*S-??? w  &11 22r!functionstarting_batch_sizeauto_find_batch_sizec|tjt||S|r ttdddlm}|||Stj||S)a+ Args: A basic decorator that will try to execute `function`. If it fails from exceptions related to out-of-memory or CUDNN, the batch size is multiplied by 0.9 and passed to `function`. `function` must take in a `batch_size` parameter as its first argument. function (`Callable`, *optional*) A function to wrap starting_batch_size (`int`, *optional*) The batch size to try and fit into memory auto_find_batch_size (`bool`, *optional*) If False, will just execute `function` )rr accelerater)find_executable_batch_size)rr) batch_size)rrrraccelerate.utils)rrr%accelerate_find_executable_batch_sizes rrrs[  & 3!5  4lCh4h\opp   X2E FFr!c(eZdZdZdZdZdZdZdZdZ y) FSDPOption full_shard shard_grad_opno_shard hybrid_shardhybrid_shard_zero2offload auto_wrapN) rfrgrh FULL_SHARD SHARD_GRAD_OPNO_SHARD HYBRID_SHARDHYBRID_SHARD_ZERO2OFFLOAD AUTO_WRAPrlr!rrr:s&J#MH!L-GIr!rcVeZdZdZ d deedeefdZdedefdZd e efd Z y) RemoveColumnsCollatorzWWrap the data collator to remove unused columns before they are passed to the collator.N model_name descriptioncX||_||_||_||_||_d|_y)NF) data_collatorsignature_columnsloggerrrmessage_logged)r[rrrrrs rr\zRemoveColumnsCollator.__init__Gs2+!2 &$#r!featurerc pt|ts|S|js|jr|jrt t |jt |jz }t|dkDr|jdnd|jd}|jjd|d|jddj|d dj|d |jd d |_|jDcic]\}}||jvs||c}}Scc}}w) Nrzin the z setzThe following columns z) don't have a corresponding argument in `z!.forward` and have been ignored: z, z. If z are not expected by `z/.forward`, you can safely ignore this message.T)rrrrrrrsetrSrrbrinforr)r[rignored_columnsdset_descriptionrrs r_remove_columnsz%RemoveColumnsCollator._remove_columnsVs'4(N""t{{t"3w||~#6T=S=S9T#TUO?#a')-)9)9)A2QUQaQaPbbfGg    ,-=,>?((I$))TcJdIef99_566LT__L]^;; '+#!(PAA9O9O4O1PPPs D2)D2featurescj|Dcgc]}|j|}}|j|Scc}wrY)rr)r[rrs r__call__zRemoveColumnsCollator.__call__fs7AIJgD((1JJ!!(++Ks0NNN) rfrgrhrirrsr\rrrrrrlr!rrrDsWa $(%) $ SM $ c] $QtQQ ,d,r!rrreturn_is_regexcd}d}t|tr%ttj|}|k7}n6|vrd}n/t fd|Drd}nt fd|Drd}d}|r||fS|S)aKA helper method to check if the passed module's key name matches any of the target modules in the optim_target_modules. Args: optim_target_modules (`Union[str, list[str]]`): A list of strings to try to match. Can be also a full string. key (`str`): A key to search any matches in optim_target_modules return_is_regex (`bool`): If set to `True`, the method will return whether the passed `optim_target_modules` is a regex or not. Returns: `bool` : True of match object if key matches any target modules from config, False or None if no match found `bool` : If the matched target module is a regex to silence out the warnings in Trainer for extra modules being found (only if `target_module_found=True` for an array of regex). FTc3&K|]}|v ywrYrl)r target_keyrs rrz-check_target_module_exists..s F:Z3  Fsc3\K|]#}ttj|%ywrY)boolre fullmatch)roptim_target_modulers rrz-check_target_module_exists..s# j>QT",,2C8 9 js),)rrsrrrany)optim_target_modulesrrtarget_module_foundis_regexs ` rcheck_target_module_existsrks$ H&,"2<<0Dc#JK'3. $ $" F1E F F" jUi j j""H,, r!rr)NF)Irirrr`rLr+r9rrjrtypingrrrrrnumpyr:utilsr r r r r rrrrrrrrrrur rr7rrQrSrnrwryPREFIX_CHECKPOINT_DIRcompilerrrrrrrrrrsrtrrrrrrr rrr"r:rrrrrrrrlr!rrs&    =="3S7#7$7@!;3!;t!;H4""""JZ(z( * %D#88;FG k| < (,(&j&,@tCJ'7@E@. d3:&6  4S>  T#s(^$l !  <0L0>o0o0d  3glGx G>AG_cG>$,$,N$#$PT$r!