L iQFddlZddlmZddlmZddlmZmZmZm Z m Z ddl Z ddl m Z ddl mZddlmZdd lmZdd lmZdd lmZdd lmZdd lmZmZddlmZerddlZer>ddlmZddl m!Z!ddl"m#Z#ddl$m%Z%ddl&m'Z'ddl(m)Z)ddl*m+Z+ddl,m-Z-ddl.m/Z/m0Z0ddl1m2Z2ejfe4Z5GddeZ6y)N)deepcopy)Path) TYPE_CHECKINGAnyCallableOptionalUnion)nn)FullyShardedDataParallel)Dataset)GenerationConfig)is_deepspeed_zero3_enabled)is_fsdp_managed_module)Trainer)is_datasets_availablelogging)deprecate_kwarg)IterableDataset) DataCollator)FeatureExtractionMixin)BaseImageProcessor)PreTrainedModel)ProcessorMixin)PreTrainedTokenizerBase)TrainerCallback)EvalPredictionPredictionOutput)TrainingArgumentsc8eZdZedddd d'deedejfded d ed d eeed dfdeeee e effdeeddee gdfdee dee dge fdee dde eejj eejj"j$fdee ej&ej&gej&fffd Zedee efdefdZ d(deedee e de de e efffd Z d)d edee e de dd!ffd" Z d*dejd#e e eej&effd$edee e de eeeej&eej&ff fd% Zd&ZxZS)+Seq2SeqTrainer tokenizerprocessing_classz5.0.0T)new_nameversionraise_if_both_namesmodelrargsr data_collatorr train_datasetrzdatasets.Dataset eval_dataset)rrrr model_initcompute_loss_funccompute_metricsr callbacksr optimizerspreprocess_logits_for_metricsc t|||||||||| | | |  |jj7|j |jj} | |j _yy)N) r'r(r)r*r+r#r,r-r.r/r0r1)super__init__r(generation_configload_generation_configr')selfr'r(r)r*r+r#r,r-r.r/r0r1 gen_config __class__s b/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/trainer_seq2seq.pyr4zSeq2SeqTrainer.__init__6sx$ ''%-!/+!*G  " 99 & & 244TYY5P5PQJ+5DJJ ( 3gen_config_argreturnct|tr t|}nqt|tr t |n|}d}|j r|j }|j}n|jrn|}tj||} |jd|S#t$r}tt|dzd}~wwxYw)aW Loads a `~generation.GenerationConfig` from the `Seq2SeqTrainingArguments.generation_config` arguments. Args: gen_config_arg (`str` or [`~generation.GenerationConfig]`): `Seq2SeqTrainingArguments.generation_config` argument. Returns: A `~generation.GenerationConfig`. NT)strictz' Fix these issues to train your model.) isinstancerrstrris_filenameparentis_dirfrom_pretrainedvalidate ValueError)r<r8pretrained_model_nameconfig_file_nameexcs r:r6z%Seq2SeqTrainer.load_generation_config]s n&6 7!.1J=G~WZ<[D$8ao !# %,,.#8#=#= (=(D(D%&--/)7%)99:OQabJ U   t  , USX(SST T UsB## C,CC ignore_keysmetric_key_prefixc |j}|jd@|jd/|jj|jj|d<|jd/|jj|jj|d<|j j |_||_t|)|||S)a Run evaluation and returns metrics. The calling script will be responsible for providing a method to compute metrics, as they are task-dependent (pass it to the init `compute_metrics` argument). You can also subclass and override this method to inject custom behavior. Args: eval_dataset (`Dataset`, *optional*): Pass a dataset if you wish to override `self.eval_dataset`. If it is an [`~datasets.Dataset`], columns not accepted by the `model.forward()` method are automatically removed. It must implement the `__len__` method. ignore_keys (`list[str]`, *optional*): A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions. metric_key_prefix (`str`, *optional*, defaults to `"eval"`): An optional prefix to be used as the metrics key prefix. For example the metrics "bleu" will be named "eval_bleu" if the prefix is `"eval"` (default) max_length (`int`, *optional*): The maximum target length to use when predicting with the generate method. num_beams (`int`, *optional*): Number of beams for beam search that will be used when predicting with the generate method. 1 means no beam search. gen_kwargs: Additional `generate` specific kwargs. Returns: A dictionary containing the evaluation loss and the potential metrics computed from the predictions. The dictionary also contains the epoch number which comes from the training state. max_lengthmax_new_tokens num_beamsrLrM) copygetr(generation_max_lengthgeneration_num_beams acceleratorgathergather_function _gen_kwargsr3evaluate)r7r+rLrM gen_kwargsr9s r:r[zSeq2SeqTrainer.evaluatesN __& NN< ( 0/08 //;'+yy'F'FJ| $ >>+ & .4993Q3Q3]&*ii&D&DJ{ ##//66%w +Yjkkr; test_datasetrc |j}|jd@|jd/|jj|jj|d<|jd/|jj|jj|d<|j j |_||_t|)|||S)a& Run prediction and returns predictions and potential metrics. Depending on the dataset and your use case, your test dataset may contain labels. In that case, this method will also return metrics, like in `evaluate()`. Args: test_dataset (`Dataset`): Dataset to run the predictions on. If it is a [`~datasets.Dataset`], columns not accepted by the `model.forward()` method are automatically removed. Has to implement the method `__len__` ignore_keys (`list[str]`, *optional*): A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions. metric_key_prefix (`str`, *optional*, defaults to `"eval"`): An optional prefix to be used as the metrics key prefix. For example the metrics "bleu" will be named "eval_bleu" if the prefix is `"eval"` (default) max_length (`int`, *optional*): The maximum target length to use when predicting with the generate method. num_beams (`int`, *optional*): Number of beams for beam search that will be used when predicting with the generate method. 1 means no beam search. gen_kwargs: Additional `generate` specific kwargs. If your predictions or labels have different sequence lengths (for instance because you're doing dynamic padding in a token classification task) the predictions will be padded (on the right) to allow for concatenation into one array. The padding index is -100. Returns: *NamedTuple* A namedtuple with the following keys: - predictions (`np.ndarray`): The predictions on `test_dataset`. - label_ids (`np.ndarray`, *optional*): The labels (if the dataset contained some). - metrics (`dict[str, float]`, *optional*): The potential dictionary of metrics (if the dataset contained labels). rOrPrQrR) rSrTr(rUrVrWrXrYrZr3predict)r7r]rLrMr\r9s r:r_zSeq2SeqTrainer.predicts^ __& NN< ( 0/08 //;'+yy'F'FJ| $ >>+ & .4993Q3Q3]&*ii&D&DJ{ ##//66%w|Xijjr;inputsprediction_loss_onlyc 2|jjr|rt| ||||Sd|v}|j |}t |dk(r&t |dr|jj}d|vr|d|jdd|vr|d|jdtxst|j}|jd||d<|j}d|vrKd |vrG|dj|d jk(r(|jD cic] \} } | d vs | | }} } t!|jt"rt#j$|jnt'j(} | 5|jj*di||} ddd|jj,j.rd |jj,_|jj,}  jd | j0kr|j3| | j0} nJ| j4>| jd | j4d zkr|j3| | j4d z} t7j85|r|j;5|di|}ddd|j<2|j=|dj?jA}n9t!tBr|dn|dj?jA}nd}ddd|jjDrddfS|r|d}|jd | j0kr|j3|| j0}nM| j4A|jd | j4d zkr"|j3|| j4d z}nd}| |fScc} } w#1swYHxYw#1swYIxYw#1swYxYw)a Perform an evaluation step on `model` using `inputs`. Subclass and override to inject custom behavior. Args: model (`nn.Module`): The model to evaluate. inputs (`dict[str, Union[torch.Tensor, Any]]`): The inputs and targets of the model. The dictionary will be unpacked before being fed to the model. Most models expect the targets under the argument `labels`. Check your model's documentation for all accepted arguments. prediction_loss_only (`bool`): Whether or not to return the loss only. gen_kwargs: Additional `generate` specific kwargs. Return: tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]: A tuple with the loss, logits and labels (each being optional). )rarLlabelsrrZrQNrO synced_gpusdecoder_input_ids)redecoder_attention_maskFr loss)#r(predict_with_generater3prediction_step_prepare_inputslenhasattrrZrSpoprrr'rTshapeitemsr@r summon_full_params contextlib nullcontextgenerater5_from_model_configrO_pad_tensors_to_max_lenrPtorchno_gradcompute_loss_context_managerlabel_smootherdetachmeandictra)r7r'r`rarLr\ has_labelsdefault_synced_gpusgeneration_inputskvsummon_full_params_contextgenerated_tokensr8outputsrhrcr9s r:rkzSeq2SeqTrainer.prediction_steps>yy..2F7*v4HVa+ ' %%f- z?a GD-$@))..0J * $K)@)H NN; ' : %*\*B*J NN< (8:`>TUYU_U_>`$.NN=BU$V =!"KKM ) )#'88!(+115FGZ5[5a5aa"(!A1? % 7 7 C'') # ( V2tzz22U5FU*U  V :: ' ' : :>CDJJ ( ( ;ZZ11  ! !" % (=(= =#;;S>ST**66<<;KjNgNgjkNk;k55fj>W>WZ[>[\F%v--e! V V&..  s=* O-7O-O3P 0 P9A?P 3O=P P  Pc|jYt|jdrC|jj|jjn|jj}nL|jj j!|jj j}n t d|tj|jd|f|j|jz}||ddd|jdf<|S)N pad_token_idzSPad_token_id must be set in the configuration of the model, in order to pad tensorsr)dtypedevicerg) r#rnr eos_token_idr'configrHrxonesrprr)r7tensorrOr padded_tensors r:rwz&Seq2SeqTrainer._pad_tensors_to_max_lenps  ,9N9NP^1_((55A%%22**77  zz  --9#zz00==  !vww$uzz \\!_j )fmm(  06 a+6<<+++,r;) NNNNNNNNNN)NNN)NNeval)Ntest)N) __name__ __module__ __qualname__rrr r Moduler r~rArlisttuplerxoptim Optimizer lr_schedulerLambdaLRTensorr4 staticmethodrr6floatr[r_rboolrkrw __classcell__)r9s@r:r!r!5s[+=wdhi@D.226Y]EI @D04HL7;jvhl$6/:;<$6*+$6 / $6  g/@BT&T UV $6 uWd3<.@%@AB $6# m n $6Xb*;&;<=$6$H-$6"(,<+=t+C"DE$6D!234$6(5;;#8#898EKKD\D\DeDe;ffg$6(0%,, 9UW\WcWc9c0d'e$6j$6L)uS:J5J/K)P`))Z+/+/!' 6lw'6ld3i(6l 6l c5j  6lv,0!' >k>kd3i(>k >k  >kJ,0 m.yym.S% c 1223m.# m. d3i( m. x 68NN Om.^r;r!)7rsrSrpathlibrtypingrrrrr rxr torch.distributed.fsdpr torch.utils.datar generation.configuration_utilsrintegrations.deepspeedrintegrations.fsdprtrainerrutilsrrutils.deprecationrdatasetsrdata.data_collatorrfeature_extraction_utilsrimage_processing_utilsrmodeling_utilsrprocessing_utilsrtokenization_utils_basertrainer_callbackr trainer_utilsrr training_argsr get_loggerrloggerr!rir;r:rs@@ ;$<>51.00@:/0@1?0   H %MWMr;