L i 9ddlZddlmZmZddlmZddlmZddlm Z ddl m Z m Z m Z e jeZe r ddlZddlmZeGd d e Zy) N) dataclassfield)cached_property)Optional)TrainingArguments)is_tf_availableloggingrequires_backends)kerascleZdZUdZdZedddiZeee d<edddiZ eee d <eddd iZ eee d <ed dd iZ e e d<edddiZee d<ededeffdZeddZedefdZedZedefdZedefdZedefdZy)TFTrainingArgumentsa! TrainingArguments is the subset of the arguments we use in our example scripts **which relate to the training loop itself**. Using [`HfArgumentParser`] we can turn this class into [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the command line. Parameters: output_dir (`str`): The output directory where the model predictions and checkpoints will be written. overwrite_output_dir (`bool`, *optional*, defaults to `False`): If `True`, overwrite the content of the output directory. Use this to continue training if `output_dir` points to a checkpoint directory. do_train (`bool`, *optional*, defaults to `False`): Whether to run training or not. This argument is not directly used by [`Trainer`], it's intended to be used by your training/evaluation scripts instead. See the [example scripts](https://github.com/huggingface/transformers/tree/main/examples) for more details. do_eval (`bool`, *optional*): Whether to run evaluation on the validation set or not. Will be set to `True` if `eval_strategy` is different from `"no"`. This argument is not directly used by [`Trainer`], it's intended to be used by your training/evaluation scripts instead. See the [example scripts](https://github.com/huggingface/transformers/tree/main/examples) for more details. do_predict (`bool`, *optional*, defaults to `False`): Whether to run predictions on the test set or not. This argument is not directly used by [`Trainer`], it's intended to be used by your training/evaluation scripts instead. See the [example scripts](https://github.com/huggingface/transformers/tree/main/examples) for more details. eval_strategy (`str` or [`~trainer_utils.IntervalStrategy`], *optional*, defaults to `"no"`): The evaluation strategy to adopt during training. Possible values are: - `"no"`: No evaluation is done during training. - `"steps"`: Evaluation is done (and logged) every `eval_steps`. - `"epoch"`: Evaluation is done at the end of each epoch. per_device_train_batch_size (`int`, *optional*, defaults to 8): The batch size per GPU/TPU core/CPU for training. per_device_eval_batch_size (`int`, *optional*, defaults to 8): The batch size per GPU/TPU core/CPU for evaluation. gradient_accumulation_steps (`int`, *optional*, defaults to 1): Number of updates steps to accumulate the gradients for, before performing a backward/update pass. When using gradient accumulation, one step is counted as one step with backward pass. Therefore, logging, evaluation, save will be conducted every `gradient_accumulation_steps * xxx_step` training examples. learning_rate (`float`, *optional*, defaults to 5e-5): The initial learning rate for Adam. weight_decay (`float`, *optional*, defaults to 0): The weight decay to apply (if not zero). adam_beta1 (`float`, *optional*, defaults to 0.9): The beta1 hyperparameter for the Adam optimizer. adam_beta2 (`float`, *optional*, defaults to 0.999): The beta2 hyperparameter for the Adam optimizer. adam_epsilon (`float`, *optional*, defaults to 1e-8): The epsilon hyperparameter for the Adam optimizer. max_grad_norm (`float`, *optional*, defaults to 1.0): Maximum gradient norm (for gradient clipping). num_train_epochs(`float`, *optional*, defaults to 3.0): Total number of training epochs to perform. max_steps (`int`, *optional*, defaults to -1): If set to a positive number, the total number of training steps to perform. Overrides `num_train_epochs`. For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until `max_steps` is reached. warmup_ratio (`float`, *optional*, defaults to 0.0): Ratio of total training steps used for a linear warmup from 0 to `learning_rate`. warmup_steps (`int`, *optional*, defaults to 0): Number of steps used for a linear warmup from 0 to `learning_rate`. Overrides any effect of `warmup_ratio`. logging_dir (`str`, *optional*): [TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to *runs/**CURRENT_DATETIME_HOSTNAME***. logging_strategy (`str` or [`~trainer_utils.IntervalStrategy`], *optional*, defaults to `"steps"`): The logging strategy to adopt during training. Possible values are: - `"no"`: No logging is done during training. - `"epoch"`: Logging is done at the end of each epoch. - `"steps"`: Logging is done every `logging_steps`. logging_first_step (`bool`, *optional*, defaults to `False`): Whether to log and evaluate the first `global_step` or not. logging_steps (`int`, *optional*, defaults to 500): Number of update steps between two logs if `logging_strategy="steps"`. save_strategy (`str` or [`~trainer_utils.SaveStrategy`], *optional*, defaults to `"steps"`): The checkpoint save strategy to adopt during training. Possible values are: - `"no"`: No save is done during training. - `"epoch"`: Save is done at the end of each epoch. - `"steps"`: Save is done every `save_steps`. save_steps (`int`, *optional*, defaults to 500): Number of updates steps before two checkpoint saves if `save_strategy="steps"`. save_total_limit (`int`, *optional*): If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in `output_dir`. no_cuda (`bool`, *optional*, defaults to `False`): Whether to not use CUDA even when it is available or not. seed (`int`, *optional*, defaults to 42): Random seed that will be set at the beginning of training. fp16 (`bool`, *optional*, defaults to `False`): Whether to use 16-bit (mixed) precision training (through NVIDIA Apex) instead of 32-bit training. fp16_opt_level (`str`, *optional*, defaults to 'O1'): For `fp16` training, Apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']. See details on the [Apex documentation](https://nvidia.github.io/apex/amp). local_rank (`int`, *optional*, defaults to -1): During distributed training, the rank of the process. tpu_num_cores (`int`, *optional*): When training on TPU, the number of TPU cores (automatically passed by launcher script). debug (`bool`, *optional*, defaults to `False`): Whether to activate the trace to record computation graphs and profiling information or not. dataloader_drop_last (`bool`, *optional*, defaults to `False`): Whether to drop the last incomplete batch (if the length of the dataset is not divisible by the batch size) or not. eval_steps (`int`, *optional*, defaults to 1000): Number of update steps before two evaluations. past_index (`int`, *optional*, defaults to -1): Some models like [TransformerXL](../model_doc/transformerxl) or :doc*XLNet <../model_doc/xlnet>* can make use of the past hidden states for their predictions. If this argument is set to a positive int, the `Trainer` will use the corresponding output (usually index 2) as the past state and feed it to the model at the next training step under the keyword argument `mems`. tpu_name (`str`, *optional*): The name of the TPU the process is running on. tpu_zone (`str`, *optional*): The zone of the TPU the process is running on. If not specified, we will attempt to automatically detect from metadata. gcp_project (`str`, *optional*): Google Cloud Project name for the Cloud TPU-enabled project. If not specified, we will attempt to automatically detect from metadata. run_name (`str`, *optional*): A descriptor for the run. Notably used for trackio, wandb, mlflow, comet and swanlab logging. xla (`bool`, *optional*): Whether to activate the XLA compilation or not. tfNhelpz Name of TPU)defaultmetadatatpu_namez Zone of TPUtpu_zonez!Name of Cloud TPU-enabled project gcp_projectg?z,Power for the Polynomial decay LR scheduler. poly_powerFz.Whether to activate the XLA compilation or notxlareturntf.distribute.Strategyct|dgtjdtjj d}|j rtjjd|jr"tjjd}|S |jrKtjjj|j|j |j"}n(tjjj}|r|j rtjjd tjj)|tj*j,j/|tjj1|}|St3|d k(r"tjjd}|St3|d k(r"tjjd }|St3|d kDr tjj5}|St%d#t$$r+|jrt'd|jd d}YbwxYw)NrzTensorflow: setting up strategyGPU mixed_float16z/cpu:0)device)zoneprojectzCouldn't connect to TPU !mixed_bfloat16rrz/gpu:0zJCannot find the proper strategy, please check your environment properties.)r loggerinforconfiglist_physical_devicesfp16r mixed_precisionset_global_policyno_cuda distributeOneDeviceStrategyrcluster_resolverTPUClusterResolverrr ValueError RuntimeErrorexperimental_connect_to_clustertpu experimentalinitialize_tpu_system TPUStrategylenMirroredStrategy)selfgpusstrategyr1s c/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/training_args_tf.py_setup_strategyz#TFTrainingArguments._setup_strategys$' 56yy..u5 99  ! ! 3 3O D <<}}66h6GHFC ==--88KK DMM4CSCSLC--88KKMC99));;##99#>==44S9Ta==::(:KTa==::(:K TQ==99;!!mnn1 ==&)A$--PQ'RSSC  sA?I 0I>=I>c4t|dg|jS)z= The strategy used for distributed training. r)r r;r7s r:r9zTFTrainingArguments.strategys $'###cHt|dg|jjS)Y The number of replicas (CPUs, GPUs or TPU cores) used in this training. r)r r;num_replicas_in_syncr=s r: n_replicaszTFTrainingArguments.n_replicass" $'##888r>cy)zH Whether or not the current process should produce log. Fr=s r: should_logzTFTrainingArguments.should_logs r>c|jrtjd|jxs |j}||jzS)zz The actual batch size for training (may differ from `per_gpu_train_batch_size` in distributed training). zUsing deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.)per_gpu_train_batch_sizer"warningper_device_train_batch_sizerBr7per_device_batch_sizes r:train_batch_sizez$TFTrainingArguments.train_batch_sizesG  ( ( NNO !% = = aAaAa$t66r>c|jrtjd|jxs |j}||jzS)z{ The actual batch size for evaluation (may differ from `per_gpu_eval_batch_size` in distributed training). zUsing deprecated `--per_gpu_eval_batch_size` argument which will be removed in a future version. Using `--per_device_eval_batch_size` is preferred.)per_gpu_eval_batch_sizer"rHper_device_eval_batch_sizerBrJs r:eval_batch_sizez#TFTrainingArguments.eval_batch_sizesG  ' ' NNN !% < < _@_@_$t66r>c|t|dgtjdt|jj S)r@rzaThe n_gpu argument is deprecated and will be removed in a future version, use n_replicas instead.)r warningswarn FutureWarningr;rAr=s r:n_gpuzTFTrainingArguments.n_gpu"s6 $' o  ##888r>)rr)__name__ __module__ __qualname____doc__ frameworkrrrstr__annotations__rrrfloatrboolrtupleintr;propertyr9rBrErLrPrUrDr>r:rr sfENI#-(Hhsm $-(Hhsm "'=>"K# HIJ ev7g.hiCi.'?'D!E..`$$9C99  7# 7 7 7 7 7 9s 9 9r>r)rR dataclassesrr functoolsrtypingr training_argsrutilsr r r get_loggerrVr" tensorflowrmodeling_tf_utilsr rrDr>r:rjs[(%,>>   H %( K9+K9 K9r>