L ieddlZddlZddlZddlmZmZddlmZddlm Z m Z ddl Z ddl m Z ddlmZddlmZdd lmZmZd d lmZmZmZd d lmZej6eZeGd dZGddeZGddeZ y)N) dataclassfield)Enum)OptionalUnion)FileLock)Dataset)PreTrainedTokenizerBase)check_torch_load_is_safelogging)!glue_convert_examples_to_featuresglue_output_modesglue_processors) InputFeaturesceZdZUdZedddj ejziZe e d<eddiZ e e d<ed dd i Z e e d <ed ddi Zee d<dZy)GlueDataTrainingArgumentsz Arguments pertaining to what data we are going to input our model for training and eval. Using `HfArgumentParser` we can turn this class into argparse arguments to be able to specify them on the command line. helpz"The name of the task to train on: z, )metadata task_namezUThe input data dir. Should contain the .tsv files (or other data files) for the task.data_dirzThe maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.)defaultrmax_seq_lengthFz1Overwrite the cached training and evaluation setsoverwrite_cachecB|jj|_yN)rlowerselfs e/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/data/datasets/glue.py __post_init__z'GlueDataTrainingArguments.__post_init__=s--/N)__name__ __module__ __qualname____doc__rjoinrkeysrstr__annotations__rrintrboolr#r$r"rr#sV-QTXT]T]^r^m^r^r^tTu-u$vwIswqrHc  Q NC")\ ]OT0r$rceZdZdZdZdZy)SplittraindevtestN)r%r&r'r2r3r4r/r$r"r1r1As E C Dr$r1c eZdZUdZeed<eed<eeed<de jdfdede de e deee fd e ef d Zd Zd efd ZdZy) GlueDatasetzH This will be superseded by a framework-agnostic approach soon. args output_modefeaturesN tokenizer limit_lengthmode cache_dirc tjdt||_t |j |_t|j |_t|tr t|}tjj||n |j d|j"d|j$j&d|j(d|j }|j j+}|j dvr)|j$j&dvr|d|dc|d<|d<||_|d z}t/|5tjj1|rw|j2skt5j4} t7t9j:|d |_t>jAd |d t5j4| z nOt>jAd|j |tjBk(r&|j jE|j } n^|tjFk(r&|j jI|j } n%|j jK|j } || d|} tM| ||j(||j|_t5j4} t9jN|j<|t>jAd|dt5j4| z dddddy#t$r tdwxYw#1swYyxYw)NuThis dataset will be removed from the library soon, preprocessing should be handled with the 🤗 Datasets library. You can have a look at this example script for pointers: https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue.pyzmode is not a valid split namecached__)mnlizmnli-mm)RobertaTokenizerRobertaTokenizerFastXLMRobertaTokenizer BartTokenizerBartTokenizerFastrz.lockT) weights_onlyz"Loading features from cached file z [took %.3f s]z'Creating features from dataset file at ) max_length label_listr8z!Saving features into cached file z [took z.3fz s])(warningswarn FutureWarningr7rr processorrr8 isinstancer+r1KeyErrorospathr)rvalue __class__r%r get_labelsrJrexistsrtimer torchloadr9loggerinfor3get_dev_examplesr4get_test_examplesget_train_examplesrsave) r!r7r:r;r<r=cached_features_filerJ lock_pathstartexampless r"__init__zGlueDataset.__init__Ps   u    (8:,T^^< dC  AT{ "ww||".IDMMdjj\9#6#6#?#?"@$BUBUAVVWX\XfXfWg h ^^..0 >>0 0Y5H5H5Q5QV 6 ,6a=*Q- (JqM:a=$)72 i  ww~~23D>t}}MHUZZ'#~~?? NH#~~@@OH+' 6H A#22) $ 0 0 !   4==*>? 78L7MWUYU^U^U`chUhilTmmpq;  - A?@@ A,  s' L3G&M 3M Mc,t|jSr)lenr9r s r"__len__zGlueDataset.__len__s4==!!r$returnc |j|Sr)r9)r!is r" __getitem__zGlueDataset.__getitem__s}}Qr$c|jSr)rJr s r"rUzGlueDataset.get_labelss r$)r%r&r'r(rr,r+listrr1r2r rr-rrdrgrkrUr/r$r"r6r6Gs $#=!! '+"'++#' I'I+Ism I CJ I C= IV"  r$r6)!rQrWrK dataclassesrrenumrtypingrrrXfilelockrtorch.utils.datar tokenization_utils_baser utilsr r processors.gluerrrprocessors.utilsr get_loggerr%rZrr1r6r/r$r"rxsx (" $>6cc,   H % 00 0:D ['[r$