L i L ddlZddlZddlZddlZddlmZddlmZmZm Z m Z m Z m Z ddl mZddlmZmZmZmZgdZe dZe dd ZeeefZeed fZe d eeZGd deeZGddeee eZGddeeed fZGddeeZ GddeeZ!GddeZ"GddeeZ#efdeedee e$e%fde ede&e#efdZ'y) N)Sequence)castGenericIterableOptionalTypeVarUnion) deprecated)default_generator GeneratorrandpermTensor)DatasetIterableDataset TensorDataset StackDataset ConcatDataset ChainDatasetSubset random_split_T_T_coT) covariant._T_stackc$eZdZdZdefdZddZy)raAn abstract class representing a :class:`Dataset`. All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:`__len__`, which is expected to return the size of the dataset by many :class:`~torch.utils.data.Sampler` implementations and the default options of :class:`~torch.utils.data.DataLoader`. Subclasses could also optionally implement :meth:`__getitems__`, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples. .. note:: :class:`~torch.utils.data.DataLoader` by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided. returnctd)Nz3Subclasses of Dataset should implement __getitem__.)NotImplementedErrorselfindexs ^/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/torch/utils/data/dataset.py __getitem__zDataset.__getitem__:s!"WXXct||gSN)rr others r"__add__zDataset.__add__AsdE]++r$N)r(zDataset[_T_co]rzConcatDataset[_T_co])__name__ __module__ __qualname____doc__rr#r)r$r"rr's$YEY,r$rc"eZdZdZdeefdZy)ra?An iterable Dataset. All datasets that represent an iterable of data samples should subclass it. Such form of datasets is particularly useful when data come from a stream. All subclasses should overwrite :meth:`__iter__`, which would return an iterator of samples in this dataset. When a subclass is used with :class:`~torch.utils.data.DataLoader`, each item in the dataset will be yielded from the :class:`~torch.utils.data.DataLoader` iterator. When :attr:`num_workers > 0`, each worker process will have a different copy of the dataset object, so it is often desired to configure each copy independently to avoid having duplicate data returned from the workers. :func:`~torch.utils.data.get_worker_info`, when called in a worker process, returns information about the worker. It can be used in either the dataset's :meth:`__iter__` method or the :class:`~torch.utils.data.DataLoader` 's :attr:`worker_init_fn` option to modify each copy's behavior. Example 1: splitting workload across all workers in :meth:`__iter__`:: >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_DATALOADER) >>> # xdoctest: +SKIP("Fails on MacOS12") >>> class MyIterableDataset(torch.utils.data.IterableDataset): ... def __init__(self, start, end): ... super(MyIterableDataset).__init__() ... assert end > start, "this example only works with end >= start" ... self.start = start ... self.end = end ... ... def __iter__(self): ... worker_info = torch.utils.data.get_worker_info() ... if worker_info is None: # single-process data loading, return the full iterator ... iter_start = self.start ... iter_end = self.end ... else: # in a worker process ... # split workload ... per_worker = int(math.ceil((self.end - self.start) / float(worker_info.num_workers))) ... worker_id = worker_info.id ... iter_start = self.start + worker_id * per_worker ... iter_end = min(iter_start + per_worker, self.end) ... return iter(range(iter_start, iter_end)) ... >>> # should give same set of data as range(3, 7), i.e., [3, 4, 5, 6]. >>> ds = MyIterableDataset(start=3, end=7) >>> # Single-process loading >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0))) [tensor([3]), tensor([4]), tensor([5]), tensor([6])] >>> # xdoctest: +REQUIRES(POSIX) >>> # Multi-process loading with two worker processes >>> # Worker 0 fetched [3, 4]. Worker 1 fetched [5, 6]. >>> # xdoctest: +IGNORE_WANT("non deterministic") >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2))) [tensor([3]), tensor([5]), tensor([4]), tensor([6])] >>> # With even more workers >>> # xdoctest: +IGNORE_WANT("non deterministic") >>> print(list(torch.utils.data.DataLoader(ds, num_workers=12))) [tensor([3]), tensor([5]), tensor([4]), tensor([6])] Example 2: splitting workload across all workers using :attr:`worker_init_fn`:: >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_DATALOADER) >>> class MyIterableDataset(torch.utils.data.IterableDataset): ... def __init__(self, start, end): ... super(MyIterableDataset).__init__() ... assert end > start, "this example only works with end >= start" ... self.start = start ... self.end = end ... ... def __iter__(self): ... return iter(range(self.start, self.end)) ... >>> # should give same set of data as range(3, 7), i.e., [3, 4, 5, 6]. >>> ds = MyIterableDataset(start=3, end=7) >>> # Single-process loading >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0))) [3, 4, 5, 6] >>> >>> # Directly doing multi-process loading yields duplicate data >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2))) [3, 3, 4, 4, 5, 5, 6, 6] >>> # Define a `worker_init_fn` that configures each dataset copy differently >>> def worker_init_fn(worker_id): ... worker_info = torch.utils.data.get_worker_info() ... dataset = worker_info.dataset # the dataset copy in this worker process ... overall_start = dataset.start ... overall_end = dataset.end ... # configure the dataset to only process the split workload ... per_worker = int(math.ceil((overall_end - overall_start) / float(worker_info.num_workers))) ... worker_id = worker_info.id ... dataset.start = overall_start + worker_id * per_worker ... dataset.end = min(dataset.start + per_worker, overall_end) ... >>> # Mult-process loading with the custom `worker_init_fn` >>> # Worker 0 fetched [3, 4]. Worker 1 fetched [5, 6]. >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2, worker_init_fn=worker_init_fn))) [3, 5, 4, 6] >>> # With even more workers >>> print(list(torch.utils.data.DataLoader(ds, num_workers=12, worker_init_fn=worker_init_fn))) [3, 4, 5, 6] r(ct||gSr&)rr's r"r)zIterableDataset.__add__sT5M**r$N)r*r+r,r-rrr)r.r$r"rrIsjX+WU^+r$rcBeZdZUdZeedfed<deddfdZdZdZ y) rzDataset wrapping tensors. Each sample will be retrieved by indexing tensors along the first dimension. Args: *tensors (Tensor): tensors that have the same size of the first dimension. .tensorsrNcJtfdDsJd|_y)Nc3jK|]*}djd|jdk(,yw)rN)size).0tensorr2s r" z)TensorDataset.__init__..s+NF71:??1%Q7Ns03zSize mismatch between tensors)allr2)r r2s `r"__init__zTensorDataset.__init__s*NgNN + N r$c@tfd|jDS)Nc3(K|] }| ywr&r.)r6r7r!s r"r8z,TensorDataset.__getitem__..s>vVE]>)tupler2rs `r"r#zTensorDataset.__getitem__s>>>>r$c>|jdjdSNr)r2r5r s r"__len__zTensorDataset.__len__s||A##A&&r$) r*r+r,r-r>r__annotations__r:r#rBr.r$r"rrs563; D ?'r$rc^eZdZUdZeeefed<dee dee ddfdZ dZ d e fd Z d Zy) raDataset as a stacking of multiple datasets. This class is useful to assemble different parts of complex input data, given as datasets. Example: >>> # xdoctest: +SKIP >>> images = ImageDataset() >>> texts = TextDataset() >>> tuple_stack = StackDataset(images, texts) >>> tuple_stack[0] == (images[0], texts[0]) >>> dict_stack = StackDataset(image=images, text=texts) >>> dict_stack[0] == {"image": images[0], "text": texts[0]} Args: *args (Dataset): Datasets for stacking returned as tuple. **kwargs (Dataset): Datasets for stacking returned as dict. datasetsargskwargsrNcV|rG|r tdt|d_tfd|Dr td|_y|rSt |j }t|d_tfd|Dr td|_ytd)NztSupported either ``tuple``- (via ``args``) or``dict``- (via ``kwargs``) like input/output, but both types are given.rc3NK|]}jt|k7ywr&_lengthlenr6datasetr s r"r8z(StackDataset.__init__..sDG4<<3w</D"%zSize mismatch between datasetsc3NK|]}jt|k7ywr&rJrMs r"r8z(StackDataset.__init__..sCG4<<3w</CrOz%At least one dataset should be passed) ValueErrorrLrKanyrElistvalues)r rFrGtmps` r"r:zStackDataset.__init__s  ^tAw.sAWU^Ar=) isinstancerEdictitemsr>)r r!krNs ` r"r#zStackDataset.__getitem__sW dmmT *8< 8K8K8MN*!WAwu~%N NA4==AAAOsA+indicesc t|jtr|Dcgc]}i}}|jjD]\}}t t |ddre|j |}t|t|k7r#tdt|dt|t||D] \}}|||< t||D] \} }|| ||<|S|Dcgc]}g} }|jD]}t t |ddrq|j |}t|t|k7r#tdt|dt|t|| D]\}} | j|t|| D]\} } | j|| | D cgc] } t| } } | Scc}wcc}wcc} w)N __getitems__z0Nested dataset's output size mismatch. Expected z, got ) rXrErYrZcallablegetattrr^rLrQzipappendr>)r r\_ dict_batchr[rNrZdatad_sampleidx list_batcht_samplesample tuple_batchs r"r^zStackDataset.__getitems__s dmmT *5<(=(=J(="mm113 3 7GG^TBC#009E5zS\1()),WfSZLJ+.eZ*@+h&* +*-Wj)A3 X&-cl 3 3 /6!6"!6 !6}} 2G>?,,W5u:W-$%%(\N&U F'*%&<*ND(OOD)*&)*%=2MCOOGCL12 2DN&NuV}&N &NA)>""7'Os G) G8Gc|jSr&)rKrAs r"rBzStackDataset.__len__(s ||r$)r*r+r,r-r r>rYrCrrr:r#rSr^rBr.r$r"rrsV$E4K  FgenFF4F(B #D#Jr$rceZdZUdZeeeed<eeed<e dZ de eddffd Z dZ d Zeed e d ZxZS) rzDataset as a concatenation of multiple datasets. This class is useful to assemble different existing datasets. Args: datasets (sequence): List of datasets to be concatenated rEcumulative_sizescdgd}}|D]&}t|}|j||z||z }(|Sr@)rLrb)sequencersels r"cumsumzConcatDataset.cumsum8sB11 AAA HHQUO FA r$rNct|t||_t |jdkDsJd|jD]}t |t sJd|j|j|_y)Nrz(datasets should not be an empty iterablez.ConcatDataset does not support IterableDataset) superr:rSrErLrXrrurn)r rEd __class__s r"r:zConcatDataset.__init__Asz X 4==!A%Q'QQ% A!!_5 @ 5 !% DMM :r$c |jdS)NrnrAs r"rBzConcatDataset.__len__Ks$$R((r$c|dkr(| t|kDr tdt||z}tj|j|}|dk(r|}n||j|dz z }|j ||S)Nrz8absolute value of index should not exceed dataset length)rLrQbisect bisect_rightrnrE)r rg dataset_idx sample_idxs r"r#zConcatDataset.__getitem__Ns 7tc$i Nd)c/C))$*?*?E ! Jt44[1_EEJ}}[)*55r$z>`cummulative_sizes` attribute is renamed to `cumulative_sizes`)categoryc|jSr&r|rAs r"cummulative_sizeszConcatDataset.cummulative_sizes\s $$$r$)r*r+r,r-rSrrrCint staticmethodrurr:rBr#propertyr FutureWarningr __classcell__rys@r"rr,s~75>""3i;'!2;t;) 6H%  %r$rc>eZdZdZdeeddffd ZdZdZxZ S)ra_Dataset for chaining multiple :class:`IterableDataset` s. This class is useful to assemble different existing dataset streams. The chaining operation is done on-the-fly, so concatenating large-scale datasets with this class will be efficient. Args: datasets (iterable of IterableDataset): datasets to be chained together rErNc0t|||_yr&)rwr:rE)r rErys r"r:zChainDataset.__init__ps   r$c#tK|jD]#}t|tsJd|Ed{%y7w)N*ChainDataset only supports IterableDataset)rErXr)r rxs r"__iter__zChainDataset.__iter__tsA Aa1 < 1LL   s ,868cvd}|jD]'}t|tsJd|t|z })|S)Nrr)rErXrrL)r totalrxs r"rBzChainDataset.__len__{sJ Aa1 < 1 SVOE    r$) r*r+r,r-rrr:rrBrrs@r"rres*!'!2!t!r$rczeZdZUdZeeed<eeed<deedeeddfdZ dZ de ede efdZ d Z y) rz Subset of a dataset at specified indices. Args: dataset (Dataset): The whole Dataset indices (sequence): Indices in the whole set selected for subset rNr\rNc ||_||_yr&)rNr\)r rNr\s r"r:zSubset.__init__s  r$ct|tr*|j|Dcgc]}|j|c}S|j|j|Scc}wr&)rXrSrNr\)r rgis r"r#zSubset.__getitem__sK c4 <<# >Qa >? ?||DLL-..!?sActt|jddr6|jj|Dcgc]}|j|c}S|Dcgc]}|j|j| c}Scc}wcc}w)Nr^)r_r`rNr^r\)r r\rgs r"r^zSubset.__getitems__sn GDLL.$? @<<,,7-SCdll3.?-ST T?FGDLLc!23G G.TGs B#Bc,t|jSr&)rLr\rAs r"rBzSubset.__len__s4<<  r$)r*r+r,r-rrrCrrr:r#rSr^rBr.r$r"rrsgU^ c]#4/ HDIH$u+H!r$rrNlengths generatorrc dtjt|drt|dkrg}t|D]Y\}}|dks|dkDrt d|dt tj t||z}|j|[t|t|z }t|D]}|t|z}||xxdz cc<|}t|D]$\}} | dk(s tjd|d&t|t|k7r t dtt||j} ttt |}t!t#j$||D cgc]\} } t'|| | | z | c} } Scc} } w) a Randomly split a dataset into non-overlapping new datasets of given lengths. If a list of fractions that sum up to 1 is given, the lengths will be computed automatically as floor(frac * len(dataset)) for each fraction provided. After computing the lengths, if there are any remainders, 1 count will be distributed in round-robin fashion to the lengths until there are no remainders left. Optionally fix the generator for reproducible results, e.g.: Example: >>> # xdoctest: +SKIP >>> generator1 = torch.Generator().manual_seed(42) >>> generator2 = torch.Generator().manual_seed(42) >>> random_split(range(10), [3, 7], generator=generator1) >>> random_split(range(30), [0.3, 0.3, 0.4], generator=generator2) Args: dataset (Dataset): Dataset to be split lengths (sequence): lengths or fractions of splits to be produced generator (Generator): Generator used for the random permutation. r~rzFraction at index z is not between 0 and 1zLength of split at index z- is 0. This might result in an empty dataset.zDSum of input lengths does not equal the length of the input dataset!)r)mathisclosesum enumeraterQrfloorrLrbrangewarningswarnr tolistrrra itertools accumulater) rNrrsubset_lengthsrfracn_items_in_split remainder idx_to_add_atlengthr\offsets r"rrs< ||CL!$W):$& ) 4GAtax4!8 #5aS8O!PQQ" 3w<$./    ! !"2 3  4L3~#66 y! /AN 33M = )Q . ) /!"7+ IAv{ /s3=>  7|s7|# R  s7|y9@@BG8C='*G")"6"6w"?I  FF w&9:  s F,)(rrrrcollections.abcrtypingrrrrrr typing_extensionsr torchr r r r__all__rrrYstr_T_dictr>_T_tuplerrrrrrrrrfloatrSrr.r$r"rsF  $ED(A@  T]4( sEz    :x 1,gen,Dn+genhuon+h'GE&#+./'0T78$Tn6%GEN6%r?@!WU^!H&7? R[? eCJ' (? "? &* ?r$