`L i, dZddlZddlZddlmZmZddlmZddlm Z ddl m Z ddl m Z mZddlmZmZddlZdd lmZd d lmZmZmZmZd d lmZmZmZd d lm Z m!Z!m"Z"d dl#m$Z$d dl%m&Z&d dl'm(Z(d dl)m*Z*m+Z+m,Z,gdZ-GddZ.Gdde&Z/Gdde&eZ0Gdde.e0Z1Gdde.e0Z2Gdde0eZ3Gd d!e.e3Z4Gd"d#e/e3Z5Gd$d%e3Z6Gd&d'e/e3Z7Gd(d)e3Z8Gd*d+e/e0Z9Gd,d-e/e0Z:Gd.d/e&eZ;Gd0d1e.e;Z<Gd2d3e.e;Z=Gd4d5e&eZ>Gd6d7e.e>Z?Gd8d9e/e>Z@Gd:d;e>ZAdSd<ZBGd=d>e0ZCGd?d@e0ZDdTdAdBdCZEe"e e!ddDdEFe ejdDddGFdge e!ddDdEFe ejdDddGFdgdHgdIgdJdgdKdLMddddLddKdNZGeHeGdOdAdeIfdPZJdQZKdRZLy)Uz The :mod:`sklearn.model_selection._split` module includes classes and functions to split the data based on a preset strategy. N)ABCMetaabstractmethod) defaultdict)Iterable) signature)chain combinations)ceilfloor)comb)_safe_indexingcheck_random_state indexablemetadata_routing)_convert_to_numpyensure_common_namespace_device get_namespace)Interval RealNotIntvalidate_params)_approximate_mode)_MetadataRequester)type_of_target) _num_samples check_array column_or_1d)BaseCrossValidator GroupKFoldGroupShuffleSplitKFoldLeaveOneGroupOut LeaveOneOutLeavePGroupsOut LeavePOutPredefinedSplit RepeatedKFoldRepeatedStratifiedKFold ShuffleSplitStratifiedGroupKFoldStratifiedKFoldStratifiedShuffleSplitcheck_cvtrain_test_splitc$eZdZdZdfd ZxZS)_UnsupportedGroupCVMixinz/Mixin for splitters that do not support Groups.c|1tjd|jjtt ||||S)aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,) The target variable for supervised learning problems. groups : object Always ignored, exists for compatibility. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. #The groups parameter is ignored by groups)warningswarn __class____name__ UserWarningsupersplitselfXyr4r7s d/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/model_selection/_split.pyr;z_UnsupportedGroupCVMixin.split>sF.   MM5dnn6M6M5NO w}Q&}11NN)r8 __module__ __qualname____doc__r; __classcell__r7s@r@r0r0;s922rAr0ceZdZdZddiZy)GroupsConsumerMixinzA Mixin to ``groups`` by default. This Mixin makes the object to request ``groups`` by default as ``True``. .. versionadded:: 1.3 r4TN)r8rCrDrE-_GroupsConsumerMixin__metadata_request__splitrAr@rIrI]s"*4 0rArIc\eZdZdZdej iZd dZd dZd dZ e d dZ dZ y) rzvBase class for all cross-validators. Implementations must define `_iter_test_masks` or `_iter_test_indices`. r4Nc#Kt|||\}}}tjt|}|j |||D]%}|tj |}||}||f'yw)aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,) The target variable for supervised learning problems. groups : array-like of shape (n_samples,), default=None Group labels for the samples used while splitting the dataset into train/test set. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. N)rnparanger_iter_test_masks logical_not)r=r>r?r4indices test_index train_indexs r@r;zBaseCrossValidator.splittst0!Av. 1f))LO,//1f= *J!".."<=K ,Jz) ) *sA,A.c#K|j|||D]/}tjt|t}d||<|1yw)zGenerates boolean masks corresponding to test sets. By default, delegates to _iter_test_indices(X, y, groups) dtypeTN)_iter_test_indicesrNzerosrbool)r=r>r?r4rS test_masks r@rPz#BaseCrossValidator._iter_test_maskssI 11!Q? Ja=I$(Ij !O sAA ct)z5Generates integer indices corresponding to test sets.)NotImplementedErrorr=r>r?r4s r@rXz%BaseCrossValidator._iter_test_indicess!!rAcy)zBReturns the number of splitting iterations in the cross-validator.NrKr^s r@ get_n_splitszBaseCrossValidator.get_n_splitssrAct|SN _build_reprr=s r@__repr__zBaseCrossValidator.__repr__ 4  rArBNNN) r8rCrDrErUNUSED,_BaseCrossValidator__metadata_request__splitr;rPrXrr`rfrKrAr@rrhsF"*+;+B+B C*B"QQ!rAr) metaclassc eZdZdZddZddZy)r#aLeave-One-Out cross-validator. Provides train/test indices to split data in train/test sets. Each sample is used once as a test set (singleton) while the remaining samples form the training set. Note: ``LeaveOneOut()`` is equivalent to ``KFold(n_splits=n)`` and ``LeavePOut(p=1)`` where ``n`` is the number of samples. Due to the high number of test sets (which is the same as the number of samples) this cross-validation method can be very costly. For large datasets one should favor :class:`KFold`, :class:`ShuffleSplit` or :class:`StratifiedKFold`. Read more in the :ref:`User Guide `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import LeaveOneOut >>> X = np.array([[1, 2], [3, 4]]) >>> y = np.array([1, 2]) >>> loo = LeaveOneOut() >>> loo.get_n_splits(X) 2 >>> print(loo) LeaveOneOut() >>> for i, (train_index, test_index) in enumerate(loo.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[1] Test: index=[0] Fold 1: Train: index=[0] Test: index=[1] See Also -------- LeaveOneGroupOut : For splitting the data according to explicit, domain-specific stratification of the dataset. GroupKFold : K-fold iterator variant with non-overlapping groups. Nclt|}|dkrtdj|t|S)Nz-Cannot perform LeaveOneOut with n_samples={}.)r ValueErrorformatrange)r=r>r?r4 n_sampless r@rXzLeaveOneOut._iter_test_indicess: O >?FFyQ YrAc2| tdt|S)aHReturns the number of splitting iterations in the cross-validator. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns ------- n_splits : int Returns the number of splitting iterations in the cross-validator. %The 'X' parameter should not be None.)rorr^s r@r`zLeaveOneOut.get_n_splitss( 9DE EArArB)r8rCrDrErXr`rKrAr@r#r#s+Z rAr#c&eZdZdZdZddZddZy)r%aLeave-P-Out cross-validator. Provides train/test indices to split data in train/test sets. This results in testing on all distinct samples of size p, while the remaining n - p samples form the training set in each iteration. Note: ``LeavePOut(p)`` is NOT equivalent to ``KFold(n_splits=n_samples // p)`` which creates non-overlapping test sets. Due to the high number of iterations which grows combinatorically with the number of samples this cross-validation method can be very costly. For large datasets one should favor :class:`KFold`, :class:`StratifiedKFold` or :class:`ShuffleSplit`. Read more in the :ref:`User Guide `. Parameters ---------- p : int Size of the test sets. Must be strictly less than the number of samples. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import LeavePOut >>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) >>> y = np.array([1, 2, 3, 4]) >>> lpo = LeavePOut(2) >>> lpo.get_n_splits(X) 6 >>> print(lpo) LeavePOut(p=2) >>> for i, (train_index, test_index) in enumerate(lpo.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[2 3] Test: index=[0 1] Fold 1: Train: index=[1 3] Test: index=[0 2] Fold 2: Train: index=[1 2] Test: index=[0 3] Fold 3: Train: index=[0 3] Test: index=[1 2] Fold 4: Train: index=[0 2] Test: index=[1 3] Fold 5: Train: index=[0 1] Test: index=[2 3] c||_yrb)p)r=rws r@__init__zLeavePOut.__init__4s rANc#Kt|}||jkr%tdj|j|t t ||jD]}t j|yw)Nz8p={} must be strictly less than the number of samples={})rrwrorpr rqrNarray)r=r>r?r4rr combinations r@rXzLeavePOut._iter_test_indices7so O  JQQFFI  (i(8$&&A (K((;' ' (sAcp| tdttt||jdS)aReturns the number of splitting iterations in the cross-validator. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. rtTexact)rointr rrwr^s r@r`zLeavePOut.get_n_splitsBs1 9DE E4 Qt<==rArB)r8rCrDrErxrXr`rKrAr@r%r%s7r (>rAr%c<eZdZdZedZdfd ZddZxZS) _BaseKFoldz;Base class for K-Fold cross-validators and TimeSeriesSplit.c^t|tjstd|dt |dt |}|dkrtdj |t|tstdj ||s | td||_ ||_ ||_ y)Nz.The number of folds must be of Integral type. z of type z was passed.rnzok-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits={0}.z&shuffle must be True or False; got {0}zSetting a random_state has no effect since shuffle is False. You should leave random_state to its default (None), or set shuffle=True.) isinstancenumbersIntegralrotyperrprZ TypeErrorn_splitsshuffle random_state)r=rrrs r@rxz_BaseKFold.__init__Ys(G$4$45/7hI x= q=%%+VH%5  '4(DKKGTU U<3O !  (rAc#Kt|||\}}}t|}|j|kDr%tdj |j|t ||||D] \}}||f yw)aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,), default=None The target variable for supervised learning problems. groups : array-like of shape (n_samples,), default=None Group labels for the samples used while splitting the dataset into train/test set. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. z\Cannot have number of splits n_splits={0} greater than the number of samples: n_samples={1}.N)rrrrorpr:r;)r=r>r?r4rrtraintestr7s r@r;z_BaseKFold.splitys0!Av. 1f O ==9 $B& 2  !7=Av6 KE4+  sA1A4c|jSaReturns the number of splitting iterations in the cross-validator. Parameters ---------- X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns ------- n_splits : int Returns the number of splitting iterations in the cross-validator. rr^s r@r`z_BaseKFold.get_n_splits&}}rArBrh) r8rCrDrErrxr;r`rFrGs@r@rrVs#E))>#JrArc4eZdZdZddddfd ZddZxZS) r!aB K-Fold cross-validator. Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). Each fold is then used once as a validation while the k - 1 remaining folds form the training set. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` Parameters ---------- n_splits : int, default=5 Number of folds. Must be at least 2. .. versionchanged:: 0.22 ``n_splits`` default value changed from 3 to 5. shuffle : bool, default=False Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. random_state : int, RandomState instance or None, default=None When `shuffle` is True, `random_state` affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import KFold >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4]) >>> kf = KFold(n_splits=2) >>> kf.get_n_splits(X) 2 >>> print(kf) KFold(n_splits=2, random_state=None, shuffle=False) >>> for i, (train_index, test_index) in enumerate(kf.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[2 3] Test: index=[0 1] Fold 1: Train: index=[0 1] Test: index=[2 3] Notes ----- The first ``n_samples % n_splits`` folds have size ``n_samples // n_splits + 1``, other folds have size ``n_samples // n_splits``, where ``n_samples`` is the number of samples. Randomized CV splitters may return different results for each call of split. You can make the results identical by setting `random_state` to an integer. See Also -------- StratifiedKFold : Takes class information into account to avoid building folds with imbalanced class distributions (for binary or multiclass classification tasks). GroupKFold : K-fold iterator variant with non-overlapping groups. RepeatedKFold : Repeats K-Fold n times. FNrrc*t||||yN)rrrr:rxr=rrrr7s r@rxzKFold.__init__ (G,WrAc#VKt|}tj|}|jr$t |j j||j }tj|||zt}|d||zxxxdz cccd}|D]} ||| z} } || | | }yw)NrVrnr) rrNrOrrrrfullr) r=r>r?r4rrrRr fold_sizescurrent fold_sizestartstops r@rXzKFold._iter_test_indicess O ))I& << t00 1 9 9' B==WWXyH'eZdZdZddddfd ZdZd fd ZxZS) ra K-fold iterator variant with non-overlapping groups. Each group will appear exactly once in the test set across all folds (the number of distinct groups has to be at least equal to the number of folds). The folds are approximately balanced in the sense that the number of samples is approximately the same in each test fold when `shuffle` is True. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` Parameters ---------- n_splits : int, default=5 Number of folds. Must be at least 2. .. versionchanged:: 0.22 ``n_splits`` default value changed from 3 to 5. shuffle : bool, default=False Whether to shuffle the groups before splitting into batches. Note that the samples within each split will not be shuffled. .. versionadded:: 1.6 random_state : int, RandomState instance or None, default=None When `shuffle` is True, `random_state` affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. .. versionadded:: 1.6 Notes ----- Groups appear in an arbitrary order throughout the folds. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import GroupKFold >>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]]) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> groups = np.array([0, 0, 2, 2, 3, 3]) >>> group_kfold = GroupKFold(n_splits=2) >>> group_kfold.get_n_splits(X, y, groups) 2 >>> print(group_kfold) GroupKFold(n_splits=2, random_state=None, shuffle=False) >>> for i, (train_index, test_index) in enumerate(group_kfold.split(X, y, groups)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}, group={groups[train_index]}") ... print(f" Test: index={test_index}, group={groups[test_index]}") Fold 0: Train: index=[2 3], group=[2 2] Test: index=[0 1 4 5], group=[0 0 3 3] Fold 1: Train: index=[0 1 4 5], group=[0 0 3 3] Test: index=[2 3], group=[2 2] See Also -------- LeaveOneGroupOut : For splitting the data according to explicit domain-specific stratification of the dataset. StratifiedKFold : Takes class information into account to avoid building folds with imbalanced class proportions (for binary or multiclass classification tasks). FNrc*t||||y)Nrrrs r@rxzGroupKFold.__init___s 7NrAc#K| tdt|ddd}tj|d\}}t |}|j |kDrtd|j |fz|j r~t|j}|j|}tj||j }|D]2} tj|| } tj| d4ytj|} tj| ddd } | | } tj|j } tjt |}t!| D]/\}}tj"| }| |xx|z cc<||| |<1||} t%|j D]}tj| |k(d!yw) N*The 'groups' parameter should not be None.r4F input_name ensure_2drWTreturn_inversezOCannot have number of splits n_splits=%d greater than the number of groups: %d.r)rorrNuniquelenrrrr permutation array_splitisinwherebincountargsortrY enumerateargminrq)r=r>r?r4 unique_groups group_idxn_groupsrng split_groupstest_group_idsr[n_samples_per_grouprRn_samples_per_fold group_to_fold group_indexweight lightest_foldfs r@rXzGroupKFold._iter_test_indicesbs >IJ JVEQUV#%99VD#I y}% ==8 #259]]H4MN  <<$T%6%67COOM:M>>-GL". -GGFN; hhy)!,, - #%++i"8 jj!45dd;G"5g"> "$$--!8 HHS%78M(11D'E D# V " *< = "=1V;16C gk23 D $I.G4==) 0hhw!|,Q// 0sG!G#c&t||||SaGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,), default=None The target variable for supervised learning problems. groups : array-like of shape (n_samples,) Group labels for the samples used while splitting the dataset into train/test set. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. r:r;r<s r@r;zGroupKFold.split0w}Q6**rArrB)r8rCrDrErxrXr;rFrGs@r@rrs*HTOe$O/0b++rArcHeZdZdZd dddfd Zd dZd dZd fd ZxZS) r+a Class-wise stratified K-Fold cross-validator. Provides train/test indices to split data in train/test sets. This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class in `y` in a binary or multiclass classification setting. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` .. note:: Stratification on the class label solves an engineering problem rather than a statistical one. See :ref:`stratification` for more details. Parameters ---------- n_splits : int, default=5 Number of folds. Must be at least 2. .. versionchanged:: 0.22 ``n_splits`` default value changed from 3 to 5. shuffle : bool, default=False Whether to shuffle each class's samples before splitting into batches. Note that the samples within each split will not be shuffled. random_state : int, RandomState instance or None, default=None When `shuffle` is True, `random_state` affects the ordering of the indices, which controls the randomness of each fold for each class. Otherwise, leave `random_state` as `None`. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import StratifiedKFold >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([0, 0, 1, 1]) >>> skf = StratifiedKFold(n_splits=2) >>> skf.get_n_splits(X, y) 2 >>> print(skf) StratifiedKFold(n_splits=2, random_state=None, shuffle=False) >>> for i, (train_index, test_index) in enumerate(skf.split(X, y)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[1 3] Test: index=[0 2] Fold 1: Train: index=[0 2] Test: index=[1 3] Notes ----- The implementation is designed to: * Generate test sets such that all contain the same distribution of classes, or as close as possible. * Be invariant to class label: relabelling ``y = ["Happy", "Sad"]`` to ``y = [1, 0]`` should not change the indices generated. * Preserve order dependencies in the dataset ordering, when ``shuffle=False``: all samples from class k in some test set were contiguous in y, or separated in y by samples from classes other than k. * Generate test sets where the smallest and largest differ by at most one sample. .. versionchanged:: 0.22 The previous implementation did not follow the last constraint. See Also -------- RepeatedStratifiedKFold : Repeats Stratified K-Fold n times. FNrc*t||||yrrrs r@rxzStratifiedKFold.__init__rrAc t|j}t|\}}|r t||}nt j |}t |}d}||vrtdj||t|}t j|dd\}} } t j| d\}} | | } t| } t j| }t j|}t j|j|kDrtd|jz|j|kDr)t!j"d||jfzt$t j&| }t j t)|jDcgc])}t j||d|j| +c}}t j*t|d }t)| D]\}t j,|jj/|dd|f}|j0r|j1|||| |k(<^|Scc}w) Nbinary multiclass1Supported target types are: {}. Got {!r} instead.T) return_indexrrGn_splits=%d cannot be greater than the number of members in each class.SThe least populated class in y has only %d members, which is less than n_splits=%d.) minlengthirV)rrrrrNasarrayrrorprrrrminallrr5r6r9sortrqemptyrOrepeatr)r=r>r?rxp is_array_apitype_of_target_yallowed_target_types_y_idxy_inv class_perm y_encoded n_classesy_counts min_groupsy_orderr allocation test_foldskfolds_for_classs r@_make_test_foldsz StratifiedKFold._make_test_foldss& !2!23 )+L !!R(A 1 A)!,7 #7 7CJJ(*:  O))ADN5% %= :u% J ;;y)VVH% 66$--(* +47;}}F  ==: % MM<t}}-.  '')$ZZt}}-  GA$6$679M  XXc!fC0 y! 9A!ii 6==jA>NOO|| O,)8JyA~ & 9% s.Ic#rK|j||}t|jD] }||k( ywrb)rrqr)r=r>r?r4rrs r@rPz StratifiedKFold._iter_test_masksKs:**1a0 t}}% "A/ ! "s57c|1tjd|jjtt |ddd}t ||||S)mGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. Note that providing ``y`` is sufficient to generate the splits and hence ``np.zeros(n_samples)`` may be used as a placeholder for ``X`` instead of actual training data. y : array-like of shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. Notes ----- Randomized CV splitters may return different results for each call of split. You can make the results identical by setting `random_state` to an integer. Nr2r?Frr5r6r7r8r9rr:r;r<s r@r;zStratifiedKFold.splitPUD   MM5dnn6M6M5NO  cU$ Gw}Q6**rArrbrB) r8rCrDrErxrrPr;rFrGs@r@r+r+s0QfXe$XDL" (+(+rAr+c0eZdZdZdfd ZdZdZxZS)r*aqClass-wise stratified K-Fold iterator variant with non-overlapping groups. This cross-validation object is a variation of StratifiedKFold attempts to return stratified folds with non-overlapping groups. The folds are made by preserving the percentage of samples for each class in `y` in a binary or multiclass classification setting. Each group will appear exactly once in the test set across all folds (the number of distinct groups has to be at least equal to the number of folds). The difference between :class:`GroupKFold` and `StratifiedGroupKFold` is that the former attempts to create balanced folds such that the number of distinct groups is approximately the same in each fold, whereas `StratifiedGroupKFold` attempts to create folds which preserve the percentage of samples for each class as much as possible given the constraint of non-overlapping groups between splits. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` .. note:: Stratification on the class label solves an engineering problem rather than a statistical one. See :ref:`stratification` for more details. Parameters ---------- n_splits : int, default=5 Number of folds. Must be at least 2. shuffle : bool, default=False Whether to shuffle each class's samples before splitting into batches. Note that the samples within each split will not be shuffled. This implementation can only shuffle groups that have approximately the same y distribution, no global shuffle will be performed. random_state : int or RandomState instance, default=None When `shuffle` is True, `random_state` affects the ordering of the indices, which controls the randomness of each fold for each class. Otherwise, leave `random_state` as `None`. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import StratifiedGroupKFold >>> X = np.ones((17, 2)) >>> y = np.array([0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]) >>> groups = np.array([1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 5, 6, 6, 7, 8, 8]) >>> sgkf = StratifiedGroupKFold(n_splits=3) >>> sgkf.get_n_splits(X, y) 3 >>> print(sgkf) StratifiedGroupKFold(n_splits=3, random_state=None, shuffle=False) >>> for i, (train_index, test_index) in enumerate(sgkf.split(X, y, groups)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" group={groups[train_index]}") ... print(f" Test: index={test_index}") ... print(f" group={groups[test_index]}") Fold 0: Train: index=[ 0 1 2 3 7 8 9 10 11 15 16] group=[1 1 2 2 4 5 5 5 5 8 8] Test: index=[ 4 5 6 12 13 14] group=[3 3 3 6 6 7] Fold 1: Train: index=[ 4 5 6 7 8 9 10 11 12 13 14] group=[3 3 3 4 5 5 5 5 6 6 7] Test: index=[ 0 1 2 3 15 16] group=[1 1 2 2 8 8] Fold 2: Train: index=[ 0 1 2 3 4 5 6 12 13 14 15 16] group=[1 1 2 2 3 3 3 6 6 7 8 8] Test: index=[ 7 8 9 10 11] group=[4 5 5 5 5] Notes ----- The implementation is designed to: * Mimic the behavior of StratifiedKFold as much as possible for trivial groups (e.g. when each group contains only one sample). * Be invariant to class label: relabelling ``y = ["Happy", "Sad"]`` to ``y = [1, 0]`` should not change the indices generated. * Stratify based on samples as much as possible while keeping non-overlapping groups constraint. That means that in some cases when there is a small number of groups containing a large number of samples the stratification will not be possible and the behavior will be close to GroupKFold. See also -------- StratifiedKFold: Takes class information into account to build folds which retain class distributions (for binary or multiclass classification tasks). GroupKFold: K-fold iterator variant with non-overlapping groups. c*t||||yrrrs r@rxzStratifiedGroupKFold.__init__rrAc#Kt|j}tj|}t |}d}||vrt dj ||t|}tj|dd\}}} tj|j| kDrt d|jztj| } |j| kDr)tjd| |jfztt| } tj|dd\}} } tj t| | f}t#|| D]\}}|||fxxdz cc<tj |j| f}t%t&}|j(r|j)|tj*tj,|d d }|D]<}||}|j/|| | }||xx|z cc<||j1|>t3|jD]*}t5| Dcgc]\}}|||vr|}}}|,ycc}}ww) NrrT)r return_countsrrrnaxis mergesortkind)y_counts_per_foldy_cntgroup_y_counts)rrrNrrrorprrrrrr5r6r9rrYziprsetrrstd_find_best_foldaddrqr)r=r>r?r4rrrrrrn_smallest_classr groups_inv groups_cnty_counts_per_group class_idxrrgroups_per_foldsorted_groups_idxr best_foldridx test_indicess r@rXz'StratifiedGroupKFold._iter_test_indicess !!2!23 JJqM)!,7 #7 7CJJ(*:  O))Ad$O5% 66$--%' (47;}}F 66%= ==+ + MM<#T]]34   J $&II 4t% !:z XXs: &BC$'z$: : Iy y)3 4 9 4 :HHdmmY%?@%c* << KK* +JJ VV&Q / /k + 6I/ :N,,"3--I i (N : ( I & * *9 5 6t}}% A'0 &;"C 22L    sII5I/%I5cd}tj}tj}t|jD]}||xx|z cc<tj||j ddz d}||xx|zcc<tj |} tj||} | |kxstj| |xr| |k} | s| }| }|}|S)Nrnrrr) rNinfrqrrreshapemeansumisclose) r=rrrrmin_evalmin_samples_in_foldr std_per_class fold_evalsamples_in_foldis_current_fold_betters r@rz$StratifiedGroupKFold._find_best_fold8s 66 fft}}% A a N 2 FF#4u}}Q7K#KRSTM a N 2  .I ff%6q%9:O%.%9& 9h/:#&99 #&$&5#  rA)rFN)r8rCrDrErxrXrrFrGs@r@r*r*{sfPXObrAr*c<eZdZdZdddddfd Zd dZdZxZS) TimeSeriesSplitaqTime Series cross-validator. Provides train/test indices to split time-ordered data, where other cross-validation methods are inappropriate, as they would lead to training on future data and evaluating on past data. To ensure comparable metrics across folds, samples must be equally spaced. Once this condition is met, each test set covers the same time duration, while the train set size accumulates data from previous splits. This cross-validation object is a variation of :class:`KFold`. In the k-th split, it returns the first k folds as the train set and the (k+1)-th fold as the test set. Note that, unlike standard cross-validation methods, successive training sets are supersets of those that come before them. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` .. versionadded:: 0.18 Parameters ---------- n_splits : int, default=5 Number of splits. Must be at least 2. .. versionchanged:: 0.22 ``n_splits`` default value changed from 3 to 5. max_train_size : int, default=None Maximum size for a single training set. test_size : int, default=None Used to limit the size of the test set. Defaults to ``n_samples // (n_splits + 1)``, which is the maximum allowed value with ``gap=0``. .. versionadded:: 0.24 gap : int, default=0 Number of samples to exclude from the end of each train set before the test set. .. versionadded:: 0.24 Examples -------- >>> import numpy as np >>> from sklearn.model_selection import TimeSeriesSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> tscv = TimeSeriesSplit() >>> print(tscv) TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0] Test: index=[1] Fold 1: Train: index=[0 1] Test: index=[2] Fold 2: Train: index=[0 1 2] Test: index=[3] Fold 3: Train: index=[0 1 2 3] Test: index=[4] Fold 4: Train: index=[0 1 2 3 4] Test: index=[5] >>> # Fix test_size to 2 with 12 samples >>> X = np.random.randn(12, 2) >>> y = np.random.randint(0, 2, 12) >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2 3 4 5] Test: index=[6 7] Fold 1: Train: index=[0 1 2 3 4 5 6 7] Test: index=[8 9] Fold 2: Train: index=[0 1 2 3 4 5 6 7 8 9] Test: index=[10 11] >>> # Add in a 2 period gap >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2, gap=2) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2 3] Test: index=[6 7] Fold 1: Train: index=[0 1 2 3 4 5] Test: index=[8 9] Fold 2: Train: index=[0 1 2 3 4 5 6 7] Test: index=[10 11] For a more extended example see :ref:`sphx_glr_auto_examples_applications_plot_cyclical_feature_engineering.py`. Notes ----- The training set has size ``i * n_samples // (n_splits + 1) + n_samples % (n_splits + 1)`` in the ``i`` th split, with a test set of size ``n_samples//(n_splits + 1)`` by default, where ``n_samples`` is the number of samples. Note that this formula is only valid when ``test_size`` and ``max_train_size`` are left to their default values. Nr)max_train_size test_sizegapcTt||dd||_||_||_y)NFr)r:rxrrr)r=rrrrr7s r@rxzTimeSeriesSplit.__init__s- 5tD,"rAc|1tjd|jjt|j |S)aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,) Always ignored, exists for compatibility. groups : array-like of shape (n_samples,) Always ignored, exists for compatibility. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. r2r5r6r7r8r9_splitr^s r@r;zTimeSeriesSplit.splits>.   MM5dnn6M6M5NO {{1~rAc # Kt|\}t|}|j}|dz}|j}|j |jn||z}||kDrt d|d|d||z ||zz dkrt d|d|d |d |d t j|}t|||zz ||}|D]N} | |z } |jr,|j| kr|| |jz | || | |zf@|d| || | |zfPyw) aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. rnNzCannot have number of folds=z$ greater than the number of samples=.rzToo many splits=z for number of samples=z with test_size=z and gap=) rrrrrrorNrOrqr) r=r>rrrn_foldsrrrR test_starts test_start train_ends r@rzTimeSeriesSplit._splitsw"| O ==Q,hh"nn8DNNi7>R  Y .wi8//8k<  s?i(2 3q 8"8*-;.yk3%qJ  ))I&I9(<`. Notes ----- Splits are ordered according to the index of the group left out. The first split has testing set consisting of the group whose index in `groups` is lowest, and so on. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import LeaveOneGroupOut >>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) >>> y = np.array([1, 2, 1, 2]) >>> groups = np.array([1, 1, 2, 2]) >>> logo = LeaveOneGroupOut() >>> logo.get_n_splits(X, y, groups) 2 >>> logo.get_n_splits(groups=groups) # 'groups' is always required 2 >>> print(logo) LeaveOneGroupOut() >>> for i, (train_index, test_index) in enumerate(logo.split(X, y, groups)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}, group={groups[train_index]}") ... print(f" Test: index={test_index}, group={groups[test_index]}") Fold 0: Train: index=[2 3], group=[2 2] Test: index=[0 1], group=[1 1] Fold 1: Train: index=[0 1], group=[1 1] Test: index=[2 3], group=[2 2] See also -------- GroupKFold: K-fold iterator variant with non-overlapping groups. c#K| tdt|dddd}tj|}t |dkrtd|z|D] }||k( yw)Nrr4TFrcopyrrWrnzcThe groups parameter contains fewer than 2 unique groups (%s). LeaveOneGroupOut expects at least 2.)rorrNrr)r=r>r?r4rrs r@rPz!LeaveOneGroupOut._iter_test_masksUs >IJ J xde4  &) }  "=?LM  AA+  sAA cv| tdt|ddd}ttj|S)Returns the number of splitting iterations in the cross-validator. Parameters ---------- X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : array-like of shape (n_samples,) Group labels for the samples used while splitting the dataset into train/test set. This 'groups' parameter must always be specified to calculate the number of splits, though the other parameters can be omitted. Returns ------- n_splits : int Returns the number of splitting iterations in the cross-validator. Nrr4Fr)rorrrNrr^s r@r`zLeaveOneGroupOut.get_n_splitses:, >IJ JVEQUV299V$%%rAc&t||||Srrr<s r@r;zLeaveOneGroupOut.splitrrArhrB)r8rCrDrErPr`r;rFrGs@r@r"r"$s.` &6++rAr"c8eZdZdZdZdZddZdfd ZxZS)r$acLeave P Group(s) Out cross-validator. Provides train/test indices to split data according to a third-party provided group. This group information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the groups could be the year of collection of the samples and thus allow for cross-validation against time-based splits. The difference between LeavePGroupsOut and LeaveOneGroupOut is that the former builds the test sets with all the samples assigned to ``p`` different values of the groups while the latter uses samples all assigned the same groups. Read more in the :ref:`User Guide `. Parameters ---------- n_groups : int Number of groups (``p``) to leave out in the test split. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import LeavePGroupsOut >>> X = np.array([[1, 2], [3, 4], [5, 6]]) >>> y = np.array([1, 2, 1]) >>> groups = np.array([1, 2, 3]) >>> lpgo = LeavePGroupsOut(n_groups=2) >>> lpgo.get_n_splits(X, y, groups) 3 >>> lpgo.get_n_splits(groups=groups) # 'groups' is always required 3 >>> print(lpgo) LeavePGroupsOut(n_groups=2) >>> for i, (train_index, test_index) in enumerate(lpgo.split(X, y, groups)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}, group={groups[train_index]}") ... print(f" Test: index={test_index}, group={groups[test_index]}") Fold 0: Train: index=[2], group=[3] Test: index=[0 1], group=[1 2] Fold 1: Train: index=[1], group=[2] Test: index=[0 2], group=[1 3] Fold 2: Train: index=[0], group=[1] Test: index=[1 2], group=[2 3] See Also -------- GroupKFold : K-fold iterator variant with non-overlapping groups. c||_yrb)r)r=rs r@rxzLeavePGroupsOut.__init__s   rAc#K| tdt|dddd}tj|}|jt |k\r(td|j||jdzfzt tt ||j}|D]O}tjt|t}|tj|D] }d|||k(< |Qyw) Nrr4TFr(zThe groups parameter contains fewer than (or equal to) n_groups (%d) numbers of unique groups (%s). LeavePGroupsOut expects that at least n_groups + 1 (%d) unique groups be presentrnrV) rorrNrrrr rqrYrrZrz) r=r>r?r4rcombirRrSls r@rPz LeavePGroupsOut._iter_test_maskss >IJ J xde4  &) ==C . ."]]M4==1;LMN  U3}#56 F G,q/>J"288G#45 /*. 6Q;' /   sC0C2c | tdt|ddd}ttt t j ||jdS)r+Nrr4FrTr})rorrr rrNrrr^s r@r`zLeavePGroupsOut.get_n_splitssL, >IJ JVEQUV4BIIf-. TJKKrAc&t||||Srrr<s r@r;zLeavePGroupsOut.splitrrArhrB) r8rCrDrErxrPr`r;rFrGs@r@r$r$s#4l!*L6++rAr$cPeZdZdZdej iZddddZd dZd dZ d Z y) _RepeatedSplitsaRepeated splits for an arbitrary randomized CV splitter. Repeats splits for cross-validators n times with different randomization in each repetition. Parameters ---------- cv : callable Cross-validator class. n_repeats : int, default=10 Number of times cross-validator needs to be repeated. random_state : int, RandomState instance or None, default=None Passes `random_state` to the arbitrary repeating cross validator. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. **cvargs : additional params Constructor parameters for cv. Must not contain random_state and shuffle. r4 N) n_repeatsrc t|tjs td|dkr tdt fddDr td||_||_||_|_y)Nz/Number of repetitions must be of Integral type.rz-Number of repetitions must be greater than 0.c3&K|]}|v ywrbrK).0keycvargss r@ z+_RepeatedSplits.__init__..EsDsf}Dsrrz0cvargs must not contain random_state or shuffle.) rrrroanycvr7rr<)r=r@r7rr<s `r@rxz_RepeatedSplits.__init__>sj)W%5%56NO O >LM M D(CD DOP P"( rAc#K|j}t|j}t|D]B}|jd|dd|j }|j |||D] \}} || f Dyw)aGenerates indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,) The target variable for supervised learning problems. groups : array-like of shape (n_samples,), default=None Group labels for the samples used while splitting the dataset into train/test set. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. Tr>NrK)r7rrrqr@r<r;) r=r>r?r4r7rr r@rTrSs r@r;z_RepeatedSplits.splitMs0NN  !2!23# .CGc4G4;;GB+-88Aq&+A .' Z!:-- . .sA3A5ct|j}|jd|dd|j}|j ||||j zS)aReturns the number of splitting iterations in the cross-validator. Parameters ---------- X : object Always ignored, exists for compatibility. ``np.zeros(n_samples)`` may be used as a placeholder. y : object Always ignored, exists for compatibility. ``np.zeros(n_samples)`` may be used as a placeholder. groups : array-like of shape (n_samples,), default=None Group labels for the samples used while splitting the dataset into train/test set. Returns ------- n_splits : int Returns the number of splitting iterations in the cross-validator. Tr>rK)rrr@r<r`r7)r=r>r?r4rr@s r@r`z_RepeatedSplits.get_n_splitsmsN,!!2!23 TWW C#t Ct{{ Cq!V,t~~==rAct|Srbrcres r@rfz_RepeatedSplits.__repr__rgrArBrh) r8rCrDrErri(_RepeatedSplits__metadata_request__splitrxr;r`rfrKrAr@r5r5 s56"*+;+B+B C(* .@>4!rAr5c,eZdZdZddddfd ZxZS)r'aRepeated K-Fold cross validator. Repeats K-Fold `n_repeats` times with different randomization in each repetition. Read more in the :ref:`User Guide `. Parameters ---------- n_splits : int, default=5 Number of folds. Must be at least 2. n_repeats : int, default=10 Number of times cross-validator needs to be repeated. random_state : int, RandomState instance or None, default=None Controls the randomness of each repeated cross-validation instance. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import RepeatedKFold >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([0, 0, 1, 1]) >>> rkf = RepeatedKFold(n_splits=2, n_repeats=2, random_state=2652124) >>> rkf.get_n_splits(X, y) 4 >>> print(rkf) RepeatedKFold(n_repeats=2, n_splits=2, random_state=2652124) >>> for i, (train_index, test_index) in enumerate(rkf.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") ... Fold 0: Train: index=[0 1] Test: index=[2 3] Fold 1: Train: index=[2 3] Test: index=[0 1] Fold 2: Train: index=[1 2] Test: index=[0 3] Fold 3: Train: index=[0 3] Test: index=[1 2] Notes ----- Randomized CV splitters may return different results for each call of split. You can make the results identical by setting `random_state` to an integer. See Also -------- RepeatedStratifiedKFold : Repeats Stratified K-Fold n times. rr6Nrr7rc4t|t|||yN)r7rr)r:rxr!r=rr7rr7s r@rxzRepeatedKFold.__init__s  Y\H  rAr8rCrDrErxrFrGs@r@r'r's9v$%  rAr'c8eZdZdZddddfd Zdfd ZxZS) r(aRepeated class-wise stratified K-Fold cross validator. Repeats Stratified K-Fold n times with different randomization in each repetition. Read more in the :ref:`User Guide `. .. note:: Stratification on the class label solves an engineering problem rather than a statistical one. See :ref:`stratification` for more details. Parameters ---------- n_splits : int, default=5 Number of folds. Must be at least 2. n_repeats : int, default=10 Number of times cross-validator needs to be repeated. random_state : int, RandomState instance or None, default=None Controls the generation of the random states for each repetition. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import RepeatedStratifiedKFold >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([0, 0, 1, 1]) >>> rskf = RepeatedStratifiedKFold(n_splits=2, n_repeats=2, ... random_state=36851234) >>> rskf.get_n_splits(X, y) 4 >>> print(rskf) RepeatedStratifiedKFold(n_repeats=2, n_splits=2, random_state=36851234) >>> for i, (train_index, test_index) in enumerate(rskf.split(X, y)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") ... Fold 0: Train: index=[1 2] Test: index=[0 3] Fold 1: Train: index=[0 3] Test: index=[1 2] Fold 2: Train: index=[1 3] Test: index=[0 2] Fold 3: Train: index=[0 2] Test: index=[1 3] Notes ----- Randomized CV splitters may return different results for each call of split. You can make the results identical by setting `random_state` to an integer. See Also -------- RepeatedKFold : Repeats K-Fold n times. rr6NrFc4t|t|||yrH)r:rxr+rIs r@rxz RepeatedStratifiedKFold.__init__s!  %  rAcFt|ddd}t| |||S)rr?FNrr3)rr:r;r<s r@r;zRepeatedStratifiedKFold.splits+D cU$ Gw}Q&}11rArb)r8rCrDrErxr;rFrGs@r@r(r(s"@D$% #2#2rAr(c^eZdZdZdej iZ d dddddZd dZd dZ d dZ d Z y) BaseShuffleSplita[Base class for *ShuffleSplit. Parameters ---------- n_splits : int, default=10 Number of re-shuffling & splitting iterations. test_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If ``train_size`` is also None, it will be set to 0.1. train_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. random_state : int, RandomState instance or None, default=None Controls the randomness of the training and testing indices produced. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. r4Nr train_sizercJ||_||_||_||_d|_y)N皙?)rrrQr_default_test_size)r=rrrQrs r@rxzBaseShuffleSplit.__init___s)! "$("%rAc#pKt|||\}}}|j|||D] \}}||f yw)aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,) The target variable for supervised learning problems. groups : array-like of shape (n_samples,), default=None Group labels for the samples used while splitting the dataset into train/test set. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. Notes ----- Randomized CV splitters may return different results for each call of split. You can make the results identical by setting `random_state` to an integer. N)r _iter_indicesr=r>r?r4rrs r@r;zBaseShuffleSplit.splithsH<!Av. 1f--aF; KE4+  s46c#(Kt|}t||j|j|j\}}t |j }t|jD]&}|j|} | d|} | |||z} | | f(yw)zGenerate (train, test) indicesdefault_test_sizeN) r_validate_shuffle_splitrrQrTrrrqrr) r=r>r?r4rrn_trainn_testrrrind_test ind_trains r@rVzBaseShuffleSplit._iter_indicess O 1  NN OO"55  !!2!23t}}% &A//)4K"7F+H#Ffw.>@IX% %  &sBBc|jSrrr^s r@r`zBaseShuffleSplit.get_n_splitsrrAct|Srbrcres r@rfzBaseShuffleSplit.__repr__rgrAr6rBrh) r8rCrDrErri*_BaseShuffleSplit__metadata_request__splitrxr;rVr`rfrKrAr@rOrO>sF<"*+;+B+B C&(,D& D&$*!rArOc0eZdZdZ dddddfd ZxZS)r)a Random permutation cross-validator. Yields indices to split data into training and test sets. Note: contrary to other cross-validation strategies, random splits do not guarantee that test sets across all folds will be mutually exclusive, and might include overlapping samples. However, this is still very likely for sizeable datasets. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` Parameters ---------- n_splits : int, default=10 Number of re-shuffling & splitting iterations. test_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If ``train_size`` is also None, it will be set to 0.1. train_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. random_state : int, RandomState instance or None, default=None Controls the randomness of the training and testing indices produced. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import ShuffleSplit >>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [3, 4], [5, 6]]) >>> y = np.array([1, 2, 1, 2, 1, 2]) >>> rs = ShuffleSplit(n_splits=5, test_size=.25, random_state=0) >>> rs.get_n_splits(X) 5 >>> print(rs) ShuffleSplit(n_splits=5, random_state=0, test_size=0.25, train_size=None) >>> for i, (train_index, test_index) in enumerate(rs.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[1 3 0 4] Test: index=[5 2] Fold 1: Train: index=[4 0 2 5] Test: index=[1 3] Fold 2: Train: index=[1 2 4 0] Test: index=[3 5] Fold 3: Train: index=[3 4 1 0] Test: index=[5 2] Fold 4: Train: index=[3 5 1 0] Test: index=[2 4] >>> # Specify train and test size >>> rs = ShuffleSplit(n_splits=5, train_size=0.5, test_size=.25, ... random_state=0) >>> for i, (train_index, test_index) in enumerate(rs.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[1 3 0] Test: index=[5 2] Fold 1: Train: index=[4 0 2] Test: index=[1 3] Fold 2: Train: index=[1 2 4] Test: index=[3 5] Fold 3: Train: index=[3 4 1] Test: index=[5 2] Fold 4: Train: index=[3 5 1] Test: index=[2 4] NrPc:t|||||d|_yNrrrQrrSr:rxrTr=rrrQrr7s r@rxzShuffleSplit.__init__- !%  #&rArbrJrGs@r@r)r)s%Zz &(,D & &rAr)cFeZdZdZ dddddfd ZfdZdfd ZxZS) r aShuffle-Group(s)-Out cross-validation iterator. Provides randomized train/test indices to split data according to a third-party provided group. This group information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the groups could be the year of collection of the samples and thus allow for cross-validation against time-based splits. The difference between :class:`LeavePGroupsOut` and ``GroupShuffleSplit`` is that the former generates splits using all subsets of size ``p`` unique groups, whereas ``GroupShuffleSplit`` generates a user-determined number of random test splits, each with a user-determined fraction of unique groups. For example, a less computationally intensive alternative to ``LeavePGroupsOut(p=10)`` would be ``GroupShuffleSplit(test_size=10, n_splits=100)``. Contrary to other cross-validation strategies, the random splits do not guarantee that test sets across all folds will be mutually exclusive, and might include overlapping samples. However, this is still very likely for sizeable datasets. Note: The parameters ``test_size`` and ``train_size`` refer to groups, and not to samples as in :class:`ShuffleSplit`. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` Parameters ---------- n_splits : int, default=5 Number of re-shuffling & splitting iterations. test_size : float, int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of groups to include in the test split (rounded up). If int, represents the absolute number of test groups. If None, the value is set to the complement of the train size. If ``train_size`` is also None, it will be set to 0.2. train_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the groups to include in the train split. If int, represents the absolute number of train groups. If None, the value is automatically set to the complement of the test size. random_state : int, RandomState instance or None, default=None Controls the randomness of the training and testing indices produced. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import GroupShuffleSplit >>> X = np.ones(shape=(8, 2)) >>> y = np.ones(shape=(8, 1)) >>> groups = np.array([1, 1, 2, 2, 2, 3, 3, 3]) >>> print(groups.shape) (8,) >>> gss = GroupShuffleSplit(n_splits=2, train_size=.7, random_state=42) >>> gss.get_n_splits() 2 >>> print(gss) GroupShuffleSplit(n_splits=2, random_state=42, test_size=None, train_size=0.7) >>> for i, (train_index, test_index) in enumerate(gss.split(X, y, groups)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}, group={groups[train_index]}") ... print(f" Test: index={test_index}, group={groups[test_index]}") Fold 0: Train: index=[2 3 4 5 6 7], group=[2 2 2 3 3 3] Test: index=[0 1], group=[1 1] Fold 1: Train: index=[0 1 5 6 7], group=[1 1 3 3 3] Test: index=[2 3 4], group=[2 2 2] See Also -------- ShuffleSplit : Shuffles samples to create independent test/train sets. LeavePGroupsOut : Train set leaves out all possible subsets of `p` groups. NrPc:t|||||d|_y)Nrgg?rhris r@rxzGroupShuffleSplit.__init__vrjrAc#ZK| tdt|ddd}tj|d\}}t ||D]]\}}tj tj||}tj tj||} || f_yw)Nrr4FrTr)r>)rorrNrr:rV flatnonzeror) r=r>r?r4classes group_indices group_train group_testrrr7s r@rVzGroupShuffleSplit._iter_indicess >IJ JVEQUV!#6$!G',w'>"''-"DED+  sB(B+c&t||||S)aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. y : array-like of shape (n_samples,), default=None The target variable for supervised learning problems. groups : array-like of shape (n_samples,) Group labels for the samples used while splitting the dataset into train/test set. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. Notes ----- Randomized CV splitters may return different results for each call of split. You can make the results identical by setting `random_state` to an integer. rr<s r@r;zGroupShuffleSplit.splits<w}Q6**rArrBr8rCrDrErxrVr;rFrGs@r@r r s/Up &'+4 & ++rAr cDeZdZdZ dddddfd ZddZdfd ZxZS) r,aL Class-wise stratified ShuffleSplit cross-validator. Provides train/test indices to split data in train/test sets. This cross-validation object is a merge of :class:`StratifiedKFold` and :class:`ShuffleSplit`, which returns stratified randomized folds. The folds are made by preserving the percentage of samples for each class in `y` in a binary or multiclass classification setting. Note: like the :class:`ShuffleSplit` strategy, stratified random splits do not guarantee that test sets across all folds will be mutually exclusive, and might include overlapping samples. However, this is still very likely for sizeable datasets. Read more in the :ref:`User Guide `. For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to :ref:`sphx_glr_auto_examples_model_selection_plot_cv_indices.py` .. note:: Stratification on the class label solves an engineering problem rather than a statistical one. See :ref:`stratification` for more details. Parameters ---------- n_splits : int, default=10 Number of re-shuffling & splitting iterations. test_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If ``train_size`` is also None, it will be set to 0.1. train_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. random_state : int, RandomState instance or None, default=None Controls the randomness of the training and testing indices produced. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import StratifiedShuffleSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([0, 0, 0, 1, 1, 1]) >>> sss = StratifiedShuffleSplit(n_splits=5, test_size=0.5, random_state=0) >>> sss.get_n_splits(X, y) 5 >>> print(sss) StratifiedShuffleSplit(n_splits=5, random_state=0, ...) >>> for i, (train_index, test_index) in enumerate(sss.split(X, y)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[5 2 3] Test: index=[4 1 0] Fold 1: Train: index=[5 1 4] Test: index=[0 2 3] Fold 2: Train: index=[5 0 2] Test: index=[4 3 1] Fold 3: Train: index=[4 1 0] Test: index=[2 3 5] Fold 4: Train: index=[0 5 1] Test: index=[3 4 2] NrPc:t|||||d|_yrfrhris r@rxzStratifiedShuffleSplit.__init__ rjrAc #Kt|}t|ddd}t||j|j|j \}}t |\}}t||}|jdk(rAtj|D cgc]"} dj| jd$c} }tj|d \} } | jd } tj| } tj | dkr t#d || krt#d || fz|| krt#d|| fztj$tj&| dtj(| dd}t+|j,}t/|j0D]}t3| ||}| |z }t3|||}g}g}t/| D]c}|j5| |}||j7|d}|j9|d|||j9|||||||ze|j5|}|j5|}||fycc} ww)Nr?FrrY)rr  strTrrzThe least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.zLThe train_size = %d should be greater or equal to the number of classes = %dzKThe test_size = %d should be greater or equal to the number of classes = %drrrclip)mode)rrr[rrQrTrrndimrNrzjoinastypershaperrror;rcumsumrrrqrrrtakeextend)r=r>r?r4rrr\r]rrrowro y_indicesr class_counts class_indicesrn_iclass_counts_remainingt_irrrrperm_indices_class_is r@rVz$StratifiedShuffleSplit._iter_indices s O cU$ G1  NN OO"55  a A aB ' 66Q;C##((3::e#45CDAYYq>MM!$ {{9- 66, ! ##  Y 69@)8LM  I 69?8KL  JJy{ 3RYY|5LSb5Q !!2!23t}}% A$L'3?C%1C%7 "#$:FCHCED9% L!ool1o> '4Q'7'<'<[v'<'V$ 1(CF;< 0Q#a&3q6/JK  LOOE*E??4(D+ ) CDsBI5 'I01GI5c|1tjd|jjtt |ddd}t ||||S)aGenerate indices to split data into training and test set. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features. Note that providing ``y`` is sufficient to generate the splits and hence ``np.zeros(n_samples)`` may be used as a placeholder for ``X`` instead of actual training data. y : array-like of shape (n_samples,) or (n_samples, n_labels) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. Notes ----- Randomized CV splitters may return different results for each call of split. You can make the results identical by setting `random_state` to an integer. Nr2r?Frrr<s r@r;zStratifiedShuffleSplit.splitV rrArbrbrtrGs@r@r,r,s1Nb &(,D &HT(+(+rAr,c|||}tj|jj}tj|jj}|dk(r ||k\s|dks|dk(r%|dks|dk\rt dj |||dk(r ||k\s|dks|dk(r%|dks|dk\rt dj ||||dvrt dj |||dvrt d j ||dk(r*|dk(r%||zdkDrt d j ||z|dk(rt ||z}n|dk(r t|}|dk(rt||z}n|dk(r t|}||z }n||z }z|kDrt d ||z|fzt|t|}}|dk(rt d j |||||fS) zx Validation helper to check if the train/test sizes are meaningful w.r.t. the size of the data (n_samples). rrrrnzqtest_size={0} should be either positive and smaller than the number of samples {1} or a float in the (0, 1) rangezrtrain_size={0} should be either positive and smaller than the number of samples {1} or a float in the (0, 1) range)rrz Invalid value for train_size: {}zInvalid value for test_size: {}zlThe sum of test_size and train_size = {}, should be in the (0, 1) range. Reduce test_size and/or train_size.z~The sum of train_size and test_size = %d, should be smaller than the number of samples %d. Reduce test_size and/or train_size.zWith n_samples={}, test_size={} and train_size={}, the resulting train set will be empty. Adjust any of the aforementioned parameters.) rNrrWrrorpr floatr r)rrrrQrZtest_size_typetrain_size_typer]r\s r@r[r[ sL Z/% ZZ *0055Njj,2277O#9 #9Y!^#9>Y!^ !6)Y7  3J)$;zQ3J!OzQ !6*i8  /"C;BB:NOOz!A:AA)LMM#.C"7J`. .. versionadded:: 0.16 Parameters ---------- test_fold : array-like of shape (n_samples,) The entry ``test_fold[i]`` represents the index of the test set that sample ``i`` belongs to. It is possible to exclude sample ``i`` from any test set (i.e. include sample ``i`` in every training set) by setting ``test_fold[i]`` equal to -1. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import PredefinedSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([0, 0, 1, 1]) >>> test_fold = [0, 1, -1, 1] >>> ps = PredefinedSplit(test_fold) >>> ps.get_n_splits() 2 >>> print(ps) PredefinedSplit(test_fold=array([ 0, 1, -1, 1])) >>> for i, (train_index, test_index) in enumerate(ps.split()): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[1 2 3] Test: index=[0] Fold 1: Train: index=[0 2] Test: index=[1 3] ctj|t|_t |j|_tj |j|_|j |j dk7|_y)NrVr)rNrzr test_foldrr unique_folds)r=rs r@rxzPredefinedSplit.__init__ sW)37%dnn5IIdnn5 --d.?.?2.EFrANc|1tjd|jjt|j S)Generate indices to split data into training and test set. Parameters ---------- X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. r2rr^s r@r;zPredefinedSplit.split s<,   MM5dnn6M6M5NO {{}rAc#Ktjt|j}|j D]%}|tj |}||}||f'yw)zGenerate indices to split data into training and test set. Yields ------ train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split. N)rNrOrrrPrQ)r=indrSrTs r@rzPredefinedSplit._split s^iiDNN+,//1 *JbnnZ89KZJz) ) *sA"A$c#K|jD]^}tj|j|k(d}tjt |jt }d||<|`yw)z3Generates boolean masks corresponding to test sets.rrVTN)rrNrrrYrrZ)r=rrSr[s r@rPz PredefinedSplit._iter_test_masks* s_"" A$..A"56q9JT^^!4DAI$(Ij !O  sA/A1c,t|jSr)rrr^s r@r`zPredefinedSplit.get_n_splits2 s&4$$%%rArh) r8rCrDrErxr;rrPr`rKrAr@r&r& s"'RG :*"&rAr&c&eZdZdZdZddZddZy)_CVIterableWrapperz5Wrapper class for old style cv objects and iterables.c$t||_yrb)listr@)r=r@s r@rxz_CVIterableWrapper.__init__K s r(rANc,t|jSr)rr@r^s r@r`z_CVIterableWrapper.get_n_splitsN s&477|rAc#@K|jD] \}}||f yw)rN)r@rWs r@r;z_CVIterableWrapper.splitc s), 77 KE4+  srh)r8rCrDrErxr`r;rKrAr@rrH s?*rArF) classifiercB|dn|}t|tjr)|r|t|ddvr t |St |St |drt|tr9t|trt|trtd|zt|S|S)aInput checker utility for building a cross-validator. Parameters ---------- cv : int, cross-validation generator, iterable or None, default=5 Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds. - :term:`CV splitter`, - An iterable that generates (train, test) splits as arrays of indices. For integer/None inputs, if classifier is True and ``y`` is either binary or multiclass, :class:`StratifiedKFold` is used. In all other cases, :class:`KFold` is used. Refer :ref:`User Guide ` for the various cross-validation strategies that can be used here. .. versionchanged:: 0.22 ``cv`` default value changed from 3-fold to 5-fold. y : array-like, default=None The target variable for supervised learning problems. classifier : bool, default=False Whether the task is a classification task, in which case stratified KFold will be used. Returns ------- checked_cv : a cross-validator instance. The return value is a cross-validator which generates the train/test splits via the ``split`` method. Examples -------- >>> from sklearn.model_selection import check_cv >>> check_cv(cv=5, y=None, classifier=False) KFold(...) >>> check_cv(cv=5, y=[1, 1, 0, 0, 0, 0], classifier=True) StratifiedKFold(...) rr?)rrr;ziExpected cv as an integer, cross-validation object (from sklearn.model_selection) or an iterable. Got %s.) rrrrr+r!hasattrryrror)r@r?rs r@r-r-} sXjbB"g&&' c26NN"2& &9  2w :b##6"h':b#+>*,./  ""%% IrArnneither)closedleftrbooleanz array-like)rrQrrstratifyT)prefer_skip_nested_validationc t|}|dk(r tdt|}t|d}t |||d\}} |dur<| tdt j | t j ||| z n<|t} nt} | | ||} t| j|d|\ t|d \ ttj fd |DS) aSplit arrays or matrices into random train and test subsets. Quick utility that wraps input validation, ``next(ShuffleSplit().split(X, y))``, and application to input data into a single call for splitting (and optionally subsampling) data into a one-liner. Read more in the :ref:`User Guide `. Parameters ---------- *arrays : sequence of indexables with same length / shape[0] Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes. test_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If ``train_size`` is also None, it will be set to 0.25. train_size : float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. random_state : int, RandomState instance or None, default=None Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. shuffle : bool, default=True Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None. stratify : array-like, default=None If not None, data is split in a stratified fashion, using this as the class labels. Read more in the :ref:`User Guide `. Returns ------- splitting : list, length=2 * len(arrays) List containing train-test split of inputs. .. versionadded:: 0.16 If the input is sparse, the output will be a ``scipy.sparse.csr_matrix``. Else, output type is the same as the input type. Examples -------- >>> import numpy as np >>> from sklearn.model_selection import train_test_split >>> X, y = np.arange(10).reshape((5, 2)), range(5) >>> X array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) >>> list(y) [0, 1, 2, 3, 4] >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, test_size=0.33, random_state=42) ... >>> X_train array([[4, 5], [0, 1], [6, 7]]) >>> y_train [2, 0, 3] >>> X_test array([[2, 3], [8, 9]]) >>> y_test [1, 4] >>> train_test_split(y, shuffle=False) [[0, 1, 2], [3, 4]] >>> from sklearn import datasets >>> iris = datasets.load_iris(as_frame=True) >>> X, y = iris['data'], iris['target'] >>> X.head() sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 >>> y.head() 0 0 1 0 2 0 3 0 4 0 ... >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, test_size=0.33, random_state=42) ... >>> X_train.head() sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 96 5.7 2.9 4.2 1.3 105 7.6 3.0 6.6 2.1 66 5.6 3.0 4.5 1.5 0 5.1 3.5 1.4 0.2 122 7.7 2.8 6.7 2.0 >>> y_train.head() 96 1 105 2 66 1 0 0 122 2 ... >>> X_test.head() sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 73 6.1 2.8 4.7 1.2 18 5.7 3.8 1.7 0.3 118 7.7 2.6 6.9 2.3 78 6.0 2.9 4.5 1.5 76 6.8 2.8 4.8 1.4 >>> y_test.head() 73 1 18 0 118 2 78 1 76 1 ... rz$At least one array required as inputg?rYFz@Stratified train/test split is not implemented for shuffle=FalserP)r>r?c3NK|]}t|t|fywrb)r)r:arrs r@r=z#train_test_split.. s) DE^Au %~a'> ? s"%)rrorrr[rNrOr,r)nextr;rrr from_iterable)rrQrrrarraysn_arraysrrr\r]CVClassr@rrs @@r@r.r. s@6{H1}?@@  FVAY'I-9jDOGV%  R  '"yy'F"23  ,G"G v' U288fQi88<= t0E4HKE4   IO   rA__test__ctj}tjdddt}|}dd|dzzdzz}t t |j D]\}\}} t| tr|dt| } n |d|| } t| d kDr| d d d z| d d z} |dkDrH|t| zdk\sd| vr|j|t|}n|jd|dz }|j| |t| z }tjdi|dj|} djd| jdD} | S)afPretty print the dictionary 'params' Parameters ---------- params : dict The dictionary to pretty print offset : int, default=0 The offset in characters to add at the begin of each line. printer : callable, default=repr The function to convert entries to strings, typically the builtin str or repr r@r ) precision threshold edgeitemsz, rnrx=iNi,z...irK z, c3>K|]}|jdyw)rxN)rstrip)r:r1s r@r=z_pprint.. s?ahhsm?srK)rNget_printoptionsset_printoptionsrrsorteditemsrrryrappendr}r;) paramsoffsetprinteroptions params_listthis_line_lengthline_seprrv this_reprliness r@_pprintr sq"!!#G!rQ?&KFaK3..Hvflln56+ 6Aq a $%c!f-I$%gaj1I y>C !$3%/)DE2BBI q5#i.0B6$):K""8,#&x= ""4( A% 9%C N*'+*"'" GGK E II?U[[->? ?E LrAc|j}t|jd|j}t|}|tjurg}nct |j jDcgc]6}|jdk7r%|j|jk7r |j8c}}|jj}t}|D]}tjdt tj d5} t||d} | (t#|dr|j$j'|d} dddt) r6| dj*tur! tj,j/dtj,j/d ||<|dt1|t)| d Scc}w#1swYxYw#tj,j/dwxYw) Ndeprecated_originalr=alwaysT)recordr<r()r))r7getattrrxrobjectr parametersvaluesnamer VAR_KEYWORDr8dictr5 simplefilter FutureWarningcatch_warningsrr<getrcategoryfilterspopr) r=clsinitinit_signatureargsrw class_namerr;wvalues r@rdrd s ..C 3<rs'#$)  LK-7-FF *22D1,1@!+w@!FL*,>L^Y>(*<Y>x[#w[|] $j]@W+$jW+tJ+jJ+ZP. PfSjSlt+*,>t+nB+)+=B+Jh!(Gh!V? ,o? Dn26n2bt!)Wt!nf&+-=f&RO++-=O+dN+-N+bHVy&(y&x2+2j@@F ZAi 8 W%%q$v >  ZAi 8 W%%q$v >  ((;!4(#'!(   r%$rp*e,d/d%LPErA