`L i'z ddlZddlZddlZddlmZddlmZddlZddl m Z ddl m Z mZmZddlmZddlmZmZmZddlmZmZdd lmZmZdd lmZmZdd lm Z dd l!m"Z"m#Z#m$Z$gd Z%Gddee dZ&Gddee dZ'eddgdgeedddgeedddgdgdddddddZ(dZ)d Z*Gd!d"ee dZ+y)#N) defaultdict)Integral) BaseEstimatorTransformerMixin _fit_context) column_or_1d)device get_namespacexpx)_encode_unique)Intervalvalidate_params)type_of_target unique_labels) min_max_axis) _num_samples check_arraycheck_is_fitted)LabelBinarizer LabelEncoderMultiLabelBinarizerlabel_binarizec:eZdZdZdZdZdZdZfdZxZ S)raEncode target labels with value between 0 and n_classes-1. This transformer should be used to encode target values, *i.e.* `y`, and not the input `X`. Read more in the :ref:`User Guide `. .. versionadded:: 0.12 Attributes ---------- classes_ : ndarray of shape (n_classes,) Holds the label for each class. See Also -------- OrdinalEncoder : Encode categorical features using an ordinal encoding scheme. OneHotEncoder : Encode categorical features as a one-hot numeric array. Examples -------- `LabelEncoder` can be used to normalize labels. >>> from sklearn.preprocessing import LabelEncoder >>> le = LabelEncoder() >>> le.fit([1, 2, 2, 6]) LabelEncoder() >>> le.classes_ array([1, 2, 6]) >>> le.transform([1, 1, 2, 6]) array([0, 0, 1, 2]...) >>> le.inverse_transform([0, 0, 1, 2]) array([1, 1, 2, 6]) It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels. >>> le = LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) [np.str_('amsterdam'), np.str_('paris'), np.str_('tokyo')] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]...) >>> list(le.inverse_transform([2, 2, 1])) [np.str_('tokyo'), np.str_('tokyo'), np.str_('paris')] c@t|d}t||_|S)zFit label encoder. Parameters ---------- y : array-like of shape (n_samples,) Target values. Returns ------- self : returns an instance of self. Fitted label encoder. Twarnr rclasses_selfys b/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/preprocessing/_label.pyfitzLabelEncoder.fitPs   &   cJt|d}t|d\|_}|S)aFit label encoder and return encoded labels. Parameters ---------- y : array-like of shape (n_samples,) Target values. Returns ------- y : array-like of shape (n_samples,) Encoded labels. Trreturn_inverserr!s r$ fit_transformzLabelEncoder.fit_transformas(  &"1T: qr&ct|t|\}}t||jjd}t |dk(r|j gSt||jS)aTransform labels to normalized encoding. Parameters ---------- y : array-like of shape (n_samples,) Target values. Returns ------- y : array-like of shape (n_samples,) Labels as normalized encodings. T)dtyperr)uniques)rr r r r,rasarrayr )r"r#xp_s r$ transformzLabelEncoder.transformrs] a A $--"5"5D A ?a ::b> !q$--00r&c t|t|\}}t|d}t|dk(r|j gSt j ||j|jjdt||}|jdrtdt|z|j |}|j|j|dS)aTransform labels back to original encoding. Parameters ---------- y : array-like of shape (n_samples,) Target values. Returns ------- y_original : ndarray of shape (n_samples,) Original encoding. Trr)r )r/z'y contains previously unseen labels: %saxis)rr r rr.r setdiff1daranger shaper ValueErrorstrtake)r"r#r/r0diffs r$inverse_transformzLabelEncoder.inverse_transforms a A  & ?a ::b> !}} IIdmm))!,VAYI ?  ::a=FTRS S JJqMwwt}}aaw00r&cvt|}d|_d|j_d|j _|S)NTF)super__sklearn_tags__array_api_support input_tags two_d_array target_tags one_d_labelsr"tags __class__s r$r?zLabelEncoder.__sklearn_tags__s7w')!%&+#(,% r&) __name__ __module__ __qualname____doc__r%r*r1r<r? __classcell__rGs@r$rrs'/b""1,1<r&r)auto_wrap_output_keysceZdZUdZegegdgdZeed<dddddZe d d Z d Z d Z ddZ fdZxZS)ra Binarize labels in a one-vs-all fashion. Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme. At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). `LabelBinarizer` makes this process easy with the transform method. At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. `LabelBinarizer` makes this easy with the :meth:`inverse_transform` method. Read more in the :ref:`User Guide `. Parameters ---------- neg_label : int, default=0 Value with which negative labels must be encoded. pos_label : int, default=1 Value with which positive labels must be encoded. sparse_output : bool, default=False True if the returned array from transform is desired to be in sparse CSR format. Attributes ---------- classes_ : ndarray of shape (n_classes,) Holds the label for each class. y_type_ : str Represents the type of the target data as evaluated by :func:`~sklearn.utils.multiclass.type_of_target`. Possible type are 'continuous', 'continuous-multioutput', 'binary', 'multiclass', 'multiclass-multioutput', 'multilabel-indicator', and 'unknown'. sparse_input_ : bool `True` if the input data to transform is given as a sparse matrix, `False` otherwise. See Also -------- label_binarize : Function to perform the transform operation of LabelBinarizer with fixed classes. OneHotEncoder : Encode categorical features using a one-hot aka one-of-K scheme. Examples -------- >>> from sklearn.preprocessing import LabelBinarizer >>> lb = LabelBinarizer() >>> lb.fit([1, 2, 6, 4, 2]) LabelBinarizer() >>> lb.classes_ array([1, 2, 4, 6]) >>> lb.transform([1, 6]) array([[1, 0, 0, 0], [0, 0, 0, 1]]) Binary targets transform to a column vector >>> lb = LabelBinarizer() >>> lb.fit_transform(['yes', 'no', 'no', 'yes']) array([[1], [0], [0], [1]]) Passing a 2D matrix for multilabel classification >>> import numpy as np >>> lb.fit(np.array([[0, 1, 1], [1, 0, 0]])) LabelBinarizer() >>> lb.classes_ array([0, 1, 2]) >>> lb.transform([0, 1, 2, 1]) array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0]]) boolean neg_label pos_label sparse_output_parameter_constraintsrFc.||_||_||_yNrQ)r"rRrSrTs r$__init__zLabelBinarizer.__init__ s""*r&Tprefer_skip_nested_validationc|j|jk\r&td|jd|jd|jrC|jdk(s|jdk7r%td|jd|jt |d|_d |j vr td t |dk(rtd |ztj||_ t||_ |S) aaFit label binarizer. Parameters ---------- y : ndarray of shape (n_samples,) or (n_samples, n_classes) Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification. Returns ------- self : object Returns the instance itself. z neg_label=z& must be strictly less than pos_label=.rz`Sparse binarization is only supported with non zero pos_label and zero neg_label, got pos_label=z and neg_label=r#) input_name multioutput@Multioutput target data is not supported with label binarizationy has 0 samples: %r) rRrSr8rTry_type_rspissparse sparse_input_rr r!s r$r%zLabelBinarizer.fits >>T^^ +T^^,-!^^,A/    4>>Q#6$..A:M!^^,ODNN;KM  &aC8 DLL (R  ?a 2Q67 7[[^%a(  r&cB|j|j|S)aFit label binarizer/transform multi-class labels to binary labels. The output of transform is sometimes referred to as the 1-of-K coding scheme. Parameters ---------- y : {ndarray, sparse matrix} of shape (n_samples,) or (n_samples, n_classes) Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification. Sparse matrix can be CSR, CSC, COO, DOK, or LIL. Returns ------- Y : {ndarray, sparse matrix} of shape (n_samples, n_classes) Shape will be (n_samples, 1) for binary problems. Sparse matrix will be of CSR format. )r%r1r!s r$r*zLabelBinarizer.fit_transform;s(xx{$$Q''r&c t|t|jd}|r&|jjds t dt ||j |j|j|jS)aTransform multi-class labels to binary labels. The output of transform is sometimes referred to by some authors as the 1-of-K coding scheme. Parameters ---------- y : {array, sparse matrix} of shape (n_samples,) or (n_samples, n_classes) Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification. Sparse matrix can be CSR, CSC, COO, DOK, or LIL. Returns ------- Y : {ndarray, sparse matrix} of shape (n_samples, n_classes) Shape will be (n_samples, 1) for binary problems. Sparse matrix will be of CSR format. multilabelz0The object was not fitted with multilabel input.)classesrSrRrT) rr startswithrbr8rr rSrRrT)r"r#y_is_multilabels r$r1zLabelBinarizer.transformQso( (+66|D 4<<#:#:<#HOP P MMnnnn,,   r&cxt|||j|jzdz }|jdk(rt ||j }n"t ||j|j |}|jrtj|}|Stj|r|j}|S)aTransform binary labels back to multi-class labels. Parameters ---------- Y : {ndarray, sparse matrix} of shape (n_samples, n_classes) Target values. All sparse matrices are converted to CSR before inverse transformation. threshold : float, default=None Threshold used in the binary and multi-label cases. Use 0 when ``Y`` contains the output of :term:`decision_function` (classifier). Use 0.5 when ``Y`` contains the output of :term:`predict_proba`. If None, the threshold is assumed to be half way between neg_label and pos_label. Returns ------- y_original : {ndarray, sparse matrix} of shape (n_samples,) Target values. Sparse matrix will be of CSR format. Notes ----- In the case when the binary labels are fractional (probabilistic), :meth:`inverse_transform` chooses the class with the greatest value. Typically, this allows to use the output of a linear model's :term:`decision_function` method directly as the input of :meth:`inverse_transform`. g@ multiclass) rrSrRrb_inverse_binarize_multiclassr _inverse_binarize_thresholdingrerc csr_matrixrdtoarray)r"Y thresholdy_invs r$r<z LabelBinarizer.inverse_transformss@   $..8C?I <<< '0DMMBE24<< E   MM%(E [[ MMOE r&cht|}d|j_d|j_|SNFT)r>r?rArBrCrDrEs r$r?zLabelBinarizer.__sklearn_tags__/w')&+#(,% r&rX)rHrIrJrKrrUdict__annotations__rYrr%r*r1r<r?rLrMs@r$rrsmVrZZ#$D %&%+ 5&6&P(,  D1fr&r array-likez sparse matrixneither)closedrP)r#rirRrSrTTrZrVFrQct|tst|dddd}nt|dk(rt d|z||k\rt dj |||r%|dk(s|dk7rt d j |||dk(}|r| }t |}d |vr t d |d k(r t d tj|r|jdn t|}t|}tj|}|dk(ra|dk(rL|rtj|dftStjt|dft} | |z } | St|dk\rd}tj |} |dk(r\t#|dr|jdn t|d} |j$| k7r$t dj |t'||dvrt)|}tj*||} || } tj,| | }tj.dtj0| f}tj2|}|j5|tj|||f||f} ne|dk(rRtj|} |dk7rFtj2| j6}|j5||| _nt d|z|s?| j9} | j;td} |dk7r|| | dk(<|r/d| | |k(<n&| j6j;td| _tj<|| k7rtj,| |}| dd|f} |dk(r&|r | dddgf} | S| dddfj?d} | S)aBinarize labels in a one-vs-all fashion. Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme. This function makes it possible to compute this transformation for a fixed set of class labels known ahead of time. Parameters ---------- y : array-like or sparse matrix Sequence of integer labels or multilabel data to encode. classes : array-like of shape (n_classes,) Uniquely holds the label for each class. neg_label : int, default=0 Value with which negative labels must be encoded. pos_label : int, default=1 Value with which positive labels must be encoded. sparse_output : bool, default=False, Set to true if output binary array is desired in CSR sparse format. Returns ------- Y : {ndarray, sparse matrix} of shape (n_samples, n_classes) Shape will be (n_samples, 1) for binary problems. Sparse matrix will be of CSR format. See Also -------- LabelBinarizer : Class used to wrap the functionality of label_binarize and allow for fitting to classes independently of the transform operation. Examples -------- >>> from sklearn.preprocessing import label_binarize >>> label_binarize([1, 6], classes=[1, 2, 4, 6]) array([[1, 0, 0, 0], [0, 0, 0, 1]]) The class ordering is preserved: >>> label_binarize([1, 6], classes=[1, 6, 4, 2]) array([[1, 0, 0, 0], [0, 1, 0, 0]]) Binary targets transform to a column vector >>> label_binarize(['yes', 'no', 'no', 'yes'], classes=['no', 'yes']) array([[1], [0], [0], [1]]) r#csrFN)r^ accept_sparse ensure_2dr,rraz7neg_label={0} must be strictly less than pos_label={1}.zuSparse binarization is only supported with non zero pos_label and zero neg_label, got pos_label={0} and neg_label={1}r_r`unknownz$The type of target data is not knownbinaryrVr,rmmultilabel-indicatorr7z:classes {0} mismatch with the labels {1} found in the data)rrmr7z7%s target data is not supported with label binarization)copy)rrV) isinstancelistrrr8formatrrcrdr7lennpr.rpintzerossorthasattrsizerr isin searchsortedhstackcumsum empty_likefilldatarqastypeanyreshape)r#rirRrSrT pos_switchy_type n_samples n_classesrr sorted_class y_n_classes y_in_classesy_seenindicesindptrrs r$rrsL a   #Ue4  ?a 2Q67 7I E L L9   )q.IN vi+   aJJ A F N  ?@@ kk!n #a&IG Ijj!G  >}}i^3??HHc!fa[4Y \Q !F777#L ''$+Aw$7aggajS1Y <<; &LSS]1-  )) Owwq'* <//,7Aryy678}}W% ) MM4&1)Y9O P ) ) MM!  >==(D IIi AF E N    IIK HHSuH % >!Aa1fI  !Aa9n s/ vvg%&//,8 ajM  !bT' A H!R%  )A Hr&ctj|}tj|r|j }|j \}}tj |}t|dd}tj|j}tj||}tj||jk(}|ddk(r*tj|t|jg}tj||jdd} tj|j dg} | || } d| tj"|dk(d<tj ||dkD|j%dk(z} | D]M} |j |j| |j| dz}|tj&||d| | <O|| S|j)|j+ddS)z}Inverse label binarization transformation for multiclass. Multiclass uses the maximal score instead of a threshold. rVrrNr3clip)mode)rr.rcrdtocsrr7r6rr;rrepeat flatnonzerorappendrrrwhereravelr5r:argmax)r#rir n_outputsoutputsrow_maxrow_nnzy_data_repeated_maxy_i_all_argmaxindex_first_argmax y_ind_ext y_i_argmaxsamplesiinds r$rnrnbs jj!G {{1~ GGI ww 9))I&q!$Q'''!((# ii9(;qvv(EF 2;! YY~AFF }EN __^QXXcr]KIIaii!- ~.@AB 01 288GqL)!,-))I&!  18L'MN CA))AHHQK!((1q5/:C#BLL#$>?BJqM Cz""||AHH!H,6|::r&c|dk(rE|jdk(r6|jddkDr$tdj|j|dk7r&|jdt |k7r tdt j |}tj|r|dkDr\|jdvr|j}t j|j|kDt|_ |jnKt j|j|kDt}nt j||kDt}|dk(rtj|r|j}|jdk(r|jddk(r ||d d dfSt |dk(r"t j|dt |S||j!S|d k(r|Std j|) z=Inverse label binarization transformation using thresholding.rrrVz'output_type='binary', but y.shape = {0}zAThe number of class is not equal to the number of dimension of y.r)r~cscrNrz{0} format is not supported)ndimr7r8rrrr.rcrdrarrayrreliminate_zerosrqrr)r# output_typerirss r$rorosh166Q;1771:>BII!''RSSh1771:W#= O  jj!G {{1~ q=xx~-GGIXXaffy0 A 66Q;1771:?1QT7# #7|q yySV44qwwy)) . .6==kJKKr&ceZdZUdZddgdgdZeed<ddddZed d Z ed d Z d Z dZ dZ dZfdZxZS)raTransform between iterable of iterables and a multilabel format. Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label. Parameters ---------- classes : array-like of shape (n_classes,), default=None Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes). sparse_output : bool, default=False Set to True if output binary array is desired in CSR sparse format. Attributes ---------- classes_ : ndarray of shape (n_classes,) A copy of the `classes` parameter when provided. Otherwise it corresponds to the sorted set of classes found when fitting. See Also -------- OneHotEncoder : Encode categorical features using a one-hot aka one-of-K scheme. Examples -------- >>> from sklearn.preprocessing import MultiLabelBinarizer >>> mlb = MultiLabelBinarizer() >>> mlb.fit_transform([(1, 2), (3,)]) array([[1, 1, 0], [0, 0, 1]]) >>> mlb.classes_ array([1, 2, 3]) >>> mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}]) array([[0, 1, 1], [1, 0, 0]]) >>> list(mlb.classes_) ['comedy', 'sci-fi', 'thriller'] A common mistake is to pass in a list, which leads to the following issue: >>> mlb = MultiLabelBinarizer() >>> mlb.fit(['sci-fi', 'thriller', 'comedy']) MultiLabelBinarizer() >>> mlb.classes_ array(['-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't', 'y'], dtype=object) To correct this, the list of labels should be passed in as: >>> mlb = MultiLabelBinarizer() >>> mlb.fit([['sci-fi', 'thriller', 'comedy']]) MultiLabelBinarizer() >>> mlb.classes_ array(['comedy', 'sci-fi', 'thriller'], dtype=object) rzNrPrirTrUFc ||_||_yrXr)r"rirTs r$rYzMultiLabelBinarizer.__init__s *r&TrZcd|_|j2tttj j |}nKtt|jt|jkr td|j}td|Drtnt}tjt|||_||jdd|S)aFit the label sets binarizer, storing :term:`classes_`. Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated. Returns ------- self : object Fitted estimator. NztThe classes argument contains duplicate classes. Remove these duplicates before passing them to MultiLabelBinarizer.c3<K|]}t|tywrXrr.0cs r$ z*MultiLabelBinarizer.fit..s?!:a-?r) _cached_dictrisortedset itertoolschain from_iterablerr8allrobjectremptyr )r"r#rir,s r$r%zMultiLabelBinarizer.fits ! << S!>!>q!ABCG T\\" #c$,,&7 7/  llG?w??VWU; " a r&ct|j |j|j|Sd|_t t }|j |_|j||}t||j}td|Drt nt}tjt||}||ddtj |d\|_}tj$||j&|j&j(|_|j*s|j-}|S)aMFit the label sets binarizer and transform the given label sets. Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated. Returns ------- y_indicator : {ndarray, sparse matrix} of shape (n_samples, n_classes) A matrix such that `y_indicator[i, j] = 1` iff `classes_[j]` is in `y[i]`, and 0 otherwise. Sparse matrix will be of CSR format. Nkeyc3<K|]}t|tywrXrrs r$rz4MultiLabelBinarizer.fit_transform..Bs;!:a-;rrTr()rir%r1rrr__len__default_factory _transformrgetrrrrruniquer r.rr,rTrq)r"r# class_mappingyttmpr,inverses r$r*z!MultiLabelBinarizer.fit_transform"s$ << #88A;((+ + $C( (5(=(= % __Q .] (9(9:;s;;S7  a!#=!N wZZ 32::;K;KL !!B r&ct||j}|j||}|js|j }|S)aTransform the given label sets. Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated. Returns ------- y_indicator : array or CSR matrix, shape (n_samples, n_classes) A matrix such that `y_indicator[i, j] = 1` iff `classes_[j]` is in `y[i]`, and 0 otherwise. )r _build_cacherrTrq)r"r#class_to_indexrs r$r1zMultiLabelBinarizer.transformNsC **, __Q /!!B r&c |j@tt|jt t |j|_|jSrX)rrxzipr ranger)r"s r$rz MultiLabelBinarizer._build_cachehs@    $ $Sc$-->P8Q%R SD    r&crtjd}tjddg}t}|D]S}t}|D]} |j|||j ||j t |U|r3tjdjt|ttjt |t} tj | ||ft |dz t |fS#t$r|j|YwxYw)a/Transforms the label sets with a given mapping. Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated. class_mapping : Mapping Maps from label to column index in label indicator matrix. Returns ------- y_indicator : sparse matrix of shape (n_samples, n_classes) Label indicator matrix. Will be of CSR format. rrz%unknown class(es) {0} will be ignoredrrrVr)rraddKeyErrorextendrrwarningsrrrr9ronesrrcrp) r"r#rrrrlabelsindexlabelrs r$rzMultiLabelBinarizer._transformns $++c"S1#&% (FEE ''IImE23 ' NN5 ! MM#g, ' (  MM7>>vgSV?WX wws7|3/}} 7F #CK!OS=O+P   'KK&'s DD65D6c t||jdt|jk7r;t dj t|j|jdt j|r|j}t|jdk7r9ttj|jddgdkDr t dt|jdd|jddDcgc]6\}}t|jj|j ||8c}}Stj|ddg}t|dkDrt dj ||Dcgc]&}t|jj#|(c}Scc}}wcc}w)aTransform the given indicator matrix into label sets. Parameters ---------- yt : {ndarray, sparse matrix} of shape (n_samples, n_classes) A matrix containing only 1s ands 0s. Returns ------- y_original : list of tuples The set of labels for each sample such that `y[i]` consists of `classes_[j]` for each `yt[i, j] == 1`. rVz/Expected indicator for {0} classes, but got {1}rz+Expected only 0s and 1s in label indicator.Nrz8Expected only 0s and 1s in label indicator. Also got {0})rr7rr r8rrcrdrrrr5rrtupler:rcompress)r"rstartend unexpected indicatorss r$r<z%MultiLabelBinarizer.inverse_transformsx  88A;#dmm, ,AHH &   ;;r?B277|q Sbgg1v)F%G!%K !NOO#&biinbiim"DE3dmm((E#)>?@  b1a&1J:" NUU" QSS*E$--00<=S STs ;F<+Gcht|}d|j_d|j_|Srv)r>r?rArBrC two_d_labelsrEs r$r?z$MultiLabelBinarizer.__sklearn_tags__rwr&)rHrIrJrKrUrxryrYrr%r*r1rrr<r?rLrMs@r$rrs<~!$'#$D #'e+56@5)6)V4! & P'TRr&r),rrr collectionsrnumbersrnumpyr scipy.sparsesparsercbaserrrutilsr utils._array_apir r r utils._encoder rutils._param_validationrrutils.multiclassrrutils.sparsefuncsrutils.validationrrr__all__rrrrnrorr&r$r s #@@ 99,?<,II M#]$M`|%}D|~O , >xtIFGxtIFG# #' -.%h  h V(;V)LXJ*MQUJr&