`L i$ddlZddlZddlmZddlmZddlmZddlZ ddl m Z ddl m ZddlmZmZmZddlmZdd lmZmZdd lmZmZdd lmZdd lmZdd lm Z m!Z!m"Z"m#Z#m$Z$dZ%dZ&dZ'GddeeZ(Gdde(Z)GddeeZ*y)N)Counter)partial)Callable)sparse) BaseEstimatorTransformerMixin _fit_context) _get_mask) is_pandas_na is_scalar_nan) MissingValues StrOptions)_mode) _get_median) FLOAT_DTYPES_check_feature_names_in_check_n_featurescheck_is_fitted validate_datact|ry|jjdvrIt|tj s.t dj|jt|yy)N)fiuzn'X' and 'missing_values' types are expected to be both numerical. Got X.dtype={} and type(missing_values)={}.) r dtypekind isinstancenumbersReal ValueErrorformattype)Xmissing_valuess Z/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/impute/_base.py_check_inputs_dtyper&sZN#ww||&z.',,/W ((.qww^8L(M  0X&c~ t|S#t$r&}dt|vrt|dcYd}~Sd}~wwxYw)zCompute the minimum of a list of potentially non-comparable values. If values cannot be directly compared due to type incompatibility, the object with the lowest string representation is returned. z'<' not supported betweenc@tt|t|fSN)strr")xs r%z_safe_min..3sSa\3q6,Br')keyN)min TypeErrorr+)itemses r% _safe_minr3)s> 5z  &#a& 0u"BC C s <7<7<c|jdkDr|jtk(rTt|}|j ddd}t |j Dcgc] \}}||k(r|c}}}n t|}|dd}|dd}nd}d}|dk(r|dk(rtjS||kr|S||kDr|S||k(r t ||gSycc}}w)zCompute the most frequent value in a 1d array extended with [extra_value] * n_repeat, where extra_value is assumed to be not part of the array.rN) sizerobjectr most_commonr3r1rnpnan) array extra_valuen_repeatcountermost_frequent_countvaluecountmost_frequent_valuemodes r%_most_frequentrD7s  zzA~ ;;& enG")"5"5a"8";A"> "+)0 $u 33# `. .. versionadded:: 0.20 `SimpleImputer` replaces the previous `sklearn.preprocessing.Imputer` estimator which is now removed. Parameters ---------- missing_values : int, float, str, np.nan, None or pandas.NA, default=np.nan The placeholder for the missing values. All occurrences of `missing_values` will be imputed. For pandas' dataframes with nullable integer dtypes with missing values, `missing_values` can be set to either `np.nan` or `pd.NA`. strategy : str or Callable, default='mean' The imputation strategy. - If "mean", then replace missing values using the mean along each column. Can only be used with numeric data. - If "median", then replace missing values using the median along each column. Can only be used with numeric data. - If "most_frequent", then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned. - If "constant", then replace missing values with fill_value. Can be used with strings or numeric data. - If an instance of Callable, then replace missing values using the scalar statistic returned by running the callable over a dense 1d array containing non-missing values of each column. .. versionadded:: 0.20 strategy="constant" for fixed value imputation. .. versionadded:: 1.5 strategy=callable for custom value imputation. fill_value : str or numerical value, default=None When strategy == "constant", `fill_value` is used to replace all occurrences of missing_values. For string or object data types, `fill_value` must be a string. If `None`, `fill_value` will be 0 when imputing numerical data and "missing_value" for strings or object data types. copy : bool, default=True If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if `copy=False`: - If `X` is not an array of floating values; - If `X` is encoded as a CSR matrix; - If `add_indicator=True`. add_indicator : bool, default=False If True, a :class:`MissingIndicator` transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time. keep_empty_features : bool, default=False If True, features that consist exclusively of missing values when `fit` is called are returned in results when `transform` is called. The imputed value is always `0` except when `strategy="constant"` in which case `fill_value` will be used instead. .. versionadded:: 1.2 .. versionchanged:: 1.6 Currently, when `keep_empty_feature=False` and `strategy="constant"`, empty features are not dropped. This behaviour will change in version 1.8. Set `keep_empty_feature=True` to preserve this behaviour. Attributes ---------- statistics_ : array of shape (n_features,) The imputation fill value for each feature. Computing statistics can result in `np.nan` values. During :meth:`transform`, features corresponding to `np.nan` statistics will be discarded. indicator_ : :class:`~sklearn.impute.MissingIndicator` Indicator used to add binary indicators for missing values. `None` if `add_indicator=False`. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 See Also -------- IterativeImputer : Multivariate imputer that estimates values to impute for each feature with missing values from all the others. KNNImputer : Multivariate imputer that estimates missing features using nearest samples. Notes ----- Columns which only contained missing values at :meth:`fit` are discarded upon :meth:`transform` if strategy is not `"constant"`. In a prediction context, simple imputation usually performs poorly when associated with a weak learner. However, with a powerful learner, it can lead to as good or better performance than complex imputation such as :class:`~sklearn.impute.IterativeImputer` or :class:`~sklearn.impute.KNNImputer`. Examples -------- >>> import numpy as np >>> from sklearn.impute import SimpleImputer >>> imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean') >>> imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]]) SimpleImputer() >>> X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]] >>> print(imp_mean.transform(X)) [[ 7. 2. 3. ] [ 4. 3.5 6. ] [10. 3.5 9. ]] For a more detailed example see :ref:`sphx_glr_auto_examples_impute_plot_missing_values.py`. >meanmedianconstant most_frequent no_validationrG)strategy fill_valuecopyrKr{NTF)r$rrrrIrJcTt||||||_||_||_y)NrH)rjrNrrr)rMr$rrrrIrJrps r%rNzSimpleImputer.__init__<s7 )' 3  ! $ r'c |jdvr,t|trtd|Drt}n d}nt }|s%|j jdk(r |j }t|jst|jrd}nd} t|||d||sdnd||j}|r|j |_t#||j|j jd vr$td j|j t%j&|r|jd k(r td |jdk(r|rB|j(6t+|j(}d|j(d|d|j d}n8|s*|j,j }d|d|j d}n |j }t/j0||j ds t|S#t$r<}dt|vr(td j|j|}|d|d}~wwxYw)N)r~r}c3JK|]}|D]}t|tywr*)rr+).0rowelems r% z0SimpleImputer._validate_input..Us.+*-s+7; 4%+%+s!#O allow-nanTcsc)reset accept_sparserforce_writeableensure_all_finiterzcould not convertz0Cannot use {} strategy with non-numeric data: {}rrrrzSimpleImputer does not support data with dtype {0}. Please provide either a numeric array (with a floating point or integer dtype) or categorical data represented either as an array with integer dtype or an array of string values with an object dtype.rdImputation not possible when missing_values == 0 and input is sparse. Provide a dense array instead.r}z fill_value=z (of type z+) cannot be cast to the input data that is z. If fill_value is a Python scalar, instead pass a numpy scalar (e.g. fill_value=np.uint8(0) if your data is of type np.uint8). Make sure that both dtypes are of the same kind.z%The dtype of the filling value (i.e. z]. Make sure that the dtypes of the input data are of the same kind between fit and transform. same_kind)casting)rrlistanyr7r _fit_dtyperr r$r rrr r+r!rr&r\r]rr" statistics_r9can_cast) rMr#in_fitrrvenew_vefill_value_dtypeerr_msgs r%_validate_inputzSimpleImputer._validate_inputOsT ==9 9 !T"s+12+( E$//..#5OOE ++ , d>Q>Q0R +  $  #,2"3YY A* ggDOAt223 77<<3 3( )/qww   ;;q>d11Q6!  ==J &$//5#'#8 !$//!4J?O>RS@@A{KGG#'#3#3#9#9 ;#//4==$"5"5z D  $4==$"5"5z D  r'c t||}|j}|jdtj|j z }tj |jd}|dk(r|jsctt|jdD cgc]} t|dd| fj c} rtjdt|j|nOt|jdD]3} |j|j | |j | dz} ||j | |j | dz} | | } t| d} | | } | j} || | z}t!| dk(r|jrd|| <|dk(r;| j"|z}|dk(rtj$n| j|z || <|dk(rt'| ||| <|dk(rt)| d||| <t+|t,s |j/| || <6t0|e||Scc} w) z#Fit the transformer on sparse data.rr5r}NCurrently, when `keep_empty_feature=False` and `strategy="constant"`, empty features are not dropped. This behaviour will change in version 1.8. Set `keep_empty_feature=True` to preserve this behaviour.r{r|r~)r datashaper9diffindptremptyrJrrangeallwarningswarn FutureWarningfillsumlenr6r:rrDrrrrjrV)rMr#rr$r missing_mask mask_datan_implicit_zeros statisticsrcolumn mask_column mask_zerosn_explicit_zerosn_zerossrps r%rzSimpleImputer._sparse_fits$ N3  %% 771:(99XXaggaj) z !++7<\=O=OPQ=R7ST!\!Q$',,-T1 L"  OOJ '1771:& > ahhq1uo>' ahhq1uoF  -'vq1  ,#->>#3 *1-0@@v;!#(@(@$%JqM6)"KK'123q&fjjlQ>N 1 !X-(3FG(D 1 !_4(6vq'(J 1 #Hh7(, f(= 1 5 >8 |,YUs#Ict||}tj||}t|||dk(rt jj |d}t jj|}|jrdnt j|t jj|<|S|dk(rt jj|d} t jj| } |jrdnt j| t jj| <| S|dk(r|j}|j} |jjdk(r)t j |j"dt$} n"t j |j"d} t't)|d d | d d D]s\} \}}t j*|j-t.}||}t1|dk(r|jrd| | <Vt3|t jd| | <u| S|d k(r|jsMtj|j5dj7rt9j:d t<t j>|j"d ||jStA|tBrjt j |j"d }tE|j"d D]+} |jG|d d | fjI|| <-|Sy ) z"Fit the transformer on dense data.)maskr{raxisr|r~rrNr}rr5)%r ma masked_arrayrjrVr9r{getdatarJr:getmaskr| getmaskarray transposerrrrr7 enumeratezip logical_notastypeboolrrDrrrrrfullrrrr compressed)rMr#rr$rrmasked_X mean_maskedr{ median_maskedr|rr~rrrow_maskrrps r%rzSimpleImputer._dense_fits N3 ??1<8 |, v %%**XA*6K55==-D484L4LqRTRXRXD{+ ,K !EELLL:MUU]]=1F--266 255%%m4 5M (  A))+Dww||s" "6 B " 4 &/AaD$q'0B&C F"?C>>(3::4@(ms8q=T%=%='(M!$'5c2661'EM!$  F!  #++ 80D0H0Ha0H0P0T0T0V L" 771771:zA A( +(.."34J8>>!,- K $ hq!tn.G.G.I J 1  K  ,r'ct||j|d}|j}|jd|jdk7r4t d|jd|jjdfzt ||j }|jdk(s |jr|}d}nt |tj}tj|}||}tj|}|jrotj|jd|}t|dr|j |}t#j$d |d |jd |dd|f}t'j(|r|j dk(r t d | |j*} n t |j*|j } tj,tjt/|j0dz t2 tj4|j0| } || j7|j8d|j*| <ni||} n |dd|f} tj:| d} tj,|| } tj<| j?ddd}| ||<t@||}t@|||S)ahImpute all missing values in `X`. Parameters ---------- X : {array-like, sparse matrix}, shape (n_samples, n_features) The input data to complete. Returns ------- X_imputed : {ndarray, sparse matrix} of shape (n_samples, n_features_out) `X` with imputed values. Frr5rz)X has %d features per sample, expected %dr}Nfeature_names_in_z/Skipping features without any observed values: zI. At least one non-missing value is needed for imputation with strategy='z'.rr)rr)#rrrrr r r$rrJr9r:r flatnonzerorarangerXrrrr\r]rrepeatrrintrrrrwhererrjrZra)rMr#rrvalid_statisticsvalid_statistics_indexes invalid_mask valid_maskinvalid_featuresrindexesmask_valid_features n_missingvalues coordinatesr`rps r%rYzSimpleImputer.transformas   5 1%% 771:))!, ,;771:t//55a89:  !D$7$78  ==J &$*B*B) '+ $%Z8L 5J)*5 ')~~j'A $!#%99QWWQZ#8#F 4!45'+'='=>N'O$ ()*66:mm_BH a112 ;;q>""a' %,3',,D$QVVT-@-@AD))IIc!((ma/s;RWWQXX=N 08??e?Tt (/&2#&216N3N&O#2;IYY/;F((#6#@#@#BCDbDIK#AkNg2<@ w-a==r'ct||jstd|jdt|jj }|j d|z }|ddd|fj}|dd|dfjt}t|j}|j d|f}tj|}||dd|jj f<|jt} d\} } | t|jkr[tj|dd| fs!|j| |dd| f<| dz } | dz } n| dz } | t|jkr[|j|| <|S)a=Convert the data back to the original representation. Inverts the `transform` operation performed on an array. This operation can only be performed after :class:`SimpleImputer` is instantiated with `add_indicator=True`. Note that `inverse_transform` can only invert the transform in features that have binary indicators for missing values. If a feature has no missing values at `fit` time, the feature won't have a binary indicator, and the imputation done at `transform` time won't be inverted. .. versionadded:: 0.24 Parameters ---------- X : array-like of shape (n_samples, n_features + n_features_missing_indicator) The imputed data to be reverted to original data. It has to be an augmented array of imputed data and the missing indicator mask. Returns ------- X_original : ndarray of shape (n_samples, n_features) The original `X` with missing values as it was prior to imputation. zr'inverse_transform' works only when 'SimpleImputer' is instantiated with 'add_indicator=True'. Got 'add_indicator=z ' instead.r5Nr)rr)rrIr rrS features_rrrrrr9zerosTrr$) rMr#n_features_missingnon_empty_feature_count array_imputedrn_features_originalshape_original X_original full_mask imputed_idx original_idxs r%inverse_transformzSimpleImputer.inverse_transforms8 !!&'+&8&8%9: !!:!:;"#''!*/A"A!55556;;= 3445<>)D4D4Dbff*MN/0<n<|Tr'rzceZdZUdZegeddhgdedhgdgdZeed<e jddddd Z d Z d Z dd Zed ddZdZed ddZddZfdZxZS)rRa Binary indicators for missing values. Note that this component typically should not be used in a vanilla :class:`~sklearn.pipeline.Pipeline` consisting of transformers and a classifier, but rather could be added using a :class:`~sklearn.pipeline.FeatureUnion` or :class:`~sklearn.compose.ColumnTransformer`. Read more in the :ref:`User Guide `. .. versionadded:: 0.20 Parameters ---------- missing_values : int, float, str, np.nan or None, default=np.nan The placeholder for the missing values. All occurrences of `missing_values` will be imputed. For pandas' dataframes with nullable integer dtypes with missing values, `missing_values` should be set to `np.nan`, since `pd.NA` will be converted to `np.nan`. features : {'missing-only', 'all'}, default='missing-only' Whether the imputer mask should represent all or a subset of features. - If `'missing-only'` (default), the imputer mask will only represent features containing missing values during fit time. - If `'all'`, the imputer mask will represent all features. sparse : bool or 'auto', default='auto' Whether the imputer mask format should be sparse or dense. - If `'auto'` (default), the imputer mask will be of same type as input. - If `True`, the imputer mask will be a sparse matrix. - If `False`, the imputer mask will be a numpy array. error_on_new : bool, default=True If `True`, :meth:`transform` will raise an error when there are features with missing values that have no missing values in :meth:`fit`. This is applicable only when `features='missing-only'`. Attributes ---------- features_ : ndarray of shape (n_missing_features,) or (n_features,) The features indices which will be returned when calling :meth:`transform`. They are computed during :meth:`fit`. If `features='all'`, `features_` is equal to `range(n_features)`. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 See Also -------- SimpleImputer : Univariate imputation of missing values. IterativeImputer : Multivariate imputation of missing values. Examples -------- >>> import numpy as np >>> from sklearn.impute import MissingIndicator >>> X1 = np.array([[np.nan, 1, 3], ... [4, 0, np.nan], ... [8, 1, 0]]) >>> X2 = np.array([[5, 1, np.nan], ... [np.nan, 2, 3], ... [2, 4, 0]]) >>> indicator = MissingIndicator() >>> indicator.fit(X1) MissingIndicator() >>> X2_tr = indicator.transform(X2) >>> X2_tr array([[False, True], [ True, False], [False, False]]) missing-onlyrrGautor$featuresrrPrKTc<||_||_||_||_yr*r)rMr$rrrPs r%rNzMissingIndicator.__init__us"-   (r'c|jst||j}n|}tj|rp|j |j dk(r|jd}|jdur|j}n|jdk(rz|j}ni|jst||j}n|}|j dk(r|jd}|jdurtj|}|j dk(r&tj|jd}||fStj }||fS) aCompute the imputer mask and the indices of the features containing missing values. Parameters ---------- X : {ndarray, sparse matrix} of shape (n_samples, n_features) The input data with missing values. Note that `X` has been checked in :meth:`fit` and :meth:`transform` before to call this function. Returns ------- imputer_mask : {ndarray, sparse matrix} of shape (n_samples, n_features) The imputer mask of the original data. features_with_missing : ndarray of shape (n_features_with_missing) The features containing missing values. rrrFcsrTrr5) _precomputedr r$r\r]eliminate_zerosrrrtoarrayr!tocsc csc_matrixr9rrr)rMr# imputer_maskrfeatures_indicess r%_get_missing_features_infoz+MissingIndicator._get_missing_features_infos7(  $Q(;(;  ( ( *}}.(,,!,4 {{e#+335 $$-+113 $$(D,?,?@  }}.(,,!,4 {{d"!}}\: ==E !!yy4 --- "~~i8 ---r'c^t|jsd}nd}t|||dd|}t||j|jj dvr$t dj|jtj|r|jdk(r t d|S) NTr)rr)rrrrrzMissingIndicator does not support data with dtype {0}. Please provide either a numeric array (with a floating point or integer dtype) or categorical data represented either as an array with integer dtype or an array of string values with an object dtype.rzSSparse input with missing_values=0 is not supported. Provide a dense array instead.) r r$rr&rrr r!r\r])rMr#rrs r%rz MissingIndicator._validate_inputsT001 $  +    (/   At223 77<<3 3( )/qww   ;;q>d11Q6!  r'cH|r8t|dr|jjdk(s tdd|_nd|_|js|j |d}nt ||d|jd|_|j|}|d|_ |d S) aOFit the transformer on `X`. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) Input data, where `n_samples` is the number of samples and `n_features` is the number of features. If `precomputed=True`, then `X` is a mask of the input data. precomputed : bool Whether the input data is a mask. Returns ------- imputer_mask : {ndarray, sparse matrix} of shape (n_samples, n_features) The imputer mask of the original data. rb4precomputed is True but the input data is not a maskTFr)rr5r) rXrrr rrrr _n_featuresrr)rMr#rrQmissing_features_infos r%rTzMissingIndicator._fits& Aw'AGGLLC,? !WXX $D  %D   $$Qt$4A dAT 2771: $ ? ? B.q1$Q''r'rc*|j|||S)aFit the transformer on `X`. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) Input data, where `n_samples` is the number of samples and `n_features` is the number of features. y : Ignored Not used, present for API consistency by convention. Returns ------- self : object Fitted estimator. )rT)rMr#rs r%rzMissingIndicator.fits$ !Q r'ct||js|j|d}n0t|dr|jj dk(s t d|j|\}}|jdk(rtj||j}|jr)|jdkDrt dj||jj|jkr|d d |jf}|S) aGenerate missing values indicator for `X`. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) The input data to complete. Returns ------- Xt : {ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing) The missing indicator for input data. The data type of `Xt` will be boolean. FrrrrrrzSThe features {} have missing values in transform but have no missing values in fit.N)rrrrXrrr rrr9 setdiff1drrPr6r!r)rMr#r rfeatures_diff_fit_transs r%rYzMissingIndicator.transforms   $$Qu$5AAw'AGGLLC,? !WXX!%!@!@!C h ==N *&(ll8T^^&L #  %<%A%AA%E $f%<= ~~""T%5%55+At~~,=> r'c|j||}|jj|jkr|dd|jf}|S)aGenerate missing values indicator for `X`. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) The input data to complete. y : Ignored Not used, present for API consistency by convention. Returns ------- Xt : {ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing) The missing indicator for input data. The data type of `Xt` will be boolean. N)rTrr6r)rMr#rr s r% fit_transformzMissingIndicator.fit_transformDsD&yyA >>  !1!1 1'4>>(9:Lr'ct|dt||}|jjj }t j ||jDcgc] }|d| c}tScc}w)rr_r) rrrprqlowerr9asarrayrr7)rMrfprefix feature_names r%rcz&MissingIndicator.get_feature_names_out^sz( ./0~F((..0zz%34>>$B  (!L>*     sA8ct|}d|j_d|j_d|j_g|j _|Sr)rjrkrlrmstringrtransformer_tagspreserves_dtyperns r%rkz!MissingIndicator.__sklearn_tags__}sGw')$(!!%!%02- r')NFr*)rqrrrsrtrrrKrurvr9r:rNrrrTr rrYrrcrkrwrxs@r%rRrRsRj)?+ 789j&23" $Dvv )5.n!F'(R56*'R562 >r'rR)+rr collectionsr functoolsrtypingrnumpyr9numpy.marscipyrr\baserr r utils._maskr utils._missingr r utils._param_validationrr utils.fixesrutils.sparsefuncsrutils.validationrrrrrr&r3rDrFrzrRr'r%r3s@@#8?+   $=NI#]IXl TLl T^j'jr'