`L i_dZddlZddlmZmZddlZddlmZ ddl m Z m Z m Z mZmZddlmZddlmZmZddlmZdd lmZdd lmZdd lmZmZmZmZm Z m!Z!Gd d e e e Z"Gdde ee Z#y)z6Dummy estimators that implement simple rules of thumb.N)IntegralReal) BaseEstimatorClassifierMixinMultiOutputMixinRegressorMixin _fit_context)check_random_state)Interval StrOptions)class_distribution)_random_choice_csc)_weighted_percentile)_check_sample_weight _num_samples check_arraycheck_consistent_lengthcheck_is_fitted validate_dataceZdZUdZehdgdgeeddgdZee d<ddddd Z e d dd Z d Z dZdZfdZdfd ZxZS)DummyClassifiera]DummyClassifier makes predictions that ignore the input features. This classifier serves as a simple baseline to compare against other more complex classifiers. The specific behavior of the baseline is selected with the `strategy` parameter. All strategies make predictions that ignore the input feature values passed as the `X` argument to `fit` and `predict`. The predictions, however, typically depend on values observed in the `y` parameter passed to `fit`. Note that the "stratified" and "uniform" strategies lead to non-deterministic predictions that can be rendered deterministic by setting the `random_state` parameter if needed. The other strategies are naturally deterministic and, once fit, always return the same constant prediction for any value of `X`. Read more in the :ref:`User Guide `. .. versionadded:: 0.13 Parameters ---------- strategy : {"most_frequent", "prior", "stratified", "uniform", "constant"}, default="prior" Strategy to use to generate predictions. * "most_frequent": the `predict` method always returns the most frequent class label in the observed `y` argument passed to `fit`. The `predict_proba` method returns the matching one-hot encoded vector. * "prior": the `predict` method always returns the most frequent class label in the observed `y` argument passed to `fit` (like "most_frequent"). ``predict_proba`` always returns the empirical class distribution of `y` also known as the empirical class prior distribution. * "stratified": the `predict_proba` method randomly samples one-hot vectors from a multinomial distribution parametrized by the empirical class prior probabilities. The `predict` method returns the class label which got probability one in the one-hot vector of `predict_proba`. Each sampled row of both methods is therefore independent and identically distributed. * "uniform": generates predictions uniformly at random from the list of unique classes observed in `y`, i.e. each class has equal probability. * "constant": always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class. .. versionchanged:: 0.24 The default value of `strategy` has changed to "prior" in version 0.24. random_state : int, RandomState instance or None, default=None Controls the randomness to generate the predictions when ``strategy='stratified'`` or ``strategy='uniform'``. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. constant : int or str or array-like of shape (n_outputs,), default=None The explicit constant as predicted by the "constant" strategy. This parameter is useful only for the "constant" strategy. Attributes ---------- classes_ : ndarray of shape (n_classes,) or list of such arrays Unique class labels observed in `y`. For multi-output classification problems, this attribute is a list of arrays as each output has an independent set of possible classes. n_classes_ : int or list of int Number of label for each output. class_prior_ : ndarray of shape (n_classes,) or list of such arrays Frequency of each class observed in `y`. For multioutput classification problems, this is computed independently for each output. n_features_in_ : int Number of features seen during :term:`fit`. feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. n_outputs_ : int Number of outputs. sparse_output_ : bool True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the input `y` is passed in sparse format. See Also -------- DummyRegressor : Regressor that makes predictions using simple rules. Examples -------- >>> import numpy as np >>> from sklearn.dummy import DummyClassifier >>> X = np.array([-1, 1, 1, 1]) >>> y = np.array([0, 1, 1, 1]) >>> dummy_clf = DummyClassifier(strategy="most_frequent") >>> dummy_clf.fit(X, y) DummyClassifier(strategy='most_frequent') >>> dummy_clf.predict(X) array([1, 1, 1, 1]) >>> dummy_clf.score(X, y) 0.75 >prioruniformconstant stratified most_frequent random_state array-likeNstrategyrr_parameter_constraintsrc.||_||_||_yNr )selfr!rrs S/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/dummy.py__init__zDummyClassifier.__init__s  (  Tprefer_skip_nested_validationc2t||d|j|_|jdk(r?tj|r*|j }t jdttj||_ |js*tj|}tj|}|jdk(rtj|d}|jd|_t#||| t%||}|jdk(r~|j& t)dtjtj|j&djd |j k7rt)d |j zt+||\|_|_|_|jdk(r~t3|j D]ft5fd |j,Dr&d j7|j&|j,j9}t)||j dk(r<|j.d |_|j,d |_|j0d |_|S) aFit the baseline classifier. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data. y : array-like of shape (n_samples,) or (n_samples, n_outputs) Target values. sample_weight : array-like of shape (n_samples,), default=None Sample weights. Returns ------- self : object Returns the instance itself. Tskip_check_arrayrzA local copy of the target data has been converted to a numpy array. Predicting on sparse target data with the uniform strategy would not save memory and would be slower.rrrMConstant target value has to be specified when the constant strategy is used.r0Constant target value should have shape (%d, 1).c34K|]}d|k(yw)rN).0crks r& z&DummyClassifier.fit..sI18A;q>Q.IszrThe constant target value must be present in the training data. You provided constant={}. Possible values are: {}.)rr! _strategyspissparsetoarraywarningswarn UserWarningsparse_output_npasarray atleast_1dndimreshapeshape n_outputs_rrr ValueErrorrclasses_ n_classes_ class_prior_rangeanyformattolist)r%Xy sample_weighterr_msgrr6s @@r&fitzDummyClassifier.fits&( dA5 >>Y &2;;q> A MM+  !kk!n"" 1 A a A 66Q; 1g&A''!*1%  $0BM >>Z '}}$ : ::bmmDMM&BGL>>!$7$J//* ?Q }? ;): >>Z '4??+ .I a8HII3396 MM4==+;+B+B+D4%W-- . ??a "ooa0DO MM!,DM $ 1 1! 4D  r(c &t|t|}t|j}|j}|j }|j }|j}|jdk(r |g}|g}|g}|g}|jdk(r#|j|}|jdk(r|g}|jrd} |jdvr2|D cgc]&} tj| jg(}} n^|jdk(r|} nL|jdk(r td|jdk(r#|D cgc]} tj| g}} t!||| |j} | S|jdvrRtj"t%|jD cgc]} || || jc} |dg} n|jdk(rZtj&t%|jD cgc]} || | jdc} j(} n|jdk(r\t%|jD cgc]} || |j+|| |  }} tj&|j(} n1|jdk(r"tj"|j|df} |jdk(rtj, }  Scc} wcc} wcc} wcc} wcc} w) a;Perform classification on test vectors X. Parameters ---------- X : array-like of shape (n_samples, n_features) Test data. Returns ------- y : array-like of shape (n_samples,) or (n_samples, n_outputs) Predicted target values for X. rrN)rrrzCSparse target prediction is not supported with the uniform strategyraxissize)rrr rrIrHrJrrFr8 predict_probar?r@arrayargmaxrGrtilerKvstackTrandintravel)r%rO n_samplesrsrIrHrJrproba class_probcpr5rPr6rets r&predictzDummyClassifier.predicts !O  1 1 2__ ==(( == ??a $J zH(>L zH >>\ )&&q)E!#   J~~!;;>JKBHHbiik]3KK</) 9, : :-3;:!#HHQK]L= s +K:6K?# L"L #LcPt|t|}t|j}|j}|j }|j }|j}|jdk(r |g}|g}|g}|g}g}t|jD]} |jdk(rH|| j} tj||| ftj} d| dd| f<n|jdk(rtj|df|| z} n|jdk(r7|j!d|| |} | j#tj} n|jd k(r3tj||| ftj} | || z} n`|jd k(rQtj$|| || k(} tj||| ftj} d| dd| f<|j' |jdk(r|d }|S) a Return probability estimates for the test vectors X. Parameters ---------- X : array-like of shape (n_samples, n_features) Test data. Returns ------- P : ndarray of shape (n_samples, n_classes) or list of such arrays Returns the probability of the sample for each class in the model, where classes are ordered arithmetically, for each output. rrdtype?NrrrWrrr)rrr rrIrHrJrrFrKr8r[r@zerosfloat64ones multinomialastypewhereappend) r%rOrarbrIrHrJrPr6indouts r&rYzDummyClassifier.predict_probaSs !O  1 1 2__ ==(( == ??a $J zH(>L zH t' A~~0"1o,,.hh :a=9L!AsF 7*ggy!n- Q?</nnQ QinHjj,9,ggy*Q-8 Kz!}$:-hhx{hqk9:hh :a=9L!AsF HHSM+ . ??a !Ar(c|j|}|jdk(rtj|S|Dcgc]}tj|c}Scc}w)a Return log probability estimates for the test vectors X. Parameters ---------- X : {array-like, object with finite length or shape} Training data. Returns ------- P : ndarray of shape (n_samples, n_classes) or list of such arrays Returns the log probability of the sample for each class in the model, where classes are ordered arithmetically for each output. r)rYrFr@log)r%rOrcps r&predict_log_probaz!DummyClassifier.predict_log_probasJ ""1% ??a 66%= ',-!BFF1I- --sAcvt|}d|j_d|j_d|_|SNT)super__sklearn_tags__ input_tagssparseclassifier_tags poor_score no_validationr%tags __class__s r&r}z DummyClassifier.__sklearn_tags__s7w')!%*.'! r(cl|!tjt|df}t||||S)akReturn the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters ---------- X : None or array-like of shape (n_samples, n_features) Test samples. Passing None as test samples gives the same result as passing real test samples, since DummyClassifier operates independently of the sampled observations. y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X. sample_weight : array-like of shape (n_samples,), default=None Sample weights. Returns ------- score : float Mean accuracy of self.predict(X) w.r.t. y. rrEr@rllenr|scorer%rOrPrQrs r&rzDummyClassifier.scores32 9A{+Aw}Q=11r(r$)__name__ __module__ __qualname____doc__r rstrr"dict__annotations__r'r rSrgrYryr}r __classcell__rs@r&rr"sof V W ((sL$7 $D$+! 5X6XtUn=~.,22r(rceZdZUdZehdgeeddddgeedddd dgd Zee d <d ddd dZ e dddZ ddZ fdZdfd ZxZS)DummyRegressoraRegressor that makes predictions using simple rules. This regressor is useful as a simple baseline to compare with other (real) regressors. Do not use it for real problems. Read more in the :ref:`User Guide `. .. versionadded:: 0.13 Parameters ---------- strategy : {"mean", "median", "quantile", "constant"}, default="mean" Strategy to use to generate predictions. * "mean": always predicts the mean of the training set * "median": always predicts the median of the training set * "quantile": always predicts a specified quantile of the training set, provided with the quantile parameter. * "constant": always predicts a constant value that is provided by the user. constant : int or float or array-like of shape (n_outputs,), default=None The explicit constant as predicted by the "constant" strategy. This parameter is useful only for the "constant" strategy. quantile : float in [0.0, 1.0], default=None The quantile to predict using the "quantile" strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum. Attributes ---------- constant_ : ndarray of shape (1, n_outputs) Mean or median or quantile of the training targets or constant value given by the user. n_features_in_ : int Number of features seen during :term:`fit`. feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. n_outputs_ : int Number of outputs. See Also -------- DummyClassifier: Classifier that makes predictions using simple rules. Examples -------- >>> import numpy as np >>> from sklearn.dummy import DummyRegressor >>> X = np.array([1.0, 2.0, 3.0, 4.0]) >>> y = np.array([2.0, 3.0, 5.0, 10.0]) >>> dummy_regr = DummyRegressor(strategy="mean") >>> dummy_regr.fit(X, y) DummyRegressor() >>> dummy_regr.predict(X) array([5., 5., 5., 5.]) >>> dummy_regr.score(X, y) 0.0 >meanmedianrquantilegrkboth)closedNneitherr)r!rrr"rr!rrc.||_||_||_yr$r)r%r!rrs r&r'zDummyRegressor.__init__s      r(Tr)ct||dt|dd}t|dk(r td|jdk(rt j |d }|jd|_t|||| t||}|jd k(rt j|d| |_ n|jd k(r]|t j|d|_ ngt|jDcgc]}t!|d d |f|dc}|_ n*|jdk(r|j" td|j"dz}|t j$|d||_ nt|jDcgc]}t!|d d |f||c}|_ n|jdk(r|j& t)dt|j&gddd|_ |jdk7rD|jjd|jdk7rtd|jdzt j |jd|_ |Scc}wcc}w)aFit the baseline regressor. Parameters ---------- X : array-like of shape (n_samples, n_features) Training data. y : array-like of shape (n_samples,) or (n_samples, n_outputs) Target values. sample_weight : array-like of shape (n_samples,), default=None Sample weights. Returns ------- self : object Fitted estimator. Tr,FrP) ensure_2d input_namerzy must not be empty.rr.Nr)rVweightsrrUgI@)percentile_rankrz^When using `strategy='quantile', you have to specify the desired quantile in the range [0, 1].gY@)rVqrr0)csrcsccoo) accept_sparserensure_min_samplesr1)rr/)rrrrGrCr@rDrErFrrr!average constant_rrKrr percentiler TypeError)r%rOrPrQr6rs r&rSzDummyRegressor.fitsA( dA5 Us ; q6Q;34 4 66Q; 1g&A''!*1m4  $0BM ==F "ZZ=IDN ]]h &$!#11!5#4??3")1a4-QUV" ]]j (}}$ 4#mme3O$!#qqO!L #4??3 ")!Q$"]]j (}}$: ) 3#$ DN!#(<(I5I:ct|t|}tj||jf|j tj |j j}tj||jf}|jdk(r*tj|}tj|}|r||fS|S)aPerform classification on test vectors X. Parameters ---------- X : array-like of shape (n_samples, n_features) Test data. return_std : bool, default=False Whether to return the standard deviation of posterior prediction. All zeros in this case. .. versionadded:: 0.20 Returns ------- y : array-like of shape (n_samples,) or (n_samples, n_outputs) Predicted target values for X. y_std : array-like of shape (n_samples,) or (n_samples, n_outputs) Standard deviation of predictive distribution of query points. rir) rrr@fullrFrrZrjrlr`)r%rO return_stdrarPy_stds r&rgzDummyRegressor.predictts,  O GG  ( NN((4>>*00  )T__56 ??a  AHHUOE'5z.Q.r(cvt|}d|j_d|j_d|_|Sr{)r|r}r~rregressor_tagsrrrs r&r}zDummyRegressor.__sklearn_tags__s7w')!%)-&! r(cl|!tjt|df}t||||S)aReturn the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as `(1 - u/v)`, where `u` is the residual sum of squares `((y_true - y_pred) ** 2).sum()` and `v` is the total sum of squares `((y_true - y_true.mean()) ** 2).sum()`. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters ---------- X : None or array-like of shape (n_samples, n_features) Test samples. Passing None as test samples gives the same result as passing real test samples, since `DummyRegressor` operates independently of the sampled observations. y : array-like of shape (n_samples,) or (n_samples, n_outputs) True values for X. sample_weight : array-like of shape (n_samples,), default=None Sample weights. Returns ------- score : float R^2 of `self.predict(X)` w.r.t. y. rrrrs r&rzDummyRegressor.scores3: 9A{+Aw}Q=11r(r$)F)rrrrr r rr"rrr'r rSrgr}rrrs@r&rrs?D JKLdCVrsb< "&90,-h2&h2V s2%~}s2r(