`L i[dZddlZddlZddlmZmZddlmZddlmZm Z ddl Z ddl m Z mZmZddlm ZddlmZd d lmZmZmZmZd d lmZd d lmZmZd d lmZd dl m!Z!d dl"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(d dl)m*Z*m+Z+m,Z,m-Z-m.Z.d dl/m0Z0m1Z1m2Z2d dl3m4Z4m5Z5d dl6m7Z7d dl8m9Z9m:Z:m;Z;mm?Z?d dl@mAZAmBZBmCZCddlDmEZEmFZFmGZGmHZHddlImJZJdZK dRdZLddddddddZMd ZNdSd"ZOdTd#ZP dUd$ZQd%ZRe2d&d'ejgd&ge0e ddd()d&ge0e ddd*)d&dge1hd+ge0eddd()dge0e ddd()gd,gd-gd.gd-gd-gd-gd/ d0dd1dddd!dd!d!dd2 d3ZT dVd4ZUd5ZVd6ZWGd7d8eFe9ZXGd:d;eeeXZYGd<d=eEZZGd>d?eZeXZ[d@Z\dAZ]GdBdCejjZ^GdDdEejjZ_GdFdGeeZ`GdHdIeEeZaGdJdKeFZbGdLdMeFZcGdNdOeeecZdGdPdQeZecZey)Wz Ridge regression N)ABCMetaabstractmethod)partial)IntegralReal)linalgoptimizesparse)r) BaseEstimator)MultiOutputMixinRegressorMixin _fit_context is_classifier)ConvergenceWarning) check_scoringget_scorer_names) GridSearchCV)LabelBinarizer)Bunch check_arraycheck_consistent_length check_scalar column_or_1dcompute_sample_weight)_is_numpy_namespace_raveldevice get_namespaceget_namespace_and_device)Interval StrOptionsvalidate_params) row_normssafe_sparse_dot)_sparse_linalg_cg)MetadataRouter MethodMapping_raise_for_params_routing_enabledprocess_routing)mean_variance_axis)_check_sample_weightcheck_is_fitted validate_data)LinearClassifierMixin LinearModel_preprocess_data _rescale_data) sag_solvercfd}fd}tjjj||}|S)zCreate LinearOperator for matrix products with implicit centering. Matrix product `LinearOperator @ coef` returns `(X - X_offset) @ coef`. cPj||jzz SN)dotbXX_offsetsample_weight_sqrts a/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/linear_model/_ridge.pymatvecz&_get_rescaled_operator..matvec>s$uuQx,quuX>>>cdjj||jzz Sr8)Tr9r:s r?rmatvecz'_get_rescaled_operator..rmatvecAs)sswwqzHquu-?'@@@@rA)shaper@rD)r rLinearOperatorrE)r<r=r>r@rDX1s``` r?_get_rescaled_operatorrH8s4 ?A  % %AGGFG % TB IrA-C6?c j|.tj|jd|j}|j\} } ||t j |n||z } t || |tj|jd| f|j} | | kDrfd} nfd} t|jdD]}|dd|f}| ||}| | kDrJt j| | f||j}t|||\}}j|| |<nJj|}t j| | f||j}t||||\| |<}|dkrtd |z||dkDs|stjd |zt| S) Nrdtyper0cfd}|S)NcPjj||zzSr8)r@rDxrG curr_alphas r?_mvz0_solve_sparse_cg..create_mv.._mvcs#yyA/*q.@@rArQrRrGs` r? create_mvz#_solve_sparse_cg..create_mvb AJrAcfd}|S)NcPjj||zzSr8)rDr@rOs r?rRz0_solve_sparse_cg..create_mv.._mvks#zz"))A,/*q.@@rArSrTs` r?rUz#_solve_sparse_cg..create_mvjrVrA)r@rL)rtol)maxiterrYzFailed with error code %dz/sparse_cg did not converge after %d iterations.)nponesrErL sp_linalgaslinearoperatorrHemptyrangerFr&rD ValueErrorwarningswarnr)r<yalphamax_itertolverboser=X_scaler> n_samples n_featuresX_offset_scalecoefsrUiy_columnmvCcoefinforGs @r?_solve_sparse_cgrtHs!WWQWWQZqww?GGIz7?  ' ' *!G+ #A~7I J HHaggaj*-QWW =EI   1771: QT7 uQx   !((I&rA+1hSAJD$zz$'E!Hzz(+H((Z(177A/q(HSVWNE!Hd !884?@ @  qW MMADH" 3< LrAT) fit_interceptrfrgr=rir>c v|.tj|jd|j}t j |r|r||z } t || |} n|} |j\} } tj|jd| f|j} tj|jdtj}tj|}t|jdD]9}|dd|f}tj| ||||||}|d| |<|d||<;| |fS)aSolve Ridge regression via LSQR. We expect that y is always mean centered. If X is dense, we expect it to be mean centered such that we can solve ||y - Xw||_2^2 + alpha * ||w||_2^2 If X is sparse, we expect X_offset to be given such that we can solve ||y - (X - X_offset)w||_2^2 + alpha * ||w||_2^2 With sample weights S=diag(sample_weight), this becomes ||sqrt(S) (y - (X - X_offset) w)||_2^2 + alpha * ||w||_2^2 and we expect y and X to already be rescaled, i.e. sqrt(S) @ y, sqrt(S) @ X. In this case, X_offset is the sample_weight weighted mean of X before scaling by sqrt(S). The objective then reads ||y - (X - sqrt(S) X_offset) w)||_2^2 + alpha * ||w||_2^2 NrrKr0)dampatolbtoliter_limr ) r[r\rErLr issparserHr_int32sqrtr`r]lsqr)r<rdrerurfrgr=rir>rlrGrjrkrmn_iter sqrt_alpharnrorss r? _solve_lsqrrs8!WWQWWQZqww? qm!G+ #A~7I JGGIz HHaggaj*-QWW =E XXaggaj 1FJ 1771: QT7~~ z!}3S8 7aGq  &=rAc|jd}|jd}t|j|d}t|j|d}tj|t ||dgz}|rC|j dd|dzxx|dz cc<tj||ddjStj||g|j}t||j|D]j\} } } |j dd|dzxx| z cc<tj|| ddj| dd|j dd|dzxx| zcc<l|S) Nr0T dense_outputrposassume_a overwrite_arKF) rEr%rCr[ array_equallenflatrsolver_rLzipravel) r<rdrerk n_targetsAXy one_alpharmrrtarget current_alphas r?_solve_choleskyrs?J IQT2A ad 3Buc%jE!H:&=>I  *q. !U1X-!||ArEtDFFF)Z0@+.ubddE+B 7 'D&- FF$j1n$ % 6 %ll1fu%PVVXDG FF$j1n$ % 6 % 7 rAFc|jd}|jd}|r|j}tj|}||dk(j }t |tj xs|dv}|r[tjtj|} || ddtjfz}|tj| | z}|rx|jdd|dzxx|dz cc< tj||dd} |jdd|dzxx|dzcc<|r|  ddtjfz} | Stj"||g|j$} t'| |j(|D]j\} } } |jdd|dzxx| z cc<tj|| ddj+| dd|jdd|dzxx| zcc<l|r|  tjddfz} | j(S#tjj$r2tjdtj ||d} YZwxYw)Nrr0)?NrFrzNSingular matrix in solving dual problem. Using least-squares solution instead.)rEcopyr[ atleast_1dall isinstancendarrayr}newaxisouterrrr LinAlgErrorrbrclstsqr_rLrrCr)Krdre sample_weightrrjrrhas_swsw dual_coef dual_coefsrrs r?_solve_cholesky_kernelrs? I I FFH MM% E%("'')I  rzz 2 Vm;6VF WWR]]=1 2 1bjj=! ! RXXb"  )a- E!H,  . QEuMI )a- E!H,  ArzzM* *IXXy)4agg> 03JU0K 6 ,Iv} FF#i!m# $ 5 $!<<6Eueg aL FF#i!m# $ 5 $ 6  "RZZ]+ +J||?yy$$ . MM2  Q*1-I  .s<H!!A I0/I0ct||\}}|jj|d\}}}|dkD}||dddf} |j|z} |j |j d|j df|j t|} | | dz|zz | |<| | z} |j| zjS)NxpF full_matricesgV瞯ros r?funcz_solve_lbfgs..funcWsuuQx(*H).~1FFFhll8,,sU1X~a/HHA33>E!HqL0D)6H)IIId7NrAsuccesszWThe lbfgs solver did not converge. Try increasing max_iter or tol. Currently: max_iter=z and tol=rP) rEr[infr\rLr_r`rr minimizerbrcr)r<rdrepositiverfrgr=rir>rjrkrconfigrmx0rresultrlrnros` ` ` @@@r? _solve_lbfgsr)sX$GGIzG%  F K=:5x 3!G+!WWQWWQZqww? HHaggaj*-QWW =E 1771:  XXzm $QT7  ""46v6i MM33;*IcUL#  #;a14 LrAc|r|dvrygdS)N)autosagsagacsrrcsccoorS) is_X_sparsesolvers r?_get_valid_accept_sparserpsv!88$$rA array-likez sparse matrixleftclosedneither>rrrr~rlbfgscholesky sparse_cgrhboolean random_state r<rdrerrrfrgrhrr return_n_iterreturn_intercept check_inputprefer_skip_nested_validationr) rrrfrgrhrrrrrc 6t|||||||||| | | dd| S)aSolve the ridge equation by the method of normal equations. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix, LinearOperator} of shape (n_samples, n_features) Training data. y : array-like of shape (n_samples,) or (n_samples, n_targets) Target values. alpha : float or array-like of shape (n_targets,) Constant that multiplies the L2 term, controlling regularization strength. `alpha` must be a non-negative float i.e. in `[0, inf)`. When `alpha = 0`, the objective is equivalent to ordinary least squares, solved by the :class:`LinearRegression` object. For numerical reasons, using `alpha = 0` with the `Ridge` object is not advised. Instead, you should use the :class:`LinearRegression` object. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number. sample_weight : float or array-like of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. If sample_weight is not None and solver='auto', the solver will be set to 'cholesky'. .. versionadded:: 0.17 solver : {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto' Solver to use in the computational routines: - 'auto' chooses the solver automatically based on the type of data. - 'svd' uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than 'cholesky' at the cost of being slower. - 'cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution via a Cholesky decomposition of dot(X.T, X) - 'sparse_cg' uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than 'cholesky' for large-scale data (possibility to set `tol` and `max_iter`). - 'lsqr' uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure. - 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. - 'lbfgs' uses L-BFGS-B algorithm implemented in `scipy.optimize.minimize`. It can be used only when `positive` is True. All solvers except 'svd' support both dense and sparse data. However, only 'lsqr', 'sag', 'sparse_cg', and 'lbfgs' support sparse input when `fit_intercept` is True. .. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver. max_iter : int, default=None Maximum number of iterations for conjugate gradient solver. For the 'sparse_cg' and 'lsqr' solvers, the default value is determined by scipy.sparse.linalg. For 'sag' and saga solver, the default value is 1000. For 'lbfgs' solver, the default value is 15000. tol : float, default=1e-4 Precision of the solution. Note that `tol` has no effect for solvers 'svd' and 'cholesky'. .. versionchanged:: 1.2 Default value changed from 1e-3 to 1e-4 for consistency with other linear models. verbose : int, default=0 Verbosity level. Setting verbose > 0 will display additional information depending on the solver used. positive : bool, default=False When set to ``True``, forces the coefficients to be positive. Only 'lbfgs' solver is supported in this case. random_state : int, RandomState instance, default=None Used when ``solver`` == 'sag' or 'saga' to shuffle the data. See :term:`Glossary ` for details. return_n_iter : bool, default=False If True, the method also returns `n_iter`, the actual number of iteration performed by the solver. .. versionadded:: 0.17 return_intercept : bool, default=False If True and if X is sparse, the method also returns the intercept, and the solver is automatically changed to 'sag'. This is only a temporary fix for fitting the intercept with sparse data. For dense data, use sklearn.linear_model._preprocess_data before your regression. .. versionadded:: 0.17 check_input : bool, default=True If False, the input arrays X and y will not be checked. .. versionadded:: 0.21 Returns ------- coef : ndarray of shape (n_features,) or (n_targets, n_features) Weight vector(s). n_iter : int, optional The actual number of iteration performed by the solver. Only returned if `return_n_iter` is True. intercept : float or ndarray of shape (n_targets,) The intercept of the model. Only returned if `return_intercept` is True and if X is a scipy sparse array. Notes ----- This function won't compute the intercept. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to ``1 / (2C)`` in other linear models such as :class:`~sklearn.linear_model.LogisticRegression` or :class:`~sklearn.svm.LinearSVC`. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number. Examples -------- >>> import numpy as np >>> from sklearn.datasets import make_regression >>> from sklearn.linear_model import ridge_regression >>> rng = np.random.RandomState(0) >>> X = rng.randn(100, 4) >>> y = 2.0 * X[:, 0] - 1.0 * X[:, 1] + 0.1 * rng.standard_normal(100) >>> coef, intercept = ridge_regression(X, y, alpha=1.0, return_intercept=True, ... random_state=0) >>> coef array([ 1.97, -1., -2.69e-3, -9.27e-4 ]) >>> intercept np.float64(-.0012) N) rrrfrgrhrrrrrir=r)_ridge_regressionrs r?ridge_regressionrws?V    # !#) rAc t|||| |\}}}t|}tj|}|du}t ||| ||}|r|st j |}|s!|dk7rtd|jd|d|r|dk7rtd|d|dk(r |s td| r|d k7r td |rL|j|jg}t||}t|||d }t||jd d}t|||j\}}|j dkDr!tdt#|jz|j dk(r|j%|d}|j\}}||k7rtd||fz|r-t'|||j}|dvrt)|||\}}}|Ct+|t-|j dgst/|dt0j2dd}t5|j |||j|}|jdd|fvrtd|jd|fz|jddk(r(|dkDr#|j7|f|d|j|}d}|dk(rt9|||||||| |rnd }n|d!k(rt;|||||||| |rnd" \}}n|d#k(r_||kDrJt=||j>d$%} tA| ||}!t=|j>|!d$%j>}n tG|||}n|dvrjtI|d$&jK}"t jL|jd|f|j}t jL|jdt jN}t jP|jdf|j}#tStU||j>D]\}$\}%}&d't jP|tW| zdf|ji}'tY||&j[|d(|%d|||| d |"|'|d)k(*\}(})}*| r|(dd+||$<|(d+|#|$<n|(||$<|)||$<|#jddk(r#|#d}#n|dk(rt]|||||||| |rnd, }|dk(r|r t_d-ta||||}|dk(r t5}|j }| r| r||#f}+n| r|#f}+n | r||f}+n|}+| rg|+|S|+S#tBjD$rd}Y}wxYw#tBjD$rd}YwxYw).Nrz Array API dispatch to namespace z" only supports solver 'svd'. Got 'z'.rzJWhen positive=True, only 'lbfgs' solver can be used. Please change solver z" to 'lbfgs' or set positive=False.N'lbfgs' solver can be used only when positive=True. Please use another solver.rzxIn Ridge, only 'sag' solver can directly fit the intercept. Please change solver to 'sag' or set return_intercept=False.rq) accept_sparserLorderF)rL ensure_2drr zTarget y has the wrong shape %sr0)r0z:Number of samples in X and y does not correspond: %d != %drK)rrrer target_typemin_valinclude_boundaries)rrLrrzENumber of targets and number of penalties do not correspond: %d != %d)rE fill_valuerLrr)rfrgrhr=rir>r~)rerurfrgr=rir>rTr)squaredrrrr)is_sagar)rrgrfr=rir>z3SVD solver does not support sparse inputs currently)1r rr r{resolve_solverr[asarrayra__name__float64float32rrrLrrEndimstrreshaper-r4rtypernumbersrrfullrtrr%rCrrrrr$maxr_r|r enumeraterintr5rr TypeErrorr),r<rdrerrrfrgrhrrrr return_solverrir=rruris_array_api_compliantdevice_is_numpy_namespace X_is_sparser_dtype_accept_sparserjrk n_samples_rr>rrrrrmax_squared_sum interceptrnalpha_irinitcoef_n_iter_rress, r?rrUs&+C 1mWh+'B-R0//!$K $ &F FH.> R PF+ JJqM &E/.r{{m<""( -  Fg% $$*8,% %   )  FeO &  **bjj)1+vF vS Q E FAq!GGIzvvz:S\IJJvv{ JJq' "GGJ J H*% &  ,]AQWWM  ((5Q='I $Aq$ E4 C58I3J!K   %   2::eG177:C KE {{1~a^+ S{{1~y) *   {{1~y1},585;;w F  5;1   6 " '5;1  f :   !133T:A 21a? &qssIDIKK  &q!U3 ? "#At488:xxZ0@!''!*BHH5HHaggaj]!'': $-c%o$> A *s3C/D"Da!HPQPWPWXD!+ &(! E7A  *Q$Ry ! QF1I3 6 ??1  "!! I 7  5;1   QR R!Qr*A~d| ::d D)FI% Io Fl*>S>&>33_%%   %%  s$/T;: U;UUU.-U.c |dk7r|St|}t|||}|r|S|rtd|jdd}||k7r,t j d|jd|d|d|d |S) NrzYThe solvers that support positive fitting do not support Array API dispatch to namespace zc. Please either disable Array API dispatch, or use a numpy-like namespace, or set `positive=False`.rz&Using Array API dispatch to namespace z7 with `solver='auto'` will result in using the solver 'zr'. The results may differ from those when using a Numpy array, because in that case the preferred solver would be z. Set `solver='z'` to suppress this warning.)rresolve_solver_for_numpyrarrbrc)rrr is_sparserr auto_solver_nps r?rr:s  ,R0-h8H)TN //1{{m<2 2  F  4R[[MB@@FxHBBPAQR"8#?  A  MrAc|ry|ry|syy)NrrrrrS)rrrs r?rrZs  rAc eZdZUeedddej gdgdgeeddddgeedddgehdgdgdgd Z e e d <e dd d dd d ddddZ ddZy) _BaseRidgerNrrrr0>rrrr~rrrrrrerucopy_Xrfrgrrr_parameter_constraintsTrIrFrurrfrgrrrct||_||_||_||_||_||_||_||_yr8r) selfrerurrfrgrrrs r?__init__z_BaseRidge.__init__xs> *      (rAct|||\}}|jdk(r|js td|jr4|jdvrtd|jd|j}nt j |r|j r|jdvr$tdj|j|jdvr |j}nQ|jd k(r3|j'|jd kDrtjd d }nd }n |j}|t|||j }t|||j |j|\}}}}} |d k(rt j |r|j rzt!|||j"||j|jd |j|j$dddd \|_|_|_|_|xj*|z c_|St j |r|j r|| d} ni} t!||f|j"||j|j||j|j$dddd|j d | \|_|_|_|j/||| |S)Nrr)rrzsolver='zh' does not support positive fitting. Please set the solver to 'auto' or 'lbfgs', or set `positive=False`)rrr~rrzsolver='{}' does not support fitting the intercept on sparse data. Please set the solver to 'auto' or 'lsqr', 'sparse_cg', 'sag', 'lbfgs' or set `fit_intercept=False`)r~rrrIz"sag" solver requires many iterations to fit an intercept with sparse inputs. Either set the solver to "auto" or "sparse_cg", or set a low "tol" and a high "max_iter" (especially if inputs are not standardized).rrKrurrTF) rerrfrgrrrrrrr)r=ri) rerrfrgrrrrrrrru)rrrrar r{ruformatrfrgrbrcr-rLr3rrrerrr intercept_solver__set_intercept) r"r<rdrrrrr=y_offsetriparamss r?fitz_BaseRidge.fits%21a%G" " ;;' !$---  =={{"33 t{{m,TT  __Q D$6$6{{"OO 34:6$++3F  {{//%$--*?DHHtO )$[[F  $0QM-= ,,' - )1h' U?vq1d6H6HFWjj+HH!.."!%"!G CDJ dot| OOx 'O: 5q!d&8&8&.7C5F6jj+HH!.."!&"!"0066 2DJ dl"   (G < rArr8)r __module__ __qualname__r!rr[rrr"rdict__annotations__rr#r,rSrAr?rrhs4D8"**E#+h4?Fq$v67 X  K'( $D ) ))*crAr) metaclassc feZdZdZ d ddddddddfd Zed d fd Zfd ZxZS)RidgeaRLinear least squares with l2 regularization. Minimizes the objective function:: ||y - Xw||^2_2 + alpha * ||w||^2_2 This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)). Read more in the :ref:`User Guide `. Parameters ---------- alpha : {float, ndarray of shape (n_targets,)}, default=1.0 Constant that multiplies the L2 term, controlling regularization strength. `alpha` must be a non-negative float i.e. in `[0, inf)`. When `alpha = 0`, the objective is equivalent to ordinary least squares, solved by the :class:`LinearRegression` object. For numerical reasons, using `alpha = 0` with the `Ridge` object is not advised. Instead, you should use the :class:`LinearRegression` object. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number. fit_intercept : bool, default=True Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations (i.e. ``X`` and ``y`` are expected to be centered). copy_X : bool, default=True If True, X will be copied; else, it may be overwritten. max_iter : int, default=None Maximum number of iterations for conjugate gradient solver. For 'sparse_cg' and 'lsqr' solvers, the default value is determined by scipy.sparse.linalg. For 'sag' solver, the default value is 1000. For 'lbfgs' solver, the default value is 15000. tol : float, default=1e-4 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for each solver: - 'svd': `tol` has no impact. - 'cholesky': `tol` has no impact. - 'sparse_cg': norm of residuals smaller than `tol`. - 'lsqr': `tol` is set as atol and btol of scipy.sparse.linalg.lsqr, which control the norm of the residual vector in terms of the norms of matrix and coefficients. - 'sag' and 'saga': relative change of coef smaller than `tol`. - 'lbfgs': maximum of the absolute (projected) gradient=max|residuals| smaller than `tol`. .. versionchanged:: 1.2 Default value changed from 1e-3 to 1e-4 for consistency with other linear models. solver : {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto' Solver to use in the computational routines: - 'auto' chooses the solver automatically based on the type of data. - 'svd' uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than 'cholesky' at the cost of being slower. - 'cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution. - 'sparse_cg' uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than 'cholesky' for large-scale data (possibility to set `tol` and `max_iter`). - 'lsqr' uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure. - 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. - 'lbfgs' uses L-BFGS-B algorithm implemented in `scipy.optimize.minimize`. It can be used only when `positive` is True. All solvers except 'svd' support both dense and sparse data. However, only 'lsqr', 'sag', 'sparse_cg', and 'lbfgs' support sparse input when `fit_intercept` is True. .. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver. positive : bool, default=False When set to ``True``, forces the coefficients to be positive. Only 'lbfgs' solver is supported in this case. random_state : int, RandomState instance, default=None Used when ``solver`` == 'sag' or 'saga' to shuffle the data. See :term:`Glossary ` for details. .. versionadded:: 0.17 `random_state` to support Stochastic Average Gradient. Attributes ---------- coef_ : ndarray of shape (n_features,) or (n_targets, n_features) Weight vector(s). intercept_ : float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 if ``fit_intercept = False``. n_iter_ : None or ndarray of shape (n_targets,) Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None. .. versionadded:: 0.17 n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 solver_ : str The solver that was used at fit time by the computational routines. .. versionadded:: 1.5 See Also -------- RidgeClassifier : Ridge classifier. RidgeCV : Ridge regression with built-in cross validation. :class:`~sklearn.kernel_ridge.KernelRidge` : Kernel ridge regression combines ridge regression with the kernel trick. Notes ----- Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to ``1 / (2C)`` in other linear models such as :class:`~sklearn.linear_model.LogisticRegression` or :class:`~sklearn.svm.LinearSVC`. Examples -------- >>> from sklearn.linear_model import Ridge >>> import numpy as np >>> n_samples, n_features = 10, 5 >>> rng = np.random.RandomState(0) >>> y = rng.randn(n_samples) >>> X = rng.randn(n_samples, n_features) >>> clf = Ridge(alpha=1.0) >>> clf.fit(X, y) Ridge() TNrIrFr c 4t |||||||||yNr)superr#) r"rerurrfrgrrr __class__s r?r#zRidge.__init__s/ '%  rArc ttj||j}t |||\}}t |||||j |jgddd\}}t|%|||S)a%Fit Ridge regression model. Parameters ---------- X : {ndarray, sparse matrix} of shape (n_samples, n_features) Training data. y : ndarray of shape (n_samples,) or (n_samples, n_targets) Target values. sample_weight : float or ndarray of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. Returns ------- self : object Fitted estimator. T)rrLforce_writeable multi_output y_numericr) rr r{rrr/rrr7r,)r"r<rdrr rrr8s r?r,z Ridge.fitsz*2&//!2DdkkRaM2A  (::rzz*   1w{1a}{==rAct|}d|_|jdk7xr|jdk7xs |j |j _|S)NTrr)r7__sklearn_tags__array_api_supportrru input_tagsr r"tagsr8s r?r?zRidge.__sklearn_tags__sVw')!%"&++"6" KK: % ?T-?-?)?  rAr-r8) rr.r/__doc__r#rr,r? __classcell__r8s@r?r4r4sTqj   .5 >6 >DrAr4c>eZdZdZfdZedZfdZxZS)_RidgeClassifierMixinc ttj||}t||||ddd\}}t dd|_|j j |}|j jjds t|d}t|||j }|jr|t|j|z}||||fS) aValidate `X` and `y` and binarize `y`. Parameters ---------- X : {ndarray, sparse matrix} of shape (n_samples, n_features) Training data. y : ndarray of shape (n_samples,) Target values. sample_weight : float or ndarray of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. solver : str The solver used in `Ridge` to know which sparse format to support. Returns ------- X : {ndarray, sparse matrix} of shape (n_samples, n_features) Validated training data. y : ndarray of shape (n_samples,) Validated target values. sample_weight : ndarray of shape (n_samples,) Validated sample weights. Y : ndarray of shape (n_samples, n_classes) The binarized version of `y`. TF)rr;r<r:r0r) pos_label neg_label multilabel)rcrK)rr r{r/r_label_binarizer fit_transformy_type_ startswithrr-rL class_weightr)r"r<rdrrrYs r? _prepare_dataz#_RidgeClassifierMixin._prepare_datas@11CVL   '  1!/b I  ! ! / / 2$$,,77 EQT*A,]AQWWM   ),A$BSBSUV,WWM!]A%%rAct|dg|jjjdr5d|j |dkDzdz }|jj |St ||S)aIPredict class labels for samples in `X`. Parameters ---------- X : {array-like, spare matrix} of shape (n_samples, n_features) The data matrix for which we want to predict the targets. Returns ------- y_pred : ndarray of shape (n_samples,) or (n_samples, n_outputs) Vector or matrix containing the predictions. In binary and multiclass problems, this is a vector containing `n_samples`. In a multilabel problem, it returns a matrix of shape `(n_samples, n_outputs)`. rM) attributesrLr rr0)r.rMrOrPdecision_functioninverse_transformr7predict)r"r<scoresr8s r?rXz_RidgeClassifierMixin.predict!st *<)=>  ( ( 3 3L A$003a781`. Parameters ---------- alpha : float, default=1.0 Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to ``1 / (2C)`` in other linear models such as :class:`~sklearn.linear_model.LogisticRegression` or :class:`~sklearn.svm.LinearSVC`. fit_intercept : bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered). copy_X : bool, default=True If True, X will be copied; else, it may be overwritten. max_iter : int, default=None Maximum number of iterations for conjugate gradient solver. The default value is determined by scipy.sparse.linalg. tol : float, default=1e-4 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for each solver: - 'svd': `tol` has no impact. - 'cholesky': `tol` has no impact. - 'sparse_cg': norm of residuals smaller than `tol`. - 'lsqr': `tol` is set as atol and btol of scipy.sparse.linalg.lsqr, which control the norm of the residual vector in terms of the norms of matrix and coefficients. - 'sag' and 'saga': relative change of coef smaller than `tol`. - 'lbfgs': maximum of the absolute (projected) gradient=max|residuals| smaller than `tol`. .. versionchanged:: 1.2 Default value changed from 1e-3 to 1e-4 for consistency with other linear models. class_weight : dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. solver : {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto' Solver to use in the computational routines: - 'auto' chooses the solver automatically based on the type of data. - 'svd' uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than 'cholesky' at the cost of being slower. - 'cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution. - 'sparse_cg' uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than 'cholesky' for large-scale data (possibility to set `tol` and `max_iter`). - 'lsqr' uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure. - 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses its unbiased and more flexible version named SAGA. Both methods use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. .. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver. - 'lbfgs' uses L-BFGS-B algorithm implemented in `scipy.optimize.minimize`. It can be used only when `positive` is True. positive : bool, default=False When set to ``True``, forces the coefficients to be positive. Only 'lbfgs' solver is supported in this case. random_state : int, RandomState instance, default=None Used when ``solver`` == 'sag' or 'saga' to shuffle the data. See :term:`Glossary ` for details. Attributes ---------- coef_ : ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function. ``coef_`` is of shape (1, n_features) when the given problem is binary. intercept_ : float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 if ``fit_intercept = False``. n_iter_ : None or ndarray of shape (n_targets,) Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None. classes_ : ndarray of shape (n_classes,) The classes labels. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 solver_ : str The solver that was used at fit time by the computational routines. .. versionadded:: 1.5 See Also -------- Ridge : Ridge regression. RidgeClassifierCV : Ridge classifier with built-in cross validation. Notes ----- For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge. Examples -------- >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import RidgeClassifier >>> X, y = load_breast_cancer(return_X_y=True) >>> clf = RidgeClassifier().fit(X, y) >>> clf.score(X, y) 0.9595... rQbalancedNrTrIrF)rurrfrgrQrrrc Bt ||||||||| ||_yr6r7r#rQ) r"rerurrfrgrQrrrr8s r?r#zRidgeClassifier.__init__s9 '%  )rArcr|j||||j\}}}}t| ||||S)asFit Ridge classifier model. Parameters ---------- X : {ndarray, sparse matrix} of shape (n_samples, n_features) Training data. y : ndarray of shape (n_samples,) Target values. sample_weight : float or ndarray of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. .. versionadded:: 0.17 *sample_weight* support to RidgeClassifier. Returns ------- self : object Instance of the estimator. r=)rSrr7r,)r"r<rdrrRr8s r?r,zRidgeClassifier.fits?0"&!3!3Aq-!U1mQ  Aq 6 rAct|}|jdk7xr|jdk7xs |j |j_|S)Nrr)r7r?rrurAr rBs r?r?z RidgeClassifier.__sklearn_tags__%sNw')"&++"6" KK: % ?T-?-?)?  rAr-r8)rr.r/rDrrr0r"r1r#rr,r?rErFs@r?rcrcEsbH$  + +$z:,7>$D) )2568rArccP|dvr|S|jd|jdkDryy)N)eigenrrr0rrjrE)r<gcv_modes r?_check_gcv_moderm-s0## wwqzAGGAJ rAcxtj|j|}tj|}|S)aFind the column of vectors that is most aligned with the query. Both query and the columns of vectors must have their l2 norm equal to 1. Parameters ---------- query : ndarray of shape (n_samples,) Normalized query vector. vectors : ndarray of shape (n_samples, n_features) Vectors to which we compare query, as columns. Must be normalized. )r[absr9argmax)queryvectors abs_cosineindexs r?_find_smallest_angleru7s. '*+J IIj !E LrAc4eZdZdZfdZdZdZdZxZS)_X_CenterStackOpzBehaves as centered and scaled X with an added intercept column. This operator behaves as np.hstack([X - sqrt_sw[:, None] * X_mean, sqrt_sw[:, None]]) c|j\}}t| |j||dzf||_||_||_yNr0rEr7r#rLr<X_meansqrt_swr"r<r{r|rjrkr8s r?r#z_X_CenterStackOp.__init__PsB ! : 9j1n"=>  rAc|j}t|j|ddd|j|jj |ddzz |d|jzzSNrTr)rr%r<r|r{r9r"vs r?_matvecz_X_CenterStackOp._matvecWsa GGI DFFAcrF >llT[[__QsV44 5edll" # rAct|j|ddd|jdddf|jj |ddzz |d|jdddfzzSr)r%r<r|r{r9rs r?_matmatz_X_CenterStackOp._matmat_sh DFFAcrF >ll1d7#dkkooaf&== >edll1d7++ , rAcXt|j|j|jSr8)_XT_CenterStackOpr<r{r|r\s r? _transposez_X_CenterStackOp._transposefs dllCCrA) rr.r/rDr#rrrrErFs@r?rwrwIs   DrArwc.eZdZdZfdZdZdZxZS)rzBehaves as transposed centered and scaled X with an intercept column. This operator behaves as np.hstack([X - sqrt_sw[:, None] * X_mean, sqrt_sw[:, None]]).T c|j\}}t| |j|dz|f||_||_||_yryrzr}s r?r#z_XT_CenterStackOp.__init__qsB ! : :>9"=>  rAc||j}|jd}tj||jj }t |jj|d|j|jj|zz |ddtj||j|d<|S)NrrKTrr) rrEr[r_r<rLr%rCr{r|r9r"rrkrs r?rz_XT_CenterStackOp._matvecxs GGIZZ] hhz6"46688QTB KK$,,**1- - CR&&DLL)B rAc|jd}tj||jdf|jj}t |jj |d|jdddf|jj|zz |ddtj|j||d<|S)Nrr0rKTrr) rEr[r_r<rLr%rCr{r|r9rs r?rz_XT_CenterStackOp._matmatsZZ] hh AGGAJ/tvv||D"46688QTBT[[ tGF LL  Q F  CR&&q)B rA)rr.r/rDr#rrrErFs@r?rrjs rArceZdZdZdZdZy)_IdentityRegressorz9Fake regressor which will directly output the prediction.c|Sr8rSr" y_predicts r?rVz$_IdentityRegressor.decision_functionrAc|Sr8rSrs r?rXz_IdentityRegressor.predictrrAN)rr.r/rDrVrXrSrAr?rrsCrArceZdZdZdZdZy)_IdentityClassifierzFake classifier which will directly output the prediction. We inherit from LinearClassifierMixin to get the proper shape for the output `y`. c||_yr8)r[)r"classess r?r#z_IdentityClassifier.__init__s  rAc|Sr8rSrs r?rVz%_IdentityClassifier.decision_functionrrAN)rr.r/rDr#rVrSrAr?rrs  rArc eZdZdZ ddddddddddZedZedZd Zd Z d Z d Z d Z dZ dZdZdZdZdZddZdZdZfdZxZS) _RidgeGCVa Ridge regression with built-in Leave-one-out Cross-Validation. This class is not intended to be used directly. Use RidgeCV instead. `_RidgeGCV` uses a Generalized Cross-Validation for model selection. It's an efficient approximation of leave-one-out cross-validation (LOO-CV), where instead of computing multiple models by excluding one data point at a time, it uses an algebraic shortcut to approximate the LOO-CV error, making it faster and computationally more efficient. Using a naive grid-search approach with a leave-one-out cross-validation in contrast requires to fit `n_samples` models to compute the prediction error for each sample and then to repeat this process for each alpha in the grid. Here, the prediction error for each sample is computed by solving a **single** linear system (in other words a single model) via a matrix factorization (i.e. eigendecomposition or SVD) solving the problem stated in the Notes section. Finally, we need to repeat this process for each alpha in the grid. The detailed complexity is further discussed in Sect. 4 in [1]. This algebraic approach is only applicable for regularized least squares problems. It could potentially be extended to kernel ridge regression. See the Notes section and references for more details regarding the formulation and the linear system that is solved. Notes ----- We want to solve (K + alpha*Id)c = y, where K = X X^T is the kernel matrix. Let G = (K + alpha*Id). Dual solution: c = G^-1y Primal solution: w = X^T c Compute eigendecomposition K = Q V Q^T. Then G^-1 = Q (V + alpha*Id)^-1 Q^T, where (V + alpha*Id) is diagonal. It is thus inexpensive to inverse for many alphas. Let loov be the vector of prediction values for each example when the model was fitted with all examples but this example. loov = (KG^-1Y - diag(KG^-1)Y) / diag(I-KG^-1) Let looe be the vector of prediction errors for each example when the model was fitted with all examples but this example. looe = y - loov = c / diag(G^-1) The best score (negative mean squared error or user-provided scoring) is stored in the `best_score_` attribute, and the selected hyperparameter in `alpha_`. References ---------- [1] http://cbcl.mit.edu/publications/ps/MIT-CSAIL-TR-2007-025.pdf [2] https://www.mit.edu/~9.520/spring07/Classes/rlsslides.pdf TNF)ruscoringrrlstore_cv_resultsis_clfalpha_per_targetct||_||_||_||_||_||_||_||_yr8)alphasrurrrlrrr) r"rrurrrlrrrs r?r#z_RidgeGCV.__init__s? *    0 0rAc2||dzzjdS)Nr raxis)sum)v_primeQs r? _decomp_diagz_RidgeGCV._decomp_diags !Q$###,,rAct|jdkDr:|tdftjft|jdz zz}||zSry)rrEslicer[r)DBs r? _diag_dotz_RidgeGCV._diag_dotsJ qww##:(:XXaggaj8F"1acc=vE EGGAJ %00 aLI 6 *--a0 &z: )gkk'222q$w/&!##D"QQ'7+bffVV.DD Aqss 6 ?$ F O   rAc|jsHtj|jd|j}t |j |d|fS|jd}tj|df||f}|j|}t|d\}}||z|j|z }|j|}t |j |d|tj||zz |fS)a\Computes covariance matrix X^TX with possible centering. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) The preprocessed design matrix. sqrt_sw : ndarray of shape (n_samples,) square roots of sample weights Returns ------- covariance : ndarray of shape (n_features, n_features) The covariance matrix. X_mean : ndarray of shape (n_feature,) The weighted mean of ``X`` for each feature. Notes ----- Since X is sparse it has not been centered in preprocessing, but it has been scaled by sqrt(sample weights). When self.fit_intercept is False no centering is done. The centered X is never actually computed because centering would break the sparsity of X. r0rKTrrrkr) rur[rrErLr%rCr rr9r,r) r"r<r|r{rjrrr weight_sums r?_compute_covariancez_RidgeGCV._compute_covariance9s8!!XXaggaj8F"133=vE EGGAJ %00 aLI 6 *--a0 &z: )#gkk'&::[[) ACC 6288FF33 4   rAc |x}}|jd}tj|jd|j}t d|jd|D]} t | t |jd| |zd} tj|| jd|jd|jzf|j} |jr9|| j||| dddfzz | ddddf<|| | dddf<n|| j} | j|| zjd|| <|S)a9Compute the diagonal of (X - X_mean).dot(A).dot((X - X_mean).T) without explicitly centering X nor computing X.dot(A) when X is sparse. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) A : ndarray of shape (n_features, n_features) X_mean : ndarray of shape (n_features,) sqrt_sw : ndarray of shape (n_features,) square roots of sample weights Returns ------- diag : np.ndarray, shape (n_samples,) The computed diagonal. r0rrKNrr) rEr[r_rLr`rminrutoarrayr9r) r"r<rr{r| intercept_colscale batch_sizediagstartbatchX_batchs r?_sparse_multidot_diagz_RidgeGCV._sparse_multidot_diagisG*!(' WWQZ xx !''21aggaj*5 AE%QWWQZ1C!DaHEhh5"AGGAJ1C1C$CDAGGG!!"#E("2"2"4ve QPTW@U7U"U3B3!.u!52E(**,";;q>G388a8@DK A rAc|j||\}}|jr|tj||z }t j |\}}tj |j|}||||fS)z?Eigendecomposition of X.X^T, used when n_samples <= n_features.)rrur[rreighr9rC) r"r<rdr|rr{eigvalsrQT_ys r?_eigen_decompose_gramz_RidgeGCV._eigen_decompose_gramsm&&q'2 6    '7+ +A[[^ vvacc1~w4''rAcfd||zz }|jr3|tjj|z } t | |} d|| <tj ||j ||} |j||} t|jdk7r| ddtjf} | | fS)zCompute dual coefficients and diagonal of G^-1. Used when we have a decomposition of X.X^T (n_samples <= n_features). rrr0N rur[rnormrur9rrrrEr) r"rerdr|r{rrrr normalized_sw intercept_dimcG_inverse_diags r?_solve_eigen_gramz_RidgeGCV._solve_eigen_grams 7U? #    $biinnW&==M0BM Am  FF1dnnQ- .**1a0 qww<1 +ArzzM:Nq  rAc|j\}}tj|dz|dzf|j}|j ||\|ddddf<}|j s |ddddf}n"d|d<d|dddf<|j ||d<td||z }tj|\} } | |d} | dd|df} || | |fS)z_Eigendecomposition of X^T.X, used when n_samples > n_features and X is sparse. r0rKNrr)rr) rEr[r_rLrrur9rrr) r"r<rdr|rjrkcovr{ nullspace_dimrVs r?_eigen_decompose_covariancez%_RidgeGCV._eigen_decompose_covariances!" :hh Q Q7qwwG $ 8 8G DCRC"H v!!crc3B3h-CCGC2J!++g.CKAzI56 [[% -.) a w1$$rAc^d||zz }||zj|j} | jt|j|d} t|| d} |j|| ||} t |j dk7r| ddt jf} d| z |z || z |z fS)zCompute dual coefficients and diagonal of G^-1. Used when we have a decomposition of X^T.X (n_samples > n_features and X is sparse), and not fitting an intercept. r0TrN)r9rCr%rrrEr[r) r"rerdr|r{rrr<rrAXyy_hathat_diags r?$_solve_eigen_covariance_no_interceptz._RidgeGCV._solve_eigen_covariance_no_intercepts 5 ! UKK eeOACC>?3T:--aFGD qww<1 2:: .HH %E U':::rActj|jd}d|d<t||} d||zz } d|| z | | <|| zj |j } t |||} | j | j j |} | j | }|j|| ||}t|jdk7r|ddtjf}d|z |z ||z |z fS)zCompute dual coefficients and diagonal of G^-1. Used when we have a decomposition of X^T.X (n_samples > n_features and X is sparse), and we are fitting an intercept. rr0rN) r[rrErur9rCrwrrr)r"rerdr|r{rrr< intercept_svrrrX_oprrrs r?!_solve_eigen_covariance_interceptz+_RidgeGCV._solve_eigen_covariance_interceptsxx +  R,\1= 5 !w}55- UKK 673eeDFFJJqM" --aFGD qww<1 2:: .HH %E U':::rAc v|jr|j|||||||S|j|||||||S)zCompute dual coefficients and diagonal of G^-1. Used when we have a decomposition of X^T.X (n_samples > n_features and X is sparse). )rurr)r"rerdr|r{rrr<s r?_solve_eigen_covariancez!_RidgeGCV._solve_eigen_covariancesS   99q'67Aq 88 1gvw1  rAcBtj|jd|j}|jr |dddf}tj ||f}t j|d\}}}|dz} tj|j|} || || fS)Nr0rKrrr ) r[rrErLruhstackrrr9rC) r"r<rdr|r{intercept_columnrsingvalsr singvals_sqUT_ys r?_svd_decompose_design_matrixz&_RidgeGCV._svd_decompose_design_matrix s!''!*AGG4    'q$w/  1./0AAQ78Qk vvacc1~{At++rAc||zdz|dzz }|jr7|tjj|z } t | |} |dz || <tj ||j |||dz|zz} |j|||dzz} t|jdk7r| ddtjf} | | fS)zCompute dual coefficients and diagonal of G^-1. Used when we have an SVD decomposition of X (n_samples > n_features and X is dense). rr0Nr) r"rerdr|r{rrrrrrrrs r?_solve_svd_design_matrixz"_RidgeGCV._solve_svd_design_matrixs E!b (UBY 7   #biinnW&==M0BM!&|Am  FF1dnnQ- .%)q @**1a0E2I> qww<1 +ArzzM:Nq  rAc  t|||gdtjgdd\}}|jr|jrJ|t |||j }tj|j|_|}t|||j|j|\}}}}}t||j} | dk(r|j} |j} nK| dk(rFt!j"|r|j$} |j&} n|j(} |j*} |j,d } |t/|||\}}} n!tj0| |j }  ||| ^}}t3|j,d k(rd n|j,d }tj4|jd k(rd nt3|j}|j6r+tj8| |z|f|j |_d \}}}t=tj>|jD]\}} tA||| |g|\}}|jBH||z d z}|jE| }|j6r|jG|j:dd|f<n|||z z }|!|j4d kDr || dddfz}n|| z}||z }|j6r!|jG|j:dd|f<|xsi}|jI||||jB|}|H|jr4|d kDr/|}tj>|}tjJ||}8|}|}|}@|jr)|d kDr$||kD}|dd|f|dd|f<||||<|||<u||kDs||||}}}||_&||_'||_(tS|jPjT||_+|j4d k(s|j,d d k(r|jVjG|_+t!j"|r||z}n|||zz }|jY||||j6rBt3|j,d k(r| |f}n| ||f}|j:j[||_|S)aFit Ridge regression model with gcv. Parameters ---------- X : {ndarray, sparse matrix} of shape (n_samples, n_features) Training data. Will be cast to float64 if necessary. y : ndarray of shape (n_samples,) or (n_samples, n_targets) Target values. Will be cast to float64 if necessary. sample_weight : float or ndarray of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. Note that the scale of `sample_weight` has an impact on the loss; i.e. multiplying all weights by `k` is equivalent to setting `alpha / k`. score_params : dict, default=None Parameters to be passed to the underlying scorer. .. versionadded:: 1.5 See :ref:`Metadata Routing User Guide ` for more details. Returns ------- self : object rT)rrLr;r<NrKr%rjrrr0)NNNr )squared_errors) predictionsrdn_yscorer score_params).r/r[rrrr-rLrrr3rurrmrlrrr r{rrrrrEr4r\rrrr_ cv_results_rrfloatr_score_without_scorerr_scoreralpha_ best_score_ dual_coef_r%rCrr)r)r"r<rdrr unscaled_yr=r*rirl decomposerrjr|r{ decompositionrn_alphas best_coef best_score best_alpharnrerrr alpha_scorer to_updatecv_results_shapes r?r,z _RidgeGCV.fit-s8  /::, 1KKD$9$9::  $0QMjj-  ,< ,,' - )1h'#1dmm4 w 22I**E  q! << 44 == 55GGAJ  $)!Q >MAq'ggiqww7G!*1a!9qww<1$a!''!* ,11s4;;7G  !xxS((C177SD ,<) :z!"-- "<=0 NHAu %eElAw W W NA||#"#n"4!:"888W ((-;-A-A-CD$$QT*1~#56  ,"''!+#wq$w'77 #w. x' ((-8->->-@D$$QT*+1r "kk + <Performs scoring using squared errors when the scorer is None.rr)rmean)r"rrs r?rz_RidgeGCV._score_without_scorers@  $))q)11F %))++F rAc l|jr   ,!(]~>rArceZdZUdeedddgdgeeeedgdgehddgdgdgd Z e e d < dd dddd d d dZ ddZ dZdZfdZxZS) _BaseRidgeCVrrNrrr cv_object>rrrjrrurcvrlrrrTF)rurrrlrrcf||_||_||_||_||_||_||_yr8r)r"rrurrrlrrs r?r#z_BaseRidgeCV.__init__ s8 *   0 0rAc *t||d|j}|j}|"ttt j dd}n!ttt j dd}t|jtjttfrtj|jdk(rdnt|j}|dk7r+t|jD]\} } || d | d } n#||jdd |jd<tj |j} |||d <|t#rt%|dfi|} n1t't'i } ||| j(j*d <|j,d}t/| |j0||j2|j4t7||j8} | j;|||| j(j*| j<|_| j>|_|j4r| j@|_ n|j4r tCd|j8r tCdd| i}tEjF|rdnd}t7|rtHntJ}||j0|} t#r| jMdtO| |||}|j:||fi||jP} |jPjR|_|j>|_| jT|_*| jV|_+| jX|_,t[| dr| j\|_.|S)aFit Ridge regression model with cv. Parameters ---------- X : ndarray of shape (n_samples, n_features) Training data. If using GCV, will be cast to float64 if necessary. y : ndarray of shape (n_samples,) or (n_samples, n_targets) Target values. Will be cast to X's dtype if necessary. sample_weight : float or ndarray of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. **params : dict, default=None Extra parameters for the underlying scorer. .. versionadded:: 1.5 Only available if `enable_metadata_routing=True`, which can be set by using ``sklearn.set_config(enable_metadata_routing=True)``. See :ref:`Metadata Routing User Guide ` for more details. Returns ------- self : object Fitted estimator. Notes ----- When sample_weight is provided, the selected hyperparameter may depend on whether we use leave-one-out cross-validation (cv=None) or another form of cross-validation, because only leave-one-out cross-validation takes the sample weights into account when computing the validation score. r,Nrrrrrr0zalphas[]rr)score)r)rurrlrrr)rrz3cv!=None and store_cv_results=True are incompatiblez3cv!=None and alpha_per_target=True are incompatiblererr)rurTr=)rrfeature_names_in_)/r)r _get_scorerrrrrrrr[rlisttuplerrrrr*r+rrrrrrurlrrrr,rrrrar r{rcr4set_fit_requestrbest_estimator_rerr'n_features_in_hasattrr)r"r<rdrr+rrcheck_scalar_alpharrtrer routed_params estimator parametersrmodel grid_searchs r?r,z_BaseRidgeCV.fit s;N &$. WW!!# :!(#LL#, " ")#LL#) "  dkkBJJe#< =GGDKK0A5q3t{{;KH1}$-dkk$:JLE5.uwa6HIEJ"4DKKNH!M ADKK(  $&3F? # :! /!! !&U_ =  ,BOM((..?||#!"00!%!6!6$T*!%!6!6I MM+*1177   $**DK(44D $$#,#8#8 $$ !VWW$$ !VWW!6*J$*OOA$6[FF'4T':OE"00I !)))=& K KOOAq +F +#33I%55;;DK*66D __ #..'66 91 2%.%@%@D " rAc4t|jjj|j |j t j ddj |jt j dd}|S)ajGet metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. .. versionadded:: 1.5 Returns ------- routing : MetadataRouter A :class:`~sklearn.utils.metadata_routing.MetadataRouter` encapsulating routing information. )ownerr,r)callercallee)rmethod_mappingsplit)splitterr')r'r8radd_self_requestaddrr(r)r"routers r?get_metadata_routingz!_BaseRidgeCV.get_metadata_routing s !8!8 9  d # S||,22%2PS,22%2P  rAct||jd}tr|j|jd|S)NT)rr allow_noner=)rrr*set_score_request)r"rs r?rz_BaseRidgeCV._get_scorer s<t||PTU  $,,"6  $ $4 $ 8 rAcFt|}d|j_|Sr^)r7r?rAr rBs r?r?z_BaseRidgeCV.__sklearn_tags__ s!w')!% rAr r8)rr.r/r!rr"setrcallablerr0r1r#r,r-rr?rErFs@r?r r s$4 !JK#s#3#5674Hm 894@&K&K$D 1 1&L\8rAr c<eZdZdZeddfd ZxZS)RidgeCVaRidge regression with built-in cross-validation. See glossary entry for :term:`cross-validation estimator`. By default, it performs efficient Leave-One-Out Cross-Validation. Read more in the :ref:`User Guide `. Parameters ---------- alphas : array-like of shape (n_alphas,), default=(0.1, 1.0, 10.0) Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to ``1 / (2C)`` in other linear models such as :class:`~sklearn.linear_model.LogisticRegression` or :class:`~sklearn.svm.LinearSVC`. If using Leave-One-Out cross-validation, alphas must be strictly positive. fit_intercept : bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered). scoring : str, callable, default=None The scoring method to use for cross-validation. Options: - str: see :ref:`scoring_string_names` for options. - callable: a scorer callable object (e.g., function) with signature ``scorer(estimator, X, y)``. See :ref:`scoring_callable` for details. - `None`: negative :ref:`mean squared error ` if cv is None (i.e. when using leave-one-out cross-validation), or :ref:`coefficient of determination ` (:math:`R^2`) otherwise. cv : int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the efficient Leave-One-Out cross-validation - integer, to specify the number of folds. - :term:`CV splitter`, - An iterable yielding (train, test) splits as arrays of indices. For integer/None inputs, if ``y`` is binary or multiclass, :class:`~sklearn.model_selection.StratifiedKFold` is used, else, :class:`~sklearn.model_selection.KFold` is used. Refer :ref:`User Guide ` for the various cross-validation strategies that can be used here. gcv_mode : {'auto', 'svd', 'eigen'}, default='auto' Flag indicating which strategy to use when performing Leave-One-Out Cross-Validation. Options are:: 'auto' : use 'svd' if n_samples > n_features, otherwise use 'eigen' 'svd' : force use of singular value decomposition of X when X is dense, eigenvalue decomposition of X^T.X when X is sparse. 'eigen' : force computation via eigendecomposition of X.X^T The 'auto' mode is the default and is intended to pick the cheaper option of the two depending on the shape of the training data. store_cv_results : bool, default=False Flag indicating if the cross-validation values corresponding to each alpha should be stored in the ``cv_results_`` attribute (see below). This flag is only compatible with ``cv=None`` (i.e. using Leave-One-Out Cross-Validation). .. versionchanged:: 1.5 Parameter name changed from `store_cv_values` to `store_cv_results`. alpha_per_target : bool, default=False Flag indicating whether to optimize the alpha value (picked from the `alphas` parameter list) for each target separately (for multi-output settings: multiple prediction targets). When set to `True`, after fitting, the `alpha_` attribute will contain a value for each target. When set to `False`, a single alpha is used for all targets. .. versionadded:: 0.24 Attributes ---------- cv_results_ : ndarray of shape (n_samples, n_alphas) or shape (n_samples, n_targets, n_alphas), optional Cross-validation values for each alpha (only available if ``store_cv_results=True`` and ``cv=None``). After ``fit()`` has been called, this attribute will contain the mean squared errors if `scoring is None` otherwise it will contain standardized per point prediction values. .. versionchanged:: 1.5 `cv_values_` changed to `cv_results_`. coef_ : ndarray of shape (n_features) or (n_targets, n_features) Weight vector(s). intercept_ : float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 if ``fit_intercept = False``. alpha_ : float or ndarray of shape (n_targets,) Estimated regularization parameter, or, if ``alpha_per_target=True``, the estimated regularization parameter for each target. best_score_ : float or ndarray of shape (n_targets,) Score of base estimator with best alpha, or, if ``alpha_per_target=True``, a score for each target. .. versionadded:: 0.23 n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 See Also -------- Ridge : Ridge regression. RidgeClassifier : Classifier based on ridge regression on {-1, 1} labels. RidgeClassifierCV : Ridge classifier with built-in cross validation. Examples -------- >>> from sklearn.datasets import load_diabetes >>> from sklearn.linear_model import RidgeCV >>> X, y = load_diabetes(return_X_y=True) >>> clf = RidgeCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y) >>> clf.score(X, y) 0.5166... Trc .t|||fd|i||S)aFit Ridge regression model with cv. Parameters ---------- X : ndarray of shape (n_samples, n_features) Training data. If using GCV, will be cast to float64 if necessary. y : ndarray of shape (n_samples,) or (n_samples, n_targets) Target values. Will be cast to X's dtype if necessary. sample_weight : float or ndarray of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. **params : dict, default=None Parameters to be passed to the underlying scorer. .. versionadded:: 1.5 Only available if `enable_metadata_routing=True`, which can be set by using ``sklearn.set_config(enable_metadata_routing=True)``. See :ref:`Metadata Routing User Guide ` for more details. Returns ------- self : object Fitted estimator. Notes ----- When sample_weight is provided, the selected hyperparameter may depend on whether we use leave-one-out cross-validation (cv=None) or another form of cross-validation, because only leave-one-out cross-validation takes the sample weights into account when computing the validation score. r)r7r,)r"r<rdrr+r8s r?r,z RidgeCV.fit^ s#P  Aq@ @@ rAr8)rr.r/rDrr,rErFs@r?r5r5 s#HT5(6(rAr5ceZdZUdZiej deedhdgiZeed<dD]Z eje  d dddddd fd Z e d dfd Z xZS)RidgeClassifierCVaRidge classifier with built-in cross-validation. See glossary entry for :term:`cross-validation estimator`. By default, it performs Leave-One-Out Cross-Validation. Currently, only the n_features > n_samples case is handled efficiently. Read more in the :ref:`User Guide `. Parameters ---------- alphas : array-like of shape (n_alphas,), default=(0.1, 1.0, 10.0) Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to ``1 / (2C)`` in other linear models such as :class:`~sklearn.linear_model.LogisticRegression` or :class:`~sklearn.svm.LinearSVC`. If using Leave-One-Out cross-validation, alphas must be strictly positive. fit_intercept : bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered). scoring : str, callable, default=None The scoring method to use for cross-validation. Options: - str: see :ref:`scoring_string_names` for options. - callable: a scorer callable object (e.g., function) with signature ``scorer(estimator, X, y)``. See :ref:`scoring_callable` for details. - `None`: negative :ref:`mean squared error ` if cv is None (i.e. when using leave-one-out cross-validation), or :ref:`accuracy ` otherwise. cv : int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the efficient Leave-One-Out cross-validation - integer, to specify the number of folds. - :term:`CV splitter`, - An iterable yielding (train, test) splits as arrays of indices. Refer :ref:`User Guide ` for the various cross-validation strategies that can be used here. class_weight : dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. store_cv_results : bool, default=False Flag indicating if the cross-validation results corresponding to each alpha should be stored in the ``cv_results_`` attribute (see below). This flag is only compatible with ``cv=None`` (i.e. using Leave-One-Out Cross-Validation). .. versionchanged:: 1.5 Parameter name changed from `store_cv_values` to `store_cv_results`. Attributes ---------- cv_results_ : ndarray of shape (n_samples, n_targets, n_alphas), optional Cross-validation results for each alpha (only if ``store_cv_results=True`` and ``cv=None``). After ``fit()`` has been called, this attribute will contain the mean squared errors if `scoring is None` otherwise it will contain standardized per point prediction values. .. versionchanged:: 1.5 `cv_values_` changed to `cv_results_`. coef_ : ndarray of shape (1, n_features) or (n_targets, n_features) Coefficient of the features in the decision function. ``coef_`` is of shape (1, n_features) when the given problem is binary. intercept_ : float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 if ``fit_intercept = False``. alpha_ : float Estimated regularization parameter. best_score_ : float Score of base estimator with best alpha. .. versionadded:: 0.23 classes_ : ndarray of shape (n_classes,) The classes labels. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 See Also -------- Ridge : Ridge regression. RidgeClassifier : Ridge classifier. RidgeCV : Ridge regression with built-in cross validation. Notes ----- For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge. Examples -------- >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import RidgeClassifierCV >>> X, y = load_breast_cancer(return_X_y=True) >>> clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y) >>> clf.score(X, y) 0.9630... rQrdNr)rlrTF)rurrrQrc<t||||||||_y)N)rrurrrrf)r"rrurrrQrr8s r?r#zRidgeClassifierCV.__init__ s0 '-  )rArc |j|||d\}}}}|j|n|}t| ||fd|i||S)arFit Ridge classifier with cv. Parameters ---------- X : ndarray of shape (n_samples, n_features) Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features. When using GCV, will be cast to float64 if necessary. y : ndarray of shape (n_samples,) Target values. Will be cast to X's dtype if necessary. sample_weight : float or ndarray of shape (n_samples,), default=None Individual weights for each sample. If given a float, every sample will have the same weight. **params : dict, default=None Parameters to be passed to the underlying scorer. .. versionadded:: 1.5 Only available if `enable_metadata_routing=True`, which can be set by using ``sklearn.set_config(enable_metadata_routing=True)``. See :ref:`Metadata Routing User Guide ` for more details. Returns ------- self : object Fitted estimator. rj)rr)rSrr7r,)r"r<rdrr+rRrr8s r?r,zRidgeClassifierCV.fit& sWH"&!3!3Aq-PW!3!X1mQggo1  AvE]EfE rAr r8)rr.r/rDr rr0r"r1parampopr#rr,rErFs@r?r8r8 sB$  - -$z:,7>$D2*""5)*  ) )&5,6,rAr8)NrIrNNN)NFr8)TNrINNN)NrNrIrFNFFFNNTF)frDrrbabcrr functoolsrrrnumpyr[scipyrr r scipy.sparser] sklearn.baser baser rrr exceptionsrmetricsrrmodel_selectionr preprocessingrutilsrrrrrrutils._array_apirrrrr utils._param_validationr!r"r# utils.extmathr$r% utils.fixesr&utils.metadata_routingr'r(r)r*r+utils.sparsefuncsr,utils.validationr-r.r/_baser1r2r3r4_sagr5rHrtrrrrrrrFrrrrrr4rHrcrmrurwrrrrr r5r8rSrAr?rRs'"**,&PP+5**LK6+3SSVV (  F\   5p,;|     DN%OY-E-E F^4D8,G T4i 8   X  h4?Fq$v67;K'(#&K!{+.#'1>   A54AP     #b4J@ IIXt njtnW1Wte+ZeP$Dv}}33DB 44D / T  T nU;Upt tnI-|IrA