`L iFXdZddlZddlZddlmZddlmZddl m Z m Z ddl m Z ddlmZdd lmZdd lmZd Zd Zd ZdZddZdddZdZdZdZdZdZdZ ddZ!d dZ"dZ#dZ$dZ%dZ&y)!zBA collection of utilities to work with sparse matrices and arrays.N)LinearOperator)_sparse_min_max_sparse_nan_min_max)_check_sample_weight)csc_mean_variance_axis0)csr_mean_variance_axis0)incr_mean_variance_axis0cztj|r |jn t|}d|z}t |)z2Raises a TypeError if X is not a CSR or CSC matrixz,Expected a CSR or CSC sparse matrix, got %s.)spissparseformattype TypeError)X input_typeerrs _/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/utils/sparsefuncs.py_raise_typeerrorrs/[[^aJ 8: EC C.c(|dvrtd|zy)N)rrz8Unknown axis value: %d. Use 0 for rows, or 1 for columns) ValueErroraxiss r_raise_error_wrong_axisr s$ 6 F M  rc|jd|jdk(sJ|xj|j|jdzc_y)aInplace column scaling of a CSR matrix. Scale each feature of the data matrix by multiplying with specific scale provided by the caller assuming a (n_samples, n_features) shape. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix to normalize using the variance of the features. It should be of CSR format. scale : ndarray of shape (n_features,), dtype={np.float32, np.float64} Array of precomputed feature-wise values to use for scaling. Examples -------- >>> from sklearn.utils import sparsefuncs >>> from scipy import sparse >>> import numpy as np >>> indptr = np.array([0, 3, 4, 4, 4]) >>> indices = np.array([0, 1, 2, 2]) >>> data = np.array([8, 1, 2, 5]) >>> scale = np.array([2, 3, 2]) >>> csr = sparse.csr_matrix((data, indices, indptr)) >>> csr.todense() matrix([[8, 1, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) >>> sparsefuncs.inplace_csr_column_scale(csr, scale) >>> csr.todense() matrix([[16, 3, 4], [ 0, 0, 10], [ 0, 0, 0], [ 0, 0, 0]]) rrclip)modeN)shapedatatakeindicesrscales rinplace_csr_column_scaler&'sBJ ;;q>QWWQZ '' 'FFejjj00Frc|jd|jdk(sJ|xjtj|tj|j zc_y)aInplace row scaling of a CSR matrix. Scale each sample of the data matrix by multiplying with specific scale provided by the caller assuming a (n_samples, n_features) shape. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix to be scaled. It should be of CSR format. scale : ndarray of float of shape (n_samples,) Array of precomputed sample-wise values to use for scaling. rN)r r!nprepeatdiffindptrr$s rinplace_csr_row_scaler,PsH ;;q>QWWQZ '' 'FFbiirwwqxx011Frclt|tj|r:|jdk(r+|dk(rt |||St |j ||Stj|r:|jdk(r+|dk(rt |||St |j ||St|y)a{Compute mean and variance along an axis on a CSR or CSC matrix. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Input data. It can be of CSR or CSC format. axis : {0, 1} Axis along which the axis should be computed. weights : ndarray of shape (n_samples,) or (n_features,), default=None If axis is set to 0 shape is (n_samples,) or if axis is set to 1 shape is (n_features,). If it is set to None, then samples are equally weighted. .. versionadded:: 0.24 return_sum_weights : bool, default=False If True, returns the sum of weights seen for each feature if `axis=0` or each sample if `axis=1`. .. versionadded:: 0.24 Returns ------- means : ndarray of shape (n_features,), dtype=floating Feature-wise means. variances : ndarray of shape (n_features,), dtype=floating Feature-wise variances. sum_weights : ndarray of shape (n_features,), dtype=floating Returned if `return_sum_weights` is `True`. Examples -------- >>> from sklearn.utils import sparsefuncs >>> from scipy import sparse >>> import numpy as np >>> indptr = np.array([0, 3, 4, 4, 4]) >>> indices = np.array([0, 1, 2, 2]) >>> data = np.array([8, 1, 2, 5]) >>> scale = np.array([2, 3, 2]) >>> csr = sparse.csr_matrix((data, indices, indptr)) >>> csr.todense() matrix([[8, 1, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) >>> sparsefuncs.mean_variance_axis(csr, axis=0) (array([2. , 0.25, 1.75]), array([12. , 0.1875, 4.1875])) csrr)weightsreturn_sum_weightscscN)rr rr_csr_mean_var_axis0_csc_mean_var_axis0Tr)rrr/r0s rmean_variance_axisr5bslD! {{1~!((e+ 19&77I 'W9K  QAHH- 19&77I 'W9K  r)r/ct|tj|r|jdvs t |t j |dk(r,t j|j||j}t j |t j |cxk(r"t j |k(stdtd|dk(rWt j ||jdk7rtd|jddt j |dt j ||jdk7r2td |jddt j |d|dk(r |jn|}|t|||j}t||||| S) a Compute incremental mean and variance along an axis on a CSR or CSC matrix. last_mean, last_var are the statistics computed at the last step by this function. Both must be initialized to 0-arrays of the proper size, i.e. the number of features in X. last_n is the number of samples encountered until now. Parameters ---------- X : CSR or CSC sparse matrix of shape (n_samples, n_features) Input data. axis : {0, 1} Axis along which the axis should be computed. last_mean : ndarray of shape (n_features,) or (n_samples,), dtype=floating Array of means to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1. last_var : ndarray of shape (n_features,) or (n_samples,), dtype=floating Array of variances to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1. last_n : float or ndarray of shape (n_features,) or (n_samples,), dtype=floating Sum of the weights seen so far, excluding the current weights If not float, it should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1. If float it corresponds to having same weights for all samples (or features). weights : ndarray of shape (n_samples,) or (n_features,), default=None If axis is set to 0 shape is (n_samples,) or if axis is set to 1 shape is (n_features,). If it is set to None, then samples are equally weighted. .. versionadded:: 0.24 Returns ------- means : ndarray of shape (n_features,) or (n_samples,), dtype=floating Updated feature-wise means if axis = 0 or sample-wise means if axis = 1. variances : ndarray of shape (n_features,) or (n_samples,), dtype=floating Updated feature-wise variances if axis = 0 or sample-wise variances if axis = 1. n : ndarray of shape (n_features,) or (n_samples,), dtype=integral Updated number of seen samples per feature if axis=0 or number of seen features per sample if axis=1. If weights is not None, n is a sum of the weights of the seen samples or features instead of the actual number of seen samples or features. Notes ----- NaNs are ignored in the algorithm. Examples -------- >>> from sklearn.utils import sparsefuncs >>> from scipy import sparse >>> import numpy as np >>> indptr = np.array([0, 3, 4, 4, 4]) >>> indices = np.array([0, 1, 2, 2]) >>> data = np.array([8, 1, 2, 5]) >>> scale = np.array([2, 3, 2]) >>> csr = sparse.csr_matrix((data, indices, indptr)) >>> csr.todense() matrix([[8, 1, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) >>> sparsefuncs.incr_mean_variance_axis( ... csr, axis=0, last_mean=np.zeros(3), last_var=np.zeros(3), last_n=2 ... ) (array([1.33, 0.167, 1.17]), array([8.88, 0.139, 3.47]), array([6., 6., 6.])) )r1r.r)dtypez8last_mean, last_var, last_n do not have the same shapes.rzHIf axis=1, then last_mean, last_n, last_var should be of size n_samples z (Got z).zIIf axis=0, then last_mean, last_n, last_var should be of size n_features ) last_meanlast_varlast_nr/)rr rrrr(sizefullr r7rr4r_incr_mean_var_axis0)rrr8r9r:r/s rincr_mean_variance_axisr>sbD! KKNqxx>9 wwv!& H GGI "''("3 Frwwv FSTT GSTT qy 779  +""#''!*VBGGI4F3GrK  779  +##$771:,fRWWY5G4HL  qyaA&wA  Y&' rctj|r&|jdk(rt|j|ytj|r|jdk(r t ||yt |y)aInplace column scaling of a CSC/CSR matrix. Scale each feature of the data matrix by multiplying with specific scale provided by the caller assuming a (n_samples, n_features) shape. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix to normalize using the variance of the features. It should be of CSC or CSR format. scale : ndarray of shape (n_features,), dtype={np.float32, np.float64} Array of precomputed feature-wise values to use for scaling. Examples -------- >>> from sklearn.utils import sparsefuncs >>> from scipy import sparse >>> import numpy as np >>> indptr = np.array([0, 3, 4, 4, 4]) >>> indices = np.array([0, 1, 2, 2]) >>> data = np.array([8, 1, 2, 5]) >>> scale = np.array([2, 3, 2]) >>> csr = sparse.csr_matrix((data, indices, indptr)) >>> csr.todense() matrix([[8, 1, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) >>> sparsefuncs.inplace_column_scale(csr, scale) >>> csr.todense() matrix([[16, 3, 4], [ 0, 0, 10], [ 0, 0, 0], [ 0, 0, 0]]) r1r.N)r rrr,r4r&rr$s rinplace_column_scaler@#sQJ {{1~!((e+acc5) QAHH- E*rctj|r&|jdk(rt|j|ytj|r|jdk(r t ||yt |y)aInplace row scaling of a CSR or CSC matrix. Scale each row of the data matrix by multiplying with specific scale provided by the caller assuming a (n_samples, n_features) shape. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix to be scaled. It should be of CSR or CSC format. scale : ndarray of shape (n_features,), dtype={np.float32, np.float64} Array of precomputed sample-wise values to use for scaling. Examples -------- >>> from sklearn.utils import sparsefuncs >>> from scipy import sparse >>> import numpy as np >>> indptr = np.array([0, 2, 3, 4, 5]) >>> indices = np.array([0, 1, 2, 3, 3]) >>> data = np.array([8, 1, 2, 5, 6]) >>> scale = np.array([2, 3, 4, 5]) >>> csr = sparse.csr_matrix((data, indices, indptr)) >>> csr.todense() matrix([[8, 1, 0, 0], [0, 0, 2, 0], [0, 0, 0, 5], [0, 0, 0, 6]]) >>> sparsefuncs.inplace_row_scale(csr, scale) >>> csr.todense() matrix([[16, 2, 0, 0], [ 0, 0, 6, 0], [ 0, 0, 0, 20], [ 0, 0, 0, 30]]) r1r.N)r rrr&r4r,rr$s rinplace_row_scalerBPsQH {{1~!((e+ e, QAHH-a'rc0||fD]'}t|tjstd|dkr||jdz }|dkr||jdz }|j |k(}||j |j |k(<||j |<y)aKSwap two rows of a CSC matrix in-place. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix whose two rows are to be swapped. It should be of CSC format. m : int Index of the row of X to be swapped. n : int Index of the row of X to be swapped. m and n should be valid integersrN) isinstancer(ndarrayrr r#)rmntm_masks rinplace_swap_row_cscrK|sV@ a $>? ?@ 1u QWWQZ1u QWWQZ YY!^F !AIIaii1nAIIfrc B||fD]'}t|tjstd|dkr||jdz }|dkr||jdz }||kDr||}}|j }||}||dz}||}||dz}||z } ||z } | | k7rE|j |dz|xxx| | z z ccc|| z|j |dz<|| z |j |<tj |jd||j|||j|||j|||j|dg|_tj |jd||j|||j|||j|||j|dg|_y)aKSwap two rows of a CSR matrix in-place. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix whose two rows are to be swapped. It should be of CSR format. m : int Index of the row of X to be swapped. n : int Index of the row of X to be swapped. rDrrrN) rEr(rFrr r+ concatenater#r!) rrGrHrIr+m_startm_stopn_startn_stopnz_mnz_ns rinplace_swap_row_csrrTsV@ a $>? ?@ 1u QWWQZ1u QWWQZ 1u!1 XXFQiG AE]FQiG AE]F G D G D t| Qtd{*!D.Qtm  IIhw  IIgf % IIfW % IIgf % IIfg   AI^^ FF8G  FF76 " FF6' " FF76 " FF67O  AFrctj|r|jdk(rt|||ytj|r|jdk(rt |||yt |y)a Swap two rows of a CSC/CSR matrix in-place. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix whose two rows are to be swapped. It should be of CSR or CSC format. m : int Index of the row of X to be swapped. n : int Index of the row of X to be swapped. Examples -------- >>> from sklearn.utils import sparsefuncs >>> from scipy import sparse >>> import numpy as np >>> indptr = np.array([0, 2, 3, 3, 3]) >>> indices = np.array([0, 2, 2]) >>> data = np.array([8, 2, 5]) >>> csr = sparse.csr_matrix((data, indices, indptr)) >>> csr.todense() matrix([[8, 0, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) >>> sparsefuncs.inplace_swap_row(csr, 0, 1) >>> csr.todense() matrix([[0, 0, 5], [8, 0, 2], [0, 0, 0], [0, 0, 0]]) r1r.N)r rrrKrTrrrGrHs rinplace_swap_rowrWsQJ {{1~!((e+Q1% QAHH-Q1%rc>|dkr||jdz }|dkr||jdz }tj|r|jdk(rt |||ytj|r|jdk(rt |||yt |y)a Swap two columns of a CSC/CSR matrix in-place. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Matrix whose two columns are to be swapped. It should be of CSR or CSC format. m : int Index of the column of X to be swapped. n : int Index of the column of X to be swapped. Examples -------- >>> from sklearn.utils import sparsefuncs >>> from scipy import sparse >>> import numpy as np >>> indptr = np.array([0, 2, 3, 3, 3]) >>> indices = np.array([0, 2, 2]) >>> data = np.array([8, 2, 5]) >>> csr = sparse.csr_matrix((data, indices, indptr)) >>> csr.todense() matrix([[8, 0, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) >>> sparsefuncs.inplace_swap_column(csr, 0, 1) >>> csr.todense() matrix([[0, 8, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) rrr1r.N)r r rrrTrKrrVs rinplace_swap_columnrYsJ 1u QWWQZ1u QWWQZ {{1~!((e+Q1% QAHH-Q1%rctj|r*|jdvr|r t||St ||St |y)aCompute minimum and maximum along an axis on a CSR or CSC matrix. Optionally ignore NaN values. Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Input data. It should be of CSR or CSC format. axis : {0, 1} Axis along which the axis should be computed. ignore_nan : bool, default=False Ignore or passing through NaN values. .. versionadded:: 0.20 Returns ------- mins : ndarray of shape (n_features,), dtype={np.float32, np.float64} Feature-wise minima. maxs : ndarray of shape (n_features,), dtype={np.float32, np.float64} Feature-wise maxima. )r.r1rN)r rrrrr)rr ignore_nans r min_max_axisr\6s?6 {{1~!((n4 &qt4 4"140 0rc|dk(rd}n;|dk(rd}n3|jdk7r$tdj|j|A| |jStjtj |j |S|dk(r7tj |j }||jdS||zS|dk(r|.tj|j|jdStj|tj |j }tj|j|jd| Std j|) aA variant of X.getnnz() with extension to weighting on axis 0. Useful in efficiently calculating multilabel metrics. Parameters ---------- X : sparse matrix of shape (n_samples, n_labels) Input data. It should be of CSR format. axis : {0, 1}, default=None The axis on which the data is aggregated. sample_weight : array-like of shape (n_samples,), default=None Weight for each row of X. Returns ------- nnz : int, float, ndarray of shape (n_samples,) or ndarray of shape (n_features,) Number of non-zero values in the array along a given axis. Otherwise, the total number of non-zero values in the array is returned. rrr.z#Expected CSR sparse format, got {0}intp) minlength)rar/zUnsupported axis: {0}) rrnnzr(dotr*r+astypebincountr#r r)r)rr sample_weightoutr/s r count_nonzerorhZs-, rz  U =DDQXXNOO  |  55L66"''!((+]; ; ggahh  ::f% %]""   ;;qyyAGGAJ? ?ii rwwqxx/@AG;;qyyAGGAJP P077=>>rct||z}|stjStj|dk}t |d\}}|j |rt ||||St |dz |||t ||||zdz S)zCompute the median of data with n_zeros additional zeros. This function is used to support sparse matrices; it modifies data in-place. rrrg@)lenr(nanrhdivmodsort_get_elem_at_rank)r!n_zerosn_elems n_negativemiddleis_odds r _get_medianrts $i'!G vv !!$(+JGQ'NFFIIK  z7CC &1*dJ@ FD*g > ?   rc8||kr||S||z |kry|||z S)z@Find the value in data augmented with n_zeros for the given rankr)rankr!rqros rrnrns3 jDz j7" w rctj|r|jdk(std|jz|j}|j \}}t j|}ttj|D]H\}\}}t j|j||}||jz } t|| ||<J|S)aCFind the median across axis 0 of a CSC matrix. It is equivalent to doing np.median(X, axis=0). Parameters ---------- X : sparse matrix of shape (n_samples, n_features) Input data. It should be of CSC format. Returns ------- median : ndarray of shape (n_features,) Median. r1z%Expected matrix of CSC format, got %s)r rrrr+r r(zeros enumerate itertoolspairwisecopyr!r;rt) rr+ n_samples n_featuresmedianf_indstartendr!nzs rcsc_median_axis_0rs KKNqxx50?!((JKK XXFGGIz XXj !F(););F)CD.|swwqvveC()  "#D"-u . MrcdddfjtfdfdfdfdjjS)aACreate an implicitly offset linear operator. This is used by PCA on sparse data to avoid densifying the whole data matrix. Params ------ X : sparse matrix of shape (n_samples, n_features) offset : ndarray of shape (n_features,) Returns ------- centered : LinearOperator Nc|z|zz SNrvxroffsets rz)_implicit_column_offset..Q!+rc|z|zz Srrvrs rrz)_implicit_column_offset..rrc6|z|jzz Sr)sumrXTrs rrz)_implicit_column_offset..s"q&FQUUW$45rc\|zj|jddddfzz S)Nrr)r4rrs rrz)_implicit_column_offset..s,"q&688aeeemD!G.D#DDr)matvecmatmatrmatvecrmatmatr7r )r4rr7r )rrrs``@r_implicit_column_offsetrsBD!G_F B ++5Dgggg  r)NF)F)NN)'__doc__r{numpyr( scipy.sparsesparser scipy.sparse.linalgr utils.fixesrrutils.validationrsparsefuncs_fastr r3r r2r r=rrr&r,r5r>r@rBrKrTrWrYr\rhrtrnrrrvrrrsH .>3  &1R2$K\NRpf*Z)X:<~*Z.b!H3?l , >r