`L iMdZddlZddlZddlmZmZddlmZddlZ ddl m Z ddl m Z mZmZmZddlmZdd lmZdd lmZdd lmZdd lmZmZdd lmZddlmZddl m!Z!ddl"m#Z#ddl$m%Z%m&Z&m'Z'Gddeee Z(y)z! Neighborhood Component Analysis N)IntegralReal)warn)minimize) BaseEstimatorClassNamePrefixFeaturesOutMixinTransformerMixin _fit_context)PCA)ConvergenceWarning)pairwise_distances) LabelEncoder)Interval StrOptions)softmax)"_get_additional_lbfgs_options_dict)check_classification_targets)check_random_state) check_arraycheck_is_fitted validate_datac eZdZUdZeeddddgehdejgdgeedddgee dddge dgd gd gd Z e e d < dd ddddddddZeddZdZdZdZddZfdZedZxZS)NeighborhoodComponentsAnalysisaNeighborhood Components Analysis. Neighborhood Component Analysis (NCA) is a machine learning algorithm for metric learning. It learns a linear transformation in a supervised fashion to improve the classification accuracy of a stochastic nearest neighbors rule in the transformed space. Read more in the :ref:`User Guide `. Parameters ---------- n_components : int, default=None Preferred dimensionality of the projected space. If None it will be set to `n_features`. init : {'auto', 'pca', 'lda', 'identity', 'random'} or ndarray of shape (n_features_a, n_features_b), default='auto' Initialization of the linear transformation. Possible options are `'auto'`, `'pca'`, `'lda'`, `'identity'`, `'random'`, and a numpy array of shape `(n_features_a, n_features_b)`. - `'auto'` Depending on `n_components`, the most reasonable initialization is chosen. If `n_components <= min(n_features, n_classes - 1)` we use `'lda'`, as it uses labels information. If not, but `n_components < min(n_features, n_samples)`, we use `'pca'`, as it projects data in meaningful directions (those of higher variance). Otherwise, we just use `'identity'`. - `'pca'` `n_components` principal components of the inputs passed to :meth:`fit` will be used to initialize the transformation. (See :class:`~sklearn.decomposition.PCA`) - `'lda'` `min(n_components, n_classes)` most discriminative components of the inputs passed to :meth:`fit` will be used to initialize the transformation. (If `n_components > n_classes`, the rest of the components will be zero.) (See :class:`~sklearn.discriminant_analysis.LinearDiscriminantAnalysis`) - `'identity'` If `n_components` is strictly smaller than the dimensionality of the inputs passed to :meth:`fit`, the identity matrix will be truncated to the first `n_components` rows. - `'random'` The initial transformation will be a random array of shape `(n_components, n_features)`. Each value is sampled from the standard normal distribution. - numpy array `n_features_b` must match the dimensionality of the inputs passed to :meth:`fit` and n_features_a must be less than or equal to that. If `n_components` is not `None`, `n_features_a` must match it. warm_start : bool, default=False If `True` and :meth:`fit` has been called before, the solution of the previous call to :meth:`fit` is used as the initial linear transformation (`n_components` and `init` will be ignored). max_iter : int, default=50 Maximum number of iterations in the optimization. tol : float, default=1e-5 Convergence tolerance for the optimization. callback : callable, default=None If not `None`, this function is called after every iteration of the optimizer, taking as arguments the current solution (flattened transformation matrix) and the number of iterations. This might be useful in case one wants to examine or store the transformation found after each iteration. verbose : int, default=0 If 0, no progress messages will be printed. If 1, progress messages will be printed to stdout. If > 1, progress messages will be printed and the `disp` parameter of :func:`scipy.optimize.minimize` will be set to `verbose - 2`. random_state : int or numpy.RandomState, default=None A pseudo random number generator object or a seed for it if int. If `init='random'`, `random_state` is used to initialize the random transformation. If `init='pca'`, `random_state` is passed as an argument to PCA when initializing the transformation. Pass an int for reproducible results across multiple function calls. See :term:`Glossary `. Attributes ---------- components_ : ndarray of shape (n_components, n_features) The linear transformation learned during fitting. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 n_iter_ : int Counts the number of iterations performed by the optimizer. random_state_ : numpy.RandomState Pseudo random number generator object used during initialization. feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 See Also -------- sklearn.discriminant_analysis.LinearDiscriminantAnalysis : Linear Discriminant Analysis. sklearn.decomposition.PCA : Principal component analysis (PCA). References ---------- .. [1] J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov. "Neighbourhood Components Analysis". Advances in Neural Information Processing Systems. 17, 513-520, 2005. http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf .. [2] Wikipedia entry on Neighborhood Components Analysis https://en.wikipedia.org/wiki/Neighbourhood_components_analysis Examples -------- >>> from sklearn.neighbors import NeighborhoodComponentsAnalysis >>> from sklearn.neighbors import KNeighborsClassifier >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>> X, y = load_iris(return_X_y=True) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, ... stratify=y, test_size=0.7, random_state=42) >>> nca = NeighborhoodComponentsAnalysis(random_state=42) >>> nca.fit(X_train, y_train) NeighborhoodComponentsAnalysis(...) >>> knn = KNeighborsClassifier(n_neighbors=3) >>> knn.fit(X_train, y_train) KNeighborsClassifier(...) >>> print(knn.score(X_test, y_test)) 0.933333... >>> knn.fit(nca.transform(X_train), y_train) KNeighborsClassifier(...) >>> print(knn.score(nca.transform(X_test), y_test)) 0.961904... Nleft)closed>ldapcaautorandomidentitybooleanrverbose random_state n_componentsinit warm_startmax_itertolcallbackr$r%_parameter_constraintsr F2gh㈵>)r(r)r*r+r,r$r%ct||_||_||_||_||_||_||_||_yNr&) selfr'r(r)r*r+r,r$r%s \/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/neighbors/_nca.py__init__z'NeighborhoodComponentsAnalysis.__init__s>) $     (T)prefer_skip_nested_validationct|||d\}}t|tj|}|jE|j|j dkDr)t d|jd|j dd|jrkt|dr_|jj d|j dk7r6t d |j dd |jj dd |j}t|tjrt|}|j d|j dk7r,t d |j dd |j dd |j d|j dkDr,t d|j dd|j dd |jE|j|j dk7r)t d|jd|j ddt|j |_t%j$}|ddtj&f|tj&ddfk(}tj(|j+|||}|j,dkDr|j,dz nd}d|j.||dfd||j0t3dd|j4it7d||j8d}d|_t=di|} | j>jAd|j d|_ t%j$|z }|j,rg|jBjD} | jFs*tIdjK| | jLtNtQdjK| ||S)aoFit the model according to the given training data. Parameters ---------- X : array-like of shape (n_samples, n_features) The training samples. y : array-like of shape (n_samples,) The corresponding training labels. Returns ------- self : object Fitted estimator. r)ensure_min_samplesNrzDThe preferred dimensionality of the projected space `n_components` (z8) cannot be greater than the given data dimensionality (z)! components_zThe new inputs dimensionality (zT) does not match the input dimensionality of the previously learned transformation (z).zThe input dimensionality (zc) of the given linear transformation `init` must match the dimensionality of the given inputs `X` (rzThe output dimensionality (z]) of the given linear transformation `init` cannot be greater than its input dimensionality (zV) does not match the output dimensionality of the given linear transformation `init` (zL-BFGS-BgTmaxiterdisp)methodfunargsjacx0r+optionsr,z[{}] NCA did not converge: {}z[{}] Training took {:8.2f}s.))rrr fit_transformr'shape ValueErrorr)hasattrr8r( isinstancenpndarrayrrr% random_state_timenewaxisravel _initializer$_loss_grad_lbfgsr+dictr*r _callbackn_iter_rxreshape __class____name__successrformatmessager print) r1Xyr(t_trainsame_class_masktransformationr;optimizer_params opt_resultcls_names r2fitz"NeighborhoodComponentsAnalysis.fits$T1aA>1$Q' N ( ( +    (T->->-K3373D3D2EF##$771:,b2  OOm,  &&q)QWWQZ71!''!*>66:6F6F6L6LQ6O5PPRT  yy dBJJ 't$Dzz!} * 0A@??@wwqzl"N zz!}tzz!}, 1$**Q-A>>Bjjm_BP   ,1B1BdjjQRm1S 77;7H7H6IJ $zz!}oR 100A0AB))+ArzzM*a A .>>$"2"21a">?$(<rr)r'r%z Finding principal components... )endr)LinearDiscriminantAnalysis)r'z*Finding most discriminative components... Nzdone in {:5.2f}s)r)rFr8rGrHrIrDr'lenuniquemineyerJstandard_normalrKr r$rZsysstdoutflushrcdiscriminant_analysisrm scalings_rgrX) r1r[r\r(r_ n_samples n_featuresr' n_classes init_timerrmrs r2rNz*NeighborhoodComponentsAnalysis._initializets* ??wt];!--NTSbjj ) PM%&GG !Iz,,: Lv~ ! - 3z9q=#AA D!C I$>> D%Dz!!# aggaj!A65!!%!3!3!C!C& 3"D"2-' IIK 5=%1@R@RC||@bI ((*GGAJ%(__NU]R4,OC||JPRS ((*GGAqM%(]]__]l%CN<<,33DIIK)4KLMr4c~|j|j||j|xjdz c_y)zCalled after each iteration of the optimizer. Parameters ---------- transformation : ndarray of shape (n_components * n_features,) The solution computed by the optimizer in this iteration. Nr)r,rR)r1r_s r2rQz(NeighborhoodComponentsAnalysis._callbacks. == $ MM.$,, 7  r4c |jdk(r|xjdz c_|jrngd}d}|j|}|jj}t dj|t dj|||dt |ztj} |jd|jd}tj||j} t| d } tj| tjt!| } | |z} tj"| dd } tj"| }| | | zz }||jz}tj||j#d  d | jj|j|z}|jrrtj| z } d}t |j|jj|j|| t$j&j)||z||j+zfS)aCompute the loss and the loss gradient w.r.t. `transformation`. Parameters ---------- transformation : ndarray of shape (n_components * n_features,) The raveled linear transformation on which to compute loss and evaluate gradient. X : ndarray of shape (n_samples, n_features) The training samples. same_class_mask : ndarray of shape (n_samples, n_samples) A mask where `mask[i, j] == 1` if `X[i]` and `X[j]` belong to the same class, and `0` otherwise. Returns ------- loss : float The loss computed for the given transformation. gradient : ndarray of shape (n_components * n_features,) The new (flattened) gradient of the loss. rr) IterationzObjective ValuezTime(s)z{:>10} {:>20} {:>10}z[{}]z[{}] {} [{}] {}-r9T)squared)axiskeepdims)rrz[{}] {:>10} {:>20.6e} {:>10.2f})rRr$rXrUrVrZrnrKrTrDrHrfrgr fill_diagonalinfrsumrsrtrurM)r1r_r[r^sign header_fields header_fmtheaderrb t_funcall X_embeddedp_ij masked_p_ijploss weighted_p_ijweighted_p_ij_symgradient values_fmts r2rOz/NeighborhoodComponentsAnalysis._loss_grad_lbfgss2 <<1  LLA L|| K 3 ***M:>>22fmmH-.&-- &(C#f+4E IIK '//AGGAJ?VVA~//0 "*d; rvv&u~_, FF;Q 6vvay$dQh. )MOO; *]->->A->-F,FGz||''(9:>>qAA << i/I:J !!NN++T\\4  JJ   d{D8>>#3333r4cFt|}d|j_|S)NT)super__sklearn_tags__ target_tagsrequired)r1tagsrUs r2rz/NeighborhoodComponentsAnalysis.__sklearn_tags__s#w')$(! r4c4|jjdS)z&Number of transformed output features.r)r8rD)r1s r2_n_features_outz.NeighborhoodComponentsAnalysis._n_features_outs%%a((r4r0)g?)rV __module__ __qualname____doc__rrrrHrIrcallabler-rP__annotations__r3r rcrhrNrQrOrpropertyr __classcell__)rUs@r2rr"sTp Xq$v 6  C D JJ !kh4?@q$v67t$;'($D&) )*5x6xt-0AF H4T ))r4r))rrsrKnumbersrrwarningsrnumpyrHscipy.optimizerbaserr r r decompositionr exceptionsr metricsr preprocessingrutils._param_validationrr utils.extmathr utils.fixesrutils.multiclassr utils.randomrutils.validationrrrrrBr4r2rs_ "#  +((:#<;-JJt)#%5}t)r4