`L i ddlmZmZddlmZddlZddlmZddlm Z m Z ddl m Z m Z ddlmZmZmZmZdd lmZdd lmZmZdd lmZdd lmZdd lmZddlmZm Z m!Z!ddl"m#Z#m$Z$m%Z%ddl&m'Z'm(Z(ejRejTjVZ,dZ-dZ. d%dZ/ d&dZ0 d'dZ1e!ddgddgeedddge e2edhze3gdddd d!d"Z4Gd#d$eeeZ5y)()IntegralReal)timeN)linalg) csr_matrixissparse)pdist squareform) BaseEstimatorClassNamePrefixFeaturesOutMixinTransformerMixin _fit_context)PCA)_VALID_METRICSpairwise_distances)NearestNeighbors)check_random_state)_openmp_effective_n_threads)Interval StrOptionsvalidate_params) _num_samplescheck_non_negative validate_data)_barnes_hut_tsne_utilsc:|jtjd}tj|||}||j z}tj tj|t}tj t||z t}|S)aTCompute joint probabilities p_ij from distances. Parameters ---------- distances : ndarray of shape (n_samples * (n_samples-1) / 2,) Distances of samples are stored as condensed matrices, i.e. we omit the diagonal and duplicate entries and store everything in a one-dimensional array. desired_perplexity : float Desired perplexity of the joint probability distributions. verbose : int Verbosity level. Returns ------- P : ndarray of shape (n_samples * (n_samples-1) / 2,) Condensed joint probability matrix. Fcopy) astypenpfloat32r_binary_search_perplexityTmaximumsumMACHINE_EPSILONr ) distancesdesired_perplexityverbose conditional_PPsum_Ps ]/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/manifold/_t_sne.py_joint_probabilitiesr1&s{.  % 8I44%wM  'A JJrvvay/ 2E :a=5(/:A Hct}|j|jd}|jj |d}|j t jd}tj|||}t jt j|sJdt|j|j|jf||f}||j z}t j"|j%t&}||z}t jt j(|jdksJ|dk\r't|z } t+d j-| |S) axCompute joint probabilities p_ij from distances using just nearest neighbors. This method is approximately equal to _joint_probabilities. The latter is O(N), but limiting the joint probability to nearest neighbors improves this substantially to O(uN). Parameters ---------- distances : sparse matrix of shape (n_samples, n_samples) Distances of samples to its n_neighbors nearest neighbors. All other distances are left to zero (and are not materialized in memory). Matrix should be of CSR format. desired_perplexity : float Desired perplexity of the joint probability distributions. verbose : int Verbosity level. Returns ------- P : sparse matrix of shape (n_samples, n_samples) Condensed joint probability matrix with only nearest neighbors. Matrix will be of CSR format. rFr "All probabilities should be finite)shape?r z5[t-SNE] Computed conditional probabilities in {:.3f}s)r sort_indicesr6datareshaper"r#r$rr%allisfiniterravelindicesindptrr&r'r(r)absprintformat) r*r+r,t0 n_samplesdistances_datar-r.r/durations r0_joint_probabilities_nnrGGsB6 B"I^^++Ir:N#**2::E*BN44*GM 66"++m, -S/SS -      1 193C3CD)$ A ACCA JJquuw 0EJA 66"&&.C' (( (!|6B; ELLXVW Hr2Tc |j||}t|d}||z}|dz }||dzdz z}tj|dtj|zz t } |rHdtj |tjtj|t | z z} ntj} tj||f|j} t|| z |z} t||D]9} tj tj| | d|| |z | | <;| j} d|dzz|z }| |z} | | fS)aZt-SNE objective function: gradient of the KL divergence of p_ijs and q_ijs and the absolute error. Parameters ---------- params : ndarray of shape (n_params,) Unraveled embedding. P : ndarray of shape (n_samples * (n_samples-1) / 2,) Condensed joint probability matrix. degrees_of_freedom : int Degrees of freedom of the Student's-t distribution. n_samples : int Number of samples. n_components : int Dimension of the embedded space. skip_num_points : int, default=0 This does not compute the gradient for points with indices below `skip_num_points`. This is useful when computing transforms of new data where you'd like to keep the old data fixed. compute_error: bool, default=True If False, the kl_divergence is not computed and returns NaN. Returns ------- kl_divergence : float Kullback-Leibler divergence of p_ij and q_ij. grad : ndarray of shape (n_params,) Unraveled gradient of the Kullback-Leibler divergence with respect to the embedding. sqeuclideanr7g@dtypeK)order)r:r r#r'r(r)dotlognanndarrayrLr ranger=)paramsr.degrees_of_freedomrD n_componentsskip_num_points compute_error X_embeddeddistQ kl_divergencegradPQdics r0_kl_divergencerasQ\ <8J ] +D DCKD 3&$ ..D 43-.@A bffQrzz!_/MPQ/Q(RSS   ::y,/v|| DD a!et^ $C ?I .R&&#a&4jmj6PQQR ::int64r?zerosr6rgradientr=)rTr.rUrDrVanglerWr,rXrdrYval_P neighborsr?r]errorr`s r0_kl_divergence_bhrlsH]]2::E] 2F <8J FFMM"**5M 1E    6I XX__RXXE_ 2F 88J$$BJJ 7D  % %    # E !C'(+==A ::>> from sklearn.datasets import make_blobs >>> from sklearn.decomposition import PCA >>> from sklearn.manifold import trustworthiness >>> X, _ = make_blobs(n_samples=100, n_features=10, centers=3, random_state=42) >>> X_embedded = PCA(n_components=2).fit_transform(X) >>> print(f"{trustworthiness(X, X_embedded, n_neighbors=5):.2f}") 0.92 r z n_neighbors (z%) should be less than n_samples / 2 ())rrr)axis)rF)return_distancerKNr4rr7rJ@)r ValueErrorrr!r# fill_diagonalrwargsortrfit kneighborsrfintarangenewaxisr() rrYrrrDdist_Xind_Xind_X_embeddedinverted_indexordered_indicesranksts r0trustworthinessrs}ZQIi!m#K=)Qq "   & 1F VRVV$ JJvA &E [1 Z E *XXy)4C@Nii A .O>Mab>QN?3B3 ?3U:;ssBJJ7GH;V  uUQY A a y;&# /C+`. Parameters ---------- n_components : int, default=2 Dimension of the embedded space. perplexity : float, default=30.0 The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. The perplexity must be less than the number of samples. early_exaggeration : float, default=12.0 Controls how tight natural clusters in the original space are in the embedded space and how much space will be between them. For larger values, the space between natural clusters will be larger in the embedded space. Again, the choice of this parameter is not very critical. If the cost function increases during initial optimization, the early exaggeration factor or the learning rate might be too high. learning_rate : float or "auto", default="auto" The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If the learning rate is too high, the data may look like a 'ball' with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. Note that many other t-SNE implementations (bhtsne, FIt-SNE, openTSNE, etc.) use a definition of learning_rate that is 4 times smaller than ours. So our learning_rate=200 corresponds to learning_rate=800 in those other implementations. The 'auto' option sets the learning_rate to `max(N / early_exaggeration / 4, 50)` where N is the sample size, following [4] and [5]. .. versionchanged:: 1.2 The default value changed to `"auto"`. max_iter : int, default=1000 Maximum number of iterations for the optimization. Should be at least 250. .. versionchanged:: 1.5 Parameter name changed from `n_iter` to `max_iter`. n_iter_without_progress : int, default=300 Maximum number of iterations without progress before we abort the optimization, used after 250 initial iterations with early exaggeration. Note that progress is only checked every 50 iterations so this value is rounded to the next multiple of 50. .. versionadded:: 0.17 parameter *n_iter_without_progress* to control stopping criteria. min_grad_norm : float, default=1e-7 If the gradient norm is below this threshold, the optimization will be stopped. metric : str or callable, default='euclidean' The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter, or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS. If metric is "precomputed", X is assumed to be a distance matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them. The default is "euclidean" which is interpreted as squared euclidean distance. metric_params : dict, default=None Additional keyword arguments for the metric function. .. versionadded:: 1.1 init : {"random", "pca"} or ndarray of shape (n_samples, n_components), default="pca" Initialization of embedding. PCA initialization cannot be used with precomputed distances and is usually more globally stable than random initialization. .. versionchanged:: 1.2 The default value changed to `"pca"`. verbose : int, default=0 Verbosity level. random_state : int, RandomState instance or None, default=None Determines the random number generator. Pass an int for reproducible results across multiple function calls. Note that different initializations might result in different local minima of the cost function. See :term:`Glossary `. method : {'barnes_hut', 'exact'}, default='barnes_hut' By default the gradient calculation algorithm uses Barnes-Hut approximation running in O(NlogN) time. method='exact' will run on the slower, but exact, algorithm in O(N^2) time. The exact algorithm should be used when nearest-neighbor errors need to be better than 3%. However, the exact method cannot scale to millions of examples. .. versionadded:: 0.17 Approximate optimization *method* via the Barnes-Hut. angle : float, default=0.5 Only used if method='barnes_hut' This is the trade-off between speed and accuracy for Barnes-Hut T-SNE. 'angle' is the angular size (referred to as theta in [3]) of a distant node as measured from a point. If this size is below 'angle' then it is used as a summary node of all points contained within it. This method is not very sensitive to changes in this parameter in the range of 0.2 - 0.8. Angle less than 0.2 has quickly increasing computation time and angle greater 0.8 has quickly increasing error. n_jobs : int, default=None The number of parallel jobs to run for neighbors search. This parameter has no impact when ``metric="precomputed"`` or (``metric="euclidean"`` and ``method="exact"``). ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details. .. versionadded:: 0.22 Attributes ---------- embedding_ : array-like of shape (n_samples, n_components) Stores the embedding vectors. kl_divergence_ : float Kullback-Leibler divergence after optimization. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 learning_rate_ : float Effective learning rate. .. versionadded:: 1.2 n_iter_ : int Number of iterations run. See Also -------- sklearn.decomposition.PCA : Principal component analysis that is a linear dimensionality reduction method. sklearn.decomposition.KernelPCA : Non-linear dimensionality reduction using kernels and PCA. MDS : Manifold learning using multidimensional scaling. Isomap : Manifold learning based on Isometric Mapping. LocallyLinearEmbedding : Manifold learning using Locally Linear Embedding. SpectralEmbedding : Spectral embedding for non-linear dimensionality. Notes ----- For an example of using :class:`~sklearn.manifold.TSNE` in combination with :class:`~sklearn.neighbors.KNeighborsTransformer` see :ref:`sphx_glr_auto_examples_neighbors_approximate_nearest_neighbors.py`. References ---------- [1] van der Maaten, L.J.P.; Hinton, G.E. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605, 2008. [2] van der Maaten, L.J.P. t-Distributed Stochastic Neighbor Embedding https://lvdmaaten.github.io/tsne/ [3] L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014. https://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf [4] Belkina, A. C., Ciccolella, C. O., Anno, R., Halpert, R., Spidlen, J., & Snyder-Cappione, J. E. (2019). Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nature Communications, 10(1), 1-12. [5] Kobak, D., & Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10(1), 1-14. Examples -------- >>> import numpy as np >>> from sklearn.manifold import TSNE >>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]]) >>> X_embedded = TSNE(n_components=2, learning_rate='auto', ... init='random', perplexity=3).fit_transform(X) >>> X_embedded.shape (4, 2) rNrrrneitherautor4rpcarandomr, random_state barnes_hutexactbothrV perplexityearly_exaggerationrr|r~rr metric_paramsinitr,rmethodrhn_jobs_parameter_constraints2g>@g(@i,Hz>r?)rrrr|r~rrrrr,rrrhrc||_||_||_||_||_||_||_||_| |_| |_ | |_ | |_ | |_ ||_ ||_yNr)selfrVrrrr|r~rrrrr,rrrhrs r0__init__z TSNE.__init__*st&)$"4*  '>$* *  (   r2c|j|jdk\r)td|jd|jddy)Nrz perplexity (z) must be less than n_samples (r)rr6r)rrs r0_check_params_vs_inputzTSNE._check_params_vs_inputMsM ??aggaj (t/0##$771:,a1  )r2c t|jtr%|jdk(rt|r t d|j dk(rJ|j d|jz dz |_tj|jd|_n|j |_|jdk(r0t||dgd tjtjg }n/t||gd tjtjg }|jd k(rt|jtr|jdk(r t!d|j d|j dk7r t!dt#|d|jdk(rt|r t d|jdk(r|j$dkDr t!dt'|j(}|j d}d}|jdk(rJ|jd k(r|}nr|j*r t-d|jdk(rt/||jd}n3|j0xsi}t/|f|j|j2d|}tj4|dkr t!d|jdk7r|d z}t7||j8|j*}tj:tj<|sJdtj:|dk\sJdtj:|dksYJdt?|dz tAd |j8zdz} |j*rt-d!jC| tEd|j2| |j|j0"} tG} | jI|tG| z } |j*rt-d#jC|| tG} | jKd$%} tG| z } |j*rt-d&jC|| ~ | xjLd zc_&tO| |j8|j*}t|jtjPr |j}n|jdk(r}tS|j$d'|(}|jUd)*|jW|jYtjd+,}|tjZ|dddfz d-z}nM|jd.k(r>d-|j]||j$f/jYtjz}t_|j$dz d}|ja|||||0S)1z;Private function to fit the model using X as training data.rzfPCA initialization is currently not supported with the sparse input matrix. Use init="random" instead.rrrrcsrr ) accept_sparseensure_min_samplesrL)rcsccoo)rrLrzBThe parameter init="pca" cannot be used with metric="precomputed".rz$X should be a square distance matrixzKTSNE.fit(). With metric='precomputed', X should contain positive distances.rzTSNE with method="exact" does not accept sparse precomputed distance matrix. Use method="barnes_hut" or provide the dense distance matrix.zj'n_components' should be inferior to 4 for the barnes_hut algorithm as it relies on quad-tree or oct-tree.Nz'[t-SNE] Computing pairwise distances...rT)rsquared)rrzAAll distances should be positive, the metric given is not correctr5z(All probabilities should be non-negativez5All probabilities should be less or then equal to onerz)[t-SNE] Computing {} nearest neighbors...) algorithmrrrrz([t-SNE] Indexed {} samples in {:.3f}s...distance)modez7[t-SNE] Computed neighbors for {} samples in {:.3f}s... randomized)rV svd_solverrdefault) transformFr g-C6?r)size)rYrjrW)1 isinstancerstrr TypeErrorrr6rlearning_rate_r#r'rrr$float64rrrrVrrr,rArrranyr1rr;r<minrrBrrrkneighbors_graphr9rGrRr set_output fit_transformr"stdstandard_normalrt_tsne)rrrWrrD neighbors_nnr*metric_params_r.rknnrCrF distances_nnrYrrUs r0_fitz TSNE._fitTs; dii %$))u*<!)     '"#''!*t/F/F"F"JD "$**T-@-@""ED "&"4"4D  ;;, &$g#$zz2::. A3zz2::. A ;;- '$))S)dii5.@ XwwqzQWWQZ' !GHH 9 {{g%(1+< ;;, &4+<+ M#M M>66!q&> G >i!mSt1F1J-KLK||AHHUV# {{'{{"00 CB GGAJv{H||>EE!8 B//Z/@Lv{H||MTT!8   ! # ( doot||TA dii ,J YY% !..')C NNYN /**1-44RZZe4LJ$bffZ1-=&>>EJ YY( " < <!2!23!=!fRZZ !J!!2!2Q!6:zz  !"+   r2c |j}d|j|j|j|jt |||||j g|j|jdd }|jdk(r;t} |j|dd<|j|dd<t|dd <nt} ||jz}t| |fi|\}} } |jrtd | d z| fz||jz}|j |jz } | |jks| dkDr<|j |d <| d z|d <d|d<|j"|d<t| |fi|\}} } | |_|jrtd| d z| fz|j'||j }| |_|S)z Runs t-SNE.r)rWr) r{r}rrr,rrr~r|rrrrhr,rdzE[t-SNE] KL divergence after %d iterations with early exaggeration: %frr|r{rnrr~z-[t-SNE] KL divergence after %d iterations: %f)r= _N_ITER_CHECKrrr,dictrV_EXPLORATION_MAX_ITERrrlrhrrarrrAr|r~n_iter_r:kl_divergence_) rr.rUrDrYrjrWrTopt_argsobj_funcr\r{ remainings r0rz TSNE._tsnes!!# ..!//!00||?;*It7H7HI'+'A'A22   ;;, &(H*.**HX w ',0LLHX y )1L0MHX } -%H T $ $$$5h$S($S! r << W6=)*  T $ $$MMD$>$>> ** *i!m#'==HZ !VHTN#&HZ 262N2NH. /(9(F(Wh(W %FM2 << ?6=)*  ^^It/@/@A +r2Frcl|j||j|}||_|jS)aFit X into an embedded space and return that transformed output. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples) If the metric is 'precomputed' X must be a square distance matrix. Otherwise it contains a sample per row. If the method is 'exact', X may be a sparse matrix of type 'csr', 'csc' or 'coo'. If the method is 'barnes_hut' and the metric is 'precomputed', X may be a precomputed sparse graph. y : None Ignored. Returns ------- X_new : ndarray of shape (n_samples, n_components) Embedding of the training data in low-dimensional space. )rr embedding_)rry embeddings r0rzTSNE.fit_transform^s02 ##A&IIaL #r2c(|j||S)aFit X into an embedded space. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples) If the metric is 'precomputed' X must be a square distance matrix. Otherwise it contains a sample per row. If the method is 'exact', X may be a sparse matrix of type 'csr', 'csc' or 'coo'. If the method is 'barnes_hut' and the metric is 'precomputed', X may be a precomputed sparse graph. y : None Ignored. Returns ------- self : object Fitted estimator. )r)rrrs r0rzTSNE.fit|s2 1 r2c4|jjdS)z&Number of transformed output features.r)rr6)rs r0_n_features_outzTSNE._n_features_outs$$Q''r2c`t|}|jdk(|j_|S)Nr)super__sklearn_tags__r input_tagspairwise)rtags __class__s r0rzTSNE.__sklearn_tags__s*w')#';;-#?  r2)r )r)Nrr)__name__ __module__ __qualname____doc__rrrrsetrcallablerr#rRr__annotations__rrrrrrrrrpropertyrr __classcell__)rs@r0rr0sYx"(AtFCDai@A'afEF x T1d9 5 hT&AB$,Xr4$O#P"4D@Ac.1]OCDhO x( ) JJ ;'(|W5674Af56"+$D2 M! # #!F} JIV&+ 4&+ 0((r2r)rT)rrFTr) rrrngi@g{Gz?rrNN)6numbersrrrnumpyr#scipyr scipy.sparserrscipy.spatial.distancer r baser r rr decompositionrmetrics.pairwiserrrjrutilsrutils._openmp_helpersrutils._param_validationrrrutils.validationrrrrrrrdoubleepsr)r1rGrarlrr r rrr2r0r!s1#-4  A(&?KKNN'"((299%)) B6 ~Jf  ]J   OdO ,#_5 1d6BCc.1]OCDhO  #'34Ke e Pp  *,