`L i+f dZddlZddlZddlZddlmZddlmZmZddl Z ddl m Z ddl mZmZddlmZdd lmZdd lmZdd lmZdd lmZmZmZdd lmZmZmZm Z m!Z!m"Z"m#Z#ddl$m%Z%ddl&m'Z'ddl(m)Z)ddl*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0ddl1m2Z2ddl3m4Z4m5Z5ddl6m7Z7m8Z8ddl9m:Z:m;Z;ddlZ>ddl?m@Z@ddlAmBZBmCZCdZDdddddddd d!ZEd"ZFe0d#d$ggd%d#dgd&gd#dgd'd( dtdddd)d*ZGdud+ZHe0d#gd#dgd&ge-d,gd&gd-d( dtde jdd.d/ZJdvd0ZKd1ZLd2ZMgd3ZNe8e7d4kreNd5gz ZNe8e7d6kreNd7gz ZNe8e7d8kreNd9gz ZNd:gZOe0d#d$gd#d$ge.eddhge/ePeNje@jeSgeTdgd;d(ddZUe0d#d$gd#d$ge.eddhge/ePeNje@jeSgeTdgd;d(dd`. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) An array where each row is a sample and each column is a feature. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An array where each row is a sample and each column is a feature. If `None`, method uses `Y=X`. Y_norm_squared : array-like of shape (n_samples_Y,) or (n_samples_Y, 1) or (1, n_samples_Y), default=None Pre-computed dot-products of vectors in Y (e.g., ``(Y**2).sum(axis=1)``) May be ignored in some cases, see the note below. squared : bool, default=False Return squared Euclidean distances. X_norm_squared : array-like of shape (n_samples_X,) or (n_samples_X, 1) or (1, n_samples_X), default=None Pre-computed dot-products of vectors in X (e.g., ``(X**2).sum(axis=1)``) May be ignored in some cases, see the note below. Returns ------- distances : ndarray of shape (n_samples_X, n_samples_Y) Returns the distances between the row vectors of `X` and the row vectors of `Y`. See Also -------- paired_distances : Distances between pairs of elements of X and Y. Notes ----- To achieve a better accuracy, `X_norm_squared` and `Y_norm_squared` may be unused if they are passed as `np.float32`. Examples -------- >>> from sklearn.metrics.pairwise import euclidean_distances >>> X = [[0, 1], [1, 1]] >>> # distance between rows of X >>> euclidean_distances(X, X) array([[0., 1.], [1., 0.]]) >>> # get distance to origin >>> euclidean_distances(X, [[0, 0]]) array([[1. ], [1.41421356]]) F)rCrr*r*z'Incompatible dimensions for X of shape z and X_norm_squared of shape .r*r[z'Incompatible dimensions for Y of shape z and Y_norm_squared of shape )rrHrrKreshapeTrL_euclidean_distances)r6r7rUrVrWrGrMoriginal_shapes r9euclidean_distancesrbsl !Q EB A &DAq!$^uE'--   AGGAJ= 0ZZ@N   Aqwwqz? 2+--N   AGGAJ? 29!''C++9*:!=  !$^uE'--   AGGAJ= 0ZZ@N   AGGAJ? 2+--N   Aqwwqz? 29!''C++9*:!=  1nng NNr;cdt||\}}}|,|j|jk7r|j|d}n0|j|jk7rt |ddddf}nd}||ur|dn |j } n^|,|j|jk7r|j|d} n0|j|jk7rt |ddddf} nd} |j|jk(s|j|jk(rt |||| } n%dt||j dz} | |z } | | z } |jd|| j } t||j| | | } ||urt| d|d |r| St||j| | } | S) a'Computational part of euclidean_distances Assumes inputs are already checked. If norms are passed as float32, they are unused. If arrays are passed as float32, norms needs to be recomputed on upcast chunks. TODO: use a float64 accumulator in row_norms to avoid the latter. NrZTrVr] dense_outputr)devicer3outF)rG add_value) rr3r4r^r"r__euclidean_distances_upcastr#r2rmaximumrsqrt) r6r7rWrUrVrGrMdevice_XXYY distancesxp_zeros r9r`r`s.a3NB7!n&:&:bjj&H ZZ 0 BJJ  q$ '4 0 AvZTRTT  %.*>*>"***LNG4B WW "1d+D!G4BBww"**2:: 502q"= ACCdCC R R jj7)//jBG) BJJ 7 I  Av A"F)"bggyiPI r;) numeric_only)r6r7rVmissing_valuesrD)rVrurDct|rdnd}t||d||\}}t||}||ur|n t||}d||<d||<t||d}||z} ||z} |t j | |j z}|t j || j z}t j|dd|||urt j|d d |z } ||ur| n|} t j | | j } tj|| dk(<t jd | | || z}||jd z}|st j|||S) a Calculate the euclidean distances in the presence of missing values. Compute the euclidean distance between each pair of samples in X and Y, where Y=X is assumed if Y=None. When calculating the distance between a pair of samples, this formulation ignores feature coordinates with a missing value in either sample and scales up the weight of the remaining coordinates: .. code-block:: text dist(x,y) = sqrt(weight * sq. distance from present coordinates) where: .. code-block:: text weight = Total # of coordinates / # of present coordinates For example, the distance between ``[3, na, na, 6]`` and ``[1, na, 4, 5]`` is: .. math:: \sqrt{\frac{4}{2}((3-1)^2 + (6-5)^2)} If all the coordinates are missing or if there are no common present coordinates then NaN is returned for that pair. Read more in the :ref:`User Guide `. .. versionadded:: 0.22 Parameters ---------- X : array-like of shape (n_samples_X, n_features) An array where each row is a sample and each column is a feature. Y : array-like of shape (n_samples_Y, n_features), default=None An array where each row is a sample and each column is a feature. If `None`, method uses `Y=X`. squared : bool, default=False Return squared Euclidean distances. missing_values : np.nan, float or int, default=np.nan Representation of missing value. copy : bool, default=True Make and use a deep copy of X and Y (if Y exists). Returns ------- distances : ndarray of shape (n_samples_X, n_samples_Y) Returns the distances between the row vectors of `X` and the row vectors of `Y`. See Also -------- paired_distances : Distances between pairs of elements of X and Y. References ---------- * John K. Dixon, "Pattern Recognition with Partly Missing Data", IEEE Transactions on Systems, Man, and Cybernetics, Volume: 9, Issue: 10, pp. 617 - 621, Oct. 1979. http://ieeexplore.ieee.org/abstract/document/4310090/ Examples -------- >>> from sklearn.metrics.pairwise import nan_euclidean_distances >>> nan = float("NaN") >>> X = [[0, 1], [1, nan]] >>> nan_euclidean_distances(X, X) # distance between rows of X array([[0. , 1.41421356], [1.41421356, 0. ]]) >>> # get distance to origin >>> nan_euclidean_distances(X, [[0, 0]]) array([[1. ], [1.41421356]]) allow-nanTF)r@rBrDrrdNrir*) rrHrrbr0dotr_clip fill_diagonalnanrmrKrn)r6r7rVrurDrB missing_X missing_Yrrrprq present_X present_Y present_counts r9nan_euclidean_distancesrsgz(5^'D $  1E5FT DAq!^,I!V 1n)EIAiLAiL#Aq$7I QB QB IKK((I  244((IGGIq$I.Av C(I I!V )IFF9ikk2M$&FFImq !JJq-]3 I I   y) r;ct||\}}}|jd}|jd} |jd} |j|| f|j|} |t |r,|j t j|jz nd} t |r,|j t j|jz nd} t| |z| | zz| z| |z| z| zzdz d}| | z| z}| tj|dzd|zzzdz }tt|d}t||}t|| }t|D]\}}|j||ddf|}|t!|d dddf}n||}t| |}t|D]\}}||ur||kr| ||fj"}n^|j||ddf|}|t!|d dddf}n |dd|f}d t%||j"d z}||z }||z }|j||jd| ||f<| S)aEuclidean distances between X and Y. Assumes X and Y have float32 dtype. Assumes XX and YY have float64 dtype or are None. X and Y are upcast to float64 by chunks, which size is chosen to limit memory increase by approximately 10% (at least 10MiB). rr*r3rhN ir )rGrhTrdrerfFrD)rrKemptyr4rnnzr0prodmaxmathrnintrr enumerateastyper"r_r#)r6rpr7rq batch_sizerGrMro n_samples_X n_samples_Y n_featuresrr x_density y_densitymaxmemtmp x_batches xp_max_floatix_sliceX_chunkXX_chunk y_batchesjy_slicedY_chunkYY_chunks r9rlrlFs.a3NB7''!*K''!*KJ+{32::gVI08 AEEBGGAGG,, 08 AEEBGGAGG,, [(9{+BBjP{*Y6DF     9$ 2dTYYsAvF ':;;q@ Z!, K4I-GDL *O 7))AgqjM<8 : $74@H'{H Z8 #I. OJAwAv!a%gw./11))AgqjM<@:($?aHH!!W*~H'))$OOX X *,))Arzz)*NIgw& '# OO8 r;c||jd}|tj|jd|f}||fS)Nr*axisr)argminr0arangerK)diststartindicesvaluess r9_argmin_min_reducers>kkqk!G "))DJJqM*G3 4F F?r;c&|jdS)Nr*r)r)rrs r9_argmin_reducers ;;A; r;) euclideanl2l1 manhattan cityblock braycurtiscanberra chebyshev correlationcosinedicehammingjaccard mahalanobismatching minkowskirogerstanimoto russellrao seuclidean sokalsneath sqeuclideanyule wminkowski nan_euclidean haversinez1.17 sokalmichenerz1.11 kulsinskiz1.9rr)r6r7rmetric metric_kwargsr)rrrc |dk(rdnd}t|||\}}|dk(r||}}|i}tj|||r^|jddr |d k(rd }i}tj||d ||d d \}}|j }|j }||fSt d5tt||ft|d|\}}dddtj}tj}||fS#1swY7xYw)a Compute minimum distances between one point and a set of points. This function computes for each row in X, the index of the row of Y which is closest (according to the specified distance). The minimal distances are also returned. This is mostly equivalent to calling:: (pairwise_distances(X, Y=Y, metric=metric).argmin(axis=axis), pairwise_distances(X, Y=Y, metric=metric).min(axis=axis)) but uses much less memory, and is faster for large arrays. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) Array containing points. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features) Array containing points. axis : int, default=1 Axis along which the argmin and distances are to be computed. metric : str or callable, default='euclidean' Metric to use for distance computation. Any metric from scikit-learn or :mod:`scipy.spatial.distance` can be used. If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy's metrics, but is less efficient than passing the metric name as a string. Distance matrices are not supported. Valid values for metric are: - from scikit-learn: ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean'] - from :mod:`scipy.spatial.distance`: ['braycurtis', 'canberra', 'chebyshev', 'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'] See the documentation for :mod:`scipy.spatial.distance` for details on these metrics. .. note:: `'kulsinski'` is deprecated from SciPy 1.9 and will be removed in SciPy 1.11. .. note:: `'matching'` has been removed in SciPy 1.9 (use `'hamming'` instead). metric_kwargs : dict, default=None Keyword arguments to pass to specified metric function. Returns ------- argmin : ndarray Y[argmin[i], :] is the row in Y that is closest to X[i, :]. distances : ndarray The array of minimum distances. `distances[i]` is the distance between the i-th row in X and the argmin[i]-th row in Y. See Also -------- pairwise_distances : Distances between every pair of samples of X and Y. pairwise_distances_argmin : Same as `pairwise_distances_argmin_min` but only returns the argmins. Examples -------- >>> from sklearn.metrics.pairwise import pairwise_distances_argmin_min >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> argmin, distances = pairwise_distances_argmin_min(X, Y) >>> argmin array([0, 1]) >>> distances array([1., 1.]) rrwTrBrNrVFrrr*autor6r7krrstrategyreturn_distance assume_finite reduce_funcr) rHr+ is_usable_forgetcomputeflattenr zippairwise_distances_chunkedrr0 concatenate)r6r7rrrrBrrs r9pairwise_distances_argmin_minrs8J(.'@ d A9J KDAq qy!1 Q6*   Y .6[3H"FM!//'  !//#( F?$ / !+q&8KXOGV  ..)' F?  s )C==Dc |dk(rdnd}t|||\}}|dk(r||}}|i}tj|||rI|jddr |d k(rd }i}tj||d ||d d }|j }|St d5tjtt||ft|d|}ddd|S#1swYSxYw)a Compute minimum distances between one point and a set of points. This function computes for each row in X, the index of the row of Y which is closest (according to the specified distance). This is mostly equivalent to calling:: pairwise_distances(X, Y=Y, metric=metric).argmin(axis=axis) but uses much less memory, and is faster for large arrays. This function works with dense 2D arrays only. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) Array containing points. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features) Arrays containing points. axis : int, default=1 Axis along which the argmin and distances are to be computed. metric : str or callable, default="euclidean" Metric to use for distance computation. Any metric from scikit-learn or :mod:`scipy.spatial.distance` can be used. If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy's metrics, but is less efficient than passing the metric name as a string. Distance matrices are not supported. Valid values for metric are: - from scikit-learn: ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean'] - from :mod:`scipy.spatial.distance`: ['braycurtis', 'canberra', 'chebyshev', 'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'] See the documentation for :mod:`scipy.spatial.distance` for details on these metrics. .. note:: `'kulsinski'` is deprecated from SciPy 1.9 and will be removed in SciPy 1.11. .. note:: `'matching'` has been removed in SciPy 1.9 (use `'hamming'` instead). metric_kwargs : dict, default=None Keyword arguments to pass to specified metric function. Returns ------- argmin : numpy.ndarray Y[argmin[i], :] is the row in Y that is closest to X[i, :]. See Also -------- pairwise_distances : Distances between every pair of samples of X and Y. pairwise_distances_argmin_min : Same as `pairwise_distances_argmin` but also returns the distances. Examples -------- >>> from sklearn.metrics.pairwise import pairwise_distances_argmin >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> pairwise_distances_argmin(X, Y) array([0, 1]) rrwTrrNrVFrrr*rrrr) rHr+rrrrr r0rlistrr)r6r7rrrrBrs r9pairwise_distances_argminr^sx(.'@ d A9J KDAq qy!1 Q6*   Y .6[3H"FM//'! //#, N$ / nn/1*8KXG  N  Ns 1CCrPcPddlm}|jdj||S)a8Compute the Haversine distance between samples in X and Y. The Haversine (or great circle) distance is the angular distance between two points on the surface of a sphere. The first coordinate of each point is assumed to be the latitude, the second is the longitude, given in radians. The dimension of the data must be 2. .. math:: D(x, y) = 2\arcsin[\sqrt{\sin^2((x_{lat} - y_{lat}) / 2) + \cos(x_{lat})\cos(y_{lat})\ sin^2((x_{lon} - y_{lon}) / 2)}] Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, 2) A feature array. Y : {array-like, sparse matrix} of shape (n_samples_Y, 2), default=None An optional second feature array. If `None`, uses `Y=X`. Returns ------- distances : ndarray of shape (n_samples_X, n_samples_Y) The distance matrix. Notes ----- As the Earth is nearly spherical, the haversine formula provides a good approximation of the distance between two points of the Earth surface, with a less than 1% error on average. Examples -------- We want to calculate the distance between the Ezeiza Airport (Buenos Aires, Argentina) and the Charles de Gaulle Airport (Paris, France). >>> from sklearn.metrics.pairwise import haversine_distances >>> from math import radians >>> bsas = [-34.83333, -58.5166646] >>> paris = [49.0083899664, 2.53844117956] >>> bsas_in_radians = [radians(_) for _ in bsas] >>> paris_in_radians = [radians(_) for _ in paris] >>> result = haversine_distances([bsas_in_radians, paris_in_radians]) >>> result * 6371000/1000 # multiply by Earth radius to get kilometers array([[ 0. , 11099.54035582], [11099.54035582, 0. ]]) r )DistanceMetricr)metricsr get_metricpairwise)r6r7rs r9haversine_distancesrs&j)  $ $[ 1 : :1a @@r;c t||\}}t|s t|rt|d}t|d}|j|jt j |j d|j df}t|j|j|j|j|j|j||Stj||dS)aCompute the L1 distances between the vectors in X and Y. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) An array where each row is a sample and each column is a feature. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An array where each row is a sample and each column is a feature. If `None`, method uses `Y=X`. Returns ------- distances : ndarray of shape (n_samples_X, n_samples_Y) Pairwise L1 distances. Notes ----- When X and/or Y are CSR sparse matrices and they are not already in canonical format, this function modifies them in-place to make them canonical. Examples -------- >>> from sklearn.metrics.pairwise import manhattan_distances >>> manhattan_distances([[3]], [[3]]) array([[0.]]) >>> manhattan_distances([[3]], [[2]]) array([[1.]]) >>> manhattan_distances([[2]], [[3]]) array([[1.]]) >>> manhattan_distances([[1, 2], [3, 4]], [[1, 2], [0, 3]]) array([[0., 2.], [4., 4.]]) Frrr) rHrrsum_duplicatesr0zerosrKr-datarindptrr cdist)r6r7Ds r9manhattan_distancesr&s\ !A &DAq{hqk qu % qu %   HHaggaj!''!*- .!&&!))QXXqvvqyy!((TUV >>!Q ,,r;ct||\}}t||}|dz}|dz }|j|dd}||us|t|d|d|S)aCompute cosine distance between samples in X and Y. Cosine distance is defined as 1.0 minus the cosine similarity. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) Matrix `X`. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None Matrix `Y`. Returns ------- distances : ndarray of shape (n_samples_X, n_samples_Y) Returns the cosine distance between samples in X and Y. See Also -------- cosine_similarity : Compute cosine similarity between samples in X and Y. scipy.spatial.distance.cosine : Dense matrices only. Examples -------- >>> from sklearn.metrics.pairwise import cosine_distances >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> cosine_distances(X, Y) array([[1. , 1. ], [0.422, 0.183]]) r[r*rxg@F)rk)rcosine_similarityrzr)r6r7rGrMSs r9cosine_distancesrbsgT !Q EB !QAGAFA 3AAv !Cu= Hr;c<t||\}}t||z S)aCompute the paired euclidean distances between X and Y. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) Input array/matrix X. Y : {array-like, sparse matrix} of shape (n_samples, n_features) Input array/matrix Y. Returns ------- distances : ndarray of shape (n_samples,) Output array/matrix containing the calculated paired euclidean distances. Examples -------- >>> from sklearn.metrics.pairwise import paired_euclidean_distances >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> paired_euclidean_distances(X, Y) array([1., 1.]) )rQr"rPs r9paired_euclidean_distancesrs#> q! $DAq QU r;cBt||\}}||z }t|r\tj|j|_tj tj |jdStj|jdS)aCompute the paired L1 distances between X and Y. Distances are calculated between (X[0], Y[0]), (X[1], Y[1]), ..., (X[n_samples], Y[n_samples]). Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) An array-like where each row is a sample and each column is a feature. Y : {array-like, sparse matrix} of shape (n_samples, n_features) An array-like where each row is a sample and each column is a feature. Returns ------- distances : ndarray of shape (n_samples,) L1 paired distances between the row vectors of `X` and the row vectors of `Y`. Examples -------- >>> from sklearn.metrics.pairwise import paired_manhattan_distances >>> import numpy as np >>> X = np.array([[1, 1, 0], [0, 1, 0], [0, 0, 1]]) >>> Y = np.array([[0, 1, 0], [0, 0, 1], [0, 0, 0]]) >>> paired_manhattan_distances(X, Y) array([1., 2., 1.]) r*rr[)rQrr0absrsqueezearraysum)r6r7diffs r9paired_manhattan_distancesrsxF q! $DAq q5D~FF499% zz"((4888#3455vvd|R((r;cjt||\}}dtt|t|z dzS)a Compute the paired cosine distances between X and Y. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) An array where each row is a sample and each column is a feature. Y : {array-like, sparse matrix} of shape (n_samples, n_features) An array where each row is a sample and each column is a feature. Returns ------- distances : ndarray of shape (n_samples,) Returns the distances between the row vectors of `X` and the row vectors of `Y`, where `distances[i]` is the distance between `X[i]` and `Y[i]`. Notes ----- The cosine distance is equivalent to the half the squared euclidean distance if each sample is normalized to unit norm. Examples -------- >>> from sklearn.metrics.pairwise import paired_cosine_distances >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> paired_cosine_distances(X, Y) array([0.5 , 0.184]) g?Trd)rQr"r rPs r9paired_cosine_distancesrs4L q! $DAq 9Q<)A,6E EEr;)rrrrrr)r6r7r)rc |tvrt|}|||St|rZt||\}}tjt |}t t |D]}|||||||<|Sy)a Compute the paired distances between X and Y. Compute the distances between (X[0], Y[0]), (X[1], Y[1]), etc... Read more in the :ref:`User Guide `. Parameters ---------- X : ndarray of shape (n_samples, n_features) Array 1 for distance computation. Y : ndarray of shape (n_samples, n_features) Array 2 for distance computation. metric : str or callable, default="euclidean" The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options specified in PAIRED_DISTANCES, including "euclidean", "manhattan", or "cosine". Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from `X` as input and return a value indicating the distance between them. **kwds : dict Unused parameters. Returns ------- distances : ndarray of shape (n_samples,) Returns the distances between the row vectors of `X` and the row vectors of `Y`. See Also -------- sklearn.metrics.pairwise_distances : Computes the distance between every pair of samples. Examples -------- >>> from sklearn.metrics.pairwise import paired_distances >>> X = [[0, 1], [1, 1]] >>> Y = [[0, 1], [2, 1]] >>> paired_distances(X, Y) array([0., 1.]) N)PAIRED_DISTANCEScallablerQr0rlenrange)r6r7rkwdsfuncrrrs r9paired_distancesrsr!!'Aqz & "1a(1HHSV$ s1v .A!!A$!-IaL . r;r6r7rgcPt||\}}t||j|S)a Compute the linear kernel between X and Y. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) A feature array. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An optional second feature array. If `None`, uses `Y=X`. dense_output : bool, default=True Whether to return dense output even when the input is sparse. If ``False``, the output is sparse if both input arrays are sparse. .. versionadded:: 0.20 Returns ------- kernel : ndarray of shape (n_samples_X, n_samples_Y) The Gram matrix of the linear kernel, i.e. `X @ Y.T`. Examples -------- >>> from sklearn.metrics.pairwise import linear_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> linear_kernel(X, Y) array([[0., 0.], [1., 2.]]) rf)rHr#r_rs r9 linear_kernelr ds(T !A &DAq 1acc ==r;left)closedneither)r6r7degreegammacoef0ct||\}}|d|jdz }t||jd}||z}||z }||z}|S)aP Compute the polynomial kernel between X and Y. .. code-block:: text K(X, Y) = (gamma + coef0) ^ degree Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) A feature array. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An optional second feature array. If `None`, uses `Y=X`. degree : float, default=3 Kernel degree. gamma : float, default=None Coefficient of the vector inner product. If None, defaults to 1.0 / n_features. coef0 : float, default=1 Constant offset added to scaled inner product. Returns ------- kernel : ndarray of shape (n_samples_X, n_samples_Y) The polynomial kernel. Examples -------- >>> from sklearn.metrics.pairwise import polynomial_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> polynomial_kernel(X, Y, degree=2) array([[1. , 1. ], [1.77, 2.77]]) ?r*Trf)rHrKr#r_)r6r7rrrKs r9polynomial_kernelrs^n !A &DAq }aggaj 133T2AJAJA&LA Hr;)r6r7rrct||\}}t||\}}|d|jdz }t||jd}||z}||z }t ||j ||}|S)aCompute the sigmoid kernel between X and Y. .. code-block:: text K(X, Y) = tanh(gamma + coef0) Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) A feature array. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An optional second feature array. If `None`, uses `Y=X`. gamma : float, default=None Coefficient of the vector inner product. If None, defaults to 1.0 / n_features. coef0 : float, default=1 Constant offset added to scaled inner product. Returns ------- kernel : ndarray of shape (n_samples_X, n_samples_Y) Sigmoid kernel between two arrays. Examples -------- >>> from sklearn.metrics.pairwise import sigmoid_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> sigmoid_kernel(X, Y) array([[0.76, 0.76], [0.87, 0.93]]) rr*Trfri)rrHrKr#r_rtanh)r6r7rrrGrMrs r9sigmoid_kernelrsyd !Q EB A &DAq }aggaj 133T2AJAJA!"bggqa8A Hr;)r6r7rct||\}}t||\}}|d|jdz }t||d}|| z}t ||j ||}|S)aCompute the rbf (gaussian) kernel between X and Y. .. code-block:: text K(x, y) = exp(-gamma ||x-y||^2) for each pair of rows x in X and y in Y. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) A feature array. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An optional second feature array. If `None`, uses `Y=X`. gamma : float, default=None If None, defaults to 1.0 / n_features. Returns ------- kernel : ndarray of shape (n_samples_X, n_samples_Y) The RBF kernel. Examples -------- >>> from sklearn.metrics.pairwise import rbf_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> rbf_kernel(X, Y) array([[0.71, 0.51], [0.51, 0.71]]) rr*Trdri)rrHrKrbrexpr6r7rrGrMrs r9 rbf_kernelrsn` !Q EB A &DAq }aggaj Aq$/A%KA!"bffaQ7A Hr;ct||\}}|d|jdz }| t||z}tj|||S)aCompute the laplacian kernel between X and Y. The laplacian kernel is defined as: .. code-block:: text K(x, y) = exp(-gamma ||x-y||_1) for each pair of rows x in X and y in Y. Read more in the :ref:`User Guide `. .. versionadded:: 0.17 Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) A feature array. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An optional second feature array. If `None`, uses `Y=X`. gamma : float, default=None If None, defaults to 1.0 / n_features. Otherwise it should be strictly positive. Returns ------- kernel : ndarray of shape (n_samples_X, n_samples_Y) The kernel matrix. Examples -------- >>> from sklearn.metrics.pairwise import laplacian_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> laplacian_kernel(X, Y) array([[0.71, 0.51], [0.51, 0.71]]) rr*)rHrKrr0r)r6r7rrs r9laplacian_kernelrOsRf !A &DAq }aggaj  $Q**AFF1aL Hr;ct||\}}t|d}||ur|}n t|d}t||j|}|S)axCompute cosine similarity between samples in X and Y. Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y: .. code-block:: text K(X, Y) = / (||X||*||Y||) On L2-normalized data, this function is equivalent to linear_kernel. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_features) Input data. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None Input data. If ``None``, the output will be the pairwise similarities between all samples in ``X``. dense_output : bool, default=True Whether to return dense output even when the input is sparse. If ``False``, the output is sparse if both input arrays are sparse. .. versionadded:: 0.17 parameter ``dense_output`` for dense output. Returns ------- similarities : ndarray or sparse matrix of shape (n_samples_X, n_samples_Y) Returns the cosine similarity between samples in X and Y. Examples -------- >>> from sklearn.metrics.pairwise import cosine_similarity >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> cosine_similarity(X, Y) array([[0. , 0. ], [0.577, 0.816]]) Trrf)rHr r#r_)r6r7rg X_normalized Y_normalizedrs r9rrsPj !A &DAqQT*LAv#  .  lnn<PA Hr;ct||\}}}t||d\}}|j|dkr td||ur|j|dkr tdt |rLt j |jd|jdf|j}t||||St|||}|dddddf}|dddddf}||z d z } ||z} |j| dk(|jd|| | } |j| dk(|jd || | } |j| | z d S) aCompute the additive chi-squared kernel between observations in X and Y. The chi-squared kernel is computed between each pair of rows in X and Y. X and Y have to be non-negative. This kernel is most commonly applied to histograms. The chi-squared kernel is given by: .. code-block:: text k(x, y) = -Sum [(x - y)^2 / (x + y)] It can be interpreted as a weighted difference per entry. Read more in the :ref:`User Guide `. Parameters ---------- X : array-like of shape (n_samples_X, n_features) A feature array. Y : array-like of shape (n_samples_Y, n_features), default=None An optional second feature array. If `None`, uses `Y=X`. Returns ------- kernel : array-like of shape (n_samples_X, n_samples_Y) The kernel matrix. See Also -------- chi2_kernel : The exponentiated version of the kernel, which is usually preferable. sklearn.kernel_approximation.AdditiveChi2Sampler : A Fourier approximation to this kernel. Notes ----- As the negative of a distance, this kernel is only conditionally positive definite. References ---------- * Zhang, J. and Marszalek, M. and Lazebnik, S. and Schmid, C. Local features and kernels for classification of texture and object categories: A comprehensive study International Journal of Computer Vision 2007 https://hal.archives-ouvertes.fr/hal-00171412/document Examples -------- >>> from sklearn.metrics.pairwise import additive_chi2_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> additive_chi2_kernel(X, Y) array([[-1., -2.], [-2., -1.]]) F)r@rzX contains negative values.zY contains negative values.r3rFNr rr*r)rrHrJrLrr0rrKr3r,rwherer2r) r6r7rGrMroresultr3xbybnomdenoms r9additive_chi2_kernelr)sN~.a3NB7 AU ;DAq vva!e}677zbffQUm67721771:qwwqz2!''B!Q' -ar: q$z] tQz]bQRhhuz2::auW:#MsS!RZZwZ%OQVWvvcEkv**r;ct||\}}t||}||z}t|rtj||S|j |S)aCompute the exponential chi-squared kernel between X and Y. The chi-squared kernel is computed between each pair of rows in X and Y. X and Y have to be non-negative. This kernel is most commonly applied to histograms. The chi-squared kernel is given by: .. code-block:: text k(x, y) = exp(-gamma Sum [(x - y)^2 / (x + y)]) It can be interpreted as a weighted difference per entry. Read more in the :ref:`User Guide `. Parameters ---------- X : array-like of shape (n_samples_X, n_features) A feature array. Y : array-like of shape (n_samples_Y, n_features), default=None An optional second feature array. If `None`, uses `Y=X`. gamma : float, default=1 Scaling parameter of the chi2 kernel. Returns ------- kernel : ndarray of shape (n_samples_X, n_samples_Y) The kernel matrix. See Also -------- additive_chi2_kernel : The additive version of this kernel. sklearn.kernel_approximation.AdditiveChi2Sampler : A Fourier approximation to the additive version of this kernel. References ---------- * Zhang, J. and Marszalek, M. and Lazebnik, S. and Schmid, C. Local features and kernels for classification of texture and object categories: A comprehensive study International Journal of Computer Vision 2007 https://hal.archives-ouvertes.fr/hal-00171412/document Examples -------- >>> from sklearn.metrics.pairwise import chi2_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> chi2_kernel(X, Y) array([[0.368, 0.135], [0.135, 0.368]]) ri)rr)rr0rrs r9 chi2_kernelr+"sQ@ !Q EBQ"AJA2vvaQ 66!9r;) rrrrrrrr?rctS)aLValid metrics for pairwise_distances. This function simply returns the valid pairwise distance metrics. It exists to allow for a description of the mapping for each of the valid strings. The valid distance metrics, and the function they map to, are: =============== ======================================== metric Function =============== ======================================== 'cityblock' metrics.pairwise.manhattan_distances 'cosine' metrics.pairwise.cosine_distances 'euclidean' metrics.pairwise.euclidean_distances 'haversine' metrics.pairwise.haversine_distances 'l1' metrics.pairwise.manhattan_distances 'l2' metrics.pairwise.euclidean_distances 'manhattan' metrics.pairwise.manhattan_distances 'nan_euclidean' metrics.pairwise.nan_euclidean_distances =============== ======================================== Read more in the :ref:`User Guide `. Returns ------- distance_metrics : dict Returns valid metrics for pairwise_distances. )PAIRWISE_DISTANCE_FUNCTIONSr;r9distance_metricsr/zs : '&r;c"||i||dd|f<y)z/Write in-place to a slice of a distance matrix.Nr.) dist_func dist_matrixslice_argskwargss r9 _dist_wrapperr6s&77K6 r;c t\}t|dk(r fiSttt j j dj df|dtd|fdttt|Dusturt jdS)zZBreak the pairwise matrix in n_jobs even slices and compute them using multithreading.r*rF)r3order threading)backendn_jobsc 3>K|]}||fiywNr.).0sr6r7fdrrrets r9 z%_parallel_pairwise..s/1  4aAaD)D)1s) r:rr'r6r0rrKr&rr(rbr{)r6r7rr<rr3rArBs``` ` @@r9_parallel_pairwiserDs y %a+KAq%1$Aq!D!!  B ((AGGAJ +5 DC0H[011 a2B62JK1 Q!)).s) O!:a  =wq*'=#= = Os)+z;reduce_func returned %r. Expected sequence(s) of length %d.rc3:K|]}t|k7ywr>r()r?rS chunk_sizes r9rCz$_check_chunk_size..s :Q<?j ( :sc32K|]}t|ywr>rUrRs r9rCz$_check_chunk_size..s=LO=szLreduce_func returned object of length %s. Expected same length as input: %d.)r/rPrJ TypeErrorrL)reducedrVis_tuple actual_sizes ` r9_check_chunk_sizer\s'5)H * Ow OO I"w J? @   :' ::=W==  1&{KNJG H  ;r;c 2|dk(r/d|vr+||urtj|dd}d|iStd|dk(r]d|vrY||urJtjj tj |j j }d|iStd iS) z:Precompute data-derived metric parameters if not provided.rVrr*)rddofzIThe 'V' parameter is required for the seuclidean metric when Y is passed.rVIzKThe 'VI' parameter is required for the mahalanobis metric when Y is passed.)r0varrLlinalginvcovr_)r6r7rrr^r`s r9_precompute_metric_paramsres #T/ 6qqq)A Qx $  4t#3 6rvvacc{+--B bz $  Ir;r?)r6r7rrr<working_memory)rrr<rfc+TKt|}|dk(rtd|f}n*||}tdt|z||} t|| }t ||fd|i|} |j d i| |D]} | j dk(r| j|k(r|} n|| } t| |f||d|} ||us|Dtj|dtur(d| j| j dt|dz<|.| jd}|| | j } t| || yw) aGenerate a distance matrix chunk by chunk with optional reduction. In cases where not all of a pairwise distance matrix needs to be stored at once, this is used to calculate pairwise distances in ``working_memory``-sized chunks. If ``reduce_func`` is given, it is run on each chunk and its return values are concatenated into lists, arrays or sparse matrices. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_samples_X) or (n_samples_X, n_features) Array of pairwise distances between samples, or a feature array. The shape the array should be (n_samples_X, n_samples_X) if metric='precomputed' and (n_samples_X, n_features) otherwise. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An optional second feature array. Only allowed if metric != "precomputed". reduce_func : callable, default=None The function which is applied on each chunk of the distance matrix, reducing it to needed values. ``reduce_func(D_chunk, start)`` is called repeatedly, where ``D_chunk`` is a contiguous vertical slice of the pairwise distance matrix, starting at row ``start``. It should return one of: None; an array, a list, or a sparse matrix of length ``D_chunk.shape[0]``; or a tuple of such objects. Returning None is useful for in-place operations, rather than reductions. If None, pairwise_distances_chunked returns a generator of vertical chunks of the distance matrix. metric : str or callable, default='euclidean' The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by :func:`scipy.spatial.distance.pdist` for its metric parameter, or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS. If metric is "precomputed", X is assumed to be a distance matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them. n_jobs : int, default=None The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details. working_memory : float, default=None The sought maximum memory for temporary distance matrix chunks. When None (default), the value of ``sklearn.get_config()['working_memory']`` is used. **kwds : optional keyword parameters Any further parameters are passed directly to the distance function. If using a :mod:`scipy.spatial.distance` metric, the parameters are still metric dependent. See the scipy docs for usage examples. Yields ------ D_chunk : {ndarray, sparse matrix} A contiguous slice of distance matrix, optionally processed by ``reduce_func``. Examples -------- Without reduce_func: >>> import numpy as np >>> from sklearn.metrics import pairwise_distances_chunked >>> X = np.random.RandomState(0).rand(5, 3) >>> D_chunk = next(pairwise_distances_chunked(X)) >>> D_chunk array([[0. , 0.295, 0.417, 0.197, 0.572], [0.295, 0. , 0.576, 0.419, 0.764], [0.417, 0.576, 0. , 0.449, 0.903], [0.197, 0.419, 0.449, 0. , 0.512], [0.572, 0.764, 0.903, 0.512, 0. ]]) Retrieve all neighbors and average distance within radius r: >>> r = .2 >>> def reduce_func(D_chunk, start): ... neigh = [np.flatnonzero(d < r) for d in D_chunk] ... avg_dist = (D_chunk * (D_chunk < r)).mean(axis=1) ... return neigh, avg_dist >>> gen = pairwise_distances_chunked(X, reduce_func=reduce_func) >>> neigh, avg_dist = next(gen) >>> neigh [array([0, 3]), array([1]), array([2]), array([0, 3]), array([4])] >>> avg_dist array([0.039, 0. , 0. , 0.039, 0. ]) Where r is defined per sample, we need to make use of ``start``: >>> r = [.2, .4, .4, .3, .1] >>> def reduce_func(D_chunk, start): ... neigh = [np.flatnonzero(d < r[i]) ... for i, d in enumerate(D_chunk, start)] ... return neigh >>> neigh = next(pairwise_distances_chunked(X, reduce_func=reduce_func)) >>> neigh [array([0, 3]), array([0, 1]), array([2]), array([0, 3]), array([4])] Force row-by-row generation by reducing ``working_memory``: >>> gen = pairwise_distances_chunked(X, reduce_func=reduce_func, ... working_memory=0) >>> next(gen) [array([0, 3])] >>> next(gen) [array([0, 1])] r?rN) row_bytes max_n_rowsrfr)rr<r*r.)r(slicerrreupdaterstoppairwise_distancesr-rrbflatrKr\)r6r7rrr<rfrrslices chunk_n_rowsparamsslrD_chunkrVs r9rrsRVq/K ;') 9A(,q/)") [,7'q! CF Cd CFDKK& 88q=RWW 3GeG$WaVvVQUV Fai%@%D%D D& &! =>GLL8\!_q%88 9  " q)J!'2884G gz 2 !sD&D(rw)r6r7rr<rArB)r<rArBc t||}|dk(r#t||d|\}}d}t|||S|tvr t|} n,t |rt t f||d|} n t|s t|r td|tvrtnd} | turG|jtk7s|2|jtk7rd |z} tj| tt||| | \}}t||fd |i|} |j d i| t#|d k(r/||ur+t%j&t%j(|fd |i|St t$j*fd |i|} t-||| |fi|S)a%Compute the distance matrix from a feature array X and optional Y. This function takes one or two feature arrays or a distance matrix, and returns a distance matrix. - If `X` is a feature array, of shape (n_samples_X, n_features), and: - `Y` is `None` and `metric` is not 'precomputed', the pairwise distances between `X` and itself are returned. - `Y` is a feature array of shape (n_samples_Y, n_features), the pairwise distances between `X` and `Y` is returned. - If `X` is a distance matrix, of shape (n_samples_X, n_samples_X), `metric` should be 'precomputed'. `Y` is thus ignored and `X` is returned as is. If the input is a collection of non-numeric data (e.g. a list of strings or a boolean array), a custom metric must be passed. This method provides a safe way to take a distance matrix as input, while preserving compatibility with many other algorithms that take a vector array. Valid values for metric are: - From scikit-learn: ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean']. All metrics support sparse matrix inputs except 'nan_euclidean'. - From :mod:`scipy.spatial.distance`: ['braycurtis', 'canberra', 'chebyshev', 'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule']. These metrics do not support sparse matrix inputs. .. note:: `'kulsinski'` is deprecated from SciPy 1.9 and will be removed in SciPy 1.11. .. note:: `'matching'` has been removed in SciPy 1.9 (use `'hamming'` instead). Note that in the case of 'cityblock', 'cosine' and 'euclidean' (which are valid :mod:`scipy.spatial.distance` metrics), the scikit-learn implementation will be used, which is faster and has support for sparse matrices (except for 'cityblock'). For a verbose description of the metrics from scikit-learn, see :func:`sklearn.metrics.pairwise.distance_metrics` function. Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_samples_X) or (n_samples_X, n_features) Array of pairwise distances between samples, or a feature array. The shape of the array should be (n_samples_X, n_samples_X) if metric == "precomputed" and (n_samples_X, n_features) otherwise. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None An optional second feature array. Only allowed if metric != "precomputed". metric : str or callable, default='euclidean' The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by :func:`scipy.spatial.distance.pdist` for its metric parameter, or a metric listed in ``pairwise.PAIRWISE_DISTANCE_FUNCTIONS``. If metric is "precomputed", X is assumed to be a distance matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them. n_jobs : int, default=None The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them using multithreading. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details. The "euclidean" and "cosine" metrics rely heavily on BLAS which is already multithreaded. So, increasing `n_jobs` would likely cause oversubscription and quickly degrade performance. force_all_finite : bool or 'allow-nan', default=True Whether to raise an error on np.inf, np.nan, pd.NA in array. Ignored for a metric listed in ``pairwise.PAIRWISE_DISTANCE_FUNCTIONS``. The possibilities are: - True: Force all values of array to be finite. - False: accepts np.inf, np.nan, pd.NA in array. - 'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite. .. versionadded:: 0.22 ``force_all_finite`` accepts the string ``'allow-nan'``. .. versionchanged:: 0.23 Accepts `pd.NA` and converts it into `np.nan`. .. deprecated:: 1.6 `force_all_finite` was renamed to `ensure_all_finite` and will be removed in 1.8. ensure_all_finite : bool or 'allow-nan', default=True Whether to raise an error on np.inf, np.nan, pd.NA in array. Ignored for a metric listed in ``pairwise.PAIRWISE_DISTANCE_FUNCTIONS``. The possibilities are: - True: Force all values of array to be finite. - False: accepts np.inf, np.nan, pd.NA in array. - 'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite. .. versionadded:: 1.6 `force_all_finite` was renamed to `ensure_all_finite`. **kwds : optional keyword parameters Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples. Returns ------- D : ndarray of shape (n_samples_X, n_samples_X) or (n_samples_X, n_samples_Y) A distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then D_{i, j} is the distance between the ith array from X and the jth array from Y. See Also -------- pairwise_distances_chunked : Performs the same calculation as this function, but returns a generator of chunks of the distance matrix, in order to limit memory usage. sklearn.metrics.pairwise.paired_distances : Computes the distances between corresponding elements of two arrays. Notes ----- If metric is a callable, no restrictions are placed on `X` and `Y` dimensions. Examples -------- >>> from sklearn.metrics.pairwise import pairwise_distances >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> pairwise_distances(X, Y, metric='sqeuclidean') array([[1., 2.], [2., 1.]]) r?T)r?rBzM`pairwise_distances`. Precomputed distance need to have non-negative values.)whom)rrBz6scipy distance metrics do not support sparse matrices.r<z+Data was converted to boolean for metric %s)r3rBrr*r.)r!rHr)r-rrrLrrXPAIRWISE_BOOLEAN_FUNCTIONSboolr3warningswarnr rerlrr squareformpdistrrD) r6r7rr<rArBrrMrvrr3msgrrs r9rnrnsd44DFWX $ qd6G 1  1  14( . .*62 &   /    A;(1+TU U"<<- D=aggo!-AGGtO?&HC MM#4 5$ q1B 1 +1aGG$G f F #q (Q!V&&x~~a'O'O$'OP Px~~=f== aD& 9D 99r;)rrrrrr) additive_chi2chi2linear polynomialpolyrbf laplaciansigmoidrctS)aValid metrics for pairwise_kernels. This function simply returns the valid pairwise distance metrics. It exists, however, to allow for a verbose description of the mapping for each of the valid strings. The valid distance metrics, and the function they map to, are: =============== ======================================== metric Function =============== ======================================== 'additive_chi2' sklearn.pairwise.additive_chi2_kernel 'chi2' sklearn.pairwise.chi2_kernel 'linear' sklearn.pairwise.linear_kernel 'poly' sklearn.pairwise.polynomial_kernel 'polynomial' sklearn.pairwise.polynomial_kernel 'rbf' sklearn.pairwise.rbf_kernel 'laplacian' sklearn.pairwise.laplacian_kernel 'sigmoid' sklearn.pairwise.sigmoid_kernel 'cosine' sklearn.pairwise.cosine_similarity =============== ======================================== Read more in the :ref:`User Guide `. Returns ------- kernel_metrics : dict Returns valid metrics for pairwise_kernels. )PAIRWISE_KERNEL_FUNCTIONSr.r;r9kernel_metricsr s : %$r;r.r)rrrr) r~rrrrrrrr)r6r7r filter_paramsr<)rr<c 8ddlm}|dk(rt||d\}}|St||r |j}nP|t vr+|r|D cic]} | t |vs| || }} t |}nt|rttfd|i|}t|||fi|Scc} w)aBCompute the kernel between arrays X and optional array Y. This function takes one or two feature arrays or a kernel matrix, and returns a kernel matrix. - If `X` is a feature array, of shape (n_samples_X, n_features), and: - `Y` is `None` and `metric` is not 'precomputed', the pairwise kernels between `X` and itself are returned. - `Y` is a feature array of shape (n_samples_Y, n_features), the pairwise kernels between `X` and `Y` is returned. - If `X` is a kernel matrix, of shape (n_samples_X, n_samples_X), `metric` should be 'precomputed'. `Y` is thus ignored and `X` is returned as is. This method provides a safe way to take a kernel matrix as input, while preserving compatibility with many other algorithms that take a vector array. Valid values for metric are: ['additive_chi2', 'chi2', 'linear', 'poly', 'polynomial', 'rbf', 'laplacian', 'sigmoid', 'cosine'] Read more in the :ref:`User Guide `. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples_X, n_samples_X) or (n_samples_X, n_features) Array of pairwise kernels between samples, or a feature array. The shape of the array should be (n_samples_X, n_samples_X) if metric == "precomputed" and (n_samples_X, n_features) otherwise. Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features), default=None A second feature array only if X has shape (n_samples_X, n_features). metric : str or callable, default="linear" The metric to use when calculating kernel between instances in a feature array. If metric is a string, it must be one of the metrics in ``pairwise.PAIRWISE_KERNEL_FUNCTIONS``. If metric is "precomputed", X is assumed to be a kernel matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables from :mod:`sklearn.metrics.pairwise` are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead. filter_params : bool, default=False Whether to filter invalid parameters or not. n_jobs : int, default=None The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them using multithreading. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details. **kwds : optional keyword parameters Any further parameters are passed directly to the kernel function. Returns ------- K : ndarray of shape (n_samples_X, n_samples_X) or (n_samples_X, n_samples_Y) A kernel matrix K such that K_{i, j} is the kernel between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then K_{i, j} is the kernel between the ith array from X and the jth array from Y. Notes ----- If metric is a callable, no restrictions are placed on `X` and `Y` dimensions. Examples -------- >>> from sklearn.metrics.pairwise import pairwise_kernels >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> pairwise_kernels(X, Y, metric='linear') array([[0., 0.], [1., 2.]]) r )Kernelr?T)r?r) gaussian_process.kernelsrrHr/__call__r KERNEL_PARAMSrrrLrD) r6r7rrr<rGPKernelrMrrs r9pairwise_kernelsr sL> $Qt<1 FH % , , (,K1]65J0JAtAwJKDK(0 & )A&ADA aD& 9D 99 Ls BBr>)NNF)NNNN)NT)NNr*)NNr*)NN)Nr)T)Nr)Nr)w__doc__rFrry functoolsrnumbersrrnumpyr0joblibr scipy.sparserr scipy.spatialr r exceptionsr preprocessingr utilsrrrutils._array_apirrrrrrrutils._chunkingr utils._maskrutils._missingrutils._param_validationrrrrrr utils.deprecationr! utils.extmathr"r# utils.fixesr$r%utils.parallelr&r'utils.validationr(r)_pairwise_distances_reductionr+_pairwise_fastr,r-r:rHrQrbr`r|rrlrr_VALID_METRICS _NAN_METRICSsetunion valid_metricsrdictrrrrrrrrrrr r1rrrrrr)r+r-r/r6rDrLr\rerrnrwrr frozensetrrr.r;r9rsW E  "#-".%==/#*<68.?2@8 ! _D HO , 2'.;'. #'  iO!%uTiO iOX3l^D !;(d;< #'  |T| |~GT6]6**''N]6**{m#N]5))zl"N O ,O ,Aq6*+ s>*001F1F1F1HI J   #( KtF FRO ,O ,Aq6*+ s>*001F1F1F1HI J   #( -.kQU~ ~B  )0UV"&3A 3AlO , 2#' 2-2-jO , 2#' . . d  )0OP"& >  )0OP"&%) %)P  )0OP"&#F #FN&+ $ $++ ^^c"234h? #' &1::|O , 2"  #' #>#>LO , 2D!T&9: T1d6 2  2::  4tI>? #' 1  1 hO , 2 T1d6 2  2::  4tI>? #' /  / dO , 2 T1d6 2  2::  #' -  - `O , 2 T1d9 5 2::   #' -  - `O , 2"  #' 7 7 t . d34"&N+ N+b^D !4D;VBJJ=OP #' ==H%$$  $, '@8 6-` *.O , 2 $'}o33NCDhOT"#D!T&A4H #(  j  j jZO , 2c.1]OCDhOT"   } % :|n- . ([M)BF4LQ #'" O: !O:O:f]6**?"33]6**;-/]5)):,. * #  ! %B wi  2 389 gY G9%'7+,  O , 2 s45G H  $T" #' g:16tg: g:r;