L iNdddlmZmZddlmZmZddlmZmZm Z m Z m Z m Z m Z mZmZmZmZmZmZmZmZmZddlZddlmZmZddlmZdgZGd dZd Zy) )linalgspecial)check_random_state np_vecdot)asarray atleast_2dreshapezerosnewaxisexppisqrtravelpower atleast_1dsqueezesum transposeonescovN)gaussian_kernel_estimategaussian_kernel_estimate_log)multivariate_normal gaussian_kdeceZdZdZddZdZeZdZdZddddZ d Z dd Z d Z d Z e Zd e_ddZdZedZdZdZdZedZedZy)raRepresentation of a kernel-density estimate using Gaussian kernels. Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. `gaussian_kde` works for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed. Parameters ---------- dataset : array_like Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of dims, # of data). bw_method : str, scalar or callable, optional The method used to calculate the bandwidth factor. This can be 'scott', 'silverman', a scalar constant or a callable. If a scalar, this will be used directly as `factor`. If a callable, it should take a `gaussian_kde` instance as only parameter and return a scalar. If None (default), 'scott' is used. See Notes for more details. weights : array_like, optional weights of datapoints. This must be the same shape as dataset. If None (default), the samples are assumed to be equally weighted Attributes ---------- dataset : ndarray The dataset with which `gaussian_kde` was initialized. d : int Number of dimensions. n : int Number of datapoints. neff : int Effective number of datapoints. .. versionadded:: 1.2.0 factor : float The bandwidth factor obtained from `covariance_factor`. covariance : ndarray The kernel covariance matrix; this is the data covariance matrix multiplied by the square of the bandwidth factor, e.g. ``np.cov(dataset) * factor**2``. inv_cov : ndarray The inverse of `covariance`. Methods ------- evaluate __call__ integrate_gaussian integrate_box_1d integrate_box integrate_kde pdf logpdf resample set_bandwidth covariance_factor Notes ----- Bandwidth selection strongly influences the estimate obtained from the KDE (much more so than the actual shape of the kernel). Bandwidth selection can be done by a "rule of thumb", by cross-validation, by "plug-in methods" or by other means; see [3]_, [4]_ for reviews. `gaussian_kde` uses a rule of thumb, the default is Scott's Rule. Scott's Rule [1]_, implemented as `scotts_factor`, is:: n**(-1./(d+4)), with ``n`` the number of data points and ``d`` the number of dimensions. In the case of unequally weighted points, `scotts_factor` becomes:: neff**(-1./(d+4)), with ``neff`` the effective number of datapoints. Silverman's suggestion for *multivariate* data [2]_, implemented as `silverman_factor`, is:: (n * (d + 2) / 4.)**(-1. / (d + 4)). or in the case of unequally weighted points:: (neff * (d + 2) / 4.)**(-1. / (d + 4)). Note that this is not the same as "Silverman's rule of thumb" [6]_, which may be more robust in the univariate case; see documentation of the ``set_bandwidth`` method for implementing a custom bandwidth rule. Good general descriptions of kernel density estimation can be found in [1]_ and [2]_, the mathematics for this multi-dimensional implementation can be found in [1]_. With a set of weighted samples, the effective number of datapoints ``neff`` is defined by:: neff = sum(weights)^2 / sum(weights^2) as detailed in [5]_. `gaussian_kde` does not currently support data that lies in a lower-dimensional subspace of the space in which it is expressed. For such data, consider performing principal component analysis / dimensionality reduction and using `gaussian_kde` with the transformed data. References ---------- .. [1] D.W. Scott, "Multivariate Density Estimation: Theory, Practice, and Visualization", John Wiley & Sons, New York, Chicester, 1992. .. [2] B.W. Silverman, "Density Estimation for Statistics and Data Analysis", Vol. 26, Monographs on Statistics and Applied Probability, Chapman and Hall, London, 1986. .. [3] B.A. Turlach, "Bandwidth Selection in Kernel Density Estimation: A Review", CORE and Institut de Statistique, Vol. 19, pp. 1-33, 1993. .. [4] D.M. Bashtannyk and R.J. Hyndman, "Bandwidth selection for kernel conditional density estimation", Computational Statistics & Data Analysis, Vol. 36, pp. 279-298, 2001. .. [5] Gray P. G., 1969, Journal of the Royal Statistical Society. Series A (General), 132, 272 .. [6] Kernel density estimation. *Wikipedia.* https://en.wikipedia.org/wiki/Kernel_density_estimation Examples -------- Generate some random two-dimensional data: >>> import numpy as np >>> from scipy import stats >>> def measure(n): ... "Measurement model, return two coupled measurements." ... m1 = np.random.normal(size=n) ... m2 = np.random.normal(scale=0.5, size=n) ... return m1+m2, m1-m2 >>> m1, m2 = measure(2000) >>> xmin = m1.min() >>> xmax = m1.max() >>> ymin = m2.min() >>> ymax = m2.max() Perform a kernel density estimate on the data: >>> X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j] >>> positions = np.vstack([X.ravel(), Y.ravel()]) >>> values = np.vstack([m1, m2]) >>> kernel = stats.gaussian_kde(values) >>> Z = np.reshape(kernel(positions).T, X.shape) Plot the results: >>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> ax.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r, ... extent=[xmin, xmax, ymin, ymax]) >>> ax.plot(m1, m2, 'k.', markersize=2) >>> ax.set_xlim([xmin, xmax]) >>> ax.set_ylim([ymin, ymax]) >>> plt.show() Compare against manual KDE at a point: >>> point = [1, 2] >>> mean = values.T >>> cov = kernel.factor**2 * np.cov(values) >>> X = stats.multivariate_normal(cov=cov) >>> res = kernel.pdf(point) >>> ref = X.pdf(point - mean).sum() / len(mean) >>> np.allclose(res, ref) True Nc(tt||_|jjdkDs t d|jj \|_|_|t|jt|_ |xjt|jzc_ |jjdk7r t dt|j|jk7r t ddt!|j|jz |_|j |jkDr d}t | |j%|y#t&j($r}d}t'j(||d}~wwxYw)Nrz.`dataset` input should have multiple elements.z*`weights` input should be one-dimensional.z%`weights` input should be of length na1Number of dimensions is greater than number of samples. This results in a singular data covariance matrix, which cannot be treated using the algorithms implemented in `gaussian_kde`. Note that `gaussian_kde` interprets each *column* of `dataset` to be a point; consider transposing the input to `dataset`. bw_methodabThe data appears to lie in a lower-dimensional subspace of the space in which it is expressed. This has resulted in a singular data covariance matrix, which cannot be treated using the algorithms implemented in `gaussian_kde`. Consider performing principal component analysis / dimensionality reduction and using `gaussian_kde` with the transformed data.)rrdatasetsize ValueErrorshapednrastypefloat_weightsrweightsndimlenr_neff set_bandwidthr LinAlgError)selfr rr)msges V/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/scipy/stats/_kde.py__init__zgaussian_kde.__init__sB!''"23 ||  1$MN N++  &w/66u=DM MMS/ /M||  A% !MNN4==!TVV+ !HII9T]]DMMBBDJ 66DFF?-C S/ ! 1     3!! 1?C$$S)q 0 1sE!!F4F  Fctt|}|j\}}||jk7rL|dk(r*||jk(rt ||jdf}d}nd|d|j}t |t |j|\}}t||jj|jdddf|j|j|}|dddfS)aEvaluate the estimated pdf on a set of points. Parameters ---------- points : (# of dimensions, # of points)-array Alternatively, a (# of dimensions,) vector can be passed in and treated as a single point. Returns ------- values : (# of points,)-array The values at each point. Raises ------ ValueError : if the dimensionality of the input points is different than the dimensionality of the KDE. rpoints have dimension , dataset has dimension Nr) rrr#r$r r"_get_output_dtype covariancerr Tr)cho_cov)r/pointsr$mr0 output_dtypespecresults r2evaluatezgaussian_kde.evaluates(GFO,||1 ;Av!tvv+ $&&!5/s3004x9 o%.tG d)$/ LLNNDLLD1 HHdllL2ad|ctt|}t|}|j|jfk7rt d|j|j|j|jfk7rt d|j|ddt f}|j|z}tj|}|j|z }tj||}tjtj|d}tdt z|jddz |z}t#||ddz } t#t%| |j&d|z } | S)aW Multiply estimated density by a multivariate Gaussian and integrate over the whole space. Parameters ---------- mean : aray_like A 1-D array, specifying the mean of the Gaussian. cov : array_like A 2-D array, specifying the covariance matrix of the Gaussian. Returns ------- result : scalar The value of the integral. Raises ------ ValueError If the mean or covariance of the input Gaussian differs from the KDE's dimensionality. zmean does not have dimension z#covariance does not have dimension Nr@axis)rrrr#r$r"r r8r cho_factorr cho_solvenpproddiagonalrr rr r)) r/meanrsum_cov sum_cov_choldifftdiffsqrt_det norm_constenergiesr?s r2integrate_gaussianzgaussian_kde.integrate_gaussian s70'$-(o ::$&& "4< _- ^0LL$,,. rA)rngc||jjz ||jjz }}tj|||j||}t ||j dS)aFComputes the integral of a pdf over a rectangular interval. Parameters ---------- low_bounds : array_like A 1-D array containing the lower bounds of integration. high_bounds : array_like A 1-D array containing the upper bounds of integration. maxpts : int, optional The maximum number of points to use for integration. rng : `numpy.random.Generator`, optional Pseudorandom number generator state. When `rng` is None, a new generator is created using entropy from the operating system. Types other than `numpy.random.Generator` are passed to `numpy.random.default_rng` to instantiate a ``Generator``. Returns ------- value : scalar The result of the integral. ) lower_limitrmaxptsr_rE)r r9rcdfr8rr))r/ low_bounds high_boundsrbr_rWrXvaluess r2 integrate_boxzgaussian_kde.integrate_boxws\./t||~~1MT$(( ctv B77rAc|j|jk7r td|j|jkr|}|}n|}|}|j|jz}t j |}d}t |jD]}|jdd|tf}|j|z } t j|| } t| | ddz } |tt| |jd|j|zz }tjtj|d} t!dt"z|j$ddz | z} || z}|S)a Computes the integral of the product of this kernel density estimate with another. Parameters ---------- other : gaussian_kde instance The other kde. Returns ------- value : scalar The result of the integral. Raises ------ ValueError If the KDEs have different dimensionality. z$KDEs are not the same dimensionalitygNrrErDrC)r$r"r%r8rrGranger r rHrr r)rIrJrKrr r#)r/othersmalllargerMrNr?irLrOrPrSrQrRs r2 integrate_kdezgaussian_kde.integrate_kdesL* 77dff CD D 77TVV EEEE""U%5%55((1 uww XA==Aw/D==4'D$$\48E u15;H iXI AFu}}UVGWW WF  X772;;|A781r67==#3c#9:XE * rAcF|t|j}t|}t|j t |j ft|j|}|j|j||j}|jdd|f}||zS)aARandomly sample a dataset from the estimated pdf. Parameters ---------- size : int, optional The number of samples to draw. If not provided, then the size is the same as the effective number of samples in the underlying dataset. seed : {None, int, `numpy.random.Generator`, `numpy.random.RandomState`}, optional If `seed` is None (or `np.random`), the `numpy.random.RandomState` singleton is used. If `seed` is an int, a new ``RandomState`` instance is used, seeded with `seed`. If `seed` is already a ``Generator`` or ``RandomState`` instance then that instance is used. Returns ------- resample : (self.d, `size`) ndarray The sampled dataset. N)r!)r!p) intneffrrrr r$r'r8choicer%r)r )r/r!seed random_statenormindicesmeanss r2resamplezgaussian_kde.resamples. <tyy>D)$/ 99 466)U #T__4:  %%dff44<<%H QZ(t|rAcNt|jd|jdzz S)zoCompute Scott's factor. Returns ------- s : float Scott's factor. rrsr$r/s r2 scotts_factorzgaussian_kde.scotts_factors!TYYTVVAX//rActt|j|jdzzdz d|jdzz S)z{Compute the Silverman factor. Returns ------- s : float The silverman factor. rDg@r|r}r~rs r2silverman_factorzgaussian_kde.silverman_factors3TYYs +C/dffQh@@rAzComputes the bandwidth factor `factor`. The default is `scotts_factor`. A subclass can overwrite this method to provide a different method, or set it through a call to `set_bandwidth`.cLndk(rj_nxdk(rj_natjr"t t sd_fd_n*tr_fd_n d}t|jy)aJCompute the bandwidth factor with given method. The new bandwidth calculated after a call to `set_bandwidth` is used for subsequent evaluations of the estimated density. Parameters ---------- bw_method : str, scalar or callable, optional The method used to calculate the bandwidth factor. This can be 'scott', 'silverman', a scalar constant or a callable. If a scalar, this will be used directly as `factor`. If a callable, it should take a `gaussian_kde` instance as only parameter and return a scalar. If None (default), nothing happens; the current `covariance_factor` method is kept. Notes ----- .. versionadded:: 0.11 Examples -------- >>> import numpy as np >>> import scipy.stats as stats >>> x1 = np.array([-7, -5, 1, 4, 5.]) >>> kde = stats.gaussian_kde(x1) >>> xs = np.linspace(-10, 10, num=50) >>> y1 = kde(xs) >>> kde.set_bandwidth(bw_method='silverman') >>> y2 = kde(xs) >>> kde.set_bandwidth(bw_method=kde.factor / 3.) >>> y3 = kde(xs) >>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> ax.plot(x1, np.full(x1.shape, 1 / (4. * x1.size)), 'bo', ... label='Data points (rescaled)') >>> ax.plot(xs, y1, label='Scott (default)') >>> ax.plot(xs, y2, label='Silverman') >>> ax.plot(xs, y3, label='Const (1/3 * Silverman)') >>> ax.legend() >>> plt.show() Nscott silvermanz use constantcSNrsr2z,gaussian_kde.set_bandwidth..9sYrAc&jSr) _bw_methodrsr2rz,gaussian_kde.set_bandwidth..<sT__T-BrAzC`bw_method` should be 'scott', 'silverman', a scalar or a callable.) rcovariance_factorrrIisscalar isinstancestrrcallabler"_compute_covariance)r/rr0s`` r2r-zgaussian_kde.set_bandwidthsX    ' !%)%7%7D " + %%)%:%:D " [[ #Jy#,F,DO%6D " i 'DO%BD "#CS/ !   "rAc v|j|_t|dsWtt |j dd|j |_tj|jd|_ |j|jdzz|_ |j|jzjtj|_dtj tj"|jtj$dt&zzj)z|_y) zcComputes the covariance matrix for each Gaussian kernel using covariance_factor(). _data_cho_covrFrowvarbiasaweightsT)lowerrCN)rfactorhasattrrrr r)_data_covariancercholeskyrr8r&rIfloat64r:logdiagrr rlog_detrs r2rz gaussian_kde._compute_covarianceDs,,. t_-$.s4<<498< 0F%GD !"(1F1F7;"=D //$++q.@**T[[8@@L  *,''!B$-)8!9::=#%@ rAc|j|_tt|jdd|j |_tj|j |jdzz S)NrFrrC) rrrrr r)rrinvrs r2inv_covzgaussian_kde.inv_covVs],,. *3t||A05 ,N!Ozz$//04;;>AArAc$|j|S)z Evaluate the estimated pdf on a provided set of points. Notes ----- This is an alias for `gaussian_kde.evaluate`. See the ``evaluate`` docstring for more details. )r@)r/xs r2pdfzgaussian_kde.pdfbs}}QrAct|}|j\}}||jk7rL|dk(r*||jk(rt||jdf}d}nd|d|j}t |t |j |\}}t||jj|jdddf|j|j|}|dddfS)zT Evaluate the log of the estimated pdf on a provided set of points. rr5r6Nr) rr#r$r r"r7r8rr r9r)r:) r/rr;r$r<r0r=r>r?s r2logpdfzgaussian_kde.logpdfnsA||1 ;Av!tvv+ $&&!5/s3004x9 o%.tG d-d3 LLNNDLLD1 HHdllL2ad|rAcXtj|}tj|jtjs d}t |t |j}|j}|||dkz||dk<t tj|t |k7r d}t ||dk||k\z}tj|rd||d|d}t ||j|}|j}t||j|S)a)Return a marginal KDE distribution Parameters ---------- dimensions : int or 1-d array_like The dimensions of the multivariate distribution corresponding with the marginal variables, that is, the indices of the dimensions that are being retained. The other dimensions are marginalized out. Returns ------- marginal_kde : gaussian_kde An object representing the marginal distribution. Notes ----- .. versionadded:: 1.10.0 zaElements of `dimensions` must be integers - the indices of the marginal variables being retained.rz,All elements of `dimensions` must be unique.z Dimensions z# are invalid for a distribution in z dimensions.)rr))rIr issubdtypedtypeintegerr"r+r copyuniqueanyr)rr) r/ dimensionsdimsr0r% original_dims i_invalidr r)s r2marginalzgaussian_kde.marginals*}}Z(}}TZZ4?CS/ !   T$(^+TAX ryy 3t9 ,ACS/ !AX$!), 66)  y!9 :;,,-3ls r2r7r7s>>*f5Lxx %..H1}   Q   X    . ;H:F rA) scipyrrscipy._lib._utilrrnumpyrrr r r r r rrrrrrrrrrI_statsrr _multivariater__all__rr7rrAr2rsN*":K.  a a HrA