L iradZddlmZddlmZddlZddlZ ddl m Z dZ dd lmZdd lmZGd d ZGd dZGddZGddZGddZGddZdZdZy#e $r ddl m Z dZ Y^wxYw)aVStatistical transformations for visualization. This module is currently private, but is being written to eventually form part of the public API. The classes should behave roughly in the style of scikit-learn. - All data-independent parameters should be passed to the class constructor. - Each class should implement a default transformation that is exposed through __call__. These are currently written for vector arguments, but I think consuming a whole `plot_data` DataFrame and return it with transformed variables would make more sense. - Some class have data-dependent preprocessing that should be cached and used multiple times (think defining histogram bins off all data and then counting observations within each bin multiple times per data subsets). These currently have unique names, but it would be good to have a common name. Not quite `fit`, but something similar. - Alternatively, the transform interface could take some information about grouping variables and do a groupby internally. - Some classes should define alternate transforms that might make the most sense with a different function. For example, KDE usually evaluates the distribution on a regular grid, but it would be useful for it to transform at the actual datapoints. Then again, this could be controlled by a parameter at the time of class instantiation. )Number) NormalDistN) gaussian_kdeFT) bootstrap)_check_argumentc`eZdZdZddddddddZd Zd Zd Zdd Zdd Z ddZ ddZ ddZ y)KDEz2Univariate and bivariate kernel density estimator.NrF) bw_method bw_adjustgridsizecutclip cumulativec|d}||_||_||_||_||_||_|rt r tdd|_y)a'Initialize the estimator with its parameters. Parameters ---------- bw_method : string, scalar, or callable, optional Method for determining the smoothing bandwidth to use; passed to :class:`scipy.stats.gaussian_kde`. bw_adjust : number, optional Factor that multiplicatively scales the value chosen using ``bw_method``. Increasing will make the curve smoother. See Notes. gridsize : int, optional Number of points on each dimension of the evaluation grid. cut : number, optional Factor, multiplied by the smoothing bandwidth, that determines how far the evaluation grid extends past the extreme datapoints. When set to 0, truncate the curve at the data limits. clip : pair of numbers or None, or a pair of such pairs Do not evaluate the density outside of these limits. cumulative : bool, optional If True, estimate a cumulative distribution function. Requires scipy. NNNz(Cumulative KDE evaluation requires scipy) r rrrrr _no_scipy RuntimeErrorsupport)selfr rrrrrs Y/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/seaborn/_statistics.py__init__z KDE.__init__,sP> <D""   $ )IJ J c|dtj n|d}|dtjn|d}t|j||zz |}t|j||zz|} tj|| |S)zGiiR'* ??"LE5hh EJJ78Geiik)B"5) D2&u-DEAr$'$5$5b2r($CGAqDMD D{{G,HC399; 45==ciiHGrcP||j||S|j|||S)z1Fit and evaluate on univariate or bivariate data.rIrZrr5r6r.s r__call__z KDE.__call__/ :((W5 5''B8 8rNNTNr) __name__ __module__ __qualname____doc__rr(r1r9r<r*rIrZr^rrr r *sG< ,\7$   $ 09rr cFeZdZdZ d dZdZd dZdZdZd dZ y) Histogramz-Univariate and bivariate histogram estimator.Ncgd}td||||_||_||_||_||_||_d|_y)auInitialize the estimator with its parameters. Parameters ---------- stat : str Aggregate statistic to compute in each bin. - `count`: show the number of observations in each bin - `frequency`: show the number of observations divided by the bin width - `probability` or `proportion`: normalize such that bar heights sum to 1 - `percent`: normalize such that bar heights sum to 100 - `density`: normalize such that the total area of the histogram equals 1 bins : str, number, vector, or a pair of such values Generic bin parameter that can be the name of a reference rule, the number of bins, or the breaks of the bins. Passed to :func:`numpy.histogram_bin_edges`. binwidth : number or pair of numbers Width of each bin, overrides ``bins`` but can be used with ``binrange``. binrange : pair of numbers or a pair of pairs Lowest and highest value for bin edges; can be used either with ``bins`` or ``binwidth``. Defaults to data extremes. discrete : bool or pair of bools If True, set ``binwidth`` and ``binrange`` such that bin edges cover integer values in the dataset. cumulative : bool If True, return the cumulative statistic. )count frequencyrH probability proportionpercentstatN)rrobinsbinwidthbinrangediscreterbin_kws)rrorprqrrrsr stat_choicess rrzHistogram.__init__sIN   d3        $ rc|!|j|j}}n|\}}|rtj|dz |dz} | S|f|} tj||| z| } | j|kst | dkr'tj | | j| z} | Stj ||||} | S)z6Inner function that takes bin parameters as arguments.?g?)r rrarangelenappendhistogram_bin_edges) rr"r.rprqrrrsstartstop bin_edgessteps r_define_bin_edgeszHistogram._define_bin_edgess  %%'15574E"KE4  %"*dSj9I !D %d;I}}%Y!);IIi41GH ..47Irc ||j|||j|j|j|j}t |jt tfr=t|dz }|j|jf}t||}nt|}ng}t||gD]\} } |j} | rt | t tfrn,t | | t r| | } nt| dk(r| | } |j} | nt | ts| | } |j} | nt | dts| | } |j}t |ts|| }|j|j| || | | |tt|}|r||_|S)z=Given data, return numpy.histogram parameters to define bins.r)rprange)rprxr)rrprqrrrs isinstancestrrrzr rdictrMboolr{tuplert)rr5r6r.r;rn_bins bin_rangertrTr"rprqrrrss rdefine_bin_paramszHistogram.define_bin_paramss :..GTYY t}}dmmI$))c6]3Y!+%MMOY]]_< F)<I.I!2r(+! 1 yyz$f >Q-7DY!^7D==##Hf5'{H==##HQK8'{H==!(D1'{H  !7!7wh("?! Fi 01G "DLrc@|j}||j||d}|jdk(}tj||fi|||d^}}tj tj |dtj |d}|jdk(s|jdk(r'|jt|jz }n`|jd k(r*|jt|jz d z}n'|jd k(r|jt|z }|jrY|jd vr)||zjd jd }||fS|jd jd }||fS)z.Inner function for histogram of two variables.FrCrHr.rHrrrlrmrndrkrHrk)axis) rtrror histogram2douterdiffastypefloatsumrcumsum) rr5r6r.rtrHhistrareas rrZzHistogram._eval_bivariatePsy,, ?,,R5,AG))y(>>   '. yxx GGIaL ! GGIaL !  99 %l)B;;u% 2D YY) #;;u% 2S8D YY+ %;;u%,D ??yy44t +++3:::BY{{{*11q19Yrc|j}||j||d}|jdk(}tj|fi|||d\}}|jdk(s|jdk(r'|j t |jz }ns|jdk(r*|j t |jz dz}n:|jd k(r+|j t tj|z }|jrH|jd vr*|tj|zj}||fS|j}||fS) z-Inner function for histogram of one variable.F)r.r;rHrrlrmrnrrkr) rtrror histogramrrrrrr)rr"r.rtrHrrs rrIzHistogram._eval_univariateps3,, ?,,Qu,MG))y(,,   ")7 i 99 %l)B;;u% 2D YY) #;;u% 2S8D YY+ %;;u% (::D ??yy44rwwy1199;Y{{}YrcP||j||S|j|||S)z3Count the occurrences in each bin, maybe normalize.r\r]s rr^zHistogram.__call__r_r)rjautoNNFFr`r) rbrcrdrerrrrZrIr^rfrrrhrhs:7 3j*:x@49rrhc,eZdZdZddZdZdZddZy) ECDFz7Univariate empirical cumulative distribution estimator.c>tdgd|||_||_y)aInitialize the class with its parameters Parameters ---------- stat : {{"proportion", "percent", "count"}} Distribution statistic to compute. complementary : bool If True, use the complementary CDF (1 - CDF) ro)rjrnrmN)rro complementary)rrors rrz ECDF.__init__s   BDI *rctd)z)Inner function for ECDF of two variables.z!Bivariate ECDF is not implemented)NotImplementedErrorr]s rrZzECDF._eval_bivariates!"EFFrcx|j}||}||}|j}|jdvr||jz }|jdk(r|dz}tj tj |f}tj d|f}|jr|j|z }||fS)z(Inner function for ECDF of one variable.)rnrmrnrr)argsortrrorrr_rr)rr"r.sorterys rrIzECDF._eval_univariates fI&/ NN  991 1AEEG A 99 !CA EE266'1*  EE!Q$K   ! A!t rNctj|}|tj|}ntj|}||j||S|j |||S)zGReturn proportion or count of observations below each sorted datapoint.)rasarray ones_likerIrZr]s rr^z ECDF.__call__s] ZZ^ ?ll2&Gjj)G :((W5 5''B8 8r)rmFr)rbrcrdrerrZrIr^rfrrrrsA +G( 9rrceZdZddZdZy)EstimateAggregatorNc X||_t|\}}||_||_||_y)a Data aggregator that produces an estimate and error bar interval. Parameters ---------- estimator : callable or string Function (or method name) that maps a vector to a scalar. errorbar : string, (string, number) tuple, or callable Name of errorbar method (either "ci", "pi", "se", or "sd"), or a tuple with a method name and a level parameter, or a function that maps from a vector to a (min, max) interval, or None to hide errorbar. See the :doc:`errorbar tutorial ` for more information. boot_kws Additional keywords are passed to bootstrap when error_method is "ci". N) estimator_validate_errorbar_arg error_method error_levelboot_kwsrrerrorbarrmethodlevels rrzEstimateAggregator.__init__s0"#.x8 "   rc||}t|jr|j|}n|j|j}|jtj x}}n@t |dkrtj x}}nt|jr|j|\}}n|jdk(r(|j|jz}||z ||z}}n|jdk(r(|j|jz}||z ||z}}n|jdk(rt||j\}}n]|jdk(rN|jdd}t|f||jd|j} t| |j\}}tj|||d |d iS) GAggregate over `var` column of `data` with estimate and error interval.Nrsdsepiciunits)rfuncr r)callableraggrrnanrzstdrsem_percentile_intervalgetrrpdSeries) rdatavarvalsestimateerr_minerr_max half_intervalrbootss rr^zEstimateAggregator.__call__sCy DNN #~~d+Hxx/H    $ " &Gg Y!^ " &Ggd'' (#006 GW  $ & HHJ)9)99M'-7M9QWG   $ & HHJ)9)99M'-7M9QWG  $ &3D$:J:JK GW   $ &HHWd+EdV%dnnV VE3E4;K;KL GWyy#xC5g#c{GTUUrrarbrcrdrr^rfrrrrs !2$VrrceZdZddZdZy)WeightedAggregatorNc |dk7rtd|d||_t|\}}||dk7rtd|d||_||_||_y)a: Data aggregator that produces a weighted estimate and error bar interval. Parameters ---------- estimator : string Function (or method name) that maps a vector to a scalar. Currently supports only "mean". errorbar : string or (string, number) tuple Name of errorbar method or a tuple with a method name and a level parameter. Currently the only supported method is "ci". boot_kws Additional keywords are passed to bootstrap when error_method is "ci". meanz'Weighted estimator must be 'mean', not .Nrz#Error bar method must be 'ci', not ) ValueErrorrrrrrrs rrzWeightedAggregator.__init__ ss  FymSTUV V".x8   &D.B6*ANO O"   rcP||}|d}tj||}|jdk(rDt|dkDr6d}t ||fd|i|j }t ||j\}} ntjx}} tj|||d||d| iS) rweightr.rrc0tj||S)Nr)raverage)r"ws r error_funcz/WeightedAggregator.__call__..error_func7szz!Q//rrr r) rrrrzrrrrrrr) rrrrr.rrrrrs rr^zWeightedAggregator.__call__.sCyx.::dG4    $TQ 0dGN*N NE3E4;K;KL GW!# &Ggyy#xC5g#c{GTUUrrarrfrrrr s!BVrrceZdZdZdZdZy) LetterValuescgd}t|trtd||n$t|tsd|d|d}t |||_||_||_y)a[ Compute percentiles of a distribution using various tail stopping rules. Parameters ---------- k_depth: "tukey", "proportion", "trustworthy", or "full" Stopping rule for choosing tail percentiled to show: - tukey: Show a similar number of outliers as in a conventional boxplot. - proportion: Show approximately `outlier_prop` outliers. - trust_alpha: Use `trust_alpha` level for most extreme tail percentile. outlier_prop: float Parameter for `k_depth="proportion"` setting the expected outlier rate. trust_alpha: float Parameter for `k_depth="trustworthy"` setting the confidence threshold. Notes ----- Based on the proposal in this paper: https://vita.had.co.nz/papers/letter-value-plot.pdf )tukeyrm trustworthyfullk_depthzDThe `k_depth` parameter must be either an integer or string (one of z), not rN)rrrint TypeErrorr outlier_prop trust_alpha)rrrr k_optionserrs rrzLetterValues.__init__Esi0C gs # Iy' :GS)$+WWKq: C.  (&rc|jdk(r#ttj|dz}n|jdk(r"ttj|dz }n|jdk(rNttj|ttj||jzz dz}n|jdk(rjtj t j}d|d|jdz z dzz}ttj||z dz}nt|j}t|dS)Nrrrr rmrrx) rrrlog2r vectorizerinv_cdfrr)rnknormal_quantile_func point_confs r _compute_kzLetterValues._compute_kks <<6 !BGGAJ!#A \\W $BGGAJ!#A \\\ )BGGAJ#bgga$2C2C.C&D"EEIA \\] *#%<< 0D0D#E 1!d6F6F6J2JKqPPJBGGA N+,q0ADLL!A1ayrcP|jt|}tj|dzddtjd|dzf}|dztj|d|dddgz }dtjd|dzdd|dzz gz}|j dk(r d|d<d|d<tj ||}tj|||jk||jkDz}tj |d }||||||d S) zEvaluate the letter values.rrxrNrrwr2)rlevelspercsvaluesfliersmedian) rrzrry concatenater percentilerr r) rr"rexpr percentilesrrrs rr^zLetterValues.__call__s OOCF #iiAq"%ryyAE'::QQQ(<==BNNC3q6M1sc!f};L+MNN <<6 !KN!KOq+.Aq6::</A 4DEFGq"%    rN)rbrcrdrrr^rfrrrrCs$'L* rrcLd|z dz }|d|z f}tj||S)z8Return a percentile interval from data of a given width.rrx)r nanpercentile)rwidthedgers rrrs1 %K1 Dd "K  D+ ..rcNddddd}d}|yt|r|dfSt|tr|}|j|d}n |\}}tdt|||t|ts t |||fS#tt f$r}|j ||d}~wwxYw)zCCheck type and value of errorbar argument and assign default level._r)rrrrz@`errorbar` must be a callable, string, or (string, number) tupleNrr) rrrrrr __class__rlistr)argDEFAULT_LEVELSusagerrrs rrrs N OE { #Dy C ""640 0MFEJ^ 4f= E6!: 5=I& 0--&C / 0sA>>B$ BB$)renumbersr statisticsrnumpyrpandasr scipy.statsrr ImportError external.kde algorithmsrutilsrr rhrrrrrrrfrrrs4!(I ""Y9Y9|G9G9T4949n?V?VD5V5VpQ Q h/y*IsA// A?>A?