L i!ddlmZddlmZddlmZmZddlZddlm Z ddl Z ddl m Z  ddl mZdZdd lmZdd lmZdd lmZeGd deZy#e$r ddlmZd ZY2wxYw)) annotations) dataclass)AnyCallableN)ndarray) DataFrame) gaussian_kdeFT)GroupBy)Scale)StatceZdZUdZdZded<dZded<dZd ed <dZd ed <d Z d ed<dZ ded<dZ ded<dZ ddZ ddZddZ ddZ d dZ d!dZy)"KDEaB Compute a univariate kernel density estimate. Parameters ---------- bw_adjust : float Factor that multiplicatively scales the value chosen using `bw_method`. Increasing will make the curve smoother. See Notes. bw_method : string, scalar, or callable Method for determining the smoothing bandwidth to use. Passed directly to :class:`scipy.stats.gaussian_kde`; see there for options. common_norm : bool or list of variables If `True`, normalize so that the areas of all curves sums to 1. If `False`, normalize each curve independently. If a list, defines variable(s) to group by and normalize within. common_grid : bool or list of variables If `True`, all curves will share the same evaluation grid. If `False`, each evaluation grid is independent. If a list, defines variable(s) to group by and share a grid within. gridsize : int or None Number of points in the evaluation grid. If None, the density is evaluated at the original datapoints. cut : float Factor, multiplied by the kernel bandwidth, that determines how far the evaluation grid extends past the extreme datapoints. When set to 0, the curve is truncated at the data limits. cumulative : bool If True, estimate a cumulative distribution function. Requires scipy. Notes ----- The *bandwidth*, or standard deviation of the smoothing kernel, is an important parameter. Much like histogram bin width, using the wrong bandwidth can produce a distorted representation. Over-smoothing can erase true features, while under-smoothing can create false ones. The default uses a rule-of-thumb that works best for distributions that are roughly bell-shaped. It is a good idea to check the default by varying `bw_adjust`. Because the smoothing is performed with a Gaussian kernel, the estimated density curve can extend to values that may not make sense. For example, the curve may be drawn over negative values when data that are naturally positive. The `cut` parameter can be used to control the evaluation range, but datasets that have many observations close to a natural boundary may be better served by a different method. Similar distortions may arise when a dataset is naturally discrete or "spiky" (containing many repeated observations of the same value). KDEs will always produce a smooth curve, which could be misleading. The units on the density axis are a common source of confusion. While kernel density estimation produces a probability distribution, the height of the curve at each point gives a density, not a probability. A probability can be obtained only by integrating the density across a range. The curve is normalized so that the integral over all possible values is 1, meaning that the scale of the density axis depends on the data values. If scipy is installed, its cython-accelerated implementation will be used. Examples -------- .. include:: ../docstrings/objects.KDE.rst float bw_adjustscottz-str | float | Callable[[gaussian_kde], float] bw_methodTzbool | list[str] common_norm common_gridz int | NonegridsizecutFbool cumulativec@|jrtr tdyy)Nz(Cumulative KDE evaluation requires scipy)r _no_scipy RuntimeError)selfs \/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/seaborn/_stats/density.py __post_init__zKDE.__post_init__^s ??yIJ J )?ct||}t|tsKt|trt d|Ds)|j j d|}t|d|j||dy)z'Do input checks on grouping parameters.c3<K|]}t|tyw)N) isinstancestr).0vs r z1KDE._check_var_list_or_boolean..hs/Rq 1c0B/Rs.z& must be a boolean or list of strings.r) stacklevelN) getattrr%rlistall __class____name__ TypeError_check_grouping_vars)rparam grouping_varsvalue param_names r _check_var_list_or_booleanzKDE._check_var_list_or_booleancsre$ ud #5$'C/RE/R,R NN334AeW=Jzl*PQR R !!%1!Er"cd|ji}d|vr|d|d<t||fi|}|j|j|jz|S)zFit and return a KDE object.rweightweights)rr set_bandwidthfactorr)rdataorientfit_kwskdes r _fitzKDE._fitns[$/"? t !%hGI 4<373 #**t~~56 r"c|j||jS|j||}tj|j j }||j||jzz }||j||jzz}tj|||jS)z2Define the grid that the KDE will be evaluated on.) rto_numpyrAnpsqrt covariancesqueezeminrmaxlinspace)rr=r>r@bwgridmingridmaxs r _get_supportzKDE._get_supportzs == <((* *iif% WWS^^++- .v,""$rDHH}4v,""$rDHH}4{{7GT]];;r"c tj|ddgt}t|dkr|S |j ||}|jr9|d}t j|Dcgc]}|j||c}}n||}|dj} tj||d| d|iS#t j j$r|cYSwxYwcc}w)zITransform single group by fitting a KDE and evaluating on a support grid.r9densitycolumnsdtyper) pdrrlenrArDlinalg LinAlgErrorrarrayintegrate_box_1dsum) rr=r>supportemptyr@s_0s_irPr9s r _fit_and_evaluatezKDE._fit_and_evaluates fh %B%P t9q=L ))D&)C ??!*Chh'R3 4 4S# >RSG'lGh##%||VWh 7STTyy$$ L  SsB=&C"=CCctjg|jdt}t |dkr|S |j ||}|Dcgc]}||jdkDs|}}|s|j|||St|}|j||j||S#t jj$r|cYSwxYwcc}w)z9Transform multiple groups by fitting KDEs and evaluating.rPrQrTr) rUrrRrrVrNrDrWrXnuniquer`r apply)rr=r>r4r]r\xgroupbys r _transformzKDE._transforms %?t||%?Y%?uM t9q=L ''f5G%2KqT!W__5F5JK K))$@ @-(}}T4#9#967KKyy$$ L LsB1C-C1CCcd|vr|jd}|j|dg}|Dcgc]}||jvst|}}|r|jdur|j |||}nh|jdur|}n0|j d||jDcgc] }||vs| }}t|j||j ||}|r|jdur$|j|dj}n|jdur|} n0|j d ||jDcgc] }||vs| } }|j|j| djjd | }|d xx|jd zcc<ddd|} |d || <|jdd gdScc}wcc}wcc}w)Nr9r)r9)subsetTFr) group_weightrri)onrPzweight / group_weightyrd)rdrk)axis)assigndropnaorderr&rrfr7r rcrr[joinrerenameevaldrop) rr=rer>scalesr(r4res grid_vars norm_varsr5s r __call__z KDE.__call__s 4 ;;a;(D{{68"4{5*.DAgmm1CQD D 0 0D 8//$ >C5() // }M(,(8(8O1A*?C5() // }M(,(8(8O1A>~NC I#((#:;;$V,^E xx>2x;;CEPPs#G G' G1G G  G N)r3r&r4rreturnNone)r=rr>r&ryr )r=rr>r&ryr)r=rr>r&r\rryr)r=rr>r&r4z list[str]ryr) r=rrer r>r&rtzdict[str, Scale]ryr)r0 __module__ __qualname____doc__r__annotations__rrrrrrr!r7rArNr`rfrxr"r rrs>~Iu?FI<F$(K!($(K!(HjCNJK F  <UU'*U5<U U*LL'*L;DL L$*<*<(/*<9<*rsi"! (I *&$ @<$@< @<1IsA A%$A%