`L i.ddlmZddlmZddlmZddlZddlm Z m Z m Z m Z m Z ddlmZddd d Zdd ZGd d eZdZGddeZdZdZdddZddZGddeZdZy))Counter)suppress) NamedTupleN)_isin _searchsorteddevice get_namespacexpx) is_scalar_nanFreturn_inverse return_countsc`|jtk(rt|||St|||S)aHelper function to find unique values with support for python objects. Uses pure python method for object dtype, and numpy method for all other dtypes. Parameters ---------- values : ndarray Values to check for unknowns. return_inverse : bool, default=False If True, also return the indices of the unique values. return_counts : bool, default=False If True, also return the number of times each unique item appears in values. Returns ------- unique : ndarray The sorted unique values. unique_inverse : ndarray The indices to reconstruct the original array from the unique array. Only provided if `return_inverse` is True. unique_counts : ndarray The number of times each of the unique values comes up in the original array. Only provided if `return_counts` is True. r )dtypeobject_unique_python _unique_np)valuesrrs [/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/utils/_encode.py_uniquers:>||v >   ~] ct|\}}d\}}|r|r|j|\}}}}n?|r|j|\}}n(|r|j|\}}n|j |}|j rYt |drKt||j|}|d|dz}|r||||kD<|r|j||d||<|d|dz}|f} |r| |fz } |r| |fz } t| dk(r| dS| S)zHelper function to find unique values for numpy arrays that correctly accounts for nans. See `_unique` documentation for details.)NNxpNrr) r unique_allunique_inverse unique_counts unique_valuessizer rnansumlen) rrrr_inversecountsuniquesnan_idxrets rrr=s( & !EB OGV-&(mmF&;#GV ,,V4 **62""6*|| gbk2B7-GaK( )0GGg% &  ffVGH%56F7OMgk*F *C z yX]3q6++rc,eZdZUdZeed<eed<dZy) MissingValuesz'Data class for missing data informationr"nonecg}|jr|jd|jr|jtj|S)z3Convert tuple to a list where None is always first.N)r-appendr"np)selfoutputs rto_listzMissingValues.to_listjs6 99 MM$  88 MM"&& ! rN)__name__ __module__ __qualname____doc__bool__annotations__r3rrr,r,ds1 I Jrr,c|Dchc]}| t|s|}}|s|tddfSd|vr*t|dk(rtdd}ntdd}n tdd}||z }||fScc}w)a.Extract missing values from `values`. Parameters ---------- values: set Set of values to extract missing from. Returns ------- output: set Set with missing values extracted. missing_values: MissingValues Object with missing value information. NF)r"r-rT)r r,r$)rvaluemissing_values_setoutput_missing_valuesr2s r_extract_missingr?ts""U]mE6J }U;;; !! ! "a '$1e$$G !%2d$F ! -$U C( (F ( (('s A3A3c(eZdZdZfdZdZxZS)_nandictz!Dictionary with support for nans.c|t|||jD]\}}t|s||_yyN)super__init__itemsr nan_value)r1mappingkeyr< __class__s rrEz_nandict.__init__s; !!--/ JCS!!& rc^t|drt|r |jSt|)NrG)hasattrr rGKeyErrorr1rIs r __missing__z_nandict.__missing__' 4 %-*<>> !smr)r4r5r6r7rErO __classcell__rJs@rrArAs+rrAct||\}}tt|Dcic]\}}|| c}}}|j|Dcgc]}|| c}t |Scc}}wcc}w)z,Map values based on its position in uniques.)r )r rA enumerateasarrayr )rr(rr%ivaltablevs r_map_to_integerrZsb &' *EB 9W+=>Cc1f> ?E ::0AuQx0: HH?0s A%  A+c t|}t|\}}t|}|j|j t j ||j}|f}|r|t||fz }|r|t||fz }t|dk(r|dS|S#t$r1tdtd|DD}td|wxYw)Nrc34K|]}|jywrC)r6).0ts r z!_unique_python..sL!q~~Lsc32K|]}t|ywrC)type)r^rYs rr`z!_unique_python..s2Kq472KszPEncoders require their input argument must be uniformly strings or numbers. Got rr) setr?sortedextendr3r0arrayr TypeErrorrZ _get_countsr$)rrr uniques_setmissing_valuesr(typesr*s rrrs  &k &6{&C# ^%~--/0((7&,,7 *C 022 FG,..X]3q6++  Ls2KF2K/KLL '',g /   s A$B"":CT) check_unknownct||\}}|j|jds t||S|rt ||}|rt d|t|||S#t$r}t d|d}~wwxYw)aHelper function to encode values into [0, n_uniques - 1]. Uses pure python method for object dtype, and numpy method for all other dtypes. The numpy method has the limitation that the `uniques` need to be sorted. Importantly, this is not checked but assumed to already be the case. The calling method needs to ensure this for all non-object values. Parameters ---------- values : ndarray Values to encode. uniques : ndarray The unique values in `values`. If the dtype is not object, then `uniques` needs to be sorted. check_unknown : bool, default=True If True, check for values in `values` that are not in `unique` and raise an error. This is ignored for object dtype, and treated as True in this case. This parameter is useful for _BaseEncoder._transform() to avoid calling _check_unknown() twice. Returns ------- encoded : ndarray Encoded values numericz%y contains previously unseen labels: Nr)r isdtyperrZrM ValueError_check_unknownr)rr(rlrr%ediffs r_encoderts: &' *EB ::fllI . J"673 3 !&'2D #H!OPPWf44 JDQCHI I Js A%% B.A<<BcRt||\}}d}|j|jdst|}t |\}}t|t \|z }|j xr j } |j xr j } fd} |rR|s| s| r&|j|D cgc] } | |  c} }n&|jt||j}t|}| r|jd| r|jtj n|j|} tj | |d|}|r@|j"rt%|||}n&|jt||j}|j'|j)|rL|j)|}|j'|r*|j"r|r|j)|}d||<||}t|}|r||fS|Scc} w)a Helper function to check for unknowns in values to be encoded. Uses pure python method for object dtype, and numpy method for all other dtypes. Parameters ---------- values : array Values to check for unknowns. known_values : array Known values. Must be unique. return_mask : bool, default=False If True, return a mask of the same shape as `values` indicating the valid values. Returns ------- diff : list The unique values present in `values` and not in `know_values`. valid_mask : boolean array Additionally returned if ``return_mask=True``. Nrncj|vxs-jxr|duxsjxr t|SrC)r-r"r )r<missing_in_uniquesris ris_validz _check_unknown..is_validsA$E&++= E&**C}U/C rr\T) assume_uniquerr)r rorrcr?r"r-rfonesr$r8listr/r0r r setdiff1dr!ranyisnan)r known_values return_maskrr% valid_mask values_setmissing_in_valuesrs nan_in_diff none_in_diffrxr<r diff_is_nanis_nanrwris @@rrqrqs2 &, /EBJ ::fllI .[ (8(D% %,' *:;*G' 'K''++J4F4J4J0J (--M6H6M6M2M   {lXXF&K5x&KL WWS[W@ Dz  KK   KK ((0 }}]LQST yy"6<< WWS[W@  66"((<( )((4.Kvvk"99XXf-F)*Jv&[L)DzZ KC'Ls;H$c.eZdZdZfdZdZdZxZS) _NaNCounterz$Counter with support for nan values.cBt||j|yrC)rDrE_generate_items)r1rFrJs rrEz_NaNCounter.__init__Ms --e45rc#K|D]:}t|s|t|dsd|_|xjdz c_<yw)z>Generate items without nans. Stores the nan counts separately. nan_countrrN)r rLr)r1rFitems rrz_NaNCounter._generate_itemsPsD D & 4-!" NNa N  sAAc^t|drt|r |jSt|)Nr)rLr rrMrNs rrOz_NaNCounter.__missing__ZrPr)r4r5r6r7rErrOrQrRs@rrrJs.6 rrcp|jjdvrnt|}tjt |tj }t|D]%\}}tt5||||<ddd'|St|d\}}tj||d}tj|drtj|drd|d<tj|||} tj|tj }|| ||<|S#1swYxYw)zGet the count of each of the `uniques` in `values`. The counts will use the order passed in by `uniques`. For non-object dtypes, `uniques` is assumed to be sorted and `np.nan` is at the end. OUr\NT)r)ryr)rkindrr0zerosr$int64rTrrMrisinr~ searchsorted zeros_like) rr(counterr2rVrr r'uniques_in_valuesunique_valid_indicess rrhrh`s ||D f%#g,bhh7 ) *GAt(# *#DMq  * * * &vTBM6dK xx b!"rxx '< $"??='BS:TU ]]7"(( 3F &'; rs $',5&R$,N J  #)L t  I,4/3(5VQh',r