`L i66(ddlZddlZddlZddlZddlmZddlmZddl m Z m Z ddl m Z ddlmZmZmZddlmZdZej*j-d gd d Zej*j-d gd d Zej*j-d ej2ej4ej6gej*j-dej2ej4ej6gdZej*j-d ej2ej4ej6gdZdZdZdZ dZ!dZ"ej*j-dgdgdgejFgdgdgejFgdgdge$ejFgddejJd gge$ejFgdde&d!d gge$ejFgd"gd#ge$ejFgd$dejJdgge$ejFgd$de&d!dgge$ggd%&d'Z'ej*j-d gd ej*j-d(d)d*gej*j-d+dd,gd-Z(ej*j-d(d)d*gej*j-d.d/d0gd1d0gd/d0gggd2gd3gd2gfd4dgd5dgd6d7gd5dgggd8gd9gd:gfgd;Z)d<Z*ej*j-d+gd=ej*j-d>gd=d?Z+ej*j-d@dAdBgej*j-dd1d/gejFdCdDggdEZ,ej*j-d@dAdBgdFZ-ej*j-dGdHd0gdId0ggdHdIgd0ggej\fejFd1d/gdJd/ggd1dJgd/ggej^fejFdKd gdLd gge$dKdLgd ggej\fejFdKd gdLd ggdKdLgd ggej`fejFd1d/gejJd/ggd1ejJgd/ggej6fejFdKejJgdejJgge$dKdgejJggej\fejFdKe&d!gde&d!gge$dKdge&d!ggej\fggdM&dNZ1ej*j-d gd ej*j-dOejFdd7gge$jdejFddPgge$jdgdQgej\fejFd1d/ggdRjdejFd1dSggdRjdgdTgejffejFdd7gge$jdejFddPgge$jdejFgdQgej\fejFddgge$jdejFdd7gge$jdgdUge$fejFdd7gge$jdejFdejJgge$jdgdVge$fejFddgge$jdejFdejJgge$jdgdWge$fggdX&dYZ4dZZ5ej*j-d[e e gd\Z6d]Z7d^Z8ej*j-d_d,d`dagfdbgdcfgdddedfgfggdg&dhZ9diZ:ej*j-dgdgdgejFgdjgdkgejFgdgdge$ggdl&dmZ;ej*j-dOejFdd7gge$jdejFddPgge$jdgdQgej\fejFd1d/ggdRjdejFd1dSggdRjdgdTgejffejFdd7gge$jdejFddPgge$jdejFgdQgej\fggdn&doZej*j-dre&e?gdsZ@dtZAduZBdvZCdwZDdxZEdyZFej*j-d+dbd,gdzZGej*j-d{ejJde&d!gd|ZHej*j-d+dHdJggd}gd~ZIej*j-dd*d)gddg&ej*j-d+d,gdgd,dg&dZJej*j-d[e e gdZKej*j-ddd/iddiddid/dddSddgej*j-ddgdggdZLej*j-d+dbd,d7ggdZMej*j-d+dgdPggdZNej*j-dddJiddiddiddiddidJdddSddgdZOej*j-d+d,d7ggdZPej*j-d+dgdPggdZQdZRej*j-ddJd1dddSigdZSdZTdZUdZVdZWdZXej*j-ddd1dgdZYej*j-dd/dJdgdZZej*j-dgdej*j-dgddZ[dZ\ej*j-d{ejJdgdZ]dZ^ej*j-d gd ej*j-dddgdZ_ej*j-d gd dZ`ej*j-d gd dZaej*j-d gd dZbdZcdZdej*j-dejJdgdZeej*j-dddgej*j-dejJdgdZfej*j-dOejFdejJgge$jdejFdd7gge$jdejFddPejJge$gej\fejFdejJgge$jdejFdd7gge$jdejFddPejJge$gej\fejFdejJggej6jdejFdCggej6jdejFddDejJggej6fggd&dZgej*j-d[e e gdZhej*j-dejFdejJdCggjdejFdejJdggjdejFdDggfejFgd¢gjdejFgdâgjdejFejJggfejFdejJd7gge$jdejFdejJdggjdejFdPgge$fejFgdŢge$jdejFgdƢgjdejFejJgge$fgdDŽZiej*j-dedɄZjdʄZkej*j-dddLggejFddLggdͬejFddLggdάgej*j-ddKdLggejFdKdLggdͬejFdKdLggdάgdЄZldфZmd҄ZndӄZoej*j-dd*d)gdՄZpej*j-dejFdgdgge$dgejJgejJggejdgdgdgge$fejFejJgdgdgge$dgejJgejJggejdgejJgejJgge$fgd؄ZrdلZsdڄZtdۄZud܄Zvd݄Zwej*j-dddJiddiddiddiddidJdddSddgdބZxd߄ZydZzdZ{dZ|ej*j-dddidd/igdZ}ej*j-ddd1iddigdZ~dZdZej*j-d[e e gdZy)N)sparse)NotFittedError) OneHotEncoderOrdinalEncoder) is_scalar_nan)_convert_containerassert_allcloseassert_array_equal)CSR_CONTAINERSctjgdgdg}t}td}|j|}|j|}|jdk(sJ|jdk(sJt j |sJt j |rJt|jgdgdgt|j|y)NrrrF sparse_outputr)?rrr)rrrrr) nparrayr fit_transformshaperissparser toarray)X enc_sparse enc_denseX_trans_sparse X_trans_denses o/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/preprocessing/tests/test_encoders.py!test_one_hot_encoder_sparse_denser$s )Y'(AJE2I--a0N++A.M   6 )) )   & (( ( ??> ** *}-- - #<>W"X~--/?handle_unknown)ignoreinfrequent_if_existwarnctjgdgdgdg}tjgdg}td}|j|t j t d5|j|dddt|}|j||j}t|j|jtjgd gt||y#1swYxYw) N)rrr)rrr)rrr)rrerrorr&Found unknown categoriesmatch)rrrrrrr) rrrfitpytestraises ValueError transformcopyr rr r&rX2oh X2_passeds r##test_one_hot_encoder_handle_unknownr;*s )Y 23A 9+ B g .BFF1I z)C D R n 5BFF1I I Y'') 567 B "s /DD ctjgdjd}tjddgjd}t|}|j ||j }t |j|jtjgdgdgt ||y)N)11111111223334444)r55555r>r-)rrrrrrrr) rrreshaperr1r6r r5rr7s r#+test_one_hot_encoder_handle_unknown_stringsrEBs 23;;GDA 7D/ " * *7 3B n 5BFF1I I Y'') &(<=> r9%r% output_dtype input_dtypectjddgg|j}tjddgddgg|}td|}t |j |j |t |j|j|j |td|d}t |j ||t |j|j||y)Nrrdtypeauto) categoriesrJF)rLrJr) rasarrayTrr rrr1r5)rGrFr X_expectedr9s r#test_one_hot_encoder_dtyperPUs QF8;/11AaVaV,LAJ & =Br''*224jArvvay**1-557D & E RBr''*J7rvvay**1-z:r%ctjd}|jddgddgd}tjgdgdg| }t | }t |j|j|t |j|j|j|t |d }t |j||t |j|j||y) NpandasabrrABrrrrrrrrrIF)rJr) r2 importorskip DataFramerrrr rrr1r5)rFpdX_dfrOr9s r#!test_one_hot_encoder_dtype_pandasr^ds   X &B <.name_combiners}tH~--r%)feature_name_combinerNoneNrIz x0_'None'x0_NonerSrza_'None'a_Nonecy)Nrrs r#wrong_combinerzItest_one_hot_encoder_custom_feature_name_combiner..wrong_combinersr%zMWhen `feature_name_combiner` is a callable, it should return a Python string.r/) rrrrrNr1rr r2r3 TypeError)rrrrrerr_msgs r#1test_one_hot_encoder_custom_feature_name_combinerrs. m r%rdefr7abcrr)rdrr)rrr)rTrVcat)rSrWrrI)rTrrrSrnan)Nrr)rSrr)NrN)mixednumericrz mixed-nanzmixed-float-nanz mixed-Nonezmixed-None-nanzmixed-None-float-nan)idsc\ttj|dddgf}t|ddgddggttj|ddddgf}t|gdgdgt dj |}t|j gdgdgy) Nrr)rrrrrrrrrKr)rrrrr)rrrrr)rrrr rrr)rXtrs r#test_one_hot_encoderrs0 #288A;q1#v#6 7CC1a&1a&)* "288A;q1a&y#9 :CC, 56 6 * 8 8 ;CCKKMO_#EFr%sparse_FTdropfirstc gdgdgdg}t||}|j|}tj|t}t |j ||ddgddgd dgg}t|d | }|j|}tj|}t |j |||gdgdgdg}t||d d gddggdg}|j|}tj|t}d|d<t |j ||ddgddgd dgg}t|ddgddgg|}|j|}tj|t}d|d<d|dddf<t |j ||tjgdgdg}tjd}tjt|5|j |dddy#1swYyxYw)Nrr)rrrrrrIrrrrrK)rrLrrr)6r8)rr&rL)rrrr)rrLr&rrrrrr)Shape of the passed X data is not correctr/) rrrrrr inverse_transformreescaper2r3r4)r&rrrrX_trexpmsgs r#test_one_hot_encoder_inversers 8A gD 9C   Q D ((1F #Cs,,T2C8 R1b'Ar7#A g&t LC   Q D ((1+Cs,,T2C8 |^^ <!)A =    #hhq'D 3006<Wq"g2w '!AR))    #hhq'D AqD 3006< 88Y * +D ))? @C z -$ d#$$$s )HH z X, X_transrrrrrrrrzr{r|rTrrrrr)rrrrr)rrrrrct|j|}d}|r t|d}tjt |5|j |dddy#1swYyxYw)zCheck that `inverse_transform` raise an error with unknown samples, no dropped feature, and `handle_unknow="error`. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/14934 rzqSamples \[(\d )*\d\] can not be inverted when drop=None and handle_unknown='error' because they contain all zerosrr/N)rr1rr2r3r4r)rX_transrrrs r#?test_one_hot_encoder_inverse_transform_raise_error_with_unknownrAsf& g . 2 21 5C A $Wh7 z -' g&''' A""A+ctjddgddgddggt}tdd }|j |}t |j ||y) Nr`rrbrrrI if_binaryFrr)rrrrrr r)rohers r#&test_one_hot_encoder_inverse_if_binaryrasV 61+!}xmC   Q Ds,,T2A6r%)rrN reset_dropctjddgddgddggt}t|d}|j ||j |}|j }|j| t|j||t|j ||t|j |y) Nr`rrbrrrIFrr) rrrrr1r5rrr rr )rrrrrrs r#test_one_hot_encoder_drop_resetrhs 61+!}xms((+335s; ??1  $ $ &/ 99 9 ==+112:: >> > 1a&A I; /C 1C z - !s #E>>FEncoderc6tjdtjdgg}||}tjddggtj}t j td5|j|dddy#1swYyxYw)zTest encoder for specified categories that nan is at the end. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/27088 rrrrIzNan should be the last elementr/N) rrrrrNr2r3r4r1rr rrs r#,test_encoder_nan_ending_specified_categoriesrst HHa^ $ %D T "C 1a&(**A z)I J  s 4BBctjddgddggtj}t gdgdg}tjgd gd g}t |j |j||jdjgdk(sJtj|jdjtjsJ|jd jgdk(sJtj|jd jtjsJy) NrSrTrrrIr)rrrr)rrrrrr)rrrrrrr) rrrrNrr rrrrrrJr rrrs r#7test_one_hot_encoder_specified_categories_mixed_columnsr$s 3*q!f%V466A OY#? @C ((24RS TCs((+335s; ??1  $ $ &/ 99 9 ==+112:: >> > ??1  $ $ &) 33 3 ==+112:: >> >r%ctjd}|jddgddgd}t|}t |gdgdgy) NrRrSrTrrrUrXrY)r2rZr[rr )r\r]rs r#test_one_hot_encoder_pandasr1sF   X &B <C   q !Fs}}&78FH% ###7Axx#sc3Z#s<=H!T+ [ >C   q !Fs}}&78FH%r%)rdrr)r#rr)rrrct}tjgdgdgd}t|j ||j dtd}t|j ||y)NrrrrrrrrIfloat64)rrrr rastypers r#test_ordinal_encoderr1ds^  C ((Iy) 9Cs((+SZZ -BC w 'Cs((+S1r%)rrzobject-string-catct|}tjdgdgg}t|j ||t |j dt |dk(sJ|jdjt |dk(sJ|jdj|k(sJt|}tjtd5|j|dddy#1swYyxYw)Nrrrrr.r/)rrrr rrrLrrrJr2r3r4r1)rr8r rrrs r#)test_ordinal_encoder_specified_categoriesr3us2 D )C ((SEC5> "Cs((+S1 q! "d47m 33 3 ??1  $ $ &$tAw- 77 7 ??1  # #y 00 0 D )C z)C D  s C88Dcgdgdg}t}|j|}tj|t}t |j ||tjgdgdg}tjd}tjt|5|j |dddy#1swYyxYw)NrrrI)rrrrrXrr/) rrrrrr rrrr2r3r4)rrrrrs r#test_ordinal_encoder_inverser5s (A  C   Q D ((1F #Cs,,T2C8 88\<0 1D ))? @C z -$ d#$$$s %CC ctdd}tjddgddgdd ggt }tjdd gd dgddggt }|j ||j |}tjd dgddgddggd }t |||j|}tjddgddgddggt }t ||y)Nuse_encoded_valuer& unknown_valuerSxrTyrrrIxyblarrrr)rrrrr1r5r r)rX_fitr X_trans_encr X_trans_invinv_exps r#+test_ordinal_encoder_handle_unknowns_stringrCs (;2 NC HHsCj3*sCj9 HEhhd eS\C:>fMGGGEN--(K ((QGb!Wq!f-W =C{C('' 4Khhd dC[3*=VLG{G,r%rJctdd}tjddgddgdd gg| }tjdd gd dgddgg| }|j||j |}tjddgddgd d ggd }t |||j |}tjddgddgddggt }t ||y)Nr7r9rrr rIrgrr)rrrr1r5r rr)rJrr?rr@rrArBs r#,test_ordinal_encoder_handle_unknowns_numericrJs (;4 PC HHq!fq!fq!f-U ;EhhB"a1a&1?GGGEN--(K ((QIay1a&1 AC{C('' 4KhhD D!9q!f5VDG{G,r%ctdtj}tjdgdgdgg}|j ||j dgdgdgg}t |dgdgtjggy)Nr7r9rrrr+r)rrrrr1r5r )rr?rs r#(test_ordinal_encoder_handle_unknowns_nanrLso (;266 RC HHqcA3_ %EGGENmmaS1#sO,Gw!qcBFF8 45r%ctdtjt}tjdgdgdgg}t j td5|j|dddy#1swYyxYw)Nr7)r&r:rJrrrz'dtype parameter should be a float dtyper/) rrrintrr2r3r4r1)rr?s r#8test_ordinal_encoder_handle_unknowns_nan_non_float_dtyperOsd *"&& C HHqcA3_ %E z)R S s A22A;ctjgdgtj}gd}t |}d}t j t|5|j|dddy#1swYyxYw)N)LowMediumHighrRrQrI)rQrRrSrz*Shape mismatch: if categories is an array,r/) rrrrNrr2r3r4r1)rr rrs r#+test_ordinal_encoder_raise_categories_shaperTs^ <=VLNNA $D D )C 6C z -  s A11A:c td}tjgdgdgd}tjddgd d ggd tjddgd d ggdtjd d gddggtjddgddggtjdd gd d ggdfD]}|j|t t dDcgc](}|j |j|jk(*c}sJt|j|j|ddgd d gg}|j|t t dDcgc]=}tj|j |jtj?c}sJt|j|j|dd gd d gg}|j|t t dDcgc]}|j |jdk( c}sJt|j|j|ycc}wcc}wcc}w)NrKr)rrrr)rrrrr/rIrrrr+rrSrTrrabcdr) rrrr1allrangerrJr r5rrinteger)rrris r#test_encoder_dtypesr^s  6 *C (((*>?y QC 1a&1a&!1 1a&1a&!3 3*sCj)* 4,t -. 1c(QH%X6  <  qJACOOA&,,7JKKK3==+335s; < Q!QAGGAJ USTXV cooa066 CV WW Ws}}Q'//137 SAs8AGGAJ eAhG"((H4G HH Hs}}Q'//137K W Hs-I &AI #I%ctjd}td}tjgdgdgd}|j dd gd d gd d gdd}|j |ttd Dcgc]}|j|jdk( c}sJt|j|j||j dd gddgddgd}gd}|j |ttd Dcgc]!}|j|j||k(#c}sJt|j|j|ycc}wcc}w)NrRrKr)rrrrrr)rrrrrrr/rIrrrr+rrrVrWCrrSrTrr)rrr/)r2rZrrrr[r1rZr[rrJr r5r)r\rrrr]expected_cat_types r#test_encoder_dtypes_pandasrcsG   X &B 6 *C (( ')GH C Aq6AaVddgddgddgg}t|ddddgddgg}|j|d dgg}tjddgg}d }t j t | 5|j|}d d d t|y #1swYxYw) z,Check handle_unknown='warn' works correctly.rSrrTrrFr)rrr&rLrqFound unknown categories in columns \[0\] during transform. These unknown categories will be encoded as all zerosr/N rr1rrr2warns UserWarningr5r )rrrX_testrOwarn_msgrs r#test_ohe_handle_unknown_warnrq%s qC8c1X&A  #JA'  C GGAJAhZFAq6(#J A  k 2(--'(GZ((( ,BB missing_valuecdddd|g}t|}gdgddddd|gg}|j|j}gdgd gd g}t|||j|usJt |j |jDcgc] \}}|| }}}|j|} tj|t } t|d rt|dd |dd t|d sJt|d sJt| dddd f| dddd ft| d dd f| d dd ft| d sJt| d sJyt||t| | ycc}}w)Nrrgrrr)rrgrrrS)rrgrrrS)rrrrr)rrrrrrrIrA)rArA) rrrr rrrr'rrrrr) rs cats_to_droprrtransrrr dropped_cats X_inv_transX_arrays r# test_one_hot_encoder_drop_manualrz?s2q"m4L \ *C Ar=) A   a ( ( *E O_ =Cuc" 88| ## #*-S__cmm)L%gG L''.Khhq'G\"%&<,l3B.?@\"-...\"-...71crc6?K3B3,?@ 72ss7+[SbS-ABWV_---[0111<67K0)s E;)rrrcrSct|}d}tjt|5|j gdgdgdgdddy#1swYyxYw)Nrz-`drop` should have length equal to the numberr/rr)rr;)rr2r3r4r1)rrrs r#test_invalid_drop_lengthr}dsK T "C=G z 1B @ABBBs AAdensityrdenserSrrTrct|}t||}gdgdg}|j||j|t|j|j|dk(rt|jdn=t ||j|jD]\}}}|t ||k(rJt|jtjsJ|jjtk(sJy)Nrr)rrrSrrr) rr1r rr'rrNrrndarrayrJr)r~rohe_baseohe_testrdrop_catdrop_idxcat_lists r#test_categoriesrls73H7>H  &A LLO LLOx++X-A-AB w8--q1,/ ($$h&:&:-  7 (HhCM*h6 66 7 h(("** 55 5    # #v -- -r%cZ|jjjsJy)N)__sklearn_tags__ input_tags categorical)rs r#"test_encoders_has_categorical_tagsrs" 9 % % ' 2 2 > >> >r%kwargsmax_categories min_frequency g(\?r)rrrgrLrKrSrTrrc.tjdgdzdgdzzdgdzzdgdzzgj}td|d d d |j |}t |j gd gdgdgdgdgd gg}tjddgddgddgddgddgg}|j|}t||dgdgdzzDcgc]}|g}}|j|} t || |j} t ddg| ycc}w)zpTest that different parameters for combine 'a', 'c', and 'd' into the infrequent category works as expected.rSrrTr#rrdrrr(F)rLr&rrSrrerrinfrequent_sklearnr+rx0_infrequent_sklearnNr rrrNrr1r infrequent_categories_r5r rr) rrLX_trainrror(rcol expected_invX_invrs r#test_ohe_infrequent_two_levelsrsIhh SEBJ.#;seaiGHIKKG  ,      c'l  s11O3DEecUSEC53% 0Fxx!Q!Q!Q!Q!Q@AHmmF#GHg&&)U.B-Ca-G%GHcSEHLH  ! !' *E|U+--/M 78-H Is Dctjdgdzdgdzzdgdzzdgdzzgj}td d d | j |}|j d |j d dk(sJtjdgdgg}|j|}td gdgg||j}tdg||j|}tdgdgg|y)z3Test two levels and dropping the frequent category.rSrrTr#rrdrrr(Frr&rrrrrrrN) rrrNrr1rr'r5r rr r)rrrrorr X_inverses r#,test_ohe_infrequent_two_levels_drop_frequentrshh SEBJ.#;seaiGHIKKG ,     c'l  ??1 cmmA. /3 66 6 XXusen %FmmF#GaS1#J(--/M/0-@%%g.I 456 Br%c(tjdgdzdgdzzdgdzzdgdzzgj}td d d | }d |dd}t j t |5|j|dddy#1swYyxYw)z_Test two levels and dropping any infrequent category removes the whole infrequent category.rSrrTr#rrdrrr(FrrUnable to drop category r( from feature 0 because it is infrequentr/NrrrNrr2r3r4r1rrrrs r#5test_ohe_infrequent_two_levels_drop_infrequent_errorsrs hh SEBJ.#;seaiGHIKKG ,   C %T!WK/W XC z -  -BBrHgQ?g{Gz?rGc tjdgdzdgdzzdgdzzdgdzzgj}tdd d d |j |}t |j ddggdgdgdgdgd gg}tjgd gdgdgdgdg}|j|}t||dgdgdgdgdgg}|j|}t |||j}t gd|y)zkTest that different parameters for combing 'a', and 'd' into the infrequent category works as expected.rSrrTr#rrdrrr(Fr&rrr.rrrr-r)rrrNrr) rrrror(rrrrs r# test_ohe_infrequent_three_levelsrs' hh SEBJ.#;seaiGHIKKG  ,E EK  c'ls11S#J<@ecUSEC53% 0FxxIy)YOPHmmF#GHg&      L  ! !' *E|U+--/M@-Pr%c$tjdgdzdgdzzdgdzzdgdzzgj}td d d| j |}tjdgdgdgg}t d d gd d gd d gg|j ||jdj |d}tjt|5|j dgdgg}dddt d d gd d ggy#1swYxYw)z5Test three levels and dropping the frequent category.rSrrTr#rrdrrr(Frrrr'r-r.r/rN) rrrNrr1r r5rr2rmrn)rrrrorrs r#.test_ohe_infrequent_three_levels_drop_frequentrshh SEBJ.#;seaiGHIKKG ,     c'l XXusecU+ ,FaVaVaV,cmmF.CDNN(N+//8 $C k -0--#/0aVaV$g.00s DDc(tjdgdzdgdzzdgdzzdgdzzgj}td d d| }d |d d}t j t |5|j|dddy#1swYyxYw)z7Test three levels and dropping the infrequent category.rSrrTr#rrdrrr(Frrrrr/Nrrs r#7test_ohe_infrequent_three_levels_drop_infrequent_errorsrshh SEBJ.#;seaiGHIKKG ,   C %T!WK/W XC z - rctjdgdzdgdzzdgdzzdgdzzgj}td d d j |}t |j ddggdgdgdgdgg}tjgd gd gdgd g}|j|}t||dgg}d}tjt|5|j|dddy#1swYyxYw)zmTest that different parameters for combining 'a', and 'd' into the infrequent category works as expected.rSrrTr#rrdrrr,F)r&rrr.rr-badz.Found unknown categories \['bad'\] in column 0r/N) rrrNrr1r rr5r r2r3r4)rrror(rrs r#(test_ohe_infrequent_handle_unknown_errorr'shh SEBJ.#;seaiGHIKKG eA  c'ls11S#J<@ecUSEC5 )FxxIy)DEHmmF#GHg&gYF ;C z - fs C44C=ctjdgdzdgdzzgtj}t dgdgddd |j |}dgd gd gd gdgg}tjd dgdd gdd gdd gdd gg}|j |}t||dddgg}dgd gg}|D]B}|j|j |tdgd gg|j |Dy)zG'a' is the only frequent category, all other categories are infrequent.rSrrrjrIrrrSrTFr(rLrr&rTrrrrrrrNr) rrrrNrr1r5r r)rrrror(rdropsrs r#5test_ohe_infrequent_two_levels_user_cats_one_frequentr?s" hh SEBJ./v>@@G  (),      c'l ecUSEC53% 0Fxx!Q!Q!Q!Q!Q@AHmmF#GHg&kC5 )EecU^F; D!%%g.!qc CMM&$9:;r%ctjdgdzdgdzzdgdzzdgdzzgt j}t gd gd d d j |}t |jgdgdgdgdgdgdgg}tjddgddgddgddgddgg}|j|}t||dgdgdzzDcgc]}|g}}|j|}t ||ycc}w)zFTest that the order of the categories provided by a user is respected.rSrrTr#rrdrrrIrFr(rrLrr&r)rrrSrrrrr+N rrrrNrr1r rr5r r)rrror(rrrrs r#(test_ohe_infrequent_two_levels_user_catsr[s+hh cURZ 3%"* ,uqy 89a  (),    c'l s11O3DEecUSEC53% 0Fxx!Q!Q!Q!Q!Q@AHmmF#GHg&'*U.B-Ca-G%GHcSEHLH  ! !' *E|U+Is C=ctjdgdzdgdzzdgdzzdgdzzgt j}t gd gd d d j |}t |jddggdgdgdgdgdgg}tjgdgdgdgdgdg}|j|}t||dgdgdgdgdgg}|j|}t ||y)zTest that the order of the categories provided by a user is respected. In this case 'c' is encoded as the first category and 'b' is encoded as the second one.rSrrTr#rrdrrrIrrrTrSFr(rrr-rr.rNr)rrror(rrrs r#*test_ohe_infrequent_three_levels_user_catsrvs hh cURZ 3%"* ,uqy 89a  (),    c'l s11S#J<@ecUSEC53% 0FxxIy)YOPHmmF#GHg&      L  ! !' *E|U+r%ctjgdgdf}tddd}|j|ddgddgg}|j |}t |gd gd gy ) zaTest infrequent categories where feature 0 has infrequent categories, and feature 1 does not. rrrrrrrrr rrrrrrrrrrrF)rrrrrrrrr)rrrrN)rc_rr1r5r )rrrors r#test_ohe_infrequent_mixedrsc )+FFGA q{% PCGGAJ!fq!f FmmF#GGlL9:r%c btjgdgdgdf}tddd}|j|j }t |j dd d gt |j d d d gt |j d d |j}t gd |gdgdgdgdgdgdgdgdgdg }t||gdgdg}|j|}gdgdg}t||j |j|}tjgdgdgt}t ||tdddj|}tjt d5|j|d d d gd gd!g}|j|}gd"gdg}t||j |j|}tjgd#gd$gt}t ||y #1swYxYw)%z?Test infrequent categories with feature matrix with 3 features.r) rrrrrrdrrr) rrrrrrrrrrKrr(rLrr&rrrrdN)x0_0x0_3rx1_0x1_5x1_infrequent_sklearnx2_0x2_1)rrrrrrrr)rrrrrrrr)rrrrrrrr)rrrrrrrr)rrrrrrrr)rrrrrrrr)rrrrrrrr)rrrrrrrr)rrr)r+rr)rrrrrrrr)rrrrrrrr)rrN)rrNrIr,r.r/)rrr)rrdr)rrrrrrrr)rrr)rrr)rrrrrr rrr r5rrrr1r2r3r4) rrrrr(ro X_test_transrrs r#'test_ohe_infrequent_multiple_categoriesrs #$# % A !s11!4d; --/M    !         HHg& #F==(L)*BCHHl2245  ! !, /E88 (*IJRXL|U+ !G  c!f z)C D f $F==(L(*BCHHl2245  ! !, /E88 8:VWL|U+!s H%%H.c tjd}|jgdgddddg}tdd d }|j |j }t |jd d dgt |jdgdgdgdgdgdgdgdgdgdgdg }t|||jddgddgdddg}gdgdg}|j|}t||j |j|}tjddgddggt}t |||jddgddgdddg}|j|j }gdgdg}t|||j|}tjddgddggt}t ||y)zHTest infrequent categories with a pandas dataframe with multiple dtypes.rR rSfrrrrSrrTrT rrrrdrdrgrrr)strrNrrNcolumnsrKrr(rrrSrTrrrrg)rrrrrr)rrrrrr)rrrrrr)rrrrrr)rrrrrrrrgrrIrrN)r2rZr[rrrr rr r5rrrr) r\rrrr(rorrrs r#.test_ohe_infrequent_multiple_categories_dtypesrs   X &B @1   A !PU\ WF"$67H==(LHl2245  ! !, /E88  4 5=Q7RSL|U+\\3*b!W=u~\ VF==(002L"$67HHl+  ! !, /E88 # $';Q&?@L|U+r%ri)rrctjdgdzdgdzzdgdzzdgdzzgj}tdd d d |}|j ||j dgg}t |d ggy ),All user provided categories are infrequent.rSrrTr#rrdrrr(FrrNr)rrrNrr1r5r rrrrs r#$test_ohe_infrequent_one_level_errorsrHshh SEBJ.#;seaiGHIKKG  ,E EK CGGGmmcUG$GGqcU#r%ctjdgdzgtj}t d gdgddd|j |}|j dgdgg}t|d gd ggy ) rrrrIrFr(rrSrNr)rrrrNrr1r5r rs r#5test_ohe_infrequent_user_cats_unknown_training_errorsrVshh {&133G  (),      c'l mmcUSEN+GGqcA3Z(r%zinput_dtype, category_dtype)OOOUUOUUSOSUSS array_type)rr dataframectjdgdgg|}tjddg|g}t|dj|}t dgdgdgdgg||}|j |}tjddgddgddgddgg}t ||t|j|} | j |}tjdgdgdgdgg}t||y ) a"Check that encoding work with object, unicode, and byte string dtypes. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/15616 https://github.com/scikit-learn/scikit-learn/issues/15726 https://github.com/scikit-learn/scikit-learn/issues/19677 rTrSrIFrrrrN) rrrr1rr5r rr ) rGcategory_dtyperrrLrrorr(oes r#test_encoders_string_categoriesrgs 3%#{3A((C:^<=J :U C G G JC  use$j FmmF#Gxx!Q!Q!Q!Q89HGX& : . 2 21 5Bll6"Gxx!qcA3,-Hw)r%c4tjdgdggd}tjddgdg}t|d}tjd}t j t| 5|j|d d d y #1swYy xYw) zCheck that this mixture of predefined categories and X raises an error. Categories defined as bytes can not easily be compared to data that is a string. rTrSUrISFrzjIn column 0, the predefined categories have type 'bytes' which is incompatible with values of type 'str_'.r/N) rrrrrr2r3r4r1)rrLrrs r#$test_mixed_string_bytes_categoricalsrs 3%#s+A((C:S12J :U CC )) ' C z -  s 3BBctjdd|d|ggtj}t ddj |}|j }t|ddd |gy) NrSrTrIFr'rr&x0_arx0_)rrrrNrr1rr )rsrrnamess r#)test_ohe_missing_values_get_feature_namesrse 3]C?@OQQA eH E I I! LC  % % 'Euvv]O/DEFr%c (tjd}|jgdtjdddtj gt ddd g }tjgd gd gd gdg}t|}t||y)NrR)dogrNrrrr+rI)col1col2rrr)rrrrrrr)rrrrrrr)rrrrrrr)rrrrrrr) r2rZr[rrrfloatrr )r\dfexpected_df_transrs r#%test_ohe_missing_value_support_pandasrs   X &B /HHaArvv.e<   B ! ! ! !   #2 &CC*+r% pd_nan_typepd.NAznp.nanc tjd}|dk(r |jntj}|j d|j dd|ddgdi}tjgd gd gd gd gd g}td |}|j|}t||t|jdk(sJt|jdddgdtj|jddsJy)NrRrrrrSrTrrI)rrrr)rrrr)rrrrrFrrrrAr)r2rZNArrr[rrrrr lenrr isnan)rr&r\pd_missing_valuerrrdf_transs r#1test_ohe_missing_value_support_pandas_categoricalr s   X &B +w 6ruuBFF  BIIsC)93DJIW  B         eN KC  $H%x0 s 1 $$ $sq)#2.@ 88COOA&r* ++ +r%cddgddgddgg}tdd|}|j|}tjgd gd gd g}t ||d d gg}tjgd g}d}t j t|5|j|}dddt |||j|}t|tjddggty#1swYOxYw)zZCheck drop='first' and handle_unknown='ignore'/'infrequent_if_exist' during transform.rSrrTrrrFrrr&rr)rrrrrtFound unknown categories in columns \[0, 1\] during transform. These unknown categories will be encoded as all zerosr/NrI rrrrr r2rmrnr5rr rr&rrrrOrorprs r#/test_ohe_drop_first_handle_unknown_ignore_warnsrs qC8c1X&A  E. C"G    JGZ(AhZF9+&J  k 2(--'(GZ(  ! !* -Eubhhaz@A (( C//C8cddgddgddgg}tdd|}|j|}tjgd gd gd g}t ||d d gg}tjgdg}d}t j t|5|j|}dddt |||j|}t|tjddggty#1swYOxYw)zDCheck drop='if_binary' and handle_unknown='ignore' during transform.rSrrTrrrFr rrrXrr)rrrrr r/NrIr rs r#3test_ohe_drop_if_binary_handle_unknown_ignore_warnsrs qC8c1X&A  n C"G    JGZ(AhZF<.)J  k 2(--'(GZ(  ! !* -Eubhhd }FCD ((rc>ddgddgddgg}tdd|ddgddgg}|j|d dgg}tjddgg}d }t j t | 5|j|}d d d t|y #1swYxYw) znCheck drop='first' and handle_unknown='ignore'/'infrequent_if_exist' during fit with categories passed in.rSrrTrrrFrjrrkr/Nrl)r&rrrorOrprs r#'test_ohe_drop_first_explicit_categoriesr&s qC8c1X&A  %#JA'  C GGAJAhZFAq6(#J A  k 2(--'(GZ(((rrctjd}|jgdgddddg}td }|j d d }tj t | 5|j|d d d |j|tj t | 5|j|d d d y #1swYPxYw#1swYy xYw)zJRaise informative error message when pandas output and sparse_output=True.rRr)rrTrT)rSrTrSrTrTrr5zxPandas output does not support sparse data. Set sparse_output=False to output pandas dataframes or disable Pandas outputr/N) r2rZr[r set_outputr3r4rr1r5)r\rrrs r#'test_ohe_more_informative_error_messagerAs   X &B IO 88BNN1%b) ** *||BHHuse.C-DsecUST$$X.I ??f $$ $y!Q'#s4yQ'#s4 88IdO $$ $r%r!)zobject-None-missing-valuezobject-nan-missing_valueznumeric-missing-valuecvt|}tjdgtjgg}t |j |||j dj|k(sJt|}tjtd5|j|dddy#1swYyxYw)z.Test ordinal encoder for specified categories.rrrr.r/N) rrrrr rrrJr2r3r4r1)rr8r rrrs r#=test_ordinal_encoder_specified_categories_missing_passthroughr$sL 4 (B ((SEBFF8$ %Cr''*C0 >>!  " "i // / 4 (B z)C D r s B//B8c$tjgdtg}||}tjddggtj}t j t d5|j|dddy#1swYyxYw) zTest encoder for specified categories have duplicate values. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/27088 )rSrTrSrIrrSrTz5the predefined categories contain duplicate elements.r/N)rrrrNr2r3r4r1rs r#+test_encoder_duplicate_specified_categoriesr&sq HH_F 3 4D T "C 3*V,..A Q   s +BBzX, expected_X_trans, X_testrr)rrr)rr!rr)rrSrT)r!rrctdd}|j|}t||t|j|dggy)z>Test the interaction between missing values and handle_unknownr7rAr9gN)rrr r5)rexpected_X_transrorrs r#/test_ordinal_encoder_handle_missing_and_unknownr)sC8 ':" MBq!GG-.BLL(D6(3r% csr_containerctjgdgdg}||}t}d}tjt |5|j |dddtjt |5|j|ddd|j|}||}tjt |5|j|dddy#1swYxYw#1swYdxYw#1swYyxYw)zCheck that we raise proper error with sparse input in OrdinalEncoder. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/19878 r rz2Sparse data was passed, but dense data is requiredr/N) rrrr2r3rr1rr)r*rX_sparseencoderrrr!s r#test_ordinal_encoder_sparser.s )Y'(AQHGBG y 0 H y 0(h'(##A&G"7+N y 02!!.122(( 22s$ C)C5D)C25C>D cBtjgdddtjf}tgdgdd}|j |tgdgd}t j td 5|j |dddy#1swYyxYw) zCheck OrdinalEncoder.fit works with unseen category when `handle_unknown="use_encoded_value"`. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/19872 )rrrrrrN)rArrr7rE)rLr&r:r,rr.r/)rrnewaxisrr1r2r3r4)rrs r#-test_ordinal_encoder_fit_with_unseen_categoryr1s #$Q ]3A <0CSW BFF1I J< HB z)C D q s :BBrAAOrroctdd}|j||j|}t|ddggy)zChecks that `OrdinalEncoder` transforms string dtypes. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/19872 r7ir9rN)rr1r5r )rrorrs r#1test_ordinal_encoder_handle_unknown_string_dtypesr5&s;* (;2 NCGGGmmF#GGr1gY'r%c8tjgdjdd}tj |}t |j tj|dj|j|}t |dgdgdgdggy) zCheck that `OrdinalEncoder` accepts Python integers that are potentially larger than 64 bits. Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/20721 )l HP 1&l H]viel :?i}GalIRK2e6krArr)axisrrN) rrrDrr1r rsortrNr5)rr-rs r##test_ordinal_encoder_python_integerr9Bs   gb!n""1%Gw**BGGAA,>,@,@A"Gw!qcA3 45r%ctjd}gd}|jgdg|}tj |}|j }t ||y)z-Check feature names out is same as the input.rR)rTrrSrrN)r2rZr[rr1rr )r\rrrfeature_names_outs r#.test_ordinal_encoder_features_names_out_pandasr<VsX   X &B E i[% 0A    q !C113u/0r%c&tjdgdgtjggt}t dtjdj |}|j |}t|dgdgdggtjd gtjggt}|j |}t|tjgdgg|j|}|ddJtj|ddsJy ) zECheck interactions between encode_unknown and missing value encoding.rSrTrIr7r&r:rrrrN) rrrrrr1r5r rr)rrrror X_roundtrips r#0test_ordinal_encoder_unknown_missing_interactionrAbs 3%#)8A *ff   c!f  ll1oGGqcA3-.XXurvvh'v 6F<<'LLBFF8bT"23&&|4K q>!  $$ $ 88KN1% && &r% with_pandascttjddgddgdtjggt}d}|r0t j d}|j |d d g }|d z}n|d z}td}t jt|5|j|dddy#1swYyxYw)zXCheck OrdinalEncoder errors when encoded_missing_value is used by an known category.rSrrTrrrIzTencoded_missing_value \(1\) is already used to encode a known category in features: rRletterpetrz \['pet'\]z\[1\]rrr/N) rrrrr2rZr[rr3r4r1)rBr error_msgr\rs r#0test_ordinal_encoder_encoded_missing_value_errorrGs 3,e sBFFm>>RJ))$/I''-HH%%'3z7798;K;KL  s 'DD#ctjd}|jddgddgd}tj d}tj d}|j |}|j |}t |j|t|j|jy ) z+Check OrdinalEncoder works with set_output.rRrSrTrrrUrQrN) r2rZr[rrrr rRr rr)r\r] ord_default ord_pandasrUrVs r#test_ordinal_set_outputr[s   X &B <)B C6Ry"23r%c ztjd}|jgd}|jgdgd|j dgdzdgdzzd gzd gz| d gd  }t dj |}t|jdddgt|jdgdt|jdd d g|jgdgd|j dgd gzd gzdgz| d gd  }gdgdgdgdg}|j|}t||y)zHTest infrequent categories with a pandas DataFrame with multiple dtypes.rR)birdrrrgrrrr+rrrgryrI)rrNrrrurrSrTrrr)rSrTrr)rgrrdr)rrr)rrr)rrrrN) r2rZCategoricalDtyper[rrr1r rr5r )r\categorical_dtyperrnrorors r#:test_ordinal_encoder_infrequent_multiple_categories_dtypesr|s[   X &B++,KL @199! ugk)WI5@'% .  AA.2215Gw55a83*Ew55a8*Ew55a867:KL \\'!997)#vh.%8'% . F IyAN'GG^,r%ctjdgdzdgdzzdgdzzdgdzztjgzgt j}t d d d d j |}t|jgd gtjdgdgdgdgdgtjggt }dgdgdgdgd gdgg}|j|}t||y)zJCheck behavior of unknown_value and encoded_missing_value with infrequent.rSrrTr#rrdrrrIr7r)r&r:rrrrrrN) rrrrrNrr1r rr5r )rrnrorors r#.test_ordinal_encoder_infrequent_custom_mappingr~shh cURZ 3%"* ,uqy 8BFF8 CDFa *   c'l  w557HI XXusecUSEC5266(C6 RFcA3aS1#s3N'GG^,r%cdtjdgdzdgdzzdgdzzdgdzzgt j}t di|d d d j |}t d d j |}dgdgdgdgd gg}t |j||j|y)zMAll categories are considered frequent have same encoding as default encoder.rSrrTr#rrdrrrIr7rAr9rNrrrrrNrr1r r5)rradjusted_encoderdefault_encoderros r#!test_ordinal_encoder_all_frequentrshh cURZ 3%"* ,uqy 89a & !4B c'l%*" c'lecUSEC53% 0F""6*O,E,Ef,Mr%dc"tjdgdzdgdzzdgdzzdgdzzgt j}t di|d d d j |}dgdgdgdgd gg}t |j|dgdgdgdgd ggy)zAWhen all categories are infrequent, they are all encoded as zero.rSrrTr#rrdrrrIr7rAr9rrNrr)rrr-ros r##test_ordinal_encoder_all_infrequentrshh cURZ 3%"* ,uqy 89a  !4B c'l ecUSEC53% 0FG%%f-aS1#sRD/IJr%ctjtjgdzdgdzzdgdzzdgzdgzgtj}t d j |}tjdddtjggtj}|j|}t|d gd gd gtjggy)z5Check behavior when missing value appears frequently.r#rrdrrrgdeerrIrrurrrN rrrrrNrr1r5r rrnrors r#-test_ordinal_encoder_missing_appears_frequentr s  &&B%2 %! 3wi ?6( JK aA.2215G XXrvv67v F H HF'GGqcA3bffX67r%c tjtjgdgdzzdgdzzdgzdgzdgdzd gdzzgt j}t d j |}tjddgdd gtjd gdd gddggt }|j|}t|d dgd dgtjdgddgddggy)z7Check behavior when missing value appears infrequently.rrdrrrgrredrHgreenrIr+)rrrrNrrs r#/test_ordinal_encoder_missing_appears_infrequentr s  VVHw| #ugk 1WI = H GaK7)a- '    a 1-11!4G XX e  W  VVW  G  EN   F'GGq!fq!frvvqkAq6Aq6JKr%ctjdgdgdggt}|gdg}tjt 5|j |dddy#1swYyxYw)a!Check that we raise a `NotFittedError` by calling transform before fit with the encoders. One could expect that the passing the `categories` argument to the encoder would make it stateless. However, `fit` is making a couple of check, such as the position of `np.nan`. rVrWrarIr`rN)rrrr2r3rr5)rrr-s r#test_encoder_not_fittedr3 s] 3%#&f5A/!23G ~ &!r)rrenumpyrr2scipyrsklearn.exceptionsrsklearn.preprocessingrrsklearn.utils._missingrsklearn.utils._testingrr r sklearn.utils.fixesr r$mark parametrizer;rErfloat32r/rPr^rrrrrrrrrrrrrrrrr r\str_rrNrr r rrrr r+r1r3r5rCrNrJrLrOrTr^rcrhrqrzr}rrrrrrrrrrrrrrrrrrrrrr rrrrrrr"r$r&r)r.r1r5r9r<rArGrMrOrWr[rerhrlrqrsrwr|r~rrrrrrr%r#rs  -?0 /@.)+TU#V#.)+TU&V&$"((BJJ )KL2::rzz(JK ;LM ;"((BJJ )KL AM A92xJ$4 9  (+z*+#%67vF/C#78G/Cuu#=>fM"O4FC/C#67vF/Cut#<=VL  .G/.G)+TUUDM2$1,$23V,$^UDM2 b'Ar7QG $y)Y&GHS\E3<'3% F o ?  ' 3'*7!=>'CD CE? CE?#;<1vxrxxc ';<=>=E?#;<= "+r{ #uenrd%;RZZH Aq6Aq6" #q!fqc]BJJ? BHHsElS%L1 @3Z% ! JJ C<#u. /3*ug1FP Aq6BFFA;' (Arvv;*?C@C.3%#01" 1 ! ! $ $q1q1 Q Q<'C5!12/3/.3%#0 1 0 !a8?A:NO;;2,6!,H;$X,v>,BbA$N#OP $Q $a1$M#NO )P ) !#M'EF*G*6*2664.9G:G,.)+TU((;<,=V,<)+TU"BV"BJ)+TU!EV!EH)+TU)V)4( 02662,?"@""((;<02662,?%@=%>3-7993*V4663RVV,F;<  3-7993*V4663RVV,F;<  3- ;==3% 3553RVV,-.  %4 9!DE!D$]N$CD E ! BHHsBFFC() * , , BHHsBFFC() * , , BHHseW  BHHo& ' ) ) BHHo& ' ) ) BHHrvvhZ  BHHsBFFC() 8 : : BHHsBFFC() * , , BHHseWF + BHHo&f 5 7 7 BHHo& ' ) ) BHHrvvhZv . !24324.92:2,"  4+c*4+c* s 3*S)3*S) ( (6( 1'<u 672: BHHsecU^6 2S266(RVVH % BJJvv.f =   BHHrvvhu-V <S266(RVVH % BJJx"&&2& A  &''&'BM. M /" $!!H 1 ! ! $ $q1q1 4 46!4H4*--`-* 1 !( 1 # K K 8L8]N$CD E r%