`L i6dZddlZddlZddlZddlmZddlmZmZddl m Z m Z ddl Z ddl ZddlmZmZddlmZdd lmZmZmZd d lmZd d lmZmZmZmZed ddZedddZ ejBe"Z#eehddge$ejJdgdgdgdgdgdgdgeed ddgeedddgd ddddddddddd d d!Z& d$d"Z'd#Z(y)%zKDDCUP 99 dataset. A classic dataset for anomaly detection. The dataset page is available from UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99-mld/kddcup.data.gz N)GzipFile)IntegralReal)existsjoin)Bunchcheck_random_state)shuffle)Interval StrOptionsvalidate_params) get_data_home)RemoteFileMetadata_convert_data_dataframe _fetch_remote load_descr kddcup99_dataz.https://ndownloader.figshare.com/files/5976045@3b6c942aa0356c0ca35b7b595a26c89d343652c9db428893e7494f837b274292)filenameurlchecksumkddcup99_10_dataz.https://ndownloader.figshare.com/files/5976042@8045aca0d84e70e622d1148d7df782496f6333bf6eb979a1b0837c42a9fd9561>SASFhttpsmtpboolean random_stateleft)closedgneither) subset data_homer r! percent10download_if_missing return_X_yas_frame n_retriesdelayT)prefer_skip_nested_validationF?c Nt|}t||||| } | j} | j} | j} | j }|dk(r| dk(}t j|}| |ddf}| |}| |ddf}| |}|jd}t|}|jd|d}||}||}t j||f} t j||f} |dk(s |d k(s|d k(r| ddd fd k(}t j| |dd f| |d dff} | dd | d dz} | |} t j| dddfdzjtd| dddf<t j| dddfdzjtd| dddf<t j| dddfdzjtd| dddf<|d k(rO| dddfdk(}| |} | |} t j| dddf| dddf| dddff} | d| d| dg} |d k(rO| dddfdk(}| |} | |} t j| dddf| dddf| dddff} | d| d| dg} |dk(rEt j| dddf| dddf| dddf| dddff} | d| d| d| dg} |rt!| | |\} } t#d}d}|rt%d| | | |\}} } |r| | fSt'| | ||| |S)a Load the kddcup99 dataset (classification). Download it if necessary. ================= ==================================== Classes 23 Samples total 4898431 Dimensionality 41 Features discrete (int) or continuous (float) ================= ==================================== Read more in the :ref:`User Guide `. .. versionadded:: 0.18 Parameters ---------- subset : {'SA', 'SF', 'http', 'smtp'}, default=None To return the corresponding classical subsets of kddcup 99. If None, return the entire kddcup 99 dataset. data_home : str or path-like, default=None Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in '~/scikit_learn_data' subfolders. .. versionadded:: 0.19 shuffle : bool, default=False Whether to shuffle dataset. random_state : int, RandomState instance or None, default=None Determines random number generation for dataset shuffling and for selection of abnormal samples if `subset='SA'`. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. percent10 : bool, default=True Whether to load only 10 percent of the data. download_if_missing : bool, default=True If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site. return_X_y : bool, default=False If True, returns ``(data, target)`` instead of a Bunch object. See below for more information about the `data` and `target` object. .. versionadded:: 0.20 as_frame : bool, default=False If `True`, returns a pandas Dataframe for the ``data`` and ``target`` objects in the `Bunch` returned object; `Bunch` return object will also have a ``frame`` member. .. versionadded:: 0.24 n_retries : int, default=3 Number of retries when HTTP errors are encountered. .. versionadded:: 1.5 delay : float, default=1.0 Number of seconds between retries. .. versionadded:: 1.5 Returns ------- data : :class:`~sklearn.utils.Bunch` Dictionary-like object, with the following attributes. data : {ndarray, dataframe} of shape (494021, 41) The data matrix to learn. If `as_frame=True`, `data` will be a pandas DataFrame. target : {ndarray, series} of shape (494021,) The regression target for each sample. If `as_frame=True`, `target` will be a pandas Series. frame : dataframe of shape (494021, 42) Only present when `as_frame=True`. Contains `data` and `target`. DESCR : str The full description of the dataset. feature_names : list The names of the dataset columns target_names: list The names of the target columns (data, target) : tuple if ``return_X_y`` is True A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples. .. versionadded:: 0.20 r&)r&r'r(r+r,rsnormal.Nri1 rrr r g?F)copyrshttpssmtp)r!z kddcup99.rstfetch_kddcup99)datatargetframe target_names feature_namesDESCR)r_fetch_brute_kddcup99r8r9r<r;np logical_notshaper randintr_c_logastypefloatshuffle_methodrrr )r%r&r r!r'r(r)r*r+r,kddcup99r8r9r<r;stnormal_samplesnormal_targetsabnormal_samplesabnormal_targetsn_samples_abnormalrfdescrr:s `/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/datasets/_kddcup99.pyr7r76st 2I$/ H ==D __F**M((L ~ j  NN1 ad1:!!9-33A6),7  $6 =+A.+A.uu^%556~'778 ~6)Vv-= BK1 uuT!SbS&\423</0%cr*]23-?? VVT!Q$Z#-55e%5HIQT VVT!Q$Z#-55e%5HIQT VVT!Q$Z#-55e%5HIQT V QT g%A7DAYF55adT!Q$Zad;55adT!Q$ZadT!Q$ZGHDa a a a M%dFN f  'F E5 dFM< tVV|  !#  ct|}d}|rt|d|z}t}nt|d|z}t}t|d}t|d} t |} gdt fdd d d t fd t fd t fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fdt fd t fd!t fd"t fd#t fd$t fd%t fd&t fd't fd(t fd)t fd*t fd+t fd,t fd-t fd.t fd/t fd0} | D cgc]} | d1 } } | d2}| d3d2}| r- tj|}tj| }n|rt|tjd6|jzt||||7t!j"| }tj%d8t||j&}t)|d9:}g}|j+D]B}|j-}|j/|j1d;d<j3d=D|j5tj%d>t7j8|t!j:|t<?}t?d@D]$}|d3d3|fjA|||d3d3|f<&|d3d3d3d2f}|d3d3d2f}tjB||d1AtjB|| d1An tdBtE||||gCScc} w#t$r}td4|d5|d3}~wwxYw)Da5Load the kddcup99 dataset, downloading it if necessary. Parameters ---------- data_home : str, default=None Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in '~/scikit_learn_data' subfolders. download_if_missing : bool, default=True If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site. percent10 : bool, default=True Whether to load only 10 percent of the data. n_retries : int, default=3 Number of retries when HTTP errors are encountered. delay : float, default=1.0 Number of seconds between retries. Returns ------- dataset : :class:`~sklearn.utils.Bunch` Dictionary-like object, with the following attributes. data : ndarray of shape (494021, 41) Each row corresponds to the 41 features in the dataset. target : ndarray of shape (494021,) Each value corresponds to one of the 21 attack types or to the label 'normal.'. feature_names : list The names of the dataset columns target_names: list The names of the target columns DESCR : str Description of the kddcup99 dataset. r1z-py3 kddcup99_10rIsamplestargetsduration) protocol_typeS4)serviceS11)flagS6 src_bytes dst_byteslandwrong_fragmenturgenthotnum_failed_logins logged_innum_compromised root_shell su_attemptednum_rootnum_file_creations num_shellsnum_access_filesnum_outbound_cmds is_host_loginis_guest_logincount srv_count serror_ratesrv_serror_rate rerror_ratesrv_rerror_rate same_srv_rate diff_srv_ratesrv_diff_host_ratedst_host_countdst_host_srv_countdst_host_same_srv_ratedst_host_diff_srv_ratedst_host_same_src_port_ratedst_host_srv_diff_host_ratedst_host_serror_ratedst_host_srv_serror_ratedst_host_rerror_ratedst_host_srv_rerror_rate)labelsS16rNz7The cache for fetch_kddcup99 is invalid, please delete z! and run the fetch_kddcup99 againzDownloading %s)dirnamer+r,zextracting archiverQ)rmode ,zextraction done)dtype*)compressz1Data not found and `download_if_missing` is False)r8r9r<r;)#rrARCHIVE_10_PERCENTARCHIVErintrGjoblibload ExceptionOSError_mkdirploggerinforrr?rdebugrr readlinesdecodeappendreplacesplitcloseosremoveasarrayobjectrangerFdumpr )r&r(r'r+r, dir_suffix kddcup_dirarchive samples_path targets_path availabledtc column_namesr;r<XyeDT archive_pathfile_Xylinejs rSr>r> sV 2IJ)]Z%?@ $)Z*%<=  I.L I.L|$I+  S+ +  +   +  c +  c +   +  3+  3+   +  c"+  c+  C +  s+  +  S!+ " s##+ $ s%+ & S!'+ ( c")+ * #++ , 3-+ . #/+ 0 c1+ 2 3+ 4 E"5+ 6 7+ 8 E"9+ : % ;+ < % =+ > u%?+ @ 3A+ B s#C+ D "5)E+ F "5)G+ H '.I+ J '.K+ L 'M+ N $U+O+ P 'Q+ R $U+S+ T U+ BZ#%%QAaD%L%#L "%M  L)A L)A   $w{{23gzYeT XXb\ )*J(8(89 ,S9 OO% 9D;;=D IIdll4,2237 8 9   &' , ZZ& )r .A!Q$xr!u-Bq!tH . q#2#vJ q"uI  A|a0 A|a0IJJ  #"^  [& I,?A  s N.8*N33 O<O  Oc tj|y#t$r(}|jtjk7rYd}~yd}~wwxYw)zgEnsure directory d exists (like mkdir -p on Unix) No guarantee that the directory is writable. N)rmakedirsrerrnoEEXIST)drs rSrrs: A  77ell "  #s A AA )NTTr.r/))__doc__rloggingrgziprnumbersrros.pathrrrnumpyr?utilsr r r rHutils._param_validationr r rrr_baserrrrrr getLogger__name__rstrPathLiker7r>rrTrSrs2  " --KK  8 O ( 8 O   8 $:;TB2;;-;'([ ){ kKxD@A4d9=> #'"    BBLRUXvrT