`L i$ dZddlZddlmZddlmZmZddlmZm Z m Z ddl m Z ddl Z ddlZddlmZdd lmZmZd d lmZd d lmZmZmZed ddZedddZdZej<eZ dejBfdZ"dZ#dZ$ee%edgdgeed ddgeedddgdd ddd!d"dd#Z&y)$a ============================= Species distribution dataset ============================= This dataset represents the geographic distribution of species. The dataset is provided by Phillips et. al. (2006). The two species are: - `"Bradypus variegatus" `_ , the Brown-throated Sloth. - `"Microryzomys minutus" `_ , also known as the Forest Small Rice Rat, a rodent that lives in Peru, Colombia, Ecuador, Peru, and Venezuela. References ---------- `"Maximum entropy modeling of species geographic distributions" `_ S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling, 190:231-259, 2006. N)BytesIO)IntegralReal)PathLikemakedirsremove)exists)Bunch)Intervalvalidate_params) get_data_home)RemoteFileMetadata _fetch_remote _pkl_filepathz samples.zipz.https://ndownloader.figshare.com/files/5976075@abb07ad284ac50d9e6d20f1c4211e0fd3c098f7f85955e89d321ee8efe37ac28)filenameurlchecksumz coverages.zipz.https://ndownloader.figshare.com/files/5976078@4d862674d72e79d6cee77e63b98651ec7926043ba7d39dcb31329cf3f6073807zspecies_coverage.pkzct|Dcgc]}|j}}d}t|Dcgc] }|| c}}tj||}t |d}|dk7rd||<|Scc}wcc}w)zjLoad a coverage file from an open file object. This will return a numpy array of the given dtype c`|jdt|jdfS)Nrr)splitfloat)ts m/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/datasets/_species_distributions.pyz _load_coverage..Is$AGGIaL% ! *=>dtypes NODATA_valuei)rangereadlinedictnploadtxtint) F header_lengthr"_header make_tuplelineMnodatas r_load_coverager1Cs} %*-$8 9qajjl 9F 9>J 7:d#7 8F 1E"A ( )F & H:7s A:A?c|jjdjjd}t j |ddd}||j _|S)zLoad csv file. Parameters ---------- F : file object CSV file open in byte mode. Returns ------- rec : np.ndarray record array representing the data ascii,rz S22,f4,f4)skiprows delimiterr")r$decodestriprr&r'r"names)r)r9recs r _load_csvr;SsR JJL   ( . . 0 6 6s ;E **Qc ECCIIO Jr cb|j|jz}||j|jzz}|j|jz}||j|jzz}t j |||j}t j |||j}||fS)a%Construct the map grid from the batch object Parameters ---------- batch : Batch object The object returned by :func:`fetch_species_distributions` Returns ------- (xgrid, ygrid) : 1-D arrays The grid corresponding to the values in batch.coverages )x_left_lower_corner grid_sizeNxy_left_lower_cornerNyr&arange)batchxminxmaxyminymaxxgridygrids rconstruct_gridsrJgs  $ $u 6D 588eoo- .D  $ $u 6D 588eoo- .D IIdD%// 2E IIdD%// 2E 5>r booleanleft)closedgneither) data_homedownload_if_missing n_retriesdelayT)prefer_skip_nested_validationg?ct|}t|s t|tddddd}tj }t |t}t|s|s tdtjdtjd |tt||| }t j|5}|jD]/} t!|| } d | vr t#| } d | vs%t#| } 1 d d d t%|tjdt&jd |tt&||| } t j| 5}g}|jD]N} t!|| } tj)dj+| |j-t/| Pt j0||}d d d t%| t3d  d|}t5j6||d|St5j|}|S#1swY0xYw#1swYaxYw)a Loader for species distribution dataset from Phillips et. al. (2006). Read more in the :ref:`User Guide `. Parameters ---------- data_home : str or path-like, default=None Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in '~/scikit_learn_data' subfolders. download_if_missing : bool, default=True If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site. n_retries : int, default=3 Number of retries when HTTP errors are encountered. .. versionadded:: 1.5 delay : float, default=1.0 Number of seconds between retries. .. versionadded:: 1.5 Returns ------- data : :class:`~sklearn.utils.Bunch` Dictionary-like object, with the following attributes. coverages : array, shape = [14, 1592, 1212] These represent the 14 features measured at each point of the map grid. The latitude/longitude values for the grid are discussed below. Missing data is represented by the value -9999. train : record array, shape = (1624,) The training points for the data. Each point has three fields: - train['species'] is the species name - train['dd long'] is the longitude, in degrees - train['dd lat'] is the latitude, in degrees test : record array, shape = (620,) The test points for the data. Same format as the training data. Nx, Ny : integers The number of longitudes (x) and latitudes (y) in the grid x_left_lower_corner, y_left_lower_corner : floats The (x,y) position of the lower-left corner, in degrees grid_size : float The spacing between points of the grid, in degrees Notes ----- This dataset represents the geographic distribution of species. The dataset is provided by Phillips et. al. (2006). The two species are: - `"Bradypus variegatus" `_ , the Brown-throated Sloth. - `"Microryzomys minutus" `_ , also known as the Forest Small Rice Rat, a rodent that lives in Peru, Colombia, Ecuador, Peru, and Venezuela. References ---------- * `"Maximum entropy modeling of species geographic distributions" `_ S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling, 190:231-259, 2006. Examples -------- >>> from sklearn.datasets import fetch_species_distributions >>> species = fetch_species_distributions() >>> species.train[:5] array([(b'microryzomys_minutus', -64.7 , -17.85 ), (b'microryzomys_minutus', -67.8333, -16.3333), (b'microryzomys_minutus', -67.8833, -16.3 ), (b'microryzomys_minutus', -67.8 , -16.2667), (b'microryzomys_minutus', -67.9833, -15.9 )], dtype=[('species', 'S22'), ('dd long', 'z1Data not found and `download_if_missing` is FalsezDownloading species data from z to )dirnamerQrRtraintestNzDownloading coverage data from z - converting {}r!) coveragesrXrW )compress)rr rr%r&int16rDATA_ARCHIVE_NAMEOSErrorloggerinfoSAMPLESrrloadfilesrr;r COVERAGESdebugformatappendr1asarrayr joblibdump)rOrPrQrR extra_paramsr" archive_path samples_pathXffhandlerWrXcoverages_pathrYbunchs rfetch_species_distributionsrtsRi(I )  ! "  L HHE ,=>L , "MN N  YWX$ Y)5 WW\ " .aWW .!!A$-a<%g.EQ;$W-D  . . | 9B R ' yIU WW^ $ ;IWW :!!A$- /66q9:  !89 : 9E:I  ; ~R ER\R Ers<")) ?CC  8 O   8 O  +   8 $%&RXX  (68T* ){xD@A4d9=>  #' VVr