gL iY\VdZddlmZddlZddlZddlZddlmZmZm Z ddl Z ddl m Z m Z ddl mZddlmZddlmZdd lmZdd lmZdd lmZdd lmZdd lmZmZddlmZddl m!Z!ddl"m#Z#m$Z$m%Z%m&Z&m'Z'erddl(m)Z)m*Z*m+Z+m,Z,m-Z-ddZ. d d!dZ/GddZ0Gdde0Z1Gdde0Z2eed d" d#dZ3eeddddejhejhddf d$dZ5y)%z parquet compat ) annotationsN) TYPE_CHECKINGAnyLiteral)catch_warningsfilterwarnings) _get_option)lib)import_optional_dependencyAbstractMethodError)doc)find_stack_level)check_dtype_backend) DataFrame get_option) _shared_docs)arrow_table_to_pandas) IOHandles get_handle is_fsspec_urlis_urlstringify_path) DtypeBackendFilePath ReadBufferStorageOptions WriteBufferBaseImplc$|dk(r td}|dk(r,ttg}d}|D] } |cStd||dk(r tS|dk(r tSt d #t$r}|dt |zz }Yd}~dd}~wwxYw) zreturn our implementationautozio.parquet.enginez - NzUnable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. A suitable version of pyarrow or fastparquet is required for parquet support. Trying to import the above resulted in these errors:pyarrow fastparquetz.engine must be one of 'pyarrow', 'fastparquet')r PyArrowImplFastParquetImpl ImportErrorstr ValueError)engineengine_classes error_msgs engine_classerrs W/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/pandas/io/parquet.py get_enginer04s /0 %7 * 1L 1#~% 1  C l    } =   E FF% 1gC00  1sA++ B4B  Bstorage_optionsc,t|}|tdd}tdd}|#t||jr |rOt d|!t||j j rn!tdt|jt|rk|i|5td}td} |jj|\}}|Mtd}|jj|fi|xsi\}}n|rt!|r|d k7r td d} |sN|sLt|t"r%>t%D"N :/9F!6!6!6"#2#8b" B &"8DDLSTTG  ~s + n-  D%   7B &&7r/  s7E;;FFc0eZdZeddZddZdddZy) rc:t|ts tdy)Nz+to_parquet only supports IO with DataFrames)r9rr))dfs r/validate_dataframezBaseImpl.validate_dataframes"i(JK K)c t|Nr )selfrSrF compressionkwargss r/writezBaseImpl.write !$''rUNc t|rWr )rXrFcolumnsrZs r/readz BaseImpl.readr\rU)rSrreturnNone)rSrrW)r`r)r? __module__ __qualname__ staticmethodrTr[r_rUr/rrs LL((rUczeZdZddZ d ddZdddej ddf d dZy) r%c<tddddl}ddl}||_y)Nr#z(pyarrow is required for parquet support.extrar)r pyarrow.parquet(pandas.core.arrays.arrow.extension_typesapi)rXr#pandass r/__init__zPyArrowImpl.__init__s!" G   8rUNc |j|d|jddi} ||| d<|jjj|fi| } |j rNdt j|j i} | jj} i| | } | j| } t|||d|du\}}}t|tjrmt|drat|j t"t$frAt|j t$r|j j'}n |j } |-|jj(j*| |f|||d|n+|jj(j,| |f||d|||j/yy#||j/wwxYw) Nschemapreserve_index PANDAS_ATTRSwb)r1rJrKname)rYpartition_cols filesystem)rYrv)rTpoprlTable from_pandasattrsjsondumpsrpmetadatareplace_schema_metadatarPr9ioBufferedWriterhasattrrtr(bytesdecodeparquetwrite_to_dataset write_tableclose)rXrSrFrYindexr1rurvrZfrom_pandas_kwargstable df_metadataexisting_metadatamerged_metadatarLrOs r/r[zPyArrowImpl.writes #.6 8T8R-S  38 / 0***2D1CD 88)4::bhh+?@K % 5 5 B!2BkBO11/BE.A  +!- / + ~r'8'8 9/>..e =.--u5!/!4!4!;!;!=!/!4!4 )1  11"!,#1)   -  ,,"!,)   " #w" #s AF::GFc d|d<i} tdd} | dk(rd| d<t|||d\} } } |jjj| f|||d |} t 5t d d tt| || }ddd| dk(rjdd }| jjrKd| jjvr3| jjd}tj|_| | jSS#1swYxYw#| | jwwxYw)NTuse_pandas_metadatazmode.data_manager)silentarray split_blocksr6)r1rJ)r^rvfiltersr3zmake_block is deprecated) dtype_backendto_pandas_kwargsF)copys PANDAS_ATTRS)r rPrlr read_tablerrDeprecationWarningr _as_managerrpr}r{loadsrzr)rXrFr^ruse_nullable_dtypesrr1rvrZrmanagerrLrOpa_tableresultrs r/r_zPyArrowImpl.readsa)-$%1$? g /3 ^ ,.A  + / +  2txx''22%   H ! .& /"/%5 '!++G%+@''"hoo&>&>>"*//":":?"KK#'::k#:FL" #+  *" #s$5D*( DBD*D'#D**D?r`rasnappyNNNN)rSrrFzFilePath | WriteBuffer[bytes]rY str | Noner bool | Noner1StorageOptions | Nonerulist[str] | Noner`ra)rboolrDtypeBackend | lib.NoDefaultr1rr`r)r?rbrcrnr[r no_defaultr_rerUr/r%r%s #+!15+/@ @ ,@  @  @ / @ )@  @ J$)69nn157 " 7 4 7 /7  7 rUr%cNeZdZddZ d ddZ d d dZy) r&c,tdd}||_y)Nr$z,fastparquet is required for parquet support.rh)r rl)rXr$s r/rnzFastParquetImpl.__init__+s1 !O rUNc  |j|d|vr | tdd|vr|jd}|d|d<| tdt |}t |rt d fd|d<n r td td 5|jj||f|||d |dddy#1swYyxYw) N partition_onzYCannot use both partition_on and partition_cols. Use partition_cols for partitioning datahive file_scheme9filesystem is not implemented for the fastparquet engine.r5cPj|dfixsijS)Nrs)open)rF_r5r1s r/z'FastParquetImpl.write..Vs.+&++d3.4"3dfrU open_withz?storage_options passed with file object or non-fsspec file pathT)record)rY write_indexr) rTr)rwr;rrr rrlr[) rXrSrFrYrrur1rvrZr5s ` @r/r[zFastParquetImpl.write3s # V #(BK  V ##ZZ7N  %$*F= !  !%K  d#  /9F#F; Q 4 (  DHHNN (!+        s #B>>Cc i}|jdd}|jdtj} d|d<|r td| tjur td| t dt |}d} t |r1td} | j|d fi|xsij|d <nJt|tr:tjj|st|d d| } | j } |j"j$|fi|} | j&d ||d || | j)SS#| | j)wwxYw)NrFr pandas_nullszNThe 'use_nullable_dtypes' argument is not supported for the fastparquet enginezHThe 'dtype_backend' argument is not supported for the fastparquet enginerr5r6rIr7)r^rre)rwr rr)r;rrr rrIr9r(rErFrGrrHrl ParquetFile to_pandasr) rXrFr^rr1rvrZparquet_kwargsrrrOr5 parquet_files r/r_zFastParquetImpl.readhsm*,$jj)>F ?CNNC ).~& %   .%   !%K d#  /9F#.6;;tT#Uo>SQS#U#X#XN4 c "277==+>!dE?G>>D /488//GGL)<))U'7UfU" #w" #s 1E Err)rSrrYz*Literal['snappy', 'gzip', 'brotli'] | Noner1rr`ra)NNNN)r1rr`r)r?rbrcrnr[r_rerUr/r&r&*snCK153 3@ 3/3 3p15 0 / 0  0 rUr&)r1r!c t|tr|g}t|} |tjn|} | j || f|||||d||,t| tjsJ| j Sy)a Write a DataFrame to the parquet format. Parameters ---------- df : DataFrame path : str, path object, file-like object, or None, default None String, path object (implementing ``os.PathLike[str]``), or file-like object implementing a binary ``write()`` function. If None, the result is returned as bytes. If a string, it will be used as Root Directory path when writing a partitioned dataset. The engine fastparquet does not accept file-like objects. engine : {{'auto', 'pyarrow', 'fastparquet'}}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. When using the ``'pyarrow'`` engine and no storage options are provided and a filesystem is implemented by both ``pyarrow.fs`` and ``fsspec`` (e.g. "s3://"), then the ``pyarrow.fs`` filesystem is attempted first. Use the filesystem keyword with an instantiated fsspec filesystem if you wish to use its implementation. compression : {{'snappy', 'gzip', 'brotli', 'lz4', 'zstd', None}}, default 'snappy'. Name of the compression to use. Use ``None`` for no compression. index : bool, default None If ``True``, include the dataframe's index(es) in the file output. If ``False``, they will not be written to the file. If ``None``, similar to ``True`` the dataframe's index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn't require much space and is faster. Other indexes will be included as columns in the file output. partition_cols : str or list, optional, default None Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string. {storage_options} filesystem : fsspec or pyarrow filesystem, default None Filesystem object to use when reading the parquet file. Only implemented for ``engine="pyarrow"``. .. versionadded:: 2.1.0 kwargs Additional keyword arguments passed to the engine Returns ------- bytes if no path argument is provided else None N)rYrrur1rv)r9r(r0rBytesIOr[getvalue) rSrFr*rYrr1rurvrZimpl path_or_bufs r/ to_parquetrsB.#&() f DAESWKDJJ   %'    |+rzz222##%%rUc t|} |tjur0d} |dur| dz } tj| t t nd}t|| j|f||||||d|S)a Load a parquet object from the file path, returning a DataFrame. Parameters ---------- path : str, path object or file-like object String, path object (implementing ``os.PathLike[str]``), or file-like object implementing a binary ``read()`` function. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: ``file://localhost/path/to/table.parquet``. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be: ``file://localhost/path/to/tables`` or ``s3://bucket/partition_dir``. engine : {{'auto', 'pyarrow', 'fastparquet'}}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. When using the ``'pyarrow'`` engine and no storage options are provided and a filesystem is implemented by both ``pyarrow.fs`` and ``fsspec`` (e.g. "s3://"), then the ``pyarrow.fs`` filesystem is attempted first. Use the filesystem keyword with an instantiated fsspec filesystem if you wish to use its implementation. columns : list, default=None If not None, only these columns will be read from the file. {storage_options} .. versionadded:: 1.3.0 use_nullable_dtypes : bool, default False If True, use dtypes that use ``pd.NA`` as missing value indicator for the resulting DataFrame. (only applicable for the ``pyarrow`` engine) As new dtypes are added that support ``pd.NA`` in the future, the output with this option will change to use those dtypes. Note: this is an experimental option, and behaviour (e.g. additional support dtypes) may change without notice. .. deprecated:: 2.0 dtype_backend : {{'numpy_nullable', 'pyarrow'}}, default 'numpy_nullable' Back-end data type applied to the resultant :class:`DataFrame` (still experimental). Behaviour is as follows: * ``"numpy_nullable"``: returns nullable-dtype-backed :class:`DataFrame` (default). * ``"pyarrow"``: returns pyarrow-backed nullable :class:`ArrowDtype` DataFrame. .. versionadded:: 2.0 filesystem : fsspec or pyarrow filesystem, default None Filesystem object to use when reading the parquet file. Only implemented for ``engine="pyarrow"``. .. versionadded:: 2.1.0 filters : List[Tuple] or List[List[Tuple]], default None To filter out data. Filter syntax: [[(column, op, val), ...],...] where op is [==, =, >, >=, <, <=, !=, in, not in] The innermost tuples are transposed into a set of filters applied through an `AND` operation. The outer list combines these sets of filters through an `OR` operation. A single list of tuples can also be used, meaning that no `OR` operation between set of filters is to be conducted. Using this argument will NOT result in row-wise filtering of the final partitions unless ``engine="pyarrow"`` is also specified. For other engines, filtering is only performed at the partition level, that is, to prevent the loading of some row-groups and/or files. .. versionadded:: 2.1.0 **kwargs Any additional kwargs are passed to the engine. Returns ------- DataFrame See Also -------- DataFrame.to_parquet : Create a parquet object that serializes a DataFrame. Examples -------- >>> original_df = pd.DataFrame( ... {{"foo": range(5), "bar": range(5, 10)}} ... ) >>> original_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 >>> df_parquet_bytes = original_df.to_parquet() >>> from io import BytesIO >>> restored_df = pd.read_parquet(BytesIO(df_parquet_bytes)) >>> restored_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 >>> restored_df.equals(original_df) True >>> restored_bar = pd.read_parquet(BytesIO(df_parquet_bytes), columns=["bar"]) >>> restored_bar bar 0 5 1 6 2 7 3 8 4 9 >>> restored_bar.equals(original_df[['bar']]) True The function uses `kwargs` that are passed directly to the engine. In the following example, we use the `filters` argument of the pyarrow engine to filter the rows of the DataFrame. Since `pyarrow` is the default engine, we can omit the `engine` argument. Note that the `filters` argument is implemented by the `pyarrow` engine, which can benefit from multithreading and also potentially be more economical in terms of memory. >>> sel = [("foo", ">", 2)] >>> restored_part = pd.read_parquet(BytesIO(df_parquet_bytes), filters=sel) >>> restored_part foo bar 0 3 8 1 4 9 zYThe argument 'use_nullable_dtypes' is deprecated and will be removed in a future version.TzFUse dtype_backend='numpy_nullable' instead of use_nullable_dtype=True.) stacklevelF)r^rr1rrrv) r0r rwarningswarn FutureWarningrrr_) rFr*r^r1rrrvrrZrmsgs r/ read_parquetrsr f D#..0 #  $ & X C  c=5E5GH# & 499  '/#    rU)r*r(r`r)Nr6F) rFz1FilePath | ReadBuffer[bytes] | WriteBuffer[bytes]rIrr1rrJr(rKrr`zVtuple[FilePath | ReadBuffer[bytes] | WriteBuffer[bytes], IOHandles[bytes] | None, Any])Nr!rNNNN)rSrrFz$FilePath | WriteBuffer[bytes] | Noner*r(rYrrrr1rrurrvrr`z bytes | None)rFzFilePath | ReadBuffer[bytes]r*r(r^rr1rrzbool | lib.NoDefaultrrrvrrz&list[tuple] | list[list[tuple]] | Noner`r)6__doc__ __future__rrr{rEtypingrrrrrrpandas._config.configr pandas._libsr pandas.compat._optionalr pandas.errorsr pandas.util._decoratorsrpandas.util._exceptionsrpandas.util._validatorsrrmrrpandas.core.shared_docsrpandas.io._utilrpandas.io.commonrrrrrpandas._typingrrrrrr0rPrr%r&rrrrerUr/rs"   .>-'4711GJ.2 <' ;<' <'+<'  <'  <'  <'~ ( (E (E Pn hn b\"34526&-1'+UU .U U U  U + U%UUU6Up\"345 $-10325..6:q &q qq+ q . q 0 qq4qq6qrU