K iL UddlmZddlmZmZddlmZmZmZddl Z ddl Z ddl m ZddlmZeZeZeZej&e j(e j*e j,e j.dej0e j2e j4e j6e j8dej:e j<e j>e j@dejBe jDe j2dejFd e jHiiZ%d e&d <ddd Z'ddd Z( d ddZ) d ddZ* d ddZ+ d ddZ,dZ-dZ. d ddZ/ d ddZ0 d d dZ1y)!) annotations)AnyTuple) DtypeKind ColumnBuffersColumnNullTypeN)Dtype) @)r r r )r r zdict[DtypeKind, dict[int, Any]]_PYARROW_DTYPESct|tjr|St|tjr tjj |gSt |ds t dt|j||S)a. Build a ``pa.Table`` from any DataFrame supporting the interchange protocol. Parameters ---------- df : DataFrameObject Object supporting the interchange protocol, i.e. `__dataframe__` method. allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.Table Examples -------- >>> import pyarrow >>> from pyarrow.interchange import from_dataframe Convert a pandas dataframe to a pyarrow table: >>> import pandas as pd >>> df = pd.DataFrame({ ... "n_attendees": [100, 10, 1], ... "country": ["Italy", "Spain", "Slovenia"], ... }) >>> df n_attendees country 0 100 Italy 1 10 Spain 2 1 Slovenia >>> from_dataframe(df) pyarrow.Table n_attendees: int64 country: large_string ---- n_attendees: [[100,10,1]] country: [["Italy","Spain","Slovenia"]] __dataframe__z#`df` does not support __dataframe__) allow_copy) isinstancepaTable RecordBatch from_batcheshasattr ValueError_from_dataframer)dfrs h/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/pyarrow/interchange/from_dataframe.pyfrom_dataframer?ssT"bhh B 'xx$$bT** 2 '>?? 2++z+B&0 22cg}|jD]}t||}|j|!|st|}|j|tjj |S)a Build a ``pa.Table`` from the DataFrame interchange object. Parameters ---------- df : DataFrameObject Object supporting the interchange protocol, i.e. `__dataframe__` method. allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.Table ) get_chunksprotocol_df_chunk_to_pyarrowappendrrr)rrbatcheschunkbatchs rrrusf"G,UJ?u ,R0u 88  ))rczi}|jD]}t|tstd|d||vrtd|d|j |}|j d}|t jt jt jt jt jfvrt||||<|t jk(rt||||<|t jk(rt!||||<t#d|dt$j&j)|S)a Convert interchange protocol chunk to ``pa.RecordBatch``. Parameters ---------- df : DataFrameObject Object supporting the interchange protocol, i.e. `__dataframe__` method. allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.RecordBatch zColumn z is not a stringz is not uniquerz Data type z not handled yet) column_namesrstrrget_column_by_namedtyperINTUINTFLOATSTRINGDATETIMEcolumn_to_arrayBOOLbool_column_to_array CATEGORICAL categorical_column_to_dictionaryNotImplementedErrorrr from_pydict)rrcolumnsnamecolr*s rr!r!s$*$&G!L$$wtf,<=> > 7?wtfN;< <##D) !   MM NN OO         ,C> % %g ..rc|j}|j}t|||j|j|j |}|S)a Convert a column holding one of the primitive dtypes to a PyArrow array. A primitive type is one of: int, uint, float, bool (1 bit). Parameters ---------- col : ColumnObject allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.Array ) get_buffersr*buffers_to_arraysize describe_nulloffset)r9rbuffers data_typedatas rr0r0sJ&ooG I GYHHJ--JJ&  (D Krc4|j}|ddd}|dk(r |s td|j}t|||j |j |j }|dk(r(tj|tj}|S)aD Convert a column holding boolean dtype to a PyArrow array. Parameters ---------- col : ColumnObject allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.Array rBrr zfBoolean column will be casted from uint8 and a copy is required which is forbidden by allow_copy=False) r; RuntimeErrorr*r<r=r>r?pccastrbool_)r9rr@r=rArBs rr2r2s$ooG 6?1 a D qy A   I GYHHJ--JJ (D qywwtRXXZ( KrcJ|s td|j}|ds td|d}t|}|j }|d\}}t |||j |j|j}tjj||} | S)aV Convert a column holding categorical data to a pa.DictionaryArray. Parameters ---------- col : ColumnObject allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.DictionaryArray zjCategorical column will be casted from uint8 and a copy is required which is forbidden by allow_copy=False is_dictionaryz-Non-dictionary categoricals not supported yet categoriesrB) rDdescribe_categoricalr5r0r;r<r=r>r?rDictionaryArray from_arrays) r9r categorical cat_column dictionaryr@_rAindices dict_arrays rr4r4s$  A  **K  '! ;= =\*J ,JooG6?LAyw "xxz"00"zz+G ##//DJ rctjd|}|r0|jd|jd}}|dk7r|dz }||fStd|)z4Parse datetime `format_str` to interpret the `data`.zts([smun]):(.*)rsz DateTime kind is not supported: )rematchgroupr5) format_strtimestamp_metaunittzs rparse_datetime_format_strr^5seXX0*=N!''*N,@,@,Cb 3; CKDRx  @ M NNrc|\}}}}|tjk(r%t|\}}tj||St j |ij |d}|r|Std|d)z+Map column date type to pyarrow date type. )r]NzConversion for  is not yet supported.)rr/r^r timestamprgetr5)rAkind bit_widthf_stringrQr\r]pa_dtypes r map_date_typergGs#, D)Xq y!!!,X6b||DR(("&&tR044YE O%!),BCE Erc|d\}} |d\}} |d\} } tj|j|j|} |rt | ||||} nt | |||||} t|}| r \}}}}tj| j| j| }|ddk(rtj}n.|dk(rtj}ntj}tjj||| || g| }|Stjj||| | g| }|S#t$rd}Y@wxYw#t$rd} YIwxYw) a$ Build a PyArrow array from the passed buffer. Parameters ---------- buffer : ColumnBuffers Dictionary containing tuples of underlying buffers and their associated dtype. data_type : Tuple[DtypeKind, int, str, str], Dtype description of the column as a tuple ``(kind, bit-width, format string, endianness)``. length : int The number of values in the array. describe_null: ColumnNullType Null representation the column dtype uses, as a tuple ``(kind, value)`` offset : int, default: 0 Number of elements to offset from the start of the buffer. allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.Array Notes ----- The returned array doesn't own the memory. The caller of this function is responsible for keeping the memory owner object alive as long as the returned PyArrow array is being used. rBvalidityNoffsetsbaserUUr r?) TypeErrorrforeign_bufferptrbufsizevalidity_buffer_from_maskvalidity_buffer_nan_sentinelrg large_stringstringArray from_buffers)r@rAlengthr>r?r data_buffrQ validity_buffvalidity_dtype offset_buff offset_dtypedata_pa_buffervalidity_pa_buff data_dtypeoffset_bit_widthoffset_pa_buffer string_typearrays rr<r<YsP6?LIq(/ (;% ~$+I$6! \ &&y}}i6G6G,57N4]5C5B5;5;5? A88A8E8>8>8B Dy)J$0! Q,,[__-8-@-@2=? Q<3 //+K2% oo/  iik %%   / @ &  L%%   ~ . &  Lw    s"EE EE E%$E%c|\}}|\}} } } |tjk(sJ|tjk(ry|tjk(s|tj k(r|dk(rt j|j|j|} |tjk(rm|s tdt jjt j|d| g|} tj| t j } n7t jjt j |d| g|} |dk(rtj"| } | j%dS|tj k(r1|dk(r,t j|j|j|St'|d)a Build a PyArrow buffer from the passed mask buffer. Parameters ---------- validity_buff : BufferObject Tuple of underlying validity buffer and associated dtype. validity_dtype : Dtype Dtype description as a tuple ``(kind, bit-width, format string, endianness)``. describe_null : ColumnNullType Null representation the column dtype uses, as a tuple ``(kind, value)`` length : int The number of values in the array. offset : int, default: 0 Number of elements to offset from the start of the buffer. allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.Buffer NrrkYTo create a bitmask a copy of the data is required which is forbidden by allow_copy=Falsernr* null representation is not yet supported.)rr1r NON_NULLABLE USE_BYTEMASK USE_BITMASKrrprqrrrDrwrxint8rErFrGinvertr@r5) r{r|r>ryr?r null_kind sentinel_val validity_kindrQbuffmask mask_bools rrsrssB,I|+M1a INN ** *N/// n11 1^///LA4E  !2!2!.!6!6&35 33 3"F88((F*.06)8Dbhhj1I--bhhj&/3Tl5;.=I 1  ),I  "1%% n00 0\Q5F  !2!2!.!6!6&35 5"oG HJ Jrc$|\}}}}t|} |\} } | tjk(r|s td|tj k(r|dk(rt |d| dtjj| |d|g|} tj| } tj| } | jdS| tjk(r|s td|tjk(rtj }n| }tjj||d|g|} tj"| | }tj|}|jdS| tj$k(ryt |d) a Build a PyArrow buffer from NaN or sentinel values. Parameters ---------- data_pa_buffer : pa.Buffer PyArrow buffer for the column data. data_type : Dtype Dtype description as a tuple ``(kind, bit-width, format string, endianness)``. describe_null : ColumnNullType Null representation the column dtype uses, as a tuple ``(kind, value)`` length : int The number of values in the array. offset : int, default: 0 Number of elements to offset from the start of the buffer. allow_copy : bool, default: True Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested). Returns ------- pa.Buffer rr z with r`Nrnrr)rgrUSE_NANrDrr-r5rrwrxrEis_nanrr@ USE_SENTINELr/int64equalr)rrAr>ryr?rrcrdrQrrr pyarrow_datarsentinel_dtype sentinel_arrrs rrtrtsB&D)Qy)J+I|N***B  9?? "yB&+VI;.DEG G8800~& 1L 99\*D99T?D<<>!$ $ n11 1B  9%% %XXZN'Nxx,,^-3.2N-C4:-< xx l; IIl+   "1%% n11 1 !oG HJ Jr)T)rDataFrameObjectreturnzpa.Table)rr)rrrboolrzpa.RecordBatch)r9 ColumnObjectrrrpa.Array)r9rrrrzpa.DictionaryArray)rT)r@rrAzTuple[DtypeKind, int, str, str]ryintr>rr?rrrrr)r{ BufferObjectr|r r>rryrr?rrrr pa.Buffer)rrrAr r>rryrr?rrrrr)2 __future__rtypingrrpyarrow.interchange.columnrrrpyarrowrrWpyarrow.computecomputerEr rrrr+rint16int32rr,uint8uint16uint32uint64r-float16float32float64r1rGr.rvr__annotations__rrr!r0r2r4r^rgr<rsrtrrrs$#   ,   MMwrwwy      $NN """&OO*"**,$"**,$"**,(NN   $ q)"))+&40$32l*>,/,/,/,/b >$ $$$R- ---`O$E. f f.f f" f  f  ff\ LJLJLJ"LJ  LJ  LJ  LJLJh UJ UJUJ"UJ  UJ  UJ  UJUJr