K iddlZddlmZddlZddlZddlmZddlmZddl Z ddl Z ddl m Z ddl mZddlmZmZmZ ddlmZddlmZmZ ddlZddlmZdd lmZdd lm Z  ddl!Z"e jFj(Z$d Z%d Z&e jFj2d Z'e jFj2dZ(e jFj2dZ)e jFj2dZ*e jFj2dZ+dZ,dZ-dZ.e jFj^dZ0e jFj2dZ1e jFj2dZ2dZ3e jFj2dZ4dZ5e jFjmdde jnge jFjmdddZ8dZ9d Z:d!Z;d"Zd%Z?d&Z@e jFj2d'ZAe jFj2d(ZBe jFj2d)ZCd*ZDd+ZEe jFj2d,ZFd-ZGe jFj2e jFje jFjd.e jFjd/d0ZJe jFjmd1d2d3d4d5ge jFjmd6d7d8gd9ZKd:ZLd;ZMd<ZNd=ZOd>ZPd?ZQe jFjd@ZSy#e$rdZYwxYw#e$rdxZZYwxYw#e$rdZ"YwxYw)AN) OrderedDict)copytree)Decimal)fs)util)_check_roundtrip_roundtrip_table_test_dataframe) _read_table _write_table)dataframe_with_lists)alltypes_samplec:tjdgdi}tjtd5t ||dz ddddtjtd5t ||dz d dddy#1swY?xYw#1swYyxYw) Naz"Unsupported Parquet format versionmatchztest_version.parquetz2.2versionz%Unsupported Parquet data page version)data_page_version)patablepytestraises ValueErrorr )tempdirrs f/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/pyarrow/tests/parquet/test_basic.pytest_parquet_invalid_versionr!;s HHc9% &E z)M NMUG&<Ysj  B HH E :EUa$&7 y !;#(*E ;;;;s A..A7c g}tjjtd}|j tj j |gdzt\}}tjj|}|j tj j |gdzdD]}dD]}|D]}t|d||!y)Nr4sizer)z1.0z2.0)TF2.6)rruse_dictionary) r RecordBatchr7rappendr( from_batchesr r)tablesbatchr9_rrCrs r test_chunked_table_writerJfsF NN & &B'? @E MM"((''! 45 "EB NN & &r *E MM"((''! 45+3+ 3N 3 5&7#13 3 33r/cHtd}tjj|}t |ddidt |dz }t |d5}t||d dddtj|d }|j|sJy#1swY4xYw) Nr4r@ memory_mapTrBread_table_kwargsrtmp_filewbr)rL rrr(r7rstropenr pq read_pandasequalsrr9rfilenamef table_reads r test_memory_mapr[ys b !B HH  $EU|T.B"$7Z'(H h .UAu-.T:J   U ## #.. BB!cHtd}tjj|}t |ddidt |dz }t |d5}t||d dddtj|d }|j|sJy#1swY4xYw) Nr4r@ buffer_sizeirBrMrOrPri)r^rQrWs r test_enable_buffered_streamr_s b !B HH  $EU}d.C"$7Z'(H h .UAu-.d;J   U ## #..r\c>tjjtjdggdg}d}||z }|j rJt |t ||j sJtt |}|j|sJy)N*intsz foo # bar) rr(r)r'existsr rRr rV)rrrXpathrZs r test_special_chars_filenameres HH "((B4.!1F8 .MockParquetDatasetctd)NMockParquetDataset) ImportError)selfargskwargss r __init__zDtest_read_table_without_dataset..MockParquetDataset.__init__s23 3r/N)__name__ __module__ __qualname__rvrjr/r rqros 4r/rq test.parquetrrz#pyarrow.parquet.core.ParquetDataset)newzthe 'filters' keywordr)integer=r)filterszthe 'partitioning' keywordweekcolor) partitioningzthe 'schema' argumentschema) unittestrmrrr patchrrrrTrhrOSError)rrmrqrdrresults r test_read_table_without_datasetrsT44 ^ #D HHc9% &E 9?Q R  ]]:-D E ? MM$)<(= > ? ]]:-I J @ MM$fg-> ? @ ]]:-D E 5 MM$u|| 4 5]]7 # # MM' " #t$   ? ? @ @ 5 5 # #  slF*E*#F&E6#F#"F!F&F<$F*E3 /F6E? ;FF FF FF#cttjttdgdg}t |dy)Ni@r#r$r)row_group_size)rrlistranger)r+s r (test_file_with_over_int16_max_row_groupsrs, $uU|$%dV4AQq)r/cZtd}tjj|}tjj |j Dcgc]}|j dddc}|jj}|jjdjtjk(sJ|jjdjtjtjk(sJt|dycc}w) Nr4r@rr$null null_listrBr)rrr(r7r) itercolumnschunkrr%fieldtyperlist_r)r9rcols r test_empty_table_roundtriprs b !B HH  $E HH %*%6%6%89c1bq 9ll   ! "E <<  f % * *bggi 77 7 <<  k * / /288BGGI3F FF F u :sD(ctj}tjj |d}t |y)NFr2)pd DataFramerr(r7r)r9emptys r test_empty_table_no_columnsrs. B HH E :EUr/c Httjtjtj}gtddgg}|Dcgc]:}tj |tj |j<}}|Dcgc]6}tjj|tj|8}}tjj|tj|}t|ycc}wcc}w)N)int32 list_stringr)Grr)rrrrstringr'structflattenrDr)rr(rFr)colsdatarH my_arrays my_batchestbls r 1test_write_nested_zero_length_array_chunk_failurers hhjHHRYY[) D 1&9< =D#$%biio6>>@$I$ )*..,,U299T?,K*J* ((   BIIdO tjdgdi}||z }tj|t |t j |5tj||}dddj|sJ|j|jrJt j |5tj|||dddtj|}|j|sJy#1swYxYw#1swY>xYw)Nrrr) rrrT write_tablerRr change_cwdrhrVunlinkrc)rrrrrdrs r test_relative_pathsr&s HHc9% &E T>DNN5#d)$  !<t ;< ==  KKM{{}   !; udz:; ]]4 F ==  <<;;sD=DDDctjt5tjddddy#1swYyxYw)Nzi-am-not-existing.parquet)rrFileNotFoundErrorrTrhrjr/r test_read_non_existing_filer?s1 ( )3 12333s 9AcGddtj}tjtd5t j |ddddy#1swYyxYw)NceZdZdZdZy)3test_file_error_python_exception..BogusFilectdNzorglubZeroDivisionErrorrsrts r readz8test_file_error_python_exception..BogusFile.readG #I. .r/ctdrrrs r seekz8test_file_error_python_exception..BogusFile.seekJrr/N)rwrxryrrrjr/r BogusFilerFs  / /r/rrrr/)ioBytesIOrrrrTrh)rs r test_file_error_python_exceptionrEsH/BJJ/ ( :& in%&&&s AA#ctjdgdi}tj|t |dz t t |dz d5}tj |}dddj|sJt t |dz d5}tj tj|}ddd|j|sJy#1swYyxYw#1swY)xYw)Nrrrrb) rrrTrrRrSrhrV PythonFile)rrrYrs r test_parquet_read_from_bufferrRs HHc9% &ENN5#g678 c'N*+T 2"aq!" ==   c'N*+T 21ar}}Q/01 ==   ""11sC")C."C+.C7c tjtttt d}tjttt t d}tjddgdz}||g}tjj|ddg}t||ddd t||ddgdg t||dddgddg tjj||||ggd }t||dd gdd g tjj|gdg}tjtd5t||dddddy#1swYyxYw)Nr1TF2rbr$gzip)expected compressionrCuse_byte_stream_splitrrcdrr)rrCrtmpBYTE_STREAM_SPLIT only supportsr)rrrC) rr'rmapfloatrintr(r)rrrIOError) arr_floatarr_intarr_bool data_floatr mixed_tables r test_byte_stream_splitr`sbc%s456IhhtCU3Z012Gxxu *+HY'J HH C: >EUU$)G UU%(E,/52 UU%(#J,/:7 ((&& 9gw'O-A'CK[;%(#J,/:7 HH (E7 ;E w&G H/d(- ////s E33E<c Ltjtttt dtj dd}tjtttt dtj dd}tjtttt dtj dd}tjdd gd z}|||g}tjj|gd  }t||d d dtjj|d}tj||d d dtj|}|j j#d} |j j#d} | j$dk(sJ| j$dk(sJt||d d ddddtjj||||ggd } t| | d dy)Nr1rr TFrrrrr$r)rrrCstore_decimal_as_integerrz)rrCrrrINT32INT64DELTA_BINARY_PACKEDrr)rrrCrcolumn_encodingr)rrCr)rr'rrrr decimal128r(r)rosrdjoinrTrrircolumn physical_type) rarr_decimal_1_9arr_decimal_10_18arr_decimal_gt18r data_decimalrpqtestfile_path pqtestfilepqcol_decimal_1_9pqcol_decimal_10_18rs r test_store_decimal_as_integerr shhtCs$<=$&MM!Q$79Oc'5:&>!?&(mmB&:<xxS%*%= >%']]2q%9;xxu *+H#%68HIL HH _ EEU#!'$).2 4ggll7N;ONN5/%"',02 0J"))003$++2215  * *g 55 5  , , 77 7U#!'$).233& ((&& +-=xH"'$K[)$).24r/c &tjtttt d}tjttt t d}tjt dDcgc] }t|c}tj}tjt dDcgc]}t|jdc}tjd}tjgddz}tjj|||||ggd}t||dd d d d d  t||dd t||dd d d d t||dd d dd t||dd d ddd  t||dddi tjtd5t||dd d d d dddtjt d5t||dd d d d dddtjt"d5t||dd dddtjt"d5t||dddi dddtjt"5t||dgdd i dddtjt"5t||dd idddtjt"5t||ddgdd d ddddtjt"5t||dddd d ddddtjt$5t||dd dddycc}wcc}w#1swYxYw#1swYxYw#1swYhxYw#1swY@xYw#1swYxYw#1swYxYw#1swYxYw#1swYxYw#1swYyxYw) Nr1rr4)FTFF)rrrrer$FBYTE_STREAM_SPLITPLAINr)rrCrrrDELTA_LENGTH_BYTE_ARRAYDELTA_BYTE_ARRAYrRLErr)rrrz)DELTA_BINARY_PACKED encoder only supportsz+'RLE_DICTIONARY' is already used by defaultRLE_DICTIONARYz/Unsupported column encoding: 'MADE_UP_ENCODING'rMADE_UP_ENCODINGr)rr)rrCrrT)rr'rrrrrrRbinaryzfillr(r)rrrrrrr=)rrrarr_binarr_flbarrs r test_column_encodingrsc%s456IhhtCU3Z012Ghhc 31A3"))+FGxx#(:.aQb .RYYr]DHxx3b89H((&& GWh9'')K [;u+>+>+2+>&@A[;$)%,. [;$)+2+@+2&45[;$)+2+@+D&FG[;$)+2+@+=+=&?@[;$)&)5\3 w> @E{(-/6/6/B*D EE wH J9{(-/D/6/6*8 99 zJ L;{(-)9 ;; zN PD{(-*-/A)B DD z "9{),*-w 99 z "9{*-w 99 z "9{(-03u/4/B/6*8 99 z "9{(-/3/4/B/6*8 99 y !/{(-)- ///o4.`EE99;;DD9999 9999//sxN!NN"N/N<9O ,O O#O/ O;?P"N,/N9<O OO #O,/O8;PPc Dtjtttt d}||g}tj j|ddg}t||ddt||ddt||dd d  t||dd d d t||ddt||ddgd}tj}|D]<\}}tjttf5t||||ddd>y#1swYIxYw)Nrrr$rr)rrcompression_levelrsnappyr)rrrrlz4r))r)ri)rgi)lzo)rr)rr'rrrrr(r)rrrrrrrr )r*rrinvalid_combinationsbufcodeclevels r test_compression_levelr(<s" ((4Ct-. /C :D HH c3Z 8EUU'(* UU'(*UU'-H!=?UU-.Q'79 UU'(*UU'(*8 **,C.2 ]]J0 1 2 +0 2 2 22 2 2s ;DD ctjgd}d}tjj|g|g}t |ddi}d}|j dj |k(sJy)N)rrrrr!z prohib; , {}flavorspark)write_table_kwargs prohib______r)rr'r(r)r rr)a0rrr expected_names r test_sanitized_spark_field_namesr0isb / "B D HH "v .E e78K LF"M ==  M 11 1r/c>td}tjj|}t j }t ||dd|jdt|d}|jdt|d }|j|sJy) Ni'r@SNAPPYrB)rrrT) use_threadsF) rrr(r7rrr rr rV)r9rr%table1table2s r test_multithreaded_readr6tsy e $B HH  $E **,C5AHHQK $ /FHHQK % 0F ==  r/ctjtjdggd}tj j |j}tj}t||d|jdt|}|j|sJtjt 5t||ddddy#1swYyxYw)Nr!)ABCD)columns) chunk_sizer)rrrrrr(r7 reset_indexrrr rr rVrrr)rrr%rs r test_min_chunksizer@s <<10D ED HH !1!1!3 4E **,C+HHQK  F ==   z "/UCA.///s CC&cftjtdttddt j ddj dt j ddd gd tjtdtjd d tjd ddtjd ddd }tjj|}|dz } t||d|jrJy#tj$rY(wxYw)Nabcrr!ru1@@float64rTFT20130101periodsz US/Eastern)rKtzns)rKfreq) rrrrrrYghirOr5r)rrrrrrastype Categorical date_rangerr(r7r ArrowExceptionrc)rr9pdfrXs r (test_write_error_deletes_incomplete_filerWs DK q!-IIaO2248IIc3i@/NN4;7MM*a@MM*a-9;MM*adK M NB ((  r "C#H  S(E2         s9DD0/D0cd} tj|y#t$r}||jdvsJYd}~yd}~wwxYw)Nznonexistent-file.parquetr)rTrh Exceptionrt)rrdrs r test_read_non_existent_filerZs> %D! d !qvvay   !s A;Actj5tjdtj|dz dddy#1swYyxYw)Nerror)actionzv0.7.1.parquet)warningscatch_warnings simplefilterrTrh)datadirs r test_read_table_doesnt_warnrbsC  "2W- g 001222s /A  Acztjjtjddggdg}t j }t j||d|jdt j|}tj|j|jy)NrBdefsome_colrrr) rr(r)r'rrrTrrrhrrr)rrY roundtrips r test_zlib_compression_bugrhs HH "((E5>":!;j\ JE ANN5!0FF1I a I)--/1BCr/ct|dz }tjtjt fd5t |d5} dddtj|dddtjtjt fd5t |d5}|jddddtj|dddy#1swYxYw#1swYxYw#1swY?xYw#1swYyxYw)Nrzzsize is 0 bytesrrPzsize is 4 bytessffff) rRrrr ArrowInvalidrrSrTrhwrite)rrdrYs r test_parquet_file_too_smallrls w' (D 1. 0 $     d  1. 0 $   GGG   d      sG C*C C* D&C68DC' #C**C36C? ;DD zignore:RangeIndex:FutureWarningz.ignore:tostring:DeprecationWarning:fastparquetc tjd}tjt dt t ddt jdddgd tjd d tjgd d}tj|}t|dz }tj||d|j|}|j!}t#j$||t|dz }|j'||tj(|}|dj+t,|d<t#j$|j!|y)N fastparquetrBrr!rErFrGrrHrIrrJ)rrr)rrrrrrYzcross_compat_arrow.parquetrfz cross_compat_fastparquet.parquetrY)r importorskiprrrrrrrTrSrrrRrTrrirrrrkrUrRobject) rfpr9r file_arrowfp_filedf_fpfile_fastparquettable_fps r $test_fastparquet_cross_compatibilityrws(   ] +B eeAqk"395$z150  B HHRLEW;;r|sBHHaY^ $r/cRtjddgdzjSrzrr'dictionary_encoderjr/r r|r|s BHHaY^ $ 6 6 8r/c6tjddgdzSNr4r{rjr/r r|r|sBHHb$Z"_ %r/cRtjddgdzjSrr~rjr/r r|r|s BHHb$Z"_ % 7 7 9r/read_dictionaryFTctjjd|i}tj}t j ||d|jd|rdgnd}t j|d|}|jD]E}|j\}|jd}|j|jdzk(rEJy) NrT)rCrF)r3rr)rr( from_pydictrrrTrrrhr<chunksbuffers to_pybytesrA)rxr orig_tablebiorrrr%s r test_buffer_contentsrs%%umo&>?J **,CNN:s48HHQK!0ugdO MM#5*9 ;E}}4**mmoa ~~388e#33334r/ctjtjtdgdg}|dz }t j ||dt j |}|j|sJy)Nr!rbr$zarrow-10480.pyarrow.gzGZIPrf)rrr'rrTrrhrV)rrrdrs r "test_parquet_compression_roundtriprsa HHbhhuQx()& :E - -DNN5$F3 ]]4 F ==  r/ctjjtjgdgdg}|dz }d}t j ||j 5}t|D]}|j| dddt j|}|jj|k(sJt|D]$}|j|j|r$Jy#1swYlxYw)Nrrr#zempty_row_groups.parquetr)rr(r)r'rT ParquetWriterrrrrimetadatanum_row_groupsread_row_grouprV)rrrd num_groupswriterrQreaders r test_empty_row_groupsr+s HH "((2G"? & '1 ,, ,+A.M-- $OK  $$ $ 1134H B<8B< '' '!)"x|HRL(2,!?2  23,}/A/AA##H-MM"5=BDM J && & BHHc<%89 99 9 w&A BP MM-$ OPPPs E##E,)Tr collectionsrrr^shutilrdecimalrrpyarrowrr pyarrow.testsrpyarrow.tests.parquet.commonrr r pyarrow.parquetparquetrTr r rrpandasrpandas.testingtestingrpyarrow.tests.pandas_examplesr rnumpyrmark pytestmarkr!r.r:r>rJr[r_rerkrslowrrrrrr parametrizeLocalFileSystemrrrrrr rr(r0r6r@rWrZrbrhrlrnfilterwarningsrwrrrrrrrdatasetrrjr/r rs$ #  ;; F B< [[ .= ; ;33$ $ $ $ $$2**   8''$ A  "   (!@A B *3 &  #/L34l}/@*2Z2!!" / / !!4!2 DD  =>LM!4N?!4H$8%9 + *UDM:4; 4(  6$ &67%t1P1Pa BNB  Bs5M M+$M;M('M(+ M87M8;NN