L ińdZddlZddlZddlZddlZddlZddlmZddlZddl Z ddl Z ddl mZmZddlmZmZmZmZmZmZmZmZmZmZddlmZddlmZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/dd l0m1Z1d Z2d Z3d Z4d Z5GddeZ6dZ7GddZ8dZ9e eddZ:e eddZ;e eddZGddZ?y)a< Classes for read / write of matlab (TM) 5 files The matfile specification last found here: https://www.mathworks.com/access/helpdesk/help/pdf_doc/matlab/matfile_format.pdf (as of December 5 2008) ================================= Note on functions and mat files ================================= The document above does not give any hints as to the storage of matlab function handles, or anonymous function handles. I had, therefore, to guess the format of matlab arrays of ``mxFUNCTION_CLASS`` and ``mxOPAQUE_CLASS`` by looking at example mat files. ``mxFUNCTION_CLASS`` stores all types of matlab functions. It seems to contain a struct matrix with a set pattern of fields. For anonymous functions, a sub-fields of one of these fields seems to contain the well-named ``mxOPAQUE_CLASS``. This seems to contain: * array flags as for any matlab matrix * 3 int8 strings * a matrix It seems that whenever the mat file contains a ``mxOPAQUE_CLASS`` instance, there is also an un-named matrix (name == '') at the end of the mat file. I'll call this the ``__function_workspace__`` matrix. When I saved two anonymous functions in a mat file, or appended another anonymous function to the mat file, there was still only one ``__function_workspace__`` un-named matrix at the end, but larger than that for a mat file with a single anonymous function, suggesting that the workspaces for the two functions had been merged. The ``__function_workspace__`` matrix appears to be of double class (``mxCLASS_DOUBLE``), but stored as uint8, the memory for which is in the format of a mini .mat file, without the first 124 bytes of the file header (the description and the subsystem_offset), but with the version U2 bytes, and the S2 endian test bytes. There follow 4 zero bytes, presumably for 8 byte padding, and then a series of ``miMATRIX`` entries, as in a standard mat file. The ``miMATRIX`` entries appear to be series of un-named (name == '') matrices, and may also contain arrays of this same mini-mat format. I guess that: * saving an anonymous function back to a mat file will need the associated ``__function_workspace__`` matrix saved as well for the anonymous function to work correctly. * appending to a mat file that has a ``__function_workspace__`` would involve first pulling off this workspace, appending, checking whether there were any more anonymous functions appended, and then somehow merging the relevant workspaces, and saving at the end of the mat file. The mat files I was playing with are in ``tests/data``: * sqr.mat * parabola.mat * some_functions.mat See ``tests/test_mio.py:test_mio_funcs.py`` for the debugging script I was working with. Small fragments of current code adapted from matfile.py by Heiko Henkelmann; parts of the code for simplify_cells=True adapted from http://blog.nephics.com/2019/08/28/better-loadmat-for-scipy/. N)BytesIO) native_code swapped_code) MatFileReader docfillermatdims read_dtype arr_to_charsarr_dtype_number MatWriteError MatReadErrorMatReadWarningMatWriteWarning) VarReader5) MatlabObjectMatlabFunctionMDTYPES NP_TO_MTYPES NP_TO_MXTYPES miCOMPRESSEDmiMATRIXmiINT8miUTF8miUINT32 mxCELL_CLASSmxSTRUCT_CLASSmxOBJECT_CLASS mxCHAR_CLASSmxSPARSE_CLASSmxDOUBLE_CLASS mclass_info mat_struct)ZlibInputStreamct|tjxr5|jdkDxr$|jdkDxrt|dt S)zBDetermine if elem is an array and if first array item is a struct.r) isinstancenpndarraysizendimr#)elems [/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/scipy/io/matlab/_mio5.py _has_structr-isF tRZZ ( ,dii!m ,$))a- , tAw +-cg}|D]d}t|tr|jt|.t |r|jt |T|j|f|S)zyConstruct lists from cell arrays (loaded as numpy ndarrays), recursing into items if they contain mat_struct objects.)r&r#append_matstruct_to_dictr-_inspect_cell_array)r( elem_listsub_elems r,r2r2oshI' h +   /9 :  "   0: ;   X & ' r.ci}|jD]O}|j|}t|trt |||<1t |rt |||<K|||<Q|S)z/Construct nested dicts from mat_struct objects.) _fieldnames__dict__r&r#r1r-r2)matobjdfr+s r,r1r1}sh A   q! dJ '%d+AaD  &t,AaDAaD Hr.c|D]G}t||trt||||<(t||s7t ||||<I|S)z,Convert mat objects in dict to nested dicts.)r&r#r1r-r2)r9keys r,_simplify_cellsr=sV1 afj )'#/AcF 3 (30AcF 1 Hr.cneZdZdZe d fd ZdZdZdZdZ d dZ d dZ d Z xZ S) MatFile5Readera Reader for Mat 5 mat files Adds the following attribute to base class uint16_codec - char codec to use for uint16 char arrays (defaults to system default codec) Uses variable reader that has the following standard interface (see abstract class in ``miobase``:: __init__(self, file_reader) read_header(self) array_from_header(self) and added interface:: set_stream(self, stream) read_full_tag(self) c t |||||||||| | stj} | |_d|_d|_y)zInitializer for matlab 5 file format reader %(matstream_arg)s %(load_args)s %(struct_arg)s uint16_codec : {None, string} Set codec to use for uint16 char arrays (e.g., 'utf-8'). Use system default codec if None N)super__init__sysgetdefaultencoding uint16_codec _file_reader_matrix_reader) self mat_stream byte_order mat_dtype squeeze_mechars_as_stringsmatlab_compatiblestruct_as_record verify_compressed_data_integrityrEsimplify_cells __class__s r,rBzMatFile5Reader.__init__sX*         ,  113L( "r.c|jjd|jjd}|jjd|dk(xrdxsdS)z3 Guess byte order. Sets stream pointer to 0~rsIM<>)rIseekread)rHmis r,guess_byte_orderzMatFile5Reader.guess_byte_ordersP S! __ ! !! $ QU{"s)c)r.ci}t|jdd}t|j|}|dj j d|d<|ddz }|ddz}|d ||d <|S) z Read in mat 5 file header dtypes file_header descriptions __header__version. __version__)rrJr rIitemstrip)rHhdict hdr_dtypehdrv_majorv_minors r,read_file_headerzMatFile5Reader.read_file_headersDOO,X6}E )4!-0557==lKli.A%i.4'")!G95m r.cDt||_t||_y)za Run when beginning read of variables Sets up readers from parameters in `self` N)rrFrGrHs r,initialize_readzMatFile5Reader.initialize_reads't,(.r.c|jj\}}|dkDs td|jj |z}|t k(r[t |j|}|jj||j}|jj\}}n'd}|jj|j|tk(std||jj|}||fS)a Read header, return header, next position Header has to define at least .name and .is_global Parameters ---------- None Returns ------- header : object object that can be passed to self.read_var_array, and that has attributes .name and .is_global next_position : int position in stream of next variable rzDid not read any bytesFz"Expecting miMATRIX type here, got ) rF read_full_tag ValueErrorrItellrr$rG set_streamrPr TypeError read_header)rHmdtype byte_countnext_posstreamcheck_stream_limitheaders r,read_var_headerzMatFile5Reader.read_var_headers""..<<> A~56 6??'')J6 \ !$T__jAF    * *6 2!%!F!F !%!4!4!B!B!D FJ!&     * *4?? ;!@IJ J$$001CDxr.c:|jj||S)a Read array, given `header` Parameters ---------- header : header object object with fields defining variable header process : {True, False} bool, optional If True, apply recursive post-processing during loading of array. Returns ------- arr : array array with post-processing applied or not according to `process`. )rGarray_from_header)rHr}processs r,read_var_arrayzMatFile5Reader.read_var_arrays"""44VWEEr.ct|tr|g}n | t|}|jj d|j |j }g|d<|js|j\}}|jdn|jjd}||vr"d|d}tj|td |d k(rd }d }nd }| ||vr|jj | |j||}|jj ||||<|j"r|dj%|| |j'|t)|dk(rn|js|j*r t-|S|S#t$r2} tjd|d| dt d d| }Yd} ~ d} ~ wwxYw)z get variables from stream as dictionary variable_names - optional list of variable names to get If variable_names is None, then get all variables in file Nr __globals__Nonelatin1zDuplicate variable name "z{" in stream - replacing previous with new Considerscipy.io.matlab.varmats_from_mat to split file into single variable filesrU stacklevel__function_workspace__FTzUnreadable variable "z ", because ""z Read error: )r&strlistrIrXrprm end_of_streamr~namedecodewarningswarnrrrWarning is_globalr0removelenrQr=) rHvariable_namesmdictrj next_positionrmsgrreserrs r, get_variableszMatFile5Reader.get_variables%s nc *,-N  '!.1N Q %%'!m$$&!%!5!5!7 C XX-6388??83LDu}/v666  c>a@rz/ )d..H$$]3 +))#w7 OO  /E$K}}m$++D1)%%d+~&!+I$$&J   "5) )L!  + +D6cU!D+%SE*  +sF## G,(GGc:|jjd|j|jg}|j s|j \}}|j dn|j jd}|dk(rd}|jj|}|jrd}n tj|jd}|j|||f|jj||j s|S)z list variables from stream rrrrrlogicalunknown)rIrXrprmrr~rrrGshape_from_header is_logicalr"getmclassr0)rHvarsrjrrshapeinfos r,list_variableszMatFile5Reader.list_variables`s Q  $$&!%!5!5!7 C XX-6388??83LDrz/''99#>E~~ "szz9= KKud+ , OO  /$$& r.) NFFTFTTNF)TN)__name__ __module__ __qualname____doc__rrBr[rmrpr~rrr __classcell__)rRs@r,r?r?s`&! !"&#("&26" %$#$#L*  /! FF&9vr.r?ct|}|jdttddj}|j |}|jd|j |j|j}g}|js|}|j\}}|jdn|jjd}|j|||z } |j | } t} | j|| j| | jd|j|| f|js|S)a- Pull variables out of mat 5 file as a sequence of mat file objects This can be useful with a difficult mat file, containing unreadable variables. This routine pulls the variables out in raw form and puts them, unread, back into a file stream for saving or reading. Another use is the pathological case where there is more than one variable of the same name in the file; this routine returns the duplicates, whereas the standard reader will overwrite duplicates in the returned dictionary. The file pointer in `file_obj` will be undefined. File pointers for the returned file-like objects are set at 0. Parameters ---------- file_obj : file-like file object containing mat file Returns ------- named_mats : list list contains tuples of (name, BytesIO) where BytesIO is a file-like object containing mat file contents as for a single variable. The BytesIO contains a string with the original header and a single var. If ``var_file_obj`` is an individual BytesIO instance, then save as a mat file with something like ``open('test.mat', 'wb').write(var_file_obj.read())`` Examples -------- >>> import scipy.io >>> import numpy as np >>> from io import BytesIO >>> from scipy.io.matlab._mio5 import varmats_from_mat >>> mat_fileobj = BytesIO() >>> scipy.io.savemat(mat_fileobj, {'b': np.arange(10), 'a': 'a string'}) >>> varmats = varmats_from_mat(mat_fileobj) >>> sorted([name for name, str_obj in varmats]) ['a', 'b'] rr]r^rr)r?rXrritemsizerYrprmrtrr~rrrwriter0) file_objrdrhdr_lenraw_hdrr named_matsstart_positionrjrryvar_strout_objs r,varmats_from_matrys.P  "C MM!k"8,];DDGmmG$G MM!MMOMJ!& 002])vsxxx/H n%"^3 -- +) g g Q4/*! r.ceZdZdZy)EmptyStructMarkerz= Class to indicate presence of empty matlab struct on output N)rrrrr.r,rrsGr.rct|tjr|S|yt|drtj|St|dxrt|dxr t|d}t|tj rnN|sLt|dr@|j jDcic]\}}|jds||}}}d}|rg}g}|jD]q\}}t|ts|d d vr2|jt|tf|j|Pd |d }tj|td s|r tjt!|g|St"S tj$|}|j(j*ttj,fvr|j.dk(r||k(ry|Scc}}w#t&$rtj$|t}YowxYw)a Convert input object ``source`` to something we can write Parameters ---------- source : object Returns ------- arr : None or ndarray or EmptyStructMarker If `source` cannot be converted to something we can write to a matfile, return None. If `source` is equivalent to an empty dictionary, return ``EmptyStructMarker``. Otherwise return `source` converted to an ndarray with contents for writing to matfile. N __array__keysvaluesitemsr7_Tr _0123456789z2Starting field name with a underscore or a digit ( ) is ignoredrUrdtyper)r&r'r(hasattrasarraygenericr7r startswithrr0objectrrrarraytupler asanyarrayrsrtypeobject_r) source is_mappingr<valuerrfieldrnarrs r, to_writeablers&"**%  ~v{#zz&!!&&)+gfh.G+&'*&"**% GFJ7/5/D/D/F2e ^^C0u*22 "LLN FLE5%%8=0LL#e*f!56MM%(**/ >CMM#1E F 88U6]OU3 3$ $3}}V$ zz62::.. zzRDFN K92, 3}}V623s/GG##$H  H r]r^tag_full tag_smalldata array_flagsceZdZdZej deZeed<dZ dZ dZ ddZ d Z d Z dd Zd Zd ZdZdZddZdZdZdZdZdZdZy) VarWriter5z% Generic matlab matrix writing class rrxc|j|_|j|_|j|_|j|_d|_d|_y)NF) file_streamunicode_stringslong_field_namesoned_as _var_name_var_is_global)rH file_writers r,rBzVarWriter5.__init__ sH&22*:: + < <"** #r.cZ|jj|jdy)NForder)rrtobytesrHarrs r, write_byteszVarWriter5.write_bytess! s{{{56r.c:|jj|yr)rr)rHss r, write_stringzVarWriter5.write_strings q!r.Ncz| t|jjdd}|jjtk(r7|j j |jj}|j|jz}|dkr|j|||y|j|||y)z write tag and data Nr) rrr byteorderrbyteswapview newbyteorderr)rwrite_smalldata_elementwrite_regular_element)rHrrxrys r, write_elementzVarWriter5.write_elements >!#))--"34F 99  , .,,.%%cii&<&<&>?CXXcll* ?  ( (fj A  & &sFJ ?r.ctjdt}|dz|z|d<|jd|d<|j |y)Nrbyte_count_mdtyperrdata)r'zeros NDT_TAG_SMALLrr)rHrrxrytags r,rz"VarWriter5.write_smalldata_element%sIhhr=)$."$4#>  kkk,F  r.ctjdt}||d<||d<|j||j||dz}|r"|jj dd|z zyy)Nrrxryrb)r'r NDT_TAG_FULLrrr)rHrrxryrbc_mod_8s r,rz VarWriter5.write_regular_element-sohhr<(H &L  >     " "7aj#9 : r.cT|j}|j}|jj|_|j |j tjdt}t|d<d|d<|dz|dzz|dzz} || dzz|d<||d <|j ||jtj|d tj|}|d k(r|j|td n|j|td |_d|_y)a Write header for given data options shape : sequence array shape mclass - mat5 matrix class is_complex - True if matrix is complex is_logical - True if matrix is logical nzmax - max non zero elements for sparse arrays We get the name and the global flag from the object, and reset them to defaults after we've used them r data_typerbryrUr flags_classnzmaxi4rrrFN)rrrrt _mat_tag_posrmat_tagr'rNDT_ARRAY_FLAGSrrrrrr) rHrr is_complexrrrrafflagss r, write_headerzVarWriter5.write_header9s$~~''  ,,113 & XXb/ *";<a)q.0:?B"UaZ/=7   288E67zz$ 2:  ( (vq 9   tV ,#r.c(|jj}|jj|||z dz }|dk\r td||jd<|j |j|jj|y)Nrblz-Matrix too large to save with Matlab 5 formatry)rrtrXr r r)rH start_poscurr_posrys r,update_matrix_tagzVarWriter5.update_matrix_tagds##((* i( )A-  !+, ,%/ \" & h'r.cB||_||_|j|y)a Write variable at top level of mat file Parameters ---------- arr : array_like array-like object to create writer for name : str, optional name as it will appear in matlab workspace default is empty string is_global : {False, True}, optional whether variable will be global on load into matlab N)rrr)rHrrrs r, write_topzVarWriter5.write_topos( 3r.c|jj}tjj |r#|j ||j |yt|}|td|dt|dt|tr|j|nt|tr td|tur|j!n|j"j$r|j'|nu|j"j(r|j+|nM|j"j,dvr$|j.rd}nd}|j1||n|j3||j |y) z Write `arr` to stream at top and sub levels Parameters ---------- arr : array_like array-like object to create writer for NzCould not convert z (type z ) to arrayzCannot write matlab functions)USUTF8ascii)rrtscipysparseissparse write_sparserrrvrr&r write_objectrr rwrite_empty_structrfields write_struct hasobject write_cellskindr write_char write_numeric)rHr mat_tag_posrcodecs r,rzVarWriter5.writes=&&++- <<  %   c "  " "; / C  <0WT#YKzRS S dL )   d # n - ?@ @ & &  # # % ZZ     d # ZZ ! !   T " ZZ__ *## OOD% (   t $ {+r.c,|jjdk(}|jjdk(} t|jjdd}|jt||j||||r7|j|j|j|jy|j|y#t$rB|r|j d}n%|r|j d}n|j d}t }YwxYw)Ncbrc128i1f8)r r) rr&rrKeyErrorastyper!rr rrrealimag)rHrimagflogifrs r,r(zVarWriter5.write_numerics #% #% $"399==#45F '#t||4 %*%*  ,    sxx (   sxx (   s #% $jj(jj&jj&#F $s CADDc|jdk(stj|dk(rRdtj|jdgz}|j |t |j|tdyt|}|j}|j |t |jjdk(r|jrtj|}tjdt!|||j"j%}|j'j)|}tjt+|fd |}|j-|t y) z5 Write string array `arr` with given `codec` rr)rrUNrrrrbufferS1rx)r)r'allmaxr*rrrrr rrr&mathprodr(r Tcopyrfencoderr)rHrr*rn_charsst_arrsts r,r'zVarWriter5.write_chars 88q=BFF3"9-266388Q-00E   e\ 2  ( (fa 8 3  %. 99>>S SXX ii&GZZb&6sG&D'*uuzz|5F%%e,B**CG:#'$&(C 3v.r.cl|j}|j|jjdk(}|jjdk(}|j}|j t ||jt|||dk(rdn||j|jjd|j|jjd|j|jj|r&|j|jjyy)z Sparse matrices are 2D r,r-rr)r rrr N)tocsc sort_indicesrr&nnzrr rr rindicesr2indptrrr3r4)rHrAr rnzs r,rzVarWriter5.write_sparses IIK ggllc) ggllc) UU '#t||4(%/%/%'1W"  6 199++D12 188??401 166;;'    qvv{{ + r.c|jt||jtt j |j d}|D]}|j|y)Nr)rr rrr' atleast_2dflattenr)rHrrLels r,r%zVarWriter5.write_cellssU '#t||4& ( MM#  & &s + B JJrN r.c|jdt|jtjdtj |jtjgtj y)N)rrrr)rrrr'rint32int8ros r,r!zVarWriter5.write_empty_struct sL &.1 288ARXX67 288Bbgg67r.cz|jt||jt|j |yr)rr rr _write_itemsrs r,r#zVarWriter5.write_structs. '#t||4( * #r.c<|jjDcgc]}|d }}t|Dcgc] }t|c}dz}|jxrdxsd}||kDrt d|dz d|j tj|gd|j tj|d |t tj|jd }|D]}|D]}|j||ycc}wcc}w) Nrr@ zField names are restricted to z charactersr rrr;r) rdescrr=rrrsrr'rrrOrPr) rHrr: fieldnames fieldnamelength max_lengthrLrQs r,rVzVarWriter5._write_itemss $'IIOO4qad4 4jAc)nAB1D++29r J 0a0@ L  288VHD9: 288J&lCFS MM#  & &s + "B " 2a5! " "5As DDc|jt||jt|j t j |jdt|j|y)zmSame as writing structs, except different mx class, and extra classname element after header rrr;N) rr rrrr'r classnamerrVrs r,r zVarWriter5.write_object&sW '#t||4( * 288CMM="(  * #r.r)FFr)r)rrrrr'rrr rrBrrrrrrrrrr(r'rr%r!r#rVr rr.r,rrs/bhhr<(G GH$7" @ ;!& % )$V ((%,N$2)/V,(8 " r.rc:eZdZdZe ddZdZddZy)MatFile5Writerz Class for writing mat5 files Ncz||_||_||_|r||_ng|_||_||_d|_y)a< Initialize writer for matlab 5 format files Parameters ---------- %(do_compression)s %(unicode_strings)s global_vars : None or sequence of strings, optional Names of variables to be marked as global for matlab %(long_fields)s %(oned_as)s N)rdo_compressionr global_varsrr_matrix_writer)rHrrdrrerrs r,rBzMatFile5Writer.__init__4sF$',. *D !D  0 "r.cFtjdt}dtjdt j |d<d|d<tjddtjd |d <|jj|jy) NrzMATLAB 5.0 MAT-file Platform: z, Created on: r_raS2iIMr8 endian_test) r'r NDT_FILE_HDRosrtimeasctimer(uint16rrr)rHrjs r,write_file_headerz MatFile5Writer.write_file_headerQshhr<( >rwwiH--1\\^,<>MIZZb,0-/YYv->@M s{{}-r.c^||jjdk(}|r|jt||_|j D]X\}}|ddk(r#d|d}t j|td2||jv}|jrt}||j_|jj||jd|tj|j!}t#j$d t&} t(| d <t+|| d <|jj-| j/|jj-|-|jj||jd|[y) a Write variables in `mdict` to stream Parameters ---------- mdict : mapping mapping with method ``items`` returns name, contents pairs where ``name`` which will appear in the matlab workspace in file load, and ``contents`` is something writeable to a matlab file, such as a NumPy array. write_header : {None, True, False}, optional If True, then write the matlab file header before writing the variables. If None (the default) then write the file header if we are at position 0 in the stream. By setting False here, and setting the stream position to the end of the file, you can append variables to a matlab file Nrrz'Starting field name with a underscore (rrUrrrrxry)rrtrprrfrrrrrerdrrrBzlibcompressgetvaluer'emptyrrrrr) rHrrrvarrrr{out_strrs r, put_variableszMatFile5Writer.put_variables\se$  ++002a7L   " " $(. UID#Aw#~&&*V<9 c?qA 0 00I"" 28##/##--c4;;x3H)T--(9:hhr<0 ,H $'LL!  &&s{{}5  &&w/##--c4;;x3H)T% Ur.)FFNFrowr)rrrrrrBrprxrr.r,rbrb1s1( %!&!"' ##8 .)Ur.rb)@rr>rlrmrCrriorrnumpyr' scipy.sparser_byteordercodesrr_miobaserrr r r r r rrr _mio5_utilsr _mio5_paramsrrrrrrrrrrrrrrr r!r"r#_streamsr$r-r2r1r=r?rrrrkrrr rrbrr.r,rsEN 6EEE $CCCCC &-    a]aHAHHH9z{#H-m< {#H-j9  $X.? +&x0?jjZ TUTUr.