L iddlZddlZddlZddlmZddlmZddlmZm Z m Z m Z ddl Z ddlmZddlmZmZddlmZdd lmZmZmZdd lmZmZdd lmZmZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)dd l*m+Z+dd l,m-Z-ddl.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7e&rddl8Z8e(rddl9m:Z;e)jxe=Z>dZ?e!de?e-dGddeZ@e"e@je@_Ae@jj8e@jjjddde@j_Byy)N)deepcopy)partial)AnyCallableOptionalUnion)custom_object_save) BatchFeature get_size_dict)BaseImageProcessorFast)ChannelDimensionSizeDictvalidate_kwargs)Unpack VideosKwargs) IMAGE_PROCESSOR_NAMEPROCESSOR_NAMEVIDEO_PROCESSOR_NAME TensorTypeadd_start_docstrings copy_func download_urlis_offline_mode is_remote_urlis_torch_availableis_torchcodec_availableis_torchvision_v2_availablelogging) cached_file)requires) VideoInput VideoMetadatagroup_videos_by_shapeis_valid_video load_videomake_batched_metadatamake_batched_videosreorder_videosto_channel_dimension_format) functionala Args: do_resize (`bool`, *optional*, defaults to `self.do_resize`): Whether to resize the video's (height, width) dimensions to the specified `size`. Can be overridden by the `do_resize` parameter in the `preprocess` method. size (`dict`, *optional*, defaults to `self.size`): Size of the output video after resizing. Can be overridden by the `size` parameter in the `preprocess` method. size_divisor (`int`, *optional*, defaults to `self.size_divisor`): The size by which to make sure both the height and width can be divided. default_to_square (`bool`, *optional*, defaults to `self.default_to_square`): Whether to default to a square video when resizing, if size is an int. resample (`PILImageResampling`, *optional*, defaults to `self.resample`): Resampling filter to use if resizing the video. Only has an effect if `do_resize` is set to `True`. Can be overridden by the `resample` parameter in the `preprocess` method. do_center_crop (`bool`, *optional*, defaults to `self.do_center_crop`): Whether to center crop the video to the specified `crop_size`. Can be overridden by `do_center_crop` in the `preprocess` method. crop_size (`dict[str, int]` *optional*, defaults to `self.crop_size`): Size of the output video after applying `center_crop`. Can be overridden by `crop_size` in the `preprocess` method. do_rescale (`bool`, *optional*, defaults to `self.do_rescale`): Whether to rescale the video by the specified scale `rescale_factor`. Can be overridden by the `do_rescale` parameter in the `preprocess` method. rescale_factor (`int` or `float`, *optional*, defaults to `self.rescale_factor`): Scale factor to use if rescaling the video. Only has an effect if `do_rescale` is set to `True`. Can be overridden by the `rescale_factor` parameter in the `preprocess` method. do_normalize (`bool`, *optional*, defaults to `self.do_normalize`): Whether to normalize the video. Can be overridden by the `do_normalize` parameter in the `preprocess` method. Can be overridden by the `do_normalize` parameter in the `preprocess` method. image_mean (`float` or `list[float]`, *optional*, defaults to `self.image_mean`): Mean to use if normalizing the video. This is a float or list of floats the length of the number of channels in the video. Can be overridden by the `image_mean` parameter in the `preprocess` method. Can be overridden by the `image_mean` parameter in the `preprocess` method. image_std (`float` or `list[float]`, *optional*, defaults to `self.image_std`): Standard deviation to use if normalizing the video. This is a float or list of floats the length of the number of channels in the video. Can be overridden by the `image_std` parameter in the `preprocess` method. Can be overridden by the `image_std` parameter in the `preprocess` method. do_convert_rgb (`bool`, *optional*, defaults to `self.image_std`): Whether to convert the video to RGB. video_metadata (`VideoMetadata`, *optional*): Metadata of the video containing information about total duration, fps and total number of frames. do_sample_frames (`int`, *optional*, defaults to `self.do_sample_frames`): Whether to sample frames from the video before processing or to process the whole video. num_frames (`int`, *optional*, defaults to `self.num_frames`): Maximum number of frames to sample when `do_sample_frames=True`. fps (`int` or `float`, *optional*, defaults to `self.fps`): Target frames to sample per second when `do_sample_frames=True`. return_tensors (`str` or `TensorType`, *optional*): Returns stacked tensors if set to `pt, otherwise returns a list of tensors. data_format (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`): The channel dimension format for the output video. Can be one of: - `"channels_first"` or `ChannelDimension.FIRST`: video in (num_channels, height, width) format. - `"channels_last"` or `ChannelDimension.LAST`: video in (height, width, num_channels) format. - Unset: Use the channel dimension format of the input video. input_data_format (`ChannelDimension` or `str`, *optional*): The channel dimension format for the input video. If unset, the channel dimension format is inferred from the input video. Can be one of: - `"channels_first"` or `ChannelDimension.FIRST`: video in (num_channels, height, width) format. - `"channels_last"` or `ChannelDimension.LAST`: video in (height, width, num_channels) format. - `"none"` or `ChannelDimension.NONE`: video in (height, width) format. device (`torch.device`, *optional*): The device to process the videos on. If unset, the device is inferred from the input videos. return_metadata (`bool`, *optional*): Whether to return video metadata or not. z!Constructs a base VideoProcessor.)vision torchvision)backendsceZdZdZdZdZdZdZdZdZ dZ dZ dZ dZ dZdZdZdZdZdZdZdZeZdgZdeeddffd Zdefd Zd d defd Z d?d ede e!de e"e!e#ffdZ$ d?dede"ee%fde e&de e'de(d f dZ) d?dede e"e*e+fde e*de(d fdZ,e-e.dedeedefdZ/ d@de(d de&de&de0de dde&d e0d!e&d"e#d#e&d$e e"e#e(e#fd%e e"e#e(e#fd&e e"e*e1fdefd'Z2e3 dAd(e"e*e4jjfd)e e"e*e4jjfd*e&d+e&d,e e"e*e&fd-e*f d.Z6dBd/e"e*e4jjfd0e&fd1Z7e3d(e"e*e4jjfde8e%e*e9fe%e*e9fffd2Z:e3d3e%e*e9ffd4Z;de%e*e9ffd5Zd9Z?e3d:e"e*e4jjffd;Z@e3dCd<ZAd@d=e"e*e(e*e(e(e*ffd>ZBxZCS)DBaseVideoProcessorNTgp?Fpixel_values_videoskwargsreturnc t||jdd|_|j D]\}} t ||||jd|j}|'t||jd|jnd|_ |jd|j}| t|d nd|_ t|jjj!|_|j"D]E}|j%|t ||||%t ||t't)||dGy#t $r%}tjd|d|d||d}~wwxYw) Nprocessor_classz Can't set z with value z for sizedefault_to_square)r6r7 crop_size) param_name)super__init__pop_processor_classitemssetattrAttributeErrorloggererrorr6r r7r8list valid_kwargs__annotations__keysmodel_valid_processing_keysgetrgetattr)selfr2keyvalueerrr6r8 __class__s i/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/video_processing_utils.pyr;zBaseVideoProcessor.__init__sj  & + 2x+X>*1'%%eGn5 ''1  2 $F-N~%% q *&)T*#'"3"3F";KKV LE!7 LRST$$o ~%%*.):):6Uf):)g&~%%!Ms3D% D ' D% D%input_data_formatdevicecg}|D]~}t|tjr>t|tj |}t j|j}||j|}|j||S)z: Prepare the input videos for processing. ) rvnpndarrayr*rFIRSTrh from_numpy contiguoustoru)rJrSrrprocessed_videosrVs rO_prepare_input_videosz(BaseVideoProcessor._prepare_input_videosFs +E%,3E;K;Q;QSde((/::<!(  # #E * + rUc t|jt|jjjdgz|jjD]}|j |t ||d!|jd}|jd}|jd}|jd}|rt|jfi|nd}|j||||\}}|j|||}|jd i|}|jd i||jd |jd } |jd d |i|} | r|| d<| S) Nreturn_tensors)captured_kwargsvalid_processor_keysrrmrrl)rlrmrn)rSrr data_formatreturn_metadatarS)rrFrCrDrE setdefaultrIr<rrkrr_further_process_kwargs_validate_preprocess_kwargs _preprocess) rJrSr2 kwarg_namerrmrrlrnrpreprocessed_videoss rOrRzBaseVideoProcessor.preprocess]s "KKM!%d&7&7&G&G&L&L&N!OScRd!d ++;; KJ   j'$ D*I J K#JJ':;!::&89H%$45EUGD$6$6A&A[_!%!?!? )-/ "@"  ++6M^gm+n---77(((262  =! **%67.d..GfGG 4B  0 1""rUdo_convert_rgb do_resizer6 interpolationzF.InterpolationModedo_center_cropr8 do_rescalerescale_factor do_normalize image_mean image_stdrc t|\}}i}|jD]3\}}|r|j|}|r|j|||}|||<5t ||}t|\}}i}|jD]4\}}|r|j ||}|j ||| | | | }|||<6t ||}| rtj|dn|}td|i| S)N)r6rrrpr1)data tensor_type) r$r>r`resizer) center_croprescale_and_normalizerhrxr )rJrSrrr6rrr8rrrrrrr2grouped_videosgrouped_videos_indexresized_videos_groupedr]stacked_videosresized_videosprocessed_videos_groupedrs rOrzBaseVideoProcessor._preprocesss1$0EV/L,,!#%3%9%9%; ; !E>!%!4!4^!D!%^$Vc!d,: "5 )  ; ((>@TU0E^/T,,#% %3%9%9%; = !E>!%!1!1.)!L!77 NL*V_N/= $U + =**BDXYCQ5;;'7Q?Wg"79I!JXfggrUpretrained_model_name_or_path cache_dirforce_downloadlocal_files_onlytokenrevisionc ||d<||d<||d<||d<|jdd}|)tjdt| t d|}|||d <|j |fi|\} }|j | fi|S) a Instantiate a type of [`~video_processing_utils.VideoProcessorBase`] from an video processor. Args: pretrained_model_name_or_path (`str` or `os.PathLike`): This can be either: - a string, the *model id* of a pretrained video hosted inside a model repo on huggingface.co. - a path to a *directory* containing a video processor file saved using the [`~video_processing_utils.VideoProcessorBase.save_pretrained`] method, e.g., `./my_model_directory/`. - a path or url to a saved video processor JSON *file*, e.g., `./my_model_directory/video_preprocessor_config.json`. cache_dir (`str` or `os.PathLike`, *optional*): Path to a directory in which a downloaded pretrained model video processor should be cached if the standard cache should not be used. force_download (`bool`, *optional*, defaults to `False`): Whether or not to force to (re-)download the video processor files and override the cached versions if they exist. resume_download: Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. proxies (`dict[str, str]`, *optional*): A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request. token (`str` or `bool`, *optional*): The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use the token generated when running `hf auth login` (stored in `~/.huggingface`). revision (`str`, *optional*, defaults to `"main"`): The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. To test a pull request you made on the Hub, you can pass `revision="refs/pr/"`. return_unused_kwargs (`bool`, *optional*, defaults to `False`): If `False`, then this function returns just the final video processor object. If `True`, then this functions returns a `Tuple(video_processor, unused_kwargs)` where *unused_kwargs* is a dictionary consisting of the key/value pairs whose keys are not video processor attributes: i.e., the part of `kwargs` which has not been used to update `video_processor` and is otherwise ignored. subfolder (`str`, *optional*, defaults to `""`): In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. kwargs (`dict[str, Any]`, *optional*): The values in kwargs of any keys which are video processor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are *not* video processor attributes is controlled by the `return_unused_kwargs` keyword parameter. Returns: A video processor of type [`~video_processing_utils.ImagVideoProcessorBase`]. Examples: ```python # We can't instantiate directly the base class *VideoProcessorBase* so let's show the examples on a # derived class: *LlavaOnevisionVideoProcessor* video_processor = LlavaOnevisionVideoProcessor.from_pretrained( "llava-hf/llava-onevision-qwen2-0.5b-ov-hf" ) # Download video_processing_config from huggingface.co and cache. video_processor = LlavaOnevisionVideoProcessor.from_pretrained( "./test/saved_model/" ) # E.g. video processor (or model) was saved using *save_pretrained('./test/saved_model/')* video_processor = LlavaOnevisionVideoProcessor.from_pretrained("./test/saved_model/video_preprocessor_config.json") video_processor = LlavaOnevisionVideoProcessor.from_pretrained( "llava-hf/llava-onevision-qwen2-0.5b-ov-hf", do_normalize=False, foo=False ) assert video_processor.do_normalize is False video_processor, unused_kwargs = LlavaOnevisionVideoProcessor.from_pretrained( "llava-hf/llava-onevision-qwen2-0.5b-ov-hf", do_normalize=False, foo=False, return_unused_kwargs=True ) assert video_processor.do_normalize is False assert unused_kwargs == {"foo": False} ```rrrruse_auth_tokenNrThe `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.V`token` and `use_auth_token` are both specified. Please set only the argument `token`.r)r<warningswarn FutureWarningreget_video_processor_dict from_dict) clsrrrrrrr2rvideo_processor_dicts rOfrom_pretrainedz"BaseVideoProcessor.from_pretrainedst({#1 %5!"%z$4d;  % MME   l#E  #F7O'Cs'C'CDa'lek'l$fs}}1>. ) ?>2BBe!fg g NT2 #ZZ(8$?NjjN,@,@,Mb,QRG'd'':6:G#99.I     ' t^D A')ggll>CW&X# 56 /0K/LMN   ' ' -jj) ( ,,,rUc ||jdd}|jdd}|jdd}|jdd}|jdd}|jdd}|jd d} |jd d} |jd d } |jd d} |jdd} |)tjdt| t d|}d| d}| | |d<t r| st jdd} t|}tjj|}tjj|r|}d}n]t|r|}t|}nDt} tt t"fDcgc]}t%||||||| ||| | d x} | }}|d} t+|dd5}|j-}dddt/j0}|j3d|}|rt jd"|||fSt jd"d#|||fScc}w#t&$rt($rt'd|d|dtdwxYw#1swYxYw#t.j4$rt'd |d!wxYw)$a From a `pretrained_model_name_or_path`, resolve to a dictionary of parameters, to be used for instantiating a video processor of type [`~video_processing_utils.VideoProcessorBase`] using `from_dict`. Parameters: pretrained_model_name_or_path (`str` or `os.PathLike`): The identifier of the pre-trained checkpoint from which we want the dictionary of parameters. subfolder (`str`, *optional*, defaults to `""`): In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. Returns: `tuple[Dict, Dict]`: The dictionary(ies) that will be used to instantiate the video processor object. rNrFresume_downloadproxiesrrrr subfolder_from_pipeline _from_autorrvideo processor) file_typefrom_auto_classusing_pipelinez+Offline mode: forcing local_files_only=TrueT) filenamerrrrrr user_agentrr%_raise_exceptions_for_missing_entriesrz Can't load video processor for 'z'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'z2' is the correct path to a directory containing a z filerutf-8encodingvideo_processorz"It looks like the config file at 'z' is not a valid JSON file.zloading configuration file z from cache at )r<rrrrerrArstrrrisdirrrrrrrr OSError ExceptionopenreadjsonloadsrHJSONDecodeError)rrr2rrrrrrrrr from_pipelinerris_localresolved_video_processor_filevideo_processor_filer resolved_fileresolved_video_processor_filesreadertextrs rOrz+BaseVideoProcessor.get_video_processor_dictes;$JJ{D1 $4e< **%6=**Y- 7D)$4d;!::&8%@::j$/JJ{B/  #3T:  **\59  % MME   l#E#4Y  $+8J' (  %5 KKE F# (+,I(J%77==!>? 77>>7 8,I )H 8 9#@ ,89V,W )#7 $  &:;OQ_$`2 )49%-&/+9$+,;-="''1%-&/BG *   ! "2.2*1Oq0Q- 3S7K %v{{} %#'::d#3 #7#;#;hasattrr?rurAr)rrr2rr to_removerKrLs rOrzBaseVideoProcessor.from_dicts " 488:%zz*@%H V *> >+1::f+=  ( & [4H%H06 ;0G  -5 45  ,,. &JC,e4  % & "C JJsD ! "  &&789 "F* *" "rUct|j}|jdd|jdd|jj|d<|S)z Serializes this instance to a Python dictionary. Returns: `dict[str, Any]`: Dictionary of all the attributes that make up this video processor instance. rGN_valid_kwargs_namesvideo_processor_type)r__dict__r<rN__name__)rJoutputs rOto_dictzBaseVideoProcessor.to_dictsJ$--( 0$7 ($/)-)@)@%& rUc|j}|jD]3\}}t|tjs!|j ||<5|j dd}|||d<tj|dddzS)z Serializes this instance to a JSON string. Returns: `str`: String containing all the attributes that make up this feature_extractor instance in JSON format. r=Nr5T)indent sort_keys ) rr>rvrrtolistr<rdumps)rJ dictionaryrKrLr=s rOto_json_stringz!BaseVideoProcessor.to_json_strings\\^ $**, 1JC%,"',,. 3 1 &>>* This API is experimental and may have some slight breaking changes in the next releases. Args: auto_class (`str` or `type`, *optional*, defaults to `"AutoVideoProcessor "`): The auto class to register this new video processor with. rNz is not a valid auto class.) rvrrtransformers.models.automodelsautorrer)r auto_class auto_modules rOregister_for_auto_classz*BaseVideoProcessor.register_for_auto_classLsC *c*#,,J66{J/ |+FGH H$rUvideo_url_or_urlsc d}tstjdd}t|tr0t t |Dcgc]}|j ||c}St|||Scc}w)z Convert a single or a list of urls into the corresponding `np.array` objects. If a single url is passed, the return value will be a single object. If a list is passed a list of objects is returned. torchcodecz`torchcodec` is not installed and cannot be used to decode the video by default. Falling back to `torchvision`. Note that `torchvision` decoding is deprecated and will be removed in future versions. r-rr)backendrn)rrrrvrCrsrzr&)rJr#rnr&xs rOrzzBaseVideoProcessor.fetch_videosfsx&( MMI $G ' .ars\]d//EV/Wstu u/Tef ftsA2 )NNrQ)NFFNmain)F)AutoVideoProcessor)Dr __module__ __qualname__rresamplerrr6 size_divisorr7r8rrrrrrrmrcrbrlrrrDmodel_input_namesrr;r rTr"r`r#rrgrfloatrkdictboolrrCrrrrrBASE_VIDEO_PROCESSOR_DOCSTRINGrRrrr classmethodrPathLikerrtuplerrrrrrrrr"rz __classcell__)rNs@rOr0r0s KHJI DLIINJNLN CJNOL./G !5G$G>1L1 8%)+/ 33SM3eCJ' ( 3r,004 &&&&mT12&&#4. && $H- && n  &&VEI $   $E#/?*?$@A   n   .&&#&#&&#  &#&#l<@,h^$,h,h ,h  ,h  56 ,h,h,h,h,h,hU5$u+#567,hE%e"456,h!sJ!78,h !,h\8<$!&,0o=',S"++-='>o=E#r{{"234o= o=  o= c4i() o=o=o=b;-eC4D.E;-TX;-zs,,1#r{{2B,Cs, tCH~tCH~- .s,s,j*#T#s(^*#*#X c3h GG* 05bkk1A+B 0D+uS"++-='>++$%%2geCcDcO4S.TgrUr0rr)zvideo processor file)object object_class object_files)Drrrrr functoolsrtypingrrrrnumpyrdynamic_module_utilsr image_processing_utilsr r image_processing_utils_fastr image_utilsrrrprocessing_utilsrrutilsrrrrrrrrrrrrr utils.hubr utils.import_utilsr! video_utilsr"r#r$r%r&r'r(r)r*rhtorchvision.transforms.v2r+r[ get_loggerrrAr2r0r__doc__formatrrUrOrJs:  114@ 3#(    9   H %A" H'" ,-b g/b g.  b gJ"++=+I+I!J!!))5-?-K-K-S-S-Z-Z /CRh.[.""*6rU