L i, ddlZddlmZddlmZmZmZmZddlZddl m Z m Z m Z m Z mZddlmZmZe rddlZddlZe rddlmZe j.eZe ed Gd d eZd Zy)N)BytesIO)AnyOptionalUnionoverload)add_end_docstringsis_av_availableis_torch_availableloggingrequires_backends)Pipelinebuild_pipeline_init_args),MODEL_FOR_VIDEO_CLASSIFICATION_MAPPING_NAMEST)has_image_processorc eZdZdZdZdZdZdZfdZddZ e de de de ee e ffd Ze de e de de e ee e ffd Zddeee e e fffd Zdd Zd ZddZxZS)VideoClassificationPipelinea Video classification pipeline using any `AutoModelForVideoClassification`. This pipeline predicts the class of a video. This video classification pipeline can currently be loaded from [`pipeline`] using the following task identifier: `"video-classification"`. See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=video-classification). FTcft||i|t|d|jty)Nav)super__init__r check_model_typer)selfargskwargs __class__s q/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/pipelines/video_classification.pyrz$VideoClassificationPipeline.__init__;s. $)&)$% JKc~i}|||d<|||d<i}|||d<||dvrtd|d||d<nd|d<|i|fS) Nframe_sampling_rate num_framestop_k)softmaxsigmoidnonez'Invalid value for `function_to_apply`: z2. Valid options are ['softmax', 'sigmoid', 'none']function_to_applyr$) ValueError)rr#r"r!r'preprocess_paramspostprocess_paramss r_sanitize_parametersz0VideoClassificationPipeline._sanitize_parameters@s  *7J 3 4  !.8 l +  */ w '  ( (FF =>O=PQGG7H 2 36? 2 3 "&888rinputsrreturnc yNrr,rs r__call__z$VideoClassificationPipeline.__call__UsLOrc yr/r0r1s rr2z$VideoClassificationPipeline.__call__XsX[rc d|vr+tjdt|jd}| t dt ||fi|S)a Assign labels to the video(s) passed as inputs. Args: inputs (`str`, `list[str]`): The pipeline handles three types of videos: - A string containing a http link pointing to a video - A string containing a local path to a video The pipeline accepts either a single video or a batch of videos, which must then be passed as a string. Videos in a batch must all be in the same format: all as http links or all as local paths. top_k (`int`, *optional*, defaults to 5): The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. num_frames (`int`, *optional*, defaults to `self.model.config.num_frames`): The number of frames sampled from the video to run the classification on. If not provided, will default to the number of frames specified in the model configuration. frame_sampling_rate (`int`, *optional*, defaults to 1): The sampling rate used to select frames from the video. If not provided, will default to 1, i.e. every frame will be used. function_to_apply(`str`, *optional*, defaults to "softmax"): The function to apply to the model output. By default, the pipeline will apply the softmax function to the output of the model. Valid options: ["softmax", "sigmoid", "none"]. Note that passing Python's built-in `None` will default to "softmax", so you need to pass the string "none" to disable any post-processing. Return: A list of dictionaries or a list of list of dictionaries containing result. If the input is a single video, will return a list of `top_k` dictionaries, if the input is a list of several videos, will return a list of list of `top_k` dictionaries corresponding to the videos. The dictionaries contain the following keys: - **label** (`str`) -- The label identified by the model. - **score** (`int`) -- The score attributed by the model for that label. videoszuThe `videos` argument has been renamed to `inputs`. In version 5 of Transformers, `videos` will no longer be acceptedzICannot call the video-classification pipeline without an inputs argument!)warningswarn FutureWarningpopr(rr2)rr,rrs rr2z$VideoClassificationPipeline.__call__[sYN v  MMH ZZ)F >hi iw1&11rc&| |jjj}|jds|jdr(t t j |j}tj|}d}||zdz }tj|||tj}t||}t|}|j||j }|j dk(r|j#|j$}|S)Nzhttp://zhttps://rr)numdtype)return_tensorspt)modelconfigr" startswithrrequestsgetcontentropennplinspaceint64read_video_pyavlistimage_processor frameworktor<) rvideor"r! container start_idxend_idxindices model_inputss r preprocessz&VideoClassificationPipeline.preprocesss  **55J   I &%*:*::*FHLL/778EGGEN  22Q6++ijQ 73U ++E$..+Q >>T !'??4::6Lrc*|jdi|}|S)Nr0)r?)rrS model_outputss r_forwardz$VideoClassificationPipeline._forwards" 2\2 rc~||jjjkDr |jjj}|jdk(rk|dk(r|jdj d}n2|dk(r|jdj }n|jd}|j|\}}ntd|j|j}|j}t||Dcgc]+\}}||jjj|d-c}}Scc}}w)Nr>r$rr%zUnsupported framework: )scorelabel) r?r@ num_labelsrLlogitsr$r%topkr(tolistzipid2label) rrVr#r'probsscoresidsrZ_ids r postprocessz'VideoClassificationPipeline.postprocesss  4::$$// /JJ%%00E >>T ! I-%,,Q/77;"i/%,,Q/779%,,Q/**U+KFC6t~~6FGH Hjjl]`agil]mnzuVY%$***;*;*D*DS*IJnnns0D9)NNNNr/)Nr)r$)__name__ __module__ __qualname____doc___load_processor_load_image_processor_load_feature_extractor_load_tokenizerrr+rstrrrJdictr2rrrTrWrf __classcell__)rs@rrr)s O #OL 9*OsOcOd4S>6JOO [tCy[C[Dd3PS8nAU/2b*orrc8g}|jd|d}|d}t|jdD](\}}||kDrn||k\s||vs|j|*t j |Dcgc]}|j dc}Scc}w)NrrY)rNrgb24)format)seek enumeratedecodeappendrFstack to_ndarray)rOrRframes start_index end_indexiframexs rrIrIs F NN1!*K Ii..Q.78!5 y=  W MM% ! 886BaQ\\\1B CCBs7B)r6iortypingrrrrrButilsr r r r r baserrrnumpyrFmodels.auto.modeling_autor get_loggerrhloggerrrIr0rrrs115 X   H %,FGLo(LoHLo^ Dr