L i+ddlZddlmZmZddlZddlZddlmZm Z m Z m Z m Z ddl mZmZe rddlmZe j$eZded ed ej.fd Zeed GddeZy)N)AnyUnion)add_end_docstringsis_torch_availableis_torchaudio_availableis_torchcodec_availablelogging)Pipelinebuild_pipeline_init_args),MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMESbpayload sampling_ratereturnc z|}d}d}dddd|d|d|d d d d g } tj|tjtj }|j |}|d}t j|t j} | jddk(r t d| S#t$r t dwxYw)z? Helper function to read an audio file through ffmpeg. 1f32leffmpegz-izpipe:0z-acz-arz-fz -hide_bannerz -loglevelquietzpipe:1)stdinstdoutzFffmpeg was not found but is required to load audio files from filenamerzMalformed soundfile) subprocessPopenPIPEFileNotFoundError ValueError communicatenp frombufferfloat32shape) rraracformat_for_conversionffmpeg_commandffmpeg_process output_stream out_bytesaudios q/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/pipelines/audio_classification.py ffmpeg_readr,s ?B B#      N c#)). XbXgXgh#..x8Ma I MM)RZZ 0E {{1~.// L cabbcs 4B%%B:T)has_feature_extractorc eZdZdZdZdZdZdZfdZde e je e efdedeee efffd Zd d Zd Zd Zdd ZxZS)AudioClassificationPipelinea Audio classification pipeline using any `AutoModelForAudioClassification`. This pipeline predicts the class of a raw waveform or an audio file. In case of an audio file, ffmpeg should be installed to support multiple audio formats. Example: ```python >>> from transformers import pipeline >>> classifier = pipeline(model="superb/wav2vec2-base-superb-ks") >>> classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac") [{'score': 0.997, 'label': '_unknown_'}, {'score': 0.002, 'label': 'left'}, {'score': 0.0, 'label': 'yes'}, {'score': 0.0, 'label': 'down'}, {'score': 0.0, 'label': 'stop'}] ``` Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial) This pipeline can currently be loaded from [`pipeline`] using the following task identifier: `"audio-classification"`. See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=audio-classification). FTcd|vr |dd|d<n d|vrd|d<t||i||jdk7rtd|jd|j t y)Ntop_kptzThe z is only available in PyTorch.)super__init__ frameworkr __class__check_model_typer)selfargskwargsr7s r+r5z$AudioClassificationPipeline.__init__bsu f !8"F7O F "F7O $)&) >>T !tDNN#33QRS S JKinputsr;rc $t||fi|S)a Classify the sequence(s) given as inputs. See the [`AutomaticSpeechRecognitionPipeline`] documentation for more information. Args: inputs (`np.ndarray` or `bytes` or `str` or `dict`): The inputs is either : - `str` that is the filename of the audio file, the file will be read at the correct sampling rate to get the waveform using *ffmpeg*. This requires *ffmpeg* to be installed on the system. - `bytes` it is supposed to be the content of an audio file and is interpreted by *ffmpeg* in the same way. - (`np.ndarray` of shape (n, ) of type `np.float32` or `np.float64`) Raw audio at the correct sampling rate (no further check will be done) - `dict` form can be used to pass raw audio sampled at arbitrary `sampling_rate` and let this pipeline do the resampling. The dict must be either be in the format `{"sampling_rate": int, "raw": np.array}`, or `{"sampling_rate": int, "array": np.array}`, where the key `"raw"` or `"array"` is used to denote the raw audio waveform. top_k (`int`, *optional*, defaults to None): The number of top labels that will be returned by the pipeline. If the provided number is `None` or higher than the number of labels available in the model configuration, it will default to the number of labels. function_to_apply(`str`, *optional*, defaults to "softmax"): The function to apply to the model output. By default, the pipeline will apply the softmax function to the output of the model. Valid options: ["softmax", "sigmoid", "none"]. Note that passing Python's built-in `None` will default to "softmax", so you need to pass the string "none" to disable any post-processing. Return: A list of `dict` with the following keys: - **label** (`str`) -- The label predicted. - **score** (`float`) -- The corresponding probability. )r4__call__)r9r=r;r7s r+r?z$AudioClassificationPipeline.__call__osDw1&11r<c ,i}|$|jjj|d<nH||jjjkDr |jjj}||d<||dvrtd|d||d<nd|d<ii|fS)Nr1)softmaxsigmoidnonez'Invalid value for `function_to_apply`: z2. Valid options are ['softmax', 'sigmoid', 'none']function_to_applyrA)modelconfig num_labelsr)r9r1rDr;postprocess_paramss r+_sanitize_parametersz0AudioClassificationPipeline._sanitize_parameterss =*.***;*;*F*F w 'tzz((333 ))44*/ w '  ( (FF =>O=PQGG7H 2 36? 2 32)))r<ct|trg|jds|jdr tj|j }n%t |d5}|j}dddt|tr t||jj}tr8ddl }t||jr|jj!}t#rSddl }ddl}t||j&j(r+|j+}|j,}||j.d}t|t0r |j3}d|vrd|vsd|vs t5d |j7dd}|$|j7d d|j7dd}|j7d}|}||jjk7rddl }t9rdd lm} n t?d | jAt|tBjDr|jG|n|||jjj!}t|tBjDs tId tK|jLdk7r t5d|j||jjd} |jN| jQ|jN} | S#1swYxYw)Nzhttp://zhttps://rbr)arrayrrrawrLzWhen passing a dictionary to AudioClassificationPipeline, the dict needs to contain a "raw" key containing the numpy array or torch tensor representing the audio and a "sampling_rate" key, containing the sampling_rate associated with that arraypath) functionalztorchaudio is required to resample audio samples in AudioClassificationPipeline. The torchaudio package can be installed through: `pip install torchaudio`.z2We expect a numpy ndarray or torch tensor as inputr zFWe expect a single channel audio input for AudioClassificationPipeliner3)rreturn_tensors)dtype)) isinstancestr startswithrequestsgetcontentopenreadbytesr,feature_extractorrrtorchTensorcpunumpyr torchcodecdecoders AudioDecoderget_all_samplesdata sample_ratedictcopyrpopr torchaudiorO ImportErrorresamplerndarray from_numpy TypeErrorlenr"rQto) r9r=fr\r`_audio_samples_array_inputsin_sampling_rateF processeds r+ preprocessz&AudioClassificationPipeline.preprocesss fc "  +v/@/@/L"f-55&$'&1VVXF& fe $ )?)?)M)MNF   &%,,/++- " $  &*"5"5"B"BC!'!7!7!9',,#)Nrsuu4X   H %!%!! !H,4HIC(CJCr<