L it@JddlZddlZddlZddlmZddlmZmZmZddl Z ddl m Z m Z mZmZddlmZedj%dz Zgd Zd d d d ddZdhZdhZdddddddZe j2dZGddZGddZGddZGd d!ZGd"d#Zhd$Z d%Z!d&Z"d'Z#d(Z$d)Z%dCd*Z&d+e'e(ee)ffd,Z*d-Z+d.e,d/e-d+e.fd0Z/d1Z0d2ee)e,e)fd+e.fd3Z1d4Z2dDd5Z3d6Z4d7Z5d8Z6d9Z7d:Z8d;Z9d<Z:d=Z;d>Z< dEd?Z=dFd@Z>dGddddAdBZ?y)HN)Path)OptionalUnionget_args)MODELS_TO_PIPELINE#PIPELINE_TASKS_TO_SAMPLE_DOCSTRINGSPT_SAMPLE_DOCSTRINGS_prepare_output_docstrings) ModelOutputsrc transformers)zconfiguration_*.pyz modeling_*.pyztokenization_*.pyzprocessing_*.pyzimage_processing_*_fast.pyzimage_processing_*.pyzfeature_extractor_*.py)image_processing_autoIMAGE_PROCESSOR_MAPPING_NAMES)video_processing_autoVIDEO_PROCESSOR_MAPPING_NAMES)feature_extraction_autoFEATURE_EXTRACTOR_MAPPING_NAMES)processing_autoPROCESSOR_MAPPING_NAMES)configuration_autoCONFIG_MAPPING_NAMES)image_processor_classvideo_processor_classfeature_extractor_classprocessor_class config_class preprocessImageProcessorFastOpenAIGPTConfig XCLIPConfig Kosmos2ConfigKosmos2_5ConfigDonutSwinConfig EsmConfig)openaizx-clipkosmos2z kosmos2-5donutesmfoldz*\[(.+?)\]\((https://huggingface\.co/.+?)\)ceZdZdddZdddZdddZdddZdddZdddZd ddZ d ddZ d ddZ d ddZ d ddZ dddZdddZdddZdddZdddZdddZdddZdddZdddZdddZy)ImageProcessorArgsz Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If passing in images with pixel values between 0 and 1, set `do_rescale=False`. N descriptionshapez Video to preprocess. Expects a single or batch of videos with pixel values ranging from 0 to 255. If passing in videos with pixel values between 0 and 1, set `do_rescale=False`. z& Whether to resize the image. z> Describes the maximum input dimensions to the model. zP Whether to default to a square image when resizing, if size is an int. z Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only has an effect if `do_resize` is set to `True`. z+ Whether to center crop the image. z@ Size of the output image after applying `center_crop`. z Whether to pad the image. Padding is done either to the largest size in the batch or to a fixed square size per image. The exact padding strategy depends on the model. a The size in `{"height": int, "width" int}` to pad the images to. Must be larger than any image size provided for preprocessing. If `pad_size` is not provided, images will be padded to the largest height and width in the batch. Applied only when `do_pad=True.` z' Whether to rescale the image. zR Rescale factor to rescale the image by if `do_rescale` is set to `True`. z) Whether to normalize the image. ze Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`. zw Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to `True`. z. Whether to convert the image to RGB. zU Returns stacked tensors if set to `pt, otherwise returns a list of tensors. zc Only `ChannelDimension.FIRST` is supported. Added for compatibility with slow processors. a The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of: - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format. - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format. - `"none"` or `ChannelDimension.NONE`: image in (height, width) format. zf The device to process the images on. If unset, the device is inferred from the input images. a5 Whether to disable grouping of images by size to process them individually and not in batches. If None, will be set to True if the images are on CPU, and False otherwise. This choice is based on empirical observations, as detailed here: https://github.com/huggingface/transformers/pull/38157 )__name__ __module__ __qualname__imagesvideos do_resizesizedefault_to_squareresampledo_center_crop crop_sizedo_padpad_size do_rescalerescale_factor do_normalize image_mean image_stddo_convert_rgbreturn_tensors data_formatinput_data_formatdevicedisable_groupingg/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/transformers/utils/auto_docstring.pyr+r+Jsj F F I  D  H N I F  H J N L J I N N K  F  rHr+ceZdZdddZdddZdddZdddZdddZd d dZd d dZ d ddZ dddZ dddZ dddZ dddZdddZdddZdddZdddZdddZdddZdddZdddZdddZdd dZd!ddZd"ddZd#ddZd$ddZd%d&dZd'd&dZd(ddZd)ddZ d*ddZ!d+d,dZ"d-d.dZ#d/ddZ$d0ddZ%d1d2dZ&d3d4dZ'd5d6dZ(y)7 ModelArgsa7 Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. (of shape `(batch_size, sequence_length)`r,aC Calculate logits for the last `num_logits_to_keep` tokens. If `0`, calculate logits for all `input_ids` (special case). Only last token logits are needed for generation, and calculating them only for that token can save memory, which becomes pretty significant for long sequences or large vocabulary size. Na" Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are input IDs?](../glossary#input-ids) a Float values of input raw speech waveform. Values can be obtained by loading a `.flac` or `.wav` audio file into an array of type `list[float]`, a `numpy.ndarray` or a `torch.Tensor`, *e.g.* via the torchcodec library (`pip install torchcodec`) or the soundfile library (`pip install soundfile`). To prepare the array into `input_values`, the [`AutoProcessor`] should be used for padding and conversion into a tensor of type `torch.FloatTensor`. See [`{processor_class}.__call__`] for details. z Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: - 1 for tokens that are **not masked**, - 0 for tokens that are **masked**. [What are attention masks?](../glossary#attention-mask) z Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`: - 1 indicates the head is **not masked**, - 0 indicates the head is **masked**. z4of shape `(num_heads,)` or `(num_layers, num_heads)`z Mask to nullify selected heads of the cross-attention modules. Mask values selected in `[0, 1]`: - 1 indicates the head is **not masked**, - 0 indicates the head is **masked**. z"of shape `(num_layers, num_heads)`z Mask to avoid performing attention on certain token indices. By default, a causal mask will be used, to make sure the model can only look at previous inputs in order to predict the future. z/of shape `(batch_size, target_sequence_length)`z Mask to nullify selected heads of the attention modules in the decoder. Mask values selected in `[0, 1]`: - 1 indicates the head is **not masked**, - 0 indicates the head is **masked**. z4of shape `(decoder_layers, decoder_attention_heads)`z Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. 5of shape `(batch_size, sequence_length, hidden_size)`a, Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`: - 1 for tokens that are **not masked**, - 0 for tokens that are **masked**. a  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: - 0 corresponds to a *sentence A* token, - 1 corresponds to a *sentence B* token. [What are token type IDs?](../glossary#token-type-ids) z Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. [What are position IDs?](../glossary#position-ids) a Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. Only [`~cache_utils.Cache`] instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). If no `past_key_values` are passed, [`~cache_utils.DynamicCache`] will be initialized by default. The model will output the same cache format that is fed as input. If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` of shape `(batch_size, sequence_length)`. a Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. a Indices of decoder input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are decoder input IDs?](../glossary#decoder-input-ids) a( Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix. If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value of `inputs_embeds`. z