# Features and requirements

Unitary's models are multimodal meaning they can analyse text, images, and videos both in isolation and in combination. This includes audio/speech and text that can be identified by optical character recognition (OCR). For example, if you send a video that includes audio, text within the frames, and a caption, Unitary will analyse all these elements to give you a contextual analysis of that video.

Unitary API response times vary between sub-second and 24 hours. Please let us know if you want to negotiate a different response time.

### Endpoint limitations

#### All endpoints

<table><thead><tr><th width="129">Modality</th><th width="128">Max file size</th><th width="199">Max length processed</th><th>Formats supported</th></tr></thead><tbody><tr><td><strong>Video</strong></td><td>200MB</td><td>3 min</td><td><code>.mp4</code>, <code>.mpeg</code>, <code>.webm</code>, <code>.mov</code>, <code>.mkv</code>, <code>.gif</code> , <code>.m4v</code></td></tr><tr><td><strong>Image</strong></td><td>10MB</td><td>-</td><td><code>.png</code>, <code>.jpeg</code></td></tr><tr><td><strong>Text</strong></td><td>256KB</td><td>-</td><td>-</td></tr></tbody></table>

#### Virtual Moderator

* **Payload:** The current maximum size of the actual payload body is a total 200KiB with a 100KiB limit per each attachment.

### Recommendations on videos and images

* For maximum efficiency, the recommend resolution is between 336 and 1200 pixels per side.
* To increase accuracy, please send any text submitted by your users alongside the video or image as a `caption` in the same API request. Unitary's models will analyse everything together.
* Media files can either be sent as multi-part form uploads or, preferably, by including a `url` field in the request where the media file can be downloaded from. Pre-signed object-storage URLs are supported.
* If you require sub-second latency, please use the resource URL server to implement partial content delivery using the “range” HTTP header.&#x20;

### Add-on Features

Please let us know if you'd like to start using any of the following add-on features:

* **Include the Optical Character Recognition (OCR) transcription in the API response.** OCR refers to any text that appears on the image or video. Examples include captions for translations, words displayed on a T-shirt, or handwritten content. Unitary's models always check images and videos for OCR content that can be fed into Unitary's models. This OCR transcript can be shared in the classification results.
* **Include Speech Transcriptions in the API response.** Audio transcriptions are a literal transcription of the speech present in a video. Unitary's API feeds this speech into Unitary's models. These speech transcriptions can be shared in the classification results. This is only available in English and Spanish with more languages potentially available in future.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.unitary.ai/api-references/features-and-requirements.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
