ℹ️Features and requirements

Unitary's models are multimodal meaning they can analyse text, images, and videos both in isolation and in combination. This includes audio/speech and text that can be identified by optical character recognition (OCR). For example, if you send a video that includes audio, text within the frames, and a caption, Unitary will analyse all these elements to give you a contextual analysis of that video.

Unitary API response times vary between sub-second and 24 hours. Please let us know if you want to negotiate a different response time.

Endpoint limitations

All endpoints

ModalityMax file sizeMax length processedFormats supported

Video

200MB

3 min

.mp4, .mpeg, .webm, .mov, .mkv, .gif , .m4v

Image

10MB

-

.png, .jpeg

Text

256KB

-

-

Virtual Moderator

  • Payload: The current maximum size of the actual payload body is a total 200KiB with a 100KiB limit per each attachment.

Recommendations on videos and images

  • For maximum efficiency, the recommend resolution is between 336 and 1200 pixels per side.

  • To increase accuracy, please send any text submitted by your users alongside the video or image as a caption in the same API request. Unitary's models will analyse everything together.

  • Media files can either be sent as multi-part form uploads or, preferably, by including a url field in the request where the media file can be downloaded from. Pre-signed object-storage URLs are supported.

  • If you require sub-second latency, please use the resource URL server to implement partial content delivery using the “range” HTTP header.

Add-on Features

Please let us know if you'd like to start using any of the following add-on features:

  • Include the Optical Character Recognition (OCR) transcription in the API response. OCR refers to any text that appears on the image or video. Examples include captions for translations, words displayed on a T-shirt, or handwritten content. Unitary's models always check images and videos for OCR content that can be fed into Unitary's models. This OCR transcript can be shared in the classification results.

  • Include Speech Transcriptions in the API response. Audio transcriptions are a literal transcription of the speech present in a video. Unitary's API feeds this speech into Unitary's models. These speech transcriptions can be shared in the classification results. This is only available in English and Spanish with more languages potentially available in future.

Last updated