SummariV

Video

Drop MP4 or click

up to ~500 MB

— —

Run the pipeline to index this video

Index

—

Audio segs

—

Visual caps

—

Duration

✓ Ready

Status

Process

Ask

Summary

Run the pipeline

Whisper transcribes audio into timestamped segments, BLIP captions sampled frames, then FAISS builds a semantic index across both streams.

Upload video

Send MP4 to backend API

Whisper transcription

Speech → timestamped text segments

BLIP captioning

Visual description per frame

FAISS index

Audio + visual vector indices

Ask a question — SummariV searches both audio and visual streams.

Video Summary

Process a video first, then open this tab to generate a summary.