Drop MP4 or click
up to ~500 MB

Run the pipeline to index this video

Audio segs
Visual caps
Duration
✓ Ready
Status
Process
Ask
Summary
Run the pipeline
Whisper transcribes audio into timestamped segments, BLIP captions sampled frames, then FAISS builds a semantic index across both streams.
Upload video
Send MP4 to backend API
Whisper transcription
Speech → timestamped text segments
BLIP captioning
Visual description per frame
FAISS index
Audio + visual vector indices

Ask a question — SummariV searches both audio and visual streams.

Video Summary

Process a video first, then open this tab to generate a summary.