SyncFlow

What Is a VTT File? WebVTT Subtitles Explained

VTT files — also known as WebVTT — are the standard subtitle format for HTML5 video players on the web. This guide explains the structure, timestamp format, styling capabilities, and how VTT compares to SRT.

📖 7 min read 🎯 WebVTT format guide 📅 Updated June 2026

What Is a VTT File?

A VTT file is a text-based subtitle format that contains timed caption cues synchronized with video playback. VTT stands for Web Video Text Tracks, and the format is defined by the W3C WebVTT specification. Each cue in a VTT file includes a start timestamp, an end timestamp, and the text to display during that interval.

VTT files are the native subtitle format for HTML5 <video> elements. Modern browsers can load VTT tracks directly without plugins or third-party players, making them the standard choice for web-based video content on platforms like YouTube, Vimeo, and custom HTML5 video players.

Unlike plain text files, VTT supports multiple cue types including subtitles, captions, descriptions, chapters, and metadata. It also allows text styling through CSS, positioning cues anywhere on the screen, and defining multiple languages in separate tracks.

A VTT file uses the file extension .vtt and the MIME type text/vtt. When served from a web server, the correct MIME type ensures browsers recognize and process the file as a timed text track. SyncFlow supports loading, editing, and exporting VTT files alongside SRT files.

What Does WebVTT Stand For?

WebVTT stands for Web Video Text Tracks. It is an open standard maintained by the World Wide Web Consortium (W3C) — the same organization that defines HTML, CSS, and other core web technologies. The format was designed specifically for the web platform, which is why every major browser supports it natively.

The W3C WebVTT specification defines:

  • The file format and required header (WEBVTT)
  • Timestamp syntax using HH:MM:SS.mmm format with a dot separator
  • Cue payload rendering rules including text, styling, and positioning
  • Different cue types: subtitle, caption, description, chapter, metadata
  • Comment syntax for non-displayed annotations

Because WebVTT is a web standard, browser-based subtitle tools like SyncFlow can read and write VTT files directly using native browser APIs, without server-side conversion or plugins.

VTT vs SRT: Key Differences

SRT (SubRip) and VTT (WebVTT) are both timestamped subtitle formats, but they differ in several important ways. Understanding these differences helps you choose the right format for your use case.

Feature SRT (SubRip) VTT (WebVTT)
File extension .srt .vtt
Standard De facto standard (no formal spec) W3C WebVTT specification
Timestamp format HH:MM:SS,mmm (comma separator) HH:MM:SS.mmm (dot separator)
Cue numbering Required (1, 2, 3...) Optional
File header None required Must start with WEBVTT
Text styling Basic HTML-like tags (<b>, <i>, <u>) CSS styling, cue positioning, and built-in tags
Cue positioning Not supported Line, position, align, and size settings
Cue types Subtitles only Subtitles, captions, descriptions, chapters, metadata
Comments Not supported NOTE comments
Browser native support Not natively supported Native HTML5 <track> element support

The most practical difference is that VTT files work natively in HTML5 video players without plugins, while SRT files require JavaScript parsing or conversion. SyncFlow supports both formats and can convert between them during export.

Structure of a VTT File

A VTT file follows a simple structure. Every file must start with the WEBVTT header, followed by optional metadata and then one or more cues. Each cue is separated from the next by a blank line.

Here is what a basic VTT file looks like:

WEBVTT

00:01:15.000 --> 00:01:18.500
Welcome to the tutorial.

00:02:30.000 --> 00:02:35.000
In this section we cover
the main features.

00:05:00.000 --> 00:05:05.500
Let us look at an example.

Each cue block consists of:

  1. Optional cue identifier — a number or custom ID on its own line (e.g., 1 or scene-3)
  2. Timestamp line — start time, arrow (-->), and end time. The arrow must be surrounded by spaces.
  3. Cue payload — the subtitle text. Can span multiple lines. Supports styling tags and CSS classes.
  4. Blank line — separates this cue from the next one.

The optional identifier is useful for anchoring specific cues — SyncFlow's Cues Registry displays these identifiers alongside cue indices, making it easier to reference specific subtitles during drift calibration.

Comments can be added using NOTE followed by the comment text. Lines starting with NOTE are ignored by the parser and not displayed. This is useful for leaving editing notes or marking sections of a large VTT file.

NOTE
This section needs review after
the next video edit.

00:10:00.000 --> 00:10:05.000
This cue is marked for review.

VTT Timestamp Format

VTT timestamps use the format HH:MM:SS.mmm where:

  • HH — hours (two digits, zero-padded)
  • MM — minutes (two digits, 00-59)
  • SS — seconds (two digits, 00-59)
  • mmm — milliseconds (three digits, 000-999)

The key difference from SRT is the separator: VTT uses a dot (.) between seconds and milliseconds, while SRT uses a comma (,). Both represent the same time resolution — three digits of milliseconds — but the different separators mean they are not interchangeable without conversion.

VTT also supports hours as an optional leading value. While 00:01:30.000 is the standard four-segment format, the specification allows omitting hours if the duration is under one hour: 01:30.000. However, most tools including SyncFlow use the full four-segment format for consistency.

Timestamps can also include a percentage alignment value after the end time using the A or align keyword, though this is rarely used in practice. SyncFlow reads and preserves all standard VTT timestamp formats.

Styling and Positioning

One of the key advantages of VTT over SRT is its support for text styling and on-screen positioning. VTT cues can be styled using built-in tags and CSS classes defined in a STYLE block.

Built-in Styling Tags

Inside a cue's text, you can use these tags to style specific words or phrases:

  • <b>text</b> — bold text
  • <i>text</i> — italic text
  • <u>text</u> — underlined text
  • <c.classname>text</c> — apply a CSS class
  • <v speaker>text</v> — voice label (displayed differently by some players)

CSS Styling with ::cue

VTT files can include a STYLE block after the header that defines CSS rules using the ::cue pseudo-element. This allows precise control over font family, size, color, background, text shadow, and other properties.

WEBVTT

STYLE
::cue {
  background-color: rgba(0, 0, 0, 0.8);
  color: #ffffff;
  font-family: Arial, sans-serif;
  font-size: 18px;
  text-shadow: 2px 2px 4px #000000;
}
::cue(.highlight) {
  color: #ffd700;
}

Cue Positioning

VTT cues can be positioned anywhere on the screen using cue settings appended to the timestamp line. These settings are placed after the end timestamp, separated by spaces:

  • vertical:rl or vertical:lr — vertical text (right-to-left or left-to-right)
  • line:10% — vertical position as percentage or line number
  • position:20% — horizontal position as percentage
  • align:start, align:middle, or align:end — text alignment within the cue box
  • size:60% — width of the cue box as percentage

These positioning features make VTT the preferred format for multi-speaker content, on-screen labels, and any scenario where subtitles need to appear at specific screen locations rather than centered at the bottom.

Browser Support

WebVTT is supported natively in all modern browsers. No plugins, extensions, or third-party libraries are required to display VTT subtitles in an HTML5 video player.

Browser Native VTT Support
Chrome ✅ Full support since version 23
Firefox ✅ Full support since version 31
Safari ✅ Full support since version 6
Edge ✅ Full support since version 79 (Chromium-based)
Opera ✅ Full support since version 12.1

Browser support includes the <track> element API, JavaScript text track control (adding, removing, and selecting tracks dynamically), and native rendering of VTT cues with styling and positioning. This native support is why SyncFlow and other web-based subtitle tools prefer VTT for browser-based video playback.

Using VTT with HTML5 Video

To use a VTT file with an HTML5 video player, you add a <track> element inside the <video> element. The <track> element points to the VTT file and specifies the kind of track, language, and a user-visible label.

<video controls width="720">
  <source src="video.mp4" type="video/mp4">
  <track kind="subtitles" src="subtitles.vtt"
         srclang="en" label="English">
  <track kind="subtitles" src="subtitles.es.vtt"
         srclang="es" label="Spanish">
</video>

The kind attribute can be subtitles, captions, descriptions, chapters, or metadata. Multiple <track> elements can be added for different languages or track types. The browser provides a built-in track selector UI that lets users choose which track to display.

JavaScript can also access VTT tracks through the HTMLTrackElement API, allowing dynamic track switching, cue inspection, and custom rendering. SyncFlow uses this API internally to load and display VTT cues during synchronization.

For the VTT file to work correctly when served from a web server, the server must return the correct MIME type: text/vtt. Most modern web servers associate the .vtt extension with this MIME type automatically.

⚡ SyncFlow uses VTT natively

SyncFlow loads VTT files directly in your browser using the HTML5 Track API. Offset adjustment, drift calibration, inline editing, and export all work locally. You can load a VTT file, sync it with your video, and export as VTT or SRT. Sync your subtitles now →

When to Use VTT vs SRT

Choosing between VTT and SRT depends on where your subtitles will be used and what features you need.

Use VTT when:

  • Embedding subtitles in a web page — the HTML5 <track> element only supports VTT natively. SRT files require JavaScript parsing to display in a browser.
  • Need on-screen positioning — if your content requires subtitles at specific screen locations (e.g., speaker labels on the left and right), VTT's positioning settings are essential.
  • CSS styling is required — the ::cue pseudo-element gives you full control over subtitle appearance through CSS.
  • Multiple cue types — if you need chapters, descriptions, or metadata tracks alongside subtitles, VTT supports all of them in the standard format.

Use SRT when:

  • Maximum player compatibility — SRT is supported by virtually every media player (VLC, MPC-HC, Plex, Kodi) and is the most portable subtitle format.
  • Simple subtitles only — if all your subtitles need to do is display centered text at the bottom of the screen, SRT is simpler and universally supported.
  • Sharing with others — SRT files are more likely to work out of the box in any player environment without adjustments.

SyncFlow removes the need to choose permanently. You can load either format, and export the result in whichever format you need. If you receive an SRT file but need VTT for your web player, SyncFlow converts it during export.

How to Edit VTT Files

VTT files are plain text, which means they can be edited with any text editor. However, for syncing VTT subtitles with a video — adjusting timestamps, correcting offset or drift, or fixing individual cue timing — a dedicated subtitle tool is more efficient.

Manual Editing (Text Editor)

For small changes, you can open a .vtt file in any text editor (Notepad, VS Code, Sublime Text). Each timestamp line shows the start and end time. Changing a timestamp repositions that cue in the video. This approach works for one-off fixes but is impractical for systematic offset or drift problems affecting many cues.

Using SyncFlow for VTT Files

SyncFlow provides a visual interface for editing VTT files alongside your video. Load the VTT file and video, then use the same tools available for SRT files:

  • Global offset slider — shift all timestamps forward or backward when every cue is consistently early or late.
  • Tap-to-sync — pause at any spoken word, click SYNC TO THIS MOMENT, and SyncFlow calculates the exact offset automatically.
  • Linear drift calibration — set two anchor points for progressive desync caused by framerate mismatches.
  • Inline cue editing — click any cue in the Cues Registry to edit its start time, end time, or text directly.
  • Waveform preview — colored cue markers on the waveform make it easy to spot misaligned cues.
  • Format conversion — export the result as VTT or SRT regardless of the input format.

All editing happens locally in your browser. No uploads, no account, no watermark. For detailed instructions on fixing timing problems, see the complete subtitle sync guide.

How SyncFlow Supports VTT Files

SyncFlow treats VTT files as first-class citizens alongside SRT. Every feature available for SRT files works identically with VTT files:

  • Load VTT directly — drag and drop your .vtt file into SyncFlow alongside your video.
  • All sync tools available — global offset, tap-to-sync, linear drift calibration, inline editing, and nudge controls all work with VTT timestamps.
  • Waveform preview — VTT cues are displayed as colored markers on the waveform for visual alignment.
  • Preserve styling — SyncFlow preserves VTT styling tags, positioning settings, and CSS STYLE blocks during editing and export. Only timestamp values and cue text are modified.
  • Export as VTT or SRT — choose your output format during export. Timestamp separators (dot for VTT, comma for SRT) are converted automatically.
  • Save/load projects — your work-in-progress is saved as a .syncflow file, preserving all edits regardless of the original subtitle format.

SyncFlow reads VTT timestamps using the standard HH:MM:SS.mmm format, applies your corrections, and writes the output with the correct format for your chosen export type. No data is lost during the round trip — styling, positioning, and cue identifiers are preserved.

For a complete walkthrough of the sync tools available for both VTT and SRT files, see the fix out-of-sync subtitles guide or the drift correction guide.

Edit and Sync VTT Files

Load your VTT file and video into SyncFlow. Adjust offset, fix drift, edit cue text, and export as VTT or SRT. All in your browser, no account needed.

🚀 Sync Your VTT File

Frequently Asked Questions

What is a VTT file?

+

A VTT file is a WebVTT (Web Video Text Tracks) subtitle file used to display timed text captions during video playback. It is the standard subtitle format for HTML5 video players and modern web browsers. VTT files support text positioning, styling, and multiple subtitle tracks in a single file.

What is the difference between SRT and VTT files?

+

SRT (SubRip) uses a comma as the millisecond separator (HH:MM:SS,mmm) and supports basic bold/italic/underline formatting. VTT (WebVTT) uses a dot separator (HH:MM:SS.mmm), adds CSS-based styling and text positioning, supports multiple cue types (subtitles, captions, descriptions, chapters), and allows comments. Both formats store timed text cues with start and end timestamps.

Can I edit a VTT file?

+

Yes. VTT files are plain text, so you can edit them with any text editor, or use a subtitle tool like SyncFlow that supports VTT format for sync adjustments. SyncFlow can load VTT files, correct offset and drift, edit individual cue timestamps and text, and export the result as VTT or SRT.

Can VTT files be converted to SRT?

+

Yes. SyncFlow can load VTT files and export them as SRT, or load SRT files and export as VTT. The main difference during conversion is the timestamp separator — VTT uses a dot (.) and SRT uses a comma (,) for milliseconds. SyncFlow handles this conversion automatically during export.

What programs open VTT files?

+

VTT files open in any text editor, web browser (through HTML5 video players), and media players that support WebVTT subtitles. For editing and synchronization, SyncFlow provides a dedicated tool that can load, adjust, and export VTT files alongside your video.