Captions, better known as subtitles, are more than just translated text. YouTube’s captions option is a really usefull tool that adds a lot of functionality to a video. It improves the user experience, makes a video better indexed by Google, helps people with disabilities, opens up new markets and information to the world.

YouTube transcript button

Handy transcript
Everything starts with a transcript; a text version of on-screen spoken word. With YouTube you can add timecode based transcripts to your video. They show up as lines of text underneath the video and as a subtitle. Viewers like them since they can read faster than they can see/hear so it’s easy for them to navigate the video. Read, click, see. Watch this example video I edited.

Translation and subtitles
Translate the timecode-based transcript and you have a subtitle track for a YouTube video. You can translate your video to as many languages as you like. Viewers select the language they prefer with the red CC button.

YouTube also has an automated ‘Translate captions’ function. But it is still in beta and far from perfect. Even worse is the ‘transcribe audio’ function. It automatically translates the audio track to text captions. It needs big improvements before I would recommend it.

If you are into SEO and VSEO, you will love captions. Quote YouTube: “Captions can be searched for, so accurate captions will help people find your videos”. Every spoken word becomes an indexed keyword and thus findable with Google! It can boost your YouTube views and your website’s pagerank. I have not checked other search engines.

Here is an interview I edited and subtitled for Bridge Outsourcing. For the transcripts you need to view the video on YouTube.

Can’t see, can’t hear, can read
Captions are the only possibility for people with disabilities (blind and/or deaf) to understand a video. You do them a great favor by adding captions to your video’s. For many governments it is even mandatory to make information (like video) accessible for people with disabilities.

Metadata and the future
Captions are metadata, making video understandable to machines like search engines. But it can go much further. How about tagging like Facebook? Or face detection like iPhoto? Now you know who’s who. Add GPS data and you know where they are. Add augmented reality like Layar and dig into extra layers of information, like Tweets and social media data. Add 3D mapping and you could view a 3D rendered version. Then make avatars of the people on screen. Add some analyzed artificial intelligence to these avatars and you can interact with them. Play games. Did I lose you there?