I figure that Google must be close to having some technology that can actually distinguish the voices in a film or video from the music track. I would guess they can use their text-to-speech capabilities to listen to video content and save either the entire contents or a set of keywords as determined by a secret-sauce-like formula.

I know that Google understands the importance of video, after all they bought youtube. They also have at least one text-to-speech experiment in Goog-411 (that we know of) and they have the resources of Emperor Palpatine. If Amazon can scan the lion’s share of their books, couldn’t Google “watch” and digitally imprint all of youtube’s videos? They have the processing power.

The trick is the isolation of the vocal tracks. Most would argue that this is impossible because when you combine all the tracks like in a .wav file, the track information is left behind, but there are people who claim that they can get (most of) the music track to go away.; If anyone can do it Google can, right? And if they can do that, they can implement speech recognition technology and output the results to good ol’ XML.
What do you think? Could they be close? Are they working on it?