The Implications of an Intelligent TV Ecosystem
[itvt] is pleased to present the following essay by Martin Focazio, Senior Director of Strategy at EPAM Empathy Lab. (Note: our regular news coverage will resume shortly.)
Televisions Have Poor Manners
Today, televisions have poor manners. They don't listen, they don't shut up, and they constantly show you things you're not really interested in seeing. They show violent content to your kids when you're not around and they don't even have the courtesy to pause the program when you leave the room. This blind, deaf and dumb machine that dominates our homes is about to change radically.
The television of tomorrow will be an ever-observant and civilized part of the home with unmatched access to your tastes and preferences. Like a good butler, it will unobtrusively respond to your wishes (verbalized or not), it will react appropriately to who is in the room and it will offer an appropriate selection of content for your enjoyment. In short, the television of tomorrow will know you and the people you live with, will know what you like, and will seem intelligent--perhaps nearly sentient. When the TV of tomorrow knows what you did last night, it will do a better job of bringing you the shows you want to see tonight.
Like so many things about the unevenly distributed future, components of the television of tomorrow are already here--just not in one place and not working well together. Yet.
Intel is releasing a new streaming media box soon. It's a replacement for your set-top box, and it's got a camera in it. Intel will use that to present personalized options and targeted advertising . The upcoming Xbox has a slew of ways to monitor your presence and activity. In the mobile world, Samsung's new S4 mobile thing (we dare not call it a phone, a tablet or phablet) has eye tracking that pauses and scrolls based on whether and how you're looking at the screen . Netflix and Amazon are both recommendation engines, optimizing and personalizing their catalogs (although both do a terrible job of dealing with multiple users, resulting in endless recommendations for penguin cartoons that persist when Dad takes control of the remote away from the kids).
Perhaps the most important thing to consider in the vision of a television with manners is that it requires a complex level of interoperability between hardware capabilities, device software, server-side systems and--most critically--program and usage metadata. In many ways, we're at a stage roughly akin to the days of dial-up Internet--a time when hardware, connectivity, and server-side capabilities were not fully standardized. Anyone who remembers having to decide between a PPP connection vs. a SLIP connection, or who bought a modem because it had v.32bis or v.90, should have a sense of familiarity with the issue. Today, smart televisions are purchased by people who clearly have no idea what a smart television is for--only 50% of them are connected to the Internet, and this is because the overall experience is still about as good as the Internet was in dial-up days.
Obviously, here at EPAM Empathy Lab we're very interested in how the TV of tomorrow will be created, and we've been giving some deep thought into how the needed human-machine interaction models will be conceived, designed and deployed. We know that it will need to be comfortable and familiar (not confusing and creepy) and we know that it may very well involve a re-evaluation of the entire technology stack involved in digital distribution.
Go Go Gadget?
My great grandmother, an immigrant from a rural village in Hungary, used to watch professional wrestling on a clunky black-and-white television that she had precariously balanced on a bookshelf in her tiny apartment, and she would yell at it almost continually throughout the broadcast, "Hit him! Hit him! That wasn't even trying!" It's not clear if she actually expected the "wrestlers" to respond to her exhortations, but I think she suspected that they could hear her and were just ignoring her.
Indeed, people do talk to machines--even though they can't hear us. Of course, that's changing now; and while voice recognition is still comically bad, it is getting better and--given the path companies like Google and Apple are on with voice recognition, and given recent advances in machine vision in fields such as game consoles, manufacturing automation and self-driving cars--it isn't a big leap to realize that we're on our way to having the capability for a machine to actually be able to confidently assess what we're saying, in addition to understanding what we're doing, within the next few years. But there's a problem looming. Perhaps you're familiar with this sight:
What a mess! There's no polite way to put this: all TV remotes suck. So the danger is that, when televisions start to gain the smarts to understand what we're saying, we may very well be stuck learning some limited vocabulary or changing--slightly--how we speak to the TV, so that it does what we mean, not what we say. This is not unprecedented for early-stage devices--and it's not always an impediment to success. The late Palm Computing had handwriting recognition--sort of--via "Graffiti," which was very close to (but not the same as) natural handwriting.
The danger is that the way we speak and gesture to our more-intelligent televisions will be fragmented by device and service. Do you say "Go Go TV" to get the device's attention, or do you say "Hey TV," or do you say "Onegai TV On" for Sony televisions and "Television Ju Se Yo" for Samsung TV's? Comcast recently announced that it will deploy new voice-enabled remote controls that will power X2, a customizable, personalized version of its cloud-based interactive program guide--if both the set-top box and the media service are listening, what if they have a shared, but functionally different, vocabulary? What's more, the television itself will need to be able to respond intelligently as well--does it beep? Blink a light? Vibrate your phone? Speak? This isn't trivial stuff--and creating an intuitive and interoperable interaction vocabulary may very well be one of the most important things that allow the TV of tomorrow to succeed in the marketplace. This goes way beyond visual interface design and into the deepest reaches of human-machine interaction--an area that, to date has had very few good practitioners.
Who Can See You in Your Underwear?
Apparently, nobody watches the "reality" TV series, "Say Yes to the Dress." At least nobody we can find in the office, at our friends' houses or at home. Yet somehow Nielsen reports an audience in the millions. Perhaps it's just that nobody admits that they watch the show. And that's interesting for the TV of tomorrow--because it will know what you watched, it will know if you were in the room while the show was playing, and it will know that you binge-watched all of Season 3 one rainy Saturday afternoon. This kind of information accumulates and becomes actionable in new and interesting ways, especially now that some of the legal barriers to using this kind of data have been removed. As video consumption moves out of the living room and into our hands, there is a new level of intimacy that is both physical and informational. We are used to Amazon having an uncanny ability to know that you're going to want to pair certain items you're purchasing with items you didn't even know you wanted: until Amazon's intimate relationship with your shopping habits reminded you that you needed that new pair of boots to go with the pup tent you're buying, you didn't know you wanted a pair of boots. So it goes with media consumption too; and while Netflix has a good sense of what you'll like--based on what you've viewed in the past as compared to others--we can only expect that the systems used to suggest media will get smarter and more accurate. But we also expect that we might see a new model of preference mining. Today, Amazon has the most intimate view of my online buying of goods, and to a much lesser degree, of my media consumption habits. Netflix likely has the best view of my media preferences, although Apple knows quite a bit too. While I flit from service to service, each of them gets a glimpse of my overall media preferences, but none has the whole. And that's where the TV of tomorrow might have a really important role: as a neutral aggregator of preferences, coupled with an agent-based system--so that as my TV encounters new media sources, it's able to discuss with the provider (machine-to-machine and in milliseconds) how I'd like my catalog organized, what to include, what to leave out, and how to prioritize the content. This could even extend to advertising: where the sponsor queries my television and bids for my attention--literally--if I seem to be a fit for what they are selling. A television with intimate knowledge of my overall preferences in media and commerce could literally invert the "pay-TV" model via direct incentives to view or otherwise interact with an advertising piece. All from a bit more information than we have now with a device that knows what you like to watch while in your underwear.
Whatever form it takes, the TV of tomorrow will represent a major shift in how technology and humans interact--and all of the elements to make this happen are in-market today. We advocate a holistic approach that incorporates more than just a clever hardware and software design, but also a semantic model and interoperability system that specifically ensures that the TV of tomorrow is welcome in our homes.