IBM’s Watson is Now Data Mining TED Talks to Extract New Forms of Knowledge

Who really benefited from the California Gold Rush of 1849? Was it the miners, only some of whom were successfully, or the merchants who sold them their equipment? Historians have differed as to the relative degree, but they largely believe it was the merchants.

Today, it seems we have somewhat of a modern analog to this in our very digital world: The gold rush of 2015 is populated by data miners and IBM is providing them with access to its innovative Watson technology in order for these contemporary prospectors to discover new forms of knowledge.

So then, what happens when Watson is deployed to sift through the thousands of incredibly original and inspiring videos of online TED Talks? Can the results be such that TED can really talk and, when processed by Watson, yield genuine knowledge with meaning and context?

Last week, the extraordinary results of this were on display at the four-day World of Watson exposition here in New York. A fascinating report on it entitled How IBM Watson Can Mine Knowledge from TED Talks by Jeffrey Coveyduc, Director, IBM Watson, and Emily McManus, Editor, TED.com was posted on the TED Blog on May 5, 2015. This was the same day that the newfangled Watson + TED system was introduced at the event. The story also includes a captivating video of a prior 2014 TED Talk by Dario Gil of IBM entitled Cognitive Systems and the Future of Expertise that came to play a critical role in launching this undertaking.

Let’s have a look and see what we can learn from the initial results. I will sum up and annotate this report, and then ask a few additional questions.

One of the key objectives of this new system is to enable users to query it in natural language. An example given in the article is “Will new innovations give me a longer life?”. Thus, users can ask questions about ideas expressed among the full database of TED talks and, for the results, view video excerpts where such ideas have been explored. Watson’s results are further accompanied by a “timeline” of related concepts contained in a particular video clip permitting users to “tunnel sideways” if they wish and explore other topics that are “contextually related”.

The rest of the article is a dialog between the project’s leaders Jeffrey Coveyduc from IBM and TED.com editor Emily McManus that took place at Watson World. They discussed how this new idea was transformed into a “prototype” of a fresh new means to extract “insights” from within “unstructured video”.

Ms. McManus began by recounting how she had attended Mr. Dario’s TED Talk about cognitive computing. Her admiration of his presentation led her to wonder whether Watson could be applied to TED Talks’ full content whereby users would be able to pose their own questions to it in natural language. She asked Mr. Dario if this might be possible.

Mr. Coveyduc said that Mr. Dario then approached him to discuss the proposed project. They agreed that it was not just the content per se, but rather, that TED’s mission of spreading ideas was so compelling. Because one of Watson’s key objectives is to “extract knowledge” that’s meaningful to the user, it thus appeared to be “a great match”.

Ms. McManus mentioned that TED Talks maintains an application programming interface (API) to assist developers in accessing their nearly 2,000 videos and transcripts. She agreed to provide access to TED’s voluminous content to IBM. The company assembled its multidisciplinary project team in about eight weeks.

They began with no preconceptions as to where their efforts would lead. Mr. Coveyduc said they “needed the freedom to be creative”. They drew from a wide range of Watson’s existing technical services. In early iterations of their work they found that “ideas began to group themselves”. In turn, this led them to “new insights” within TED’s vast content base.

Ms. McManus recently received a call from Mr. Dario asking her to stop by his office in New York. He demo-ed the new system which had completely indexed the TED content. Moreover, he showed how it could display, according to her “a universe of concepts extracted” from the content’s core. Next, using the all important natural language capabilities to pose questions, they demonstrated how the results in the form of numerous short clips which, taken altogether, were compiling “a nuanced and complex answer to a big question”, as she described it.

Mr. Coveyduc believes this new system simplifies how users can inspect and inquire about “diverse expertise and viewpoints” expressed in video. He cited other potential areas of exploration such as broadcast journalism and online courses (also known as MOOCs) * . Furthermore, the larger concept underlying this project is that Watson can distill the major “ideas and concepts” of each TED Talk and thus give users the knowledge they are seeking.

Going beyond Watson + TED’s accomplishments, he believes that video search remains quite challenging but this project demonstrates it can indeed be done. As a result, he thinks that mining such deep and wide knowledge within massive video libraries may turn into “a shared source of creativity and innovation”.

My questions are as follows:

  • What if Watson was similarly applied to the vast troves of video classes used by professionals to maintain their ongoing license certifications in, among others, law, medicine and accounting? Would new forms of potentially applicable and actionable knowledge emerge that would benefit these professionals as well as the consumers of their services? Rather than restricting Watson to processing the video classes of each profession separately, what might be the results of instead processing them together in various combinations and permutations?
  • What if Watson was configured to process the video repositories of today’s popular MOOC providers such as Coursera or edX? The same as well for universities around the world who are putting their classes online. Their missions are more or less the same in enabling remote learning across the web in a multitude of subjects. The results could possibly hold new revelations about subjects that no one can presently discern.

Editor’s Note – reprinted with the permission of the author – first published on his blog – The Subway Fold.

* Reference per MOOCs – See the September 18, 2015 Subway Fold post entitled A Real Class Act: Massive Open Online Courses (MOOCs) are Changing the Learning Process for the full details and some supporting links.

Posted in: Business Research, Competitive Intelligence, KM