Very interesting, thanks for posting! The TED dataset is also quite
interesting for Wikidata, because we are missing the generic concepts
behind many Wikipedia articles. Most people complain that Wikipedia tends
to dive into indepth information without giving adequate coverage in an
overview article. Many overview articles have grown beyond normal viewing
capacity on a mobile phone and probably should be split into 2nd and 3rd
tier wikipages giving explanations about branches of the subject. To see
what I mean, try viewing the English Wikipedia article for "Insurance" on
your phone.
The TED talks touch on many of such missing subject items and it would be
nice to crowdsource the creation of them. Your project could be possibly be
a way to direct contributors to quick explanations and/or uses of such
concepts. The fact that many TED talks are transcribed into so many
different languages means we may be able to harness these translations for
use in Wikidata labels. At least that is what I hope. Without labels,
nothing is findable on Wikidata and that is why we still are so slow
interlinking linkable items.
If your initiative takes off, it may be interesting to apply it to our own
set of film media on Commons, but very little of that has been linked to
Wikidata yet.
Post by Pine Whttps://blog.wikimedia.org/2016/04/22/ted-wikimedia-collaboration/
Great news! I didn't know neither that Wikidata has unique identifiers for
so many TED talks.
FYI, my group has worked 18 months ago on a prototype we called HyperTED.
You can read about it at
http://linkedup-project.eu/2014/10/14/vici-shortlist-hyperted/. There is
also a presentation at
http://www.slideshare.net/JosLuisRedondoGarca/hyperted-40494120. And you
can play directly with the HyperTED prototype at
http://linkedtv.eurecom.fr/HyperTED/
In a nutshell, we used the TED talk metadata (subtitles divided into
paragraphs) in order to provide chapters to TED talks. We have annotated
them automatically using named entity recognition and disambiguation tools
and topic detection algorithms. Hence, entities are disambiguated to
dbpedia (but this could also be wikidata entities). Finally, we have
developed an algorithm that detects hot spots in TED talks (read the
scientific paper at
http://www.eurecom.fr/~troncy/Publications/Redondo_Troncy-iswc14.pdf).
Ultimately, as soon you watch chapters of TED talks, we are recommending
you other chapters of other TED talks that may be related (because of
common entities and topics). Instead of being a traditional recommender
system that suggests you other TED talks, we perform recommendation at the
fragment level.
We are eager to receive any feedback. Be gentle with the demo, we are
aware of some bugs and limitations.
Best regards.
Raphaël
--
Raphaël Troncy
EURECOM, Campus SophiaTech
Data Science Department
450 route des Chappes, 06410 Biot, France.
Tel: +33 (0)4 - 9300 8242
Fax: +33 (0)4 - 9000 8200
Web: http://www.eurecom.fr/~troncy/
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata