Skip to main content

This video requires you to be logged in to view it.

To purchase access, send an email with the link below.
Price: $2.99


Add To List

Machine Learning in Yahoo Knowledge Graph

The Yahoo Knowledge(YK) graph crawls, reconciles and blends information (around 10B fact triples) from 200 M entities across 30 semi-structured source (crawlable sites like Wikipedia, IMDB, LonelyPlanet etc and as well licensed feeds) graphs to a merged graph of 75 M entities, 5B facts distributed across 140 entity types and 300 attributes. From classifying entity type of source entities, to reconcile entities across sources (e.g. Brad Pitt from Wikipedia vs. Brad Pitt from IMDB), and blending conflicting and complementing facts for each entity from different sources, the YK graph encapsulates production scale machine learning solutions for multi-label classification(e.g. predicted entity types for Arnold Schwarzenegger could be Actor, Politician, BusinessPerson etc ), large scale high precision binary classifiers along with an array of distributed hashing techniques help scale a potential billion edge comparisons (de-duplication of entities across sources require high precision classifiers for which we develop active learning and precision clamped training strategies) and lastly hubs and authorities based fact blending from competing sources. To support product initiatives like surfacing knowledge augmented results on web and sponsored searches we build a variety of "knowledge discovery" services like 1. knowledge triples based question answering and reading comprehension type question answering utilizing our blended/merged knowledge graph, 2. related entities for a given entity to other connected entities beyond direct ontological relations to generate browsing interest to other sites/properties in Yahoo. In contrast to broad cross domain knowledge, we delve into deep domain specific information extraction from news text and videos to power unique experiences for brands like Yahoo! Sports. Specifically for US Sports (NBA/NFL/NHL/MLB/Soccer) our text information extraction sits in the cross roads of fact finding in articles, fine grained entity typing and topical extractive summarization of temporal topics like trades/contracts/injuries and performances connecting player and potential teams to provide 360 degree browsing of daily fantasy news/sports rumors. Through our Video deep linking capabilities we link moments in highlight videos to points in time of a game such that we can power within-video search/browse experiences for e.g. queries like "Lebron Jame's dunks from yesterday" would seek to exact moments in a highlight video where LeBron dunked or "Laker's top scorer's tonight" would find the stats of the top Laker's scorers, followed by seeking to exact moments of their plays in highlight videos.
May 6th 2020, 1:20pm EST
deep linking, knowledge graph, information extraction, binary classification, recommender system, news, entity linking, natural language processing, machine learning

Topojoy Biswas


Verizon Media

Topojoy currently leads information extraction on text and videos in Yahoo Knowledge Graph which powers search and information organization in products in Yahoo! like Finance, Sports, entity search and browse. He has worked on Yahoo Knowledge Graph (YK) for 4 years on various aspects of creating knowledge graphs like reconciliation of source graphs, type classifying semistructured pages to right ontology types, ranking related entities beyond the obvious neighbours to name a few. Before Yahoo Knowledge graphs, he worked for Yahoo shopping on attribute extraction and classification of shopping feeds into large taxonomies of products.

Lightening talks from the Main Stage along with speaker Q&A sessions

May 4-7, 2020


Knowledge Graphs form an organized and curated set of facts that provide support for models to help understand the world. This conference gathers technology leaders, researchers, academics, vendors — and most importantly, practitioners, who know the discipline. For KGC 2020, attendees can participate from wherever they want in the world, from the comfort of their homes. We will stream the content, provide access to our speakers and support chat and networking as well as give access to all of the content live and on-demand after the event.