SEO friend , to watch this talk that took place at SEMANTiCS 2023 on September 21st. On that day, Xing Luna Dong presented the talk “Three Generations of Knowledge Graphs this talk and giving you a good summary with the exact points in the video for a very simple reason:

This talk helped me understand the process of creating and using knowledge graphs, initially at Google , which Luna Dong led, and how we got to where we are today. What you see and read on the web about how the use of graphs solves the problems and challenges of Artificial Intelligence is not by chance; it was built! And this talk will help you understand that too.

Contents

Key points of the presentation:

Luna Dong's professional journey and research philosophy

Luna Dong describes her career as a “salmon path,” passing through companies like Google, AT&T Labs, Amazon, and Meta. She has built three main knowledge graphs: a personal knowledge graph, the Google Knowledge Graph, and the Amazon Product Knowledge Graph [ 01:19:21 ]. Her research philosophy balances incremental goals with an impact on output (“roofshot goals”) and the invention of cutting-edge technologies to avoid stagnation (“moonshot goals”) [ 20:44 ].

The Three Generations of Knowledge Graphs

First Generation (Entity-Based)

Characterized by manually defined ontologies and entities with clear boundaries [ 25:41 ]. The goal was to mimic the human view of the world [ 26:23 ]. Challenges included data heterogeneity, resolved with data integration techniques [ 29:24 ].

Second Generation (Rich in Text)

It emerged to address "long tail" use cases, such as product search, where the data is predominantly textual and noisy [ 43:57 ]. Characterized by complex ontologies and textual attribute values with vague boundaries [ 40:26 ]. Amazon's "AutoKnow" project automated the collection of product knowledge from names and descriptions [ 44:05 ].

Third Generation (Neural Duals with Large-Scale Language Models)

It addresses the question of whether Large-Scale Language Models (LLMs) will replace knowledge graphs [ 01:06:15 ]. One study showed that the accuracy of LLMs in answering factual questions is low, with a high rate of "hallucination," especially for "long-tail" data [ 01:03:24 ]. The conclusion is that knowledge graphs will continue to coexist with LLMs in symbolic and neural forms [ 01:06:41 ].

Current Job at Meta

Dong mentioned the development of intelligent assistants for Virtual Reality (VR) devices and smart glasses, which require multimodal inputs, context sensitivity, and personalization.01:08:43].

Recipe for Innovation in Practice

The presentation outlines a five-step model for turning "crazy" ideas into practice in industry: Feasibility, Quality, Repeatability, Scalability, and Ubiquity.01:22:05].

Focus on the Google Knowledge Graph

During her time at Google, she was involved in creating the Google Knowledge Graph search results [ 01:19:48 ]. The central idea was to create a graph of entities and relationships that mirrored how humans see the real world [ 01:26:23 ].

Key Aspects of Working at Google

Initial Data Sources

One of the primary sources for constructing the graph was Wikipedia, especially the "infocays," which provided structured data that was easy to extract about the attributes and relationships of entities.01:27:07].

Data Integration Challenges

Expanding to other data sources, such as IMDb and Wikidata, significant heterogeneity challenges arose. Data on the same entity or attribute could differ across each source [ 01:28:44 ]. To address this, data integration techniques were applied, such as:

Entity Linkage

To unify different mentions of the same real-world entity into a single node in the graph [01:29:57].

Schema Alignment

To standardize the different attribute names [01:30:07].

Data Fusion

To consolidate the different values for the same attribute [01:30:16].

Knowledge Vault Project

To scale knowledge extraction across the web, Google developed the Knowledge Vault [ 01:36:24 ].

Comprehensive Extraction

The project extracted information from web text, tables, and HTML annotations (such as schema.org), using 16 different extractors.01:36:26].

Results and Limitations

The Knowledge Vault processed 2.5 billion web pages and generated 3.2 billion “knowledge triples.” However, only about 10% were considered high-confidence [ 01:37:06 ]. The 70% accuracy for this high-confidence data did not meet the 99% requirement of the Google Knowledge Graph [ 01:37:35 ]. Furthermore, even if accuracy could be increased, coverage for “long-tail” (less popular) entities would be insufficient to support a good product experience [ 01:37:56 ].

Although the knowledge extracted by Knowledge Vault has not been fully integrated into the main Google Knowledge Graph due to these accuracy issues, the technologies developed in the project have been applied in other areas at Google and other companies to collect “long tail” knowledge [ 01:38:16 ].

In summary, Xing Luna Dong's lecture highlights the continued importance of knowledge graphs, even with the advancement of Large-Scale Language Models, and the need for hybrid approaches to handle the complexity and scale of real-world data

Luna Dong's work at Google was instrumental in building the foundations of the Google Knowledge Graph, tackling the challenges of integrating data from diverse sources and scaling knowledge extraction with the ambitious Knowledge Vault project.