View on GitHub


All You Need to Know to Build a Product Knowledge Graph (KDD 2021 Tutorial)


Knowledge graphs have been pivotal in supporting downstream applications like search, recommendation, and question answering, among others. Such applications have recently become fundamental components in online retail and e-Commerce platforms, so knowledge graphs have naturally become key enabling technologies in such outlets. Building a high coverage knowledge graph for products is more challenging than generic knowledge graphs. The highly specific and complex domain, the sparsity of training data, along with the constantly evolving taxonomies, can constrain the resulting knowledge graphs. Moreover, the product knowledge graph building process should be very scalable and generalizable, to accommodate the dynamic and constantly growing number of products and product types.

In this tutorial we will be presenting best practices and ML innovations in industry towards building a scalable product knowledge graph. Contributions in this domain benefit from the general literature in areas including information extraction and data mining, tailored to address the characteristics of e-Commerce platforms. We will first discuss the efforts for enriching product taxonomies automatically. Product taxonomies address the relationships and hierarchy of the different product types and corresponding attributes. We also cover the information extraction techniques, used to obtain machine-actionable knowledge from the natural and unstructured product profiles. We cover both text-based and multimodal-based extraction techniques, used to utilize product images as additional signals. Industry-based applications require particularly high precision levels, ideally with minimal recall loss. Towards that end, we also cover several contributions utilizing data cleaning and quality control techniques for the extracted product knowledge.

The various approaches for building product knowledge graphs could also be useful for other domains. We take a holistic approach for the covered contributions, addressing the different task formulations, modeling strategies, and utilized architectures. We highlight the corresponding assumptions and use cases, and shed light on how the techniques apply and generalize to other domains.