Prof. Bizer gives keynote at WebIST conference comparing GPT4 and BERT for Web Data Integration

Prof. Christian Bizer has given a keynote talk comparing the utility of GPT4 and BERT for Web Data Integration at the 19th International Conference on Web Information Systems and Technologies (WEBIST) in Rome.

Title of the Talk: 

GPT4 versus BERT: Which Foundation Model is more Suitable for Web Data Integration?


The Web contains vast amounts of structured data in the form of HTML tables, annotations, as well as datasets accessible via data repositories. The automated integration of data from larger numbers of Web data sources is a long-standing research challenge as the integration requires dealing with several tricky tasks such as schema matching, entity matching, and data indexing for retrieval. Most state-of-the-art methods for these tasks rely on variants of the BERT transformer model fine-tuned using significant amounts of task-specific training data. In the talk, Christian Bizer will critically review BERT-based data integration methods and question their robustness concerning out-of-distribution entities. He will compare the performance of BERT-based methods with results of GPT-4-based data integration methods and will argue that GPT-4-based methods are more training data efficient and more robust concerning unseen entities.

Slides of the Presentation

Conference Website