1. Entity Matching using Domain Adaptation
- N. Kirielle, P. Christen, and T. Ranbaduge, “TransER: Homogeneous Transfer Learning for Entity Resolution.” in Proceedings of the 25th International Conference on Extending Database Technology , 2022, pp. 118–130
- M. Trabelsi, J. Heflin, and J. Cao, “DAME: Domain Adaptation for Matching Entities,” in Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, New York, NY, USA, Feb. 2022, pp. 1016–1024
- J. Tu et al., “Domain Adaptation for Deep Entity Resolution,” in Proceedings of the 2022 International Conference on Management of Data, New York, NY, USA, Jun. 2022, pp. 443–457
- More references and benchmarks: Papers with Code: Entity Resolution
2. Experimental Topic: Evaluating ChatGPT on the Task of Entity Matching
- P. Wnag et al.: PromptEM: Prompt-tuning for low-resource generalized entity matching. Proceedings of the VLDB Endowment. Volume 16, Issue 2, pp 369–378. November 2022.
- Avanika Narayan et al.: Can Foundation Models Wrangle Your Data? arXiv:2205.09911 [cs.LG] (2022)
- A. Venkatesh et al., “On Evaluating and Comparing Open Domain Dialog Systems.” arXiv:1801.03625 [cs], Dec. 2018.
- A. Srivastava et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”. arXiv:2206.04615 [cs], June 2022.
- Q. Dong et al., “A Survey for In-context Learning”. arXiv:2301.00234 [cs], Dec. 2022.
- N. Barlaug and J. A. Gulla, “Neural Networks for Entity Matching: A Survey,” ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 3, p. 52:1–52:37, Apr. 2021.
- A. Primpeli and C. Bizer, “Profiling Entity Matching Benchmark Tasks,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, New York, NY, USA, Oct. 2020, pp. 3101–3108.
3. Deep Learning for Blocking
- S. Thirumuruganathan, H. Li, N. Tang, M. Ouzzani, Y. Govind, D. Paulsen, G. Fung, and A. Doan. 2021. Deep learning for blocking in entity matching: a design space exploration. Proceedings of the 2021 VLDB Endowment 14, 11 (July 2021), 2459–2472.
- W. Zhang, H. Wei, B. Sisman, L. Dong, C. Faloutsos, and D. Page. 2020. AutoBlock: A Hands-off Blocking Framework for Entity Matching. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM ’20), Association for Computing Machinery, New York, NY, USA, 744–752.
- R. Wang, Y. Li, and J. Wang, “Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation.” arXiv:2207.04122 [cs], Jul. 08, 2022.
4. Deep Learning for Table Search
- G. Fan, J. Wang, Y. Li, D. Zhang, and R. Miller. 2023. Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning. arxivx.
- A. Bogatu, A. A. Fernandes, N. W. Paton, and A. Konstantinou. 2020. Dataset Discovery in Data Lakes. In IEEE 36th International Conference on Data Engineering (ICDE), 709–720.
- A. D. Sarma, L. Fang, N. Gupta, A. Y. Halevy, H. Lee, F. Wu, R. Xin, and C. Yu. 2012. Finding Related Tables. In SIGMOD.
5. Representation Learning for Missing Value Imputation
- Richard Wu, Aoqian Zhang, Ihab Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. Proceedings of Machine Learning and Systems 2, (March 2020), 307–325.
- Avanika Narayan et al.: Can Foundation Models Wrangle Your Data? arXiv:2205.09911 [cs.LG] (2022)
- Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. 2022. TURL: Table Understanding through Representation Learning. SIGMOD Rec. 51, 1 (June 2022), 33–40.
- J. Yoon, J. Jordon, and M. Schaar. 2018. GAIN: Missing Data Imputation using Generative Adversarial Nets. In Proceedings of the 35th International Conference on Machine Learning, PMLR, 5689–5698.
- Ihab F. Ilyas and Theodoros Rekatsinas. 2022. Machine Learning and Data Cleaning: Which Serves the Other? J. Data and Information Quality 14, 3 (September 2022), 1–11.
6. Experimental Topic: Evaluating ChatGPT on the Task of Missing Value Imputation for Knowledge Graph Completion
- A. Venkatesh et al., “On Evaluating and Comparing Open Domain Dialog Systems.” arXiv:1801.03625 [cs], Dec. 2018.
- A. Srivastava et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”. arXiv:2206.04615 [cs], June 2022.
- Q. Dong et al., “A Survey for In-context Learning”. arXiv:2301.00234 [cs], Dec. 2022.
- Richard Wu, Aoqian Zhang, Ihab Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. Proceedings of Machine Learning and Systems 2, (March 2020), 307–325.
- Avanika Narayan et al.: Can Foundation Models Wrangle Your Data? arXiv:2205.09911 [cs.LG] (2022)
- https://paperswithcode.com/task/knowledge-graph-completion
7. Schema Matching using Deep Learning
- Zhang, Jing, et al. “SMAT: An attention-based deep learning solution to the automation of schema matching.” European Conference on Advances in Databases and Information Systems. Springer, Cham, 2021.
- Shraga, Roee, Avigdor Gal, and Haggai Roitman. “Adnev: Cross-domain schema matching using deep similarity matrix adjustment and evaluation.” Proceedings of the VLDB Endowment 13.9 (2020): 1401–1415.
- Koutras, Christos, et al. “REMA: Graph Embeddings-based Relational Schema Matching.” EDBT/ICDT Workshops. 2020.
- Rahm, E., Bernstein, P. A survey of approaches to automatic schema matching. The VLDB Journal 10 (2001), 334–350.
2. Experimental Topic: Evaluating ChatGPT on the Task of Schema Matching/Table Annotation
- Avanika Narayan et al.: Can Foundation Models Wrangle Your Data? arXiv:2205.09911 [cs.LG] (2022)
- A. Srivastava et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”. arXiv:2206.04615 [cs], June 2022.
- Q. Dong et al., “A Survey for In-context Learning”. arXiv:2301.00234 [cs], Dec. 2022.
- Korini K, Peeters R, Bizer C., “SOTAB: The WDC Schema. org Table Annotation Benchmark”. Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS. org. 2022.
- https://paperswithcode.com/task/table-annotation
8. Cell Entity Annotation in Tabular Data
- X. Deng, H. Sun, A. Lees, Y. Wu, and C. Yu, “TURL: table understanding through representation learning,” Proc. VLDB Endow., vol. 14, no. 3, Nov. 2020, pp. 307–319
- Huynh, V.P., Liu, J., Chabot, Y., Labbé, T., Monnin, P. and Troncy, R., DAGOBAH: Enhanced Scoring Algorithms for Scalable Annotations of Tabular Data. In SemTab@ ISWC, Nov. 2020, (pp. 27–39).
- Chen, S., Karaoglu, A., Negreanu, C., Ma, T., Yao, J.G., Williams, J., Jiang, F., Gordon, A. and Lin, C.Y. LinkingPark: An automatic semantic table interpretation system. Journal of Web Semantics, 74, 2022, p.100733.
- More references and benchmarks: Papers with Code: Cell Entity Annotation
9. Deep Tabular Learning for Domain-Specific Prediction Tasks
- Yoon, Jinsung, et al. “Vime: Extending the success of self-and semi-supervised learning to tabular domain.” Advances in Neural Information Processing Systems 33 (2020).
- Somepalli, Gowthami, et al. “Saint: Improved neural networks for tabular data via row attention and contrastive pre-training.” arXiv preprint arXiv:2106.01342 (2021).
- Gharibshah, Zhabiz, and Xingquan Zhu. “Local Contrastive Feature Learning for Tabular Data.” Proceedings of the 31st ACM International Conference on Information & Knowledge Management (2022).
- Stefan Hegselmann, et al. “TabLLM: Few-shot Classification of Tabular Data with Large Language Models” arXiv:2210.10723 [cs.CL] (2022).
- Borisov, Vadim, Tobias Leemann, et al. “Deep neural networks and tabular data: A survey.” IEEE Transactions on Neural Networks and Learning Systems (2022).
10. Information Extraction for E-Commerce Product Data
- Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, and Jiawei Han. 2022. OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision. In Proceedings of the ACM Web Conference 2022, ACM, Virtual Event, Lyon France, 3153–3161.
- Huimin Xu, Wenting Wang, Xin Mao, Xinyu Jiang, and Man Lan. 2019. Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 5214–5223.
- Qifan Wang, et al. 2020: Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
- Gilad Fuchs and Yoni Acriche. 2022. Product Titles-to-Attributes As a Text-to-Text Task. In Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5), Association for Computational Linguistics, Dublin, Ireland, 91–98.
11. Experimental Topic: Evaluating GPT3 on the Task of Product Information Extraction
- A. Venkatesh et al., “On Evaluating and Comparing Open Domain Dialog Systems.” arXiv:1801.03625 [cs], Dec. 2018.
- A. Srivastava et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”. arXiv:2206.04615 [cs], June 2022.
- Q. Dong et al., “A Survey for In-context Learning”. arXiv:2301.00234 [cs], Dec. 2022.
- Li Yang: MAVE: A Product Dataset for Multi-source Attribute Value Extraction. WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022.
- P. Petrovski, et al.: The wdc gold standards for product feature extraction and product matching. In E-Commerce and Web Technologies: 17th International Conference, EC-Web 2016.
- OpenAI Plyayground Example: https://beta.openai.com/playground/p/default-parse-data
- Aleph Alpha Plyayground Example: https://app.aleph-alpha.com/jumpstart/text-to-table
12. Experimental Topic: Combining WebAPIs and Large Language Models for Question Answering via In-Context Learning
- Omar Khattab, et al.: Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP. arXiv:2212.14024 [cs.CL], Dec. 2022.
- Q. Dong et al., “A Survey for In-context Learning”. arXiv:2301.00234 [cs], Dec. 2022.
- Christopher Potts: Stanford online seminar – GPT-3 & Beyond. Starting from minute 28:13, Jan 2023.
- Example Task: Ask ChatGPT or GPT3 questions about restaurants or hotels in Mannheim using TripAdvisor data and in-context learning.