1. Entity Matching using Deep Learning
- S. Mudgal et al., “Deep Learning for Entity Matching: A Design Space Exploration,” in Proceedings of the 2018 International Conference on Management of Data, New York, NY, USA, 2018, pp. 19–34.
- N. Barlaug and J. A. Gulla, “Neural Networks for Entity Matching: A Survey,” ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 3, p. 52:1–52:37, Apr. 2021.
- Y. Li, J. Li, Y. Suhara, A. Doan, and W.-C. Tan, “Deep entity matching with pre-trained language models,” Proceedings of the VLDB Endowment, vol. 14, no. 1, pp. 50–60, Sep. 2020.
- More references and benchmarks: Papers with Code: Entity Resolution
2. Entity Matching using Contrastive Learning
- M. Almagro, D. Jiménez, D. Ortego, E. Almazán, and E. Martínez, “Block-SCL: Blocking Matters for Supervised Contrastive Learning in Product Matching.” arXiv:2207.02008 [cs], Jul. 05, 2022.
- R. Wang, Y. Li, and J. Wang, “Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation.” arXiv:2207.04122 [cs], Jul. 08, 2022.
- R. Peeters and C. Bizer, “Supervised Contrastive Learning for Product Matching.” in Companion Proceedings of the Web Conference 2022, Lyon, France, April 2022.
3. Entity Matching using Domain Adaptation
- N. Kirielle, P. Christen, and T. Ranbaduge, “TransER: Homogeneous Transfer Learning for Entity Resolution.” in Proceedings of the 25th International Conference on Extending Database Technology , 2022, pp. 118–130.
- M. Trabelsi, J. Heflin, and J. Cao, “DAME: Domain Adaptation for Matching Entities,” in Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, New York, NY, USA, Feb. 2022, pp. 1016–1024.
- J. Tu et al., “Domain Adaptation for Deep Entity Resolution,” in Proceedings of the 2022 International Conference on Management of Data, New York, NY, USA, Jun. 2022, pp. 443–457.
4. Active Learning for Entity Matching
- J. Huang, W. Hu, Z. Bao, Q. Chen, and Y. Qu, “Deep entity matching with adversarial active learning,” The VLDB Journal, Apr. 2022.
- A. Jain, S. Sarawagi, and P. Sen, “Deep indexed active learning for matching heterogeneous entity representations,” Proceedings of the VLDB Endowment, vol. 15, no. 1, pp. 31–45, Sep. 2021.
- A. Bogatu, N. W. Paton, M. Douthwaite, S. Davie, and A. Freitas, “Cost–effective Variational Active Entity Resolution,” in 2021 IEEE 37th International Conference on Data Engineering, Apr. 2021, pp. 1272–1283.
- More references and benchmark results: Papers with Code: MusicBrainz20K
5. Deep Learning for Blocking
- S. Thirumuruganathan, H. Li, N. Tang, M. Ouzzani, Y. Govind, D. Paulsen, G. Fung, and A. Doan. 2021. Deep learning for blocking in entity matching: a design space exploration. Proceedings of the 2021 VLDB Endowment 14, 11 (July 2021), 2459–2472.
- W. Zhang, H. Wei, B. Sisman, L. Dong, C. Faloutsos, and D. Page. 2020. AutoBlock: A Hands-off Blocking Framework for Entity Matching. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM ’20), Association for Computing Machinery, New York, NY, USA, 744–752.
- R. Wang, Y. Li, and J. Wang, “Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation.” arXiv:2207.04122 [cs], Jul. 08, 2022.
6. Column Type Annotation in Tabular Data
- Suhara, Y., Li, J., Li, Y., Zhang, D., Demiralp, Ç., Chen, C. and Tan, W.C. “Annotating columns with pre-trained language models.” In Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 1493-1503
- Zhang, D., Suhara, Y., Li, J., Hulsebos, M., Demiralp, Ç. and Tan, W.C. Sato: Contextual semantic type detection in tables. arXiv preprint arXiv:1911.06311. 2019
- X. Deng, H. Sun, A. Lees, Y. Wu, and C. Yu, “TURL: table understanding through representation learning,” Proc. VLDB Endow., vol. 14, no. 3, Nov. 2020, pp. 307–319
- More references and bechnmarks: Papers with Code: Column Type Annotation
7. Column Pair Annotation in Tabular Data
- Suhara, Y., Li, J., Li, Y., Zhang, D., Demiralp, Ç., Chen, C. and Tan, W.C. “Annotating columns with pre-trained language models.” In Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 1493-1503
- D. Wang, P. Shiralkar, C. Lockard, B. Huang, X. L. Dong, and M. Jiang, “TCN: Table Convolutional Network for Web Table Interpretation,” in Proceedings of the Web Conference 2021, New York, NY, USA, Apr. 2021, pp. 4020–4032
- Chen, S., Karaoglu, A., Negreanu, C., Ma, T., Yao, J.G., Williams, J., Jiang, F., Gordon, A. and Lin, C.Y. LinkingPark: An automatic semantic table interpretation system. Journal of Web Semantics, 74, 2022, p.100733.
- More references and bechnmarks: Papers with Code: Columns Property Annotation
8. Cell Entity Annotation in Tabular Data
- X. Deng, H. Sun, A. Lees, Y. Wu, and C. Yu, “TURL: table understanding through representation learning,” Proc. VLDB Endow., vol. 14, no. 3, Nov. 2020, pp. 307–319
- Huynh, V.P., Liu, J., Chabot, Y., Labbé, T., Monnin, P. and Troncy, R., DAGOBAH: Enhanced Scoring Algorithms for Scalable Annotations of Tabular Data. In SemTab@ ISWC, Nov. 2020, (pp. 27–39).
- Chen, S., Karaoglu, A., Negreanu, C., Ma, T., Yao, J.G., Williams, J., Jiang, F., Gordon, A. and Lin, C.Y. LinkingPark: An automatic semantic table interpretation system. Journal of Web Semantics, 74, 2022, p.100733.
- More references and benchmarks: Papers with Code: Cell Entity Annotation
9. Representation Learning for Tabular Data
- D. Wang, P. Shiralkar, C. Lockard, B. Huang, X. L. Dong, and M. Jiang, “TCN: Table Convolutional Network for Web Table Interpretation,” in Proceedings of the Web Conference 2021, New York, NY, USA, Apr. 2021, pp. 4020–4032
- H. Iida, D. Thai, V. Manjunatha, and M. Iyyer, “TABBIE: Pretrained Representations of Tabular Data,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, Jun. 2021, pp. 3446–3456
- Z. Wang et al., “TUTA: Tree-based Transformers for Generally Structured Table Pre-training,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, New York, NY, USA, Aug. 2021, pp. 1780–1790
- X. Deng, H. Sun, A. Lees, Y. Wu, and C. Yu, “TURL: table understanding through representation learning,” Proc. VLDB Endow., vol. 14, no. 3, Nov. 2020, pp. 307–319
10. Representation Learning for Data Cleansing/Missing Value Imputation
- Richard Wu, Aoqian Zhang, Ihab Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. Proceedings of Machine Learning and Systems 2, (March 2020), 307–325.
- Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. 2022. TURL: Table Understanding through Representation Learning. SIGMOD Rec. 51, 1 (June 2022), 33–40.
- Ihab F. Ilyas and Theodoros Rekatsinas. 2022. Machine Learning and Data Cleaning: Which Serves the Other? J. Data and Information Quality 14, 3 (September 2022), 1–11.
11. Information Extraction for E-Commerce Product Data
- Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, and Jiawei Han. 2022. OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision. In Proceedings of the ACM Web Conference 2022, ACM, Virtual Event, Lyon France, 3153–3161.
- Huimin Xu, Wenting Wang, Xin Mao, Xinyu Jiang, and Man Lan. 2019. Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 5214–5223.
- Gilad Fuchs and Yoni Acriche. 2022. Product Titles-to-Attributes As a Text-to-Text Task. In Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5), Association for Computational Linguistics, Dublin, Ireland, 91–98.