ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction
In order to facilitate features such as faceted product search and product comparison, e-commerce platforms require accurately structured product data, including precise attribute/
In the few-shot scenario, we investigate (i) the provision of example attribute values, (ii) the selection of in-context demonstrations, (iii) shuffled ensembling to prevent position bias, and (iv) fine-tuning the LLM. We evaluate the prompt templates in combination with hosted LLMs, such as GPT-3.5 and GPT-4, and open-source LLMs which can be run locally. We compare the performance of the LLMs to the PLM-based methods SU-OpenTag, AVEQA, and MAVEQA.
The highest average F1-score of 86% was achieved by GPT-4 using an ensemble of shuffled prompts that integrated a comprehensive target schema containing attribute descriptions and example values with demonstrations. Llama-3-70B performs only 3% worse than GPT-4, making it a competitive open-source alternative. Given the
same training data, this prompt/