ترجمه مقاله نقش ضروری ارتباطات 6G با چشم انداز صنعت 4.0
- مبلغ: ۸۶,۰۰۰ تومان
ترجمه مقاله پایداری توسعه شهری، تعدیل ساختار صنعتی و کارایی کاربری زمین
- مبلغ: ۹۱,۰۰۰ تومان
Abstract
The categorization of retail products is essential for the business decision-making process. It is a common practice to classify products based on their quantitative and qualitative characteristics. In this paper, we use a purely data-driven approach. Our clustering of products is based exclusively on the customer behaviour. We propose a method for clustering retail products using market basket data. Our model is formulated as an optimization problem which is solved by a genetic algorithm. It is demonstrated on simulated data how our method behaves in different settings. The application using real data from a Czech drugstore company shows that our method leads to similar results in comparison with the classification by experts. The number of clusters is a parameter of our algorithm. We demonstrate that if more clusters are allowed than the original number of categories is, the method yields additional information about the structure of the product categorization.
5. Conclusion
We introduced a new method for the product categorization based solely on the market basket data. The method uses a genetic algorithm for dividing products into a given number of clusters. We tested the method using synthetic and real data. The method performs well at synthetic data even if the assumptions are violated to some point. We verified our method using real market basket data from a drugstore’s retail market. We found that the method accurately identified categories which do not significantly violated the assumptions. When the assumption that customers buy at most one product from each category is violated then the products from that category were spread into several clusters instead of assigning to one cluster. It is worth noting that the original categories were subjectively chosen. Our method identified several hidden subcategories using only market basket data that may be widely used in marketing and in general in decision-making processes. We found out that a common feature of customer’s behaviour in the Czech drugstore market is that there are not enough receipts with a larger amount of different products, which lead to a violation of the ideal behaviour (IB) and the method’s assumptions. If we had more data, we suppose that the method would give even more accurate results. Simulations using synthetic data strongly support this hypothesis.