Product categorization are methods that enable (usually automated) assignment of products to one or more pre-specified classes or categories.
Product categorization is most often used in ecommerce setting, e.g. for allowing clients easier search on online stores. It can also be used for other purposes, e.g. allowing classification of complete websites/businesses based on
the categories of products that they are selling.
What are the possible ways of product categorizations?
There is no unique way of categorizing products. In fact, many e-commerce retailers use their own definitions, which are also called product taxonomies.
Different taxonomies differ from each other both in categories used as well as in the number of Tiers or depth of their categorizations.
Here is an example of classification for a tube used in aquarium with Google taxonomy:
Animals & Pet Supplies > Pet Supplies > Fish Supplies > Aquarium & Pond Tubing
In this case, there are 4 Tiers available for the categorization of this product, from general ones like Animals Pet Supplies to a more detailed one - Aquarium & Pond Tubing.
When classifying this product with Facebook taxonomy, it would be a little bit different:
pet supplies > pet feeding & watering supplies
Why is product categorization important?
One important benefit is that it allows users to find desired products more quickly, thus improving user interface experience. This leads to higher satisfaction and higher probability that the user will return to the website and thus purchase again in a repeat business. Finding products quickly also means better conversion for the online store.
Another way that the UI experience is improved is if online store implements filtering based on products categorizations which is alternative way of finding products, as compared to search queries.
Moving on from on-site benefits, a significant advantage of product categorizations is its impact on search engine rankings. Categorization keywords improve the relevancy of webpages to the search engine ranking algorithms and may thus lead to higher rankings and more visits from search engines.
When online store implements a separate set of webpages where the products are listed grouped by categories, then these subpages will be indexed in search engines and represent additional ways for customers to find the website.
How is product categorization done in practice?
Due to sheer volume of products being available in online stores, e.g. Amazon sells over 12 million products, the process of product categorization is usually automated, by using machine learning or deep learning models.
Machine learning (ML) models are usually developed by training them on data set with labelled categories, as this task belongs to the supervised machine learning.
ML models itself are from the class of text classification models and can include many different ones: Support vector machines, Naive Bayes, Logistic Regression, Random Forests, Decision Trees, Recurrent Neural Nets, Convolutional neural networks and others.
Feature engineering and text vectorization
An important part of the ML pipeline is the pre-processing and conversion to numerical format of product names before being sent to the ML model. Conversion of texts to numerical format that machine learning models can understand is also known as feature engineering.
We can use many features of products for categorization. E.g. Shopify stores allow us to use product title, product image, product description and product tags.
In our ML model, the input is primarily based on texts.
In this case, the vectorization of texts can be accomplished through use of many different natural language processing (NLP) techniques, including TF-IDF, Word2Vec, GloVe, BERT.
Before performing the featurization, one also performs several text pre-processing steps, such as:
- - Removing stop words (i.e. “and”, “we”, etc.), special characters
- - carrying out tokenization which is the practice of splitting a string into an array of individual words or so-called tokens.
Product classification database
One of our services is the offering of product classification database, which is an offline database of 1+ million online shops classified in several Tiers, according to E-Commerce Taxonomy.
Our product classification database can be used in many use cases: internal applications, Saas platforms, consulting and market reports/research. Please contact us for more information on our offline database.