structured data to make ML [Machine Learning]
more precise and accurate in eCommerce.
We analyzed the top 30 Fashion websites looking for data issues on the most common product we could find: T-shirts
100% of the sites were missing one or more references to the basic t-shirts attributes. | |
80,7% of the sites had wrong items on the search results (cataloging errors). | |
76,9% of the sites had products that didn’t appear when we applied the search filters |
As we can see, the eCommerce data issues are common even on the top 30 websites of the eCommerce industry.
The data uniqueness forces the development of unique ML models for every single site. The data uniqueness forces the companies to have custom built ML models that will have smaller a life cycles and that can't be scaled.
products
structured
data
Shifting the focus from the company's data to the product's data will remove most of the data singularity from ML models, cutting costs and allowing scalability.
A T-shirt will always be a T-shirt no matter who sells it. A T-shirt will always have the same physical properties independent of the sellers unique data.
For us the focus needs to be firstly on the product and then on the company's data.
We want to build product blueprints with all the product’s options, properties, and variables.
This products structure data will be a knowledge base / starting point for the data scientists.
Products structure data will remove a huge part of data analysis process, reduce analysis errors and largely reduce costs.
By having access to a structured data schema, it is simple to generate product descriptions and answers to questions
In the voice space, the users tend to ignore the limitations of the search filters. By having a structured data schema, is simpler to understand the user request, allowing more correct answers than the typical “I don’t know that one” answer
By using the blueprints it is faster and simpler to construct/reuse an agent. The maintenance work will be lower because there is no need for custom development.
we select and curate the best source data to build the structured data
we use proprietary NLP models to process the
curated information
we use a mix GTPs, vectorization, and clustering to generate structured data schemas
openWorld.domains is focused on building structured data and knowledge bases for the development of better ML models.
Our work can be used widely in the eCommerce ecosystem. Structured data schemas will allow from better product recommendations up to the development of self-learning chatbots or digital assistants.