Daltix Tech Challenges
Daltix helps companies in retail and FMCG answering questions about what’s going on in their markets by delivering real time market insights. We want to help these large & sometimes slow organizations transform themselves quickly in order to stay relevant in a world which is currently being dominated by big players such as Amazon.
Currently our insights are largely fueled by data scraped from the webshops which is often combined with their own data. However we are not a web scraping company thus we’re not stopping there. We’re actively looking for partners to enrich our datasets as clean data will lead to much better insights.
Daltix is actively working on technical challenges in the following areas.
Distributed Systems Engineering
Daltix has a distributed web-crawling engine which is collecting data from > 50 websites on a daily basis. At this point more than 600GB of data is being downloaded & processed every day, but we plan on expanding to much much more data.
This distributed system is made on top of Amazon Web Services and uses Serverless architectures where possible, with Python being the main programming language used.
As Daltix scales from 50+ websites to 200+ websites (which it scrapes multiple times per day!) it has to invest in orchestration technologies such as Kubernetes as well as logging & monitoring solutions to keep an overview at scale.
Big Data Engineering
Being able to make huge sets of data easily analyzable & available for different use-cases is one of our main challenges as well as building tools to monitor & guard the quality of the data.
Furthermore our data engineering team allows Daltix to integrate various data sources to build a complete & qualitivate dataset which they make accessible for the various other teams within the company such as analytics, data science but also sales, marketing (yes it’s that easy to get insights from our data).
Daltix is using big data technologies such as Spark, Airflow, Amazon Athena (Presto), ElasticSearch & Snowflake to cope with the big amounts of data that it has to process.
Analytics, Machine Learning & AI
Daltix is actively investing in building up a data science team which is solving challenges that now require a lot of manual labor, some examples are:
- Identifying similar & comparable products across different retailers.
- Automatic categorization (classification) of products in a uniform product tree.
- Automatically interpreting the contents of web pages & helping with structuring that data.
For this we are using technologies such as Pytorch, Fast.ai, spaCy.
Daltix is also analysing data in order to help big retailers to stay competitive in terms of pricing, promotions and assortment. Thanks to our analytics team we’re able to give our customers the insights they need in the way they desire. This can range from the traditional Excel file to an interactive Tableau dashboard. Python (pandas, Pyspark) is also used by our analytics team as it is currently the most suited language (sorry R) for data analysis & science.
Full-stack Web Development
There’s no use in gathering & processing big volumes of data if they can’t be accessed. Our full-stack development team is tasked with building web applications that allow customers to access & work with our data.
We’re building a product which allows free-text search for products across all the retailers we collect (handy if you’re looking for the cheapest Ben & Jerry’s) as well as a recommendations system to find similar products (handy if you’re on a budget). This allows our customers to compare private label products with their A-brand counterparts, just like consumers do.
Technology wise we’re relying on AWS & Serverless to build & deploy APIs using Python, while working with Angular for our front-ends.