Data Science for Dinner at the University of Cambridge

The UK has a young, but fast-growing data science community, as evidenced by the growing number of regular data science meet-ups and companies offering training programmes like ASI or S2DS. As a data science team we are keen to learn from fellow data scientists, as well as to showcase our work and the impact it has had on the business. Giving talks at universities is one way in which connect with the community, while improving our chances to grow our team with the UK’s most promising new data scientists. Recently, we combined a team off-site in Cambridge with a workshop in partnership with the Cambridge University Data Science Society. You can find the slides we used for the event in this post, along with some learnings we took away as a team.

Start early

Building a data science team capable of creating step-changes across business KPIs takes time. For one, building the right environment for data science is no mean feat. Without a solid data pipeline and platform infrastructure in place, data scientists often find themselves spending the bulk of their time retrieving and cleaning data before they even get to develop and test data products. We believe the environment we have built at Gousto allows us to be truly Agile as a team and build impactful data products in much less time than our competitors. The time saved in deploying the first version of any given product can be reinvested in iterating on and generating compounding value from existing data products.

A second reason building an impactful data science team takes times has to do with within-team learning and buy-in from the wider business. Achieving deep impact of data product largely rests on making existing optimisation, statistics and machine learning techniques fit for purpose within the organisation. At Gousto, this means every new team member spends a lot of time in their first months on the team learning about our operations, characteristics of our product offering and customer base as well as the parts of the platform our data products interact with. Meanwhile we actively try to prove the value of data science to the business through presenting at the monthly company ‘tech demo’ and test our products in a staging environment or through controlled experiments tied directly to KPIs.

As we grow our team, we look for signals that applicants will step beyond the comfort zone of the team and actively maintain a dialogue on data science with the other teams at Gousto.

Work back from a deployment plan

At Gousto, engineering teams take the responsibility to run what they build. This holds no differently for the data science team and forces us to start each project with a concept of a ‘live’ data product generating value autonomously —whether that means optimising processes, generating predictions or generating reports— and working back from there.

With a rough deployment plan in hand, we can prioritise projects according to expected impact. Ultimately this also means we are in a good position to build products with compounding value. Every new data product that goes live not only optimises its direct objective, but achieves second order performance effects through interaction with other data products. In the past year, we have seen such compound value come out of related products we deployed across forecasting, warehouse optimisation and inventory control. As we roll out our work on personalisation, we expect to see further compounding.

Having various data products ‘in production’ means we have to allow for time to be stewards for our projects. This stewardship includes explaining to the rest of the business what we have built, responding to potential production errors and keeping track of and working on engineering improvements. At the same time, each live data product give us a set of unambiguous performance indicators to beat for its next iteration. This allows us to challenge each other to reimagine and redesign our products frequently.

Interrogate your source codes

In recent years, many powerful tools have been made accessible to the global data science community as open source software projects. Open source software has been tremendously valuable to us as a team. To give a few examples, we have integrated Facebook’s Prophet tool into our forecasting product and now run many of our products using AirBnB’s Airflow. Although most tools do a great job practically out of the box, the nature of our business is such that we occasionally need to dive in deep and ‘interrogate’ what goes on inside the source code. Such interrogation may for instance lead to the conclusion we need to fork the source code and perform precision surgery on an algorithm; something we encountered for an open source library we use in our recommender system. Source code surgeries come with obvious risks to performance and stability. However, if done right you stand to make your data product significantly stronger and even have a shot at giving back to the open source community with your insights.


For those interested, here are the slides we used during the workshop:

Marc Jansen
Data Scientist