Developing data science products will in most cases require extensive testing and tuning models using historic data. At Gousto this is the case for a number of optimisation and machine learning algorithms powering our operations and marketing efforts. In turn, the majority of these cases will involve large volume data queried from a database. In order to maintain agility in developing your data-driven product the last thing you want is to wait for queries to complete while testing iterations to your code.
To prevent re-running large queries we can construct a simple Python method to cross-reference a hashed SQL query against stored query results saved under hash-filenames. Keep reading to find out how!