I’m working on a c++ data analysis project. My workflow goes like this

- Analyze the data and build models
- Optimize the code for latency, to deploy for production
- goto 1

Step 1 has lots of machine learning parameters, using which I test very minor variations of algorithms. In step 2, I clean up the unused parts of the code (non optimal parameters), optimize code for latency (changing maps to arrays for example), and deploy the code. These modifications are done directly on the step 1’s code. No separate branch in maintained.

When new data is obtained and step 1 is required to be repeated, I would have lost the ability to test minor variations of an algorithm. One way to solve this is to maintain two branches. One will be an experimental branch, which has all the parameters for the minor variations of the algorithm. Another branch will be latency optimized code. But, the problem here is any small change in experimental branch will need to be repeated in the latency optimized branch, because there two branches cannot be merged. There are huge differences (even new files appearing) between experimental branch and latency optimized branch, which hinder direct merging.

Is there any other way to solve this?

EDIT1: Another example of step 2

For the sake of illustration, lets say step 1 leads me to a predictor y = f(x) = floor(x^3 + 3*x^2 + 5*x), where floor(z) = (value of integer closest but <= z). x in (0, 100).

The basic way to make prediction (in deployment) is to evaluate f(x).

But, observe that the whole function is discrete in output and increasing. So, another way to predict is to store ranges of x, which map to y = 1, 2, …, and do binary search on it.

This will lead to maintaining a vector with entries,

{(lbx_1, y_1), (lbx_2, y_2), …}, where y_i = i, lbx_i gives the least value of x such that f(x) >= y_i.

In this case, for any input, a simple binary search will fetch the prediction much faster.

(In practice, f(x) is much more complicated than the above mentioned function).

The latency optimized branch will have this map for prediction.

The experimental branch will evaluate the function.

But I also need the general evaluator for experimentation, in case my predictor turns out to say y = f(x) = (x – 1)^2.