Help design a db for multiple experiments of time series features


My use case is something like the following:

  1. There are n medical cases today.
  2. Every day 0 to 10 new cases arrive.
  3. Each case is a time series of varying length, of measurements of raw data. Assume a bunch of ecg signals.
  4. Raw data comes from various 1<k<5 sources
  5. Each source has its own sampling rate (between 1 and 100 per second), and can have na values.
  6. Data sources are not necessarily exactly syncronized
  7. Each data source has ‘c_k’ columns
  8. A case is an .h5 file with ~180000 to ~540000 samples, in the highest sampling rate (100HZ).
  9. Data is mostly sampled into matrices, and columns will be those matrices flattened into their respective indices.
  10. The code base has to be in Python.

For example, n=5, k=2, c_1=1, c_2=2, sampling rate of 1: “every row”, sampling rate of 2: “every 2 rows”. There is no “time” column. (should there be?)

Sample |  1_1 | 2_1 | 2_2
1      |  1.0 | 10.0| 100.0
2      |  2.0 |     | 
3      |  3.0 | 30.0| 300.0
4      |  4.0 |     | 
5      |  5.0 | 50.0| 500.0

I would like to have the ability to query in an SQL-like manner, over any intersection of cases, samples, data, with the most flexibility, for research purposes.

This is not for production. Both querying and adding data doesn’t happen often, and there are no scale-related constraints.

What would be a good design for this?

I assume some information may be missing, so please ask and I will add in whatever is needed.