I’m trying to think of a scalable solution for my current system.
The current system is
1 processing machine
- 60-100GB Files come from 2-3 microscopes every 30 minutes - That data is transferred to a (local) network mount of the processing machine - The processing machine runs and contains the ETL(airflow)
Right now it currently works well.
I am concerned in the future that as the demand and load (size of file, processing times, etc..) increases we may face bottleneck(s). I was thinking of using a cluster of machines (via cloud computing or buy a couple more machines), but our network is not the fastest, maybe transferring around 100-200mbps. I worry with distributed computing the transfer speed would nullify the benefit of multiple machines.
I’m considering an idea where a group of machines are in a queue, if the top of the queue is not busy then the microscope can transfer the initial file to that machine and the rest of the process(2-3) can run as normal. I’m just wondering if this is a sane approach or if there is anything I can improve on.