Sorry if it is a wrong subject. I came here from stackoverflow.
I develop a web application in Angular (frontend) and Scala (backend) for a big data team. Because they use large files for export/import, I build a module which is a copy of Microsoft Excel.
So, what is the flux for import files:
the client send the file to api1
api1 save the file on a temp folder and send response to client that the import has began.
this time, api1 call api2 (service) to process this file, map rows and columns in list of objects, create a table in database and insert bulk rows (2500 lines for each query) For import, that’s ok, is a process in background and the client don’t need to wait for it because will see results from first second (first bulk insert is realy fast). We talk here about excels with some hundred thousands of rows, maybe millions.
after processing, the file is deleted from temp folder
Now, the problem is at export: How I need to think about exporting a db table like this? Because, if I don’t save it anywhere, I need to get all data from table and compute a temp file (excel worksheet) which take some time (maybe 5, 10, 15 minutes) and I can not keep a connection client – server opened so much time. Anyway, if I can use sockets instead http-requests for this, the client will need to wait this time to compute the file. Is annoying for him.
One solution is to keep the temp file on server/cloud, but probably this will be (or can be) altered by users and need to be updated before downloading. My question is … how I can map db tables in excel files to give it to users instantly, when they want to download a table?
For my api2 I’ll use Apache Spark for reading the files and write into database. But anyway, this will remain an background process and will be decoupled from the user request.