I’m building a data visualization that displays COVID information for the United States, at the city, state, and county level.
The ultimate source of truth are three CSVs published by the New York Times on Github in this repo:
The CSVs are updated once per day with new data from the previous day.
The front-end involves selecting a state, county, and type of statistic (number of deaths, number of cases, etc.). Three line charts are then displayed, showing the rate of change over time – at the national, state, and county level.
Right now, the app is purely front-end. It downloads the set of three CSVs (which are quite large), then does a series of calculations on the data, and when the Promise completes, the visualization is finally displayed in the browser. It takes good 5-10 seconds to complete on a good internet connection – hardly sustainable in production, and also requires the user to download the entirety of the data, even though they might be only looking for a few combinations of states / counties.
Is there a solution that could speed this up, without requiring a back-end? Or is a formal database / backend structure needed?
Here is my general idea of what the back-end solution (I would use a Node.js / Express REST API setup) would entail, but looking for suggestions:
Deploy a Node.js script that downloads the CSVs once per day and puts the data in a database. I could either download the entirety of the CSVs and rewrite the entire database, or download just the new data and add it to the database.
Do some additional calculations on the data (for example, calculate change from the previous day) and then send those to the database. These additional calculations could also be done client-side (this is how it is working currently in my front-end solution)
When the user loads the page, have the front-end query for a list of states and counties from the back-end, so front-end can load.
When the user selects a state / county combination, send just that information to the back-end via a REST API. Have the back-end query the database and return just the requested information to the front-end.
a. Obviously, a no-backend solution would be preferred, but I can’t
think of a way where I can query these CSVs with just the
user-supplied information without downloading them in their entirety
b. From a database perspective, it is a big lift / cost to delete all
the data and rewrite it entirely? Or would it be more cost-efficient
(assuming this is a cloud-based solution) to only add the new data?
(assuming the old data does not change, which is an assumption)
c. I’ve been looking at GraphQL as an alternative to REST, but I’m
not sure it will solve the problem of having to download the CSVs in
their entirety and “store” them somewhere. There are several
open-source APIs online already that provide a more convenient way to
query the data:
But these all seem to be pulling from the CSV, and they take a long time. Is this because they are accessing the data from a CSV instead of a database which I’m assuming has much faster access?