No, a database is not always involved.
Let’s come up with a random service that happens to be moderately popular. A fairly standard set of components includes:
- a user-facing client (webpage or app)
- an API for programmatic access
- a backend to serve clients (API and user-facing)
- a database for the backend to store data
- backups of the database
- data exports to third-parties for analytics or even reselling
Every single component can be the source of a data leak.
Here are some examples how (this is by no means comprehensive):
User-facing client / programmatic API:
Let’s say there’s an endpoint on the server to return a logged-in user’s private profile (not normally publicly visible). The endpoint is just
12345 is my user ID).
Because the developers forgot their coffee that day, they figured they don’t need to perform any checks on it; the user ID is generated by them, there’s no reason for anyone out there to know what it is. So just return the profile for
12345 no matter who is asking.
Because they forgot their coffee on the day they were writing their schema, they also made those user ID monotonically increasing, as opposed to random.
The attacker can now start dumping the private profile of every single user in the system by hitting
/getprofil?userid=1 and increasing by one until it starts telling me the user does not exist.
The same applies to a programmatic API that doesn’t perform the correct authentication/authorization checks.
The under-caffeinated developers have some good friends: the sleep-deprived database admins. Because security is hard, they just deployed their database without secure communication (plaintext as opposed to TLS), and using either simple passwords for the admin users (because they’re hard to remember), or even worse: the default username and password.
This database is also accessible to the world because some backends outside that datacenter need to access it. Any attacker can now connect to the database and try the default username and password. Once in, they can perform any query they like on the database.
The database admins know problems sometimes occur, so they setup regular backups of the entire database. Because they found a neat script that does it for them, they save those backups to an S3 bucket.
But alas, they had a really late night and figure they’ll worry about access permissions on the S3 buckets tomorrow, it’ll just stay open for now, it’s not like anyone is going to find that bucket.
Alas, the attacker has found the publicly-readable bucket with a convenient dump of the entire database. All they need to do is a quick copy of everything.
Analytics is hard, it would be a lot better to export our entire database to a third-party that has really shiny dashboards and graphs. We’ll just setup a pipeline to save all DB modifications straight to that third-party.
Oops, that third-party is just as red-eyed as our developers and database admins, they forgot proper security in their database, or backups. The attacker can now access our data due to someone else’s carelessness.
These are just some naive scenarios on some of the possible compoments, yet they happen constantly. You’ll notice that none of them involve breaking through strong levels of security, or require insider knowledge. They are all attacks that can be carried out by low level security threats.
Not all of them involve a database (eg: the API attacks don’t care how the data is stored), and most of them never have to deal with the database software itself. When they do, it’s a question of database configuration, not the software itself.