Suppose I have this spreadsheet, which displays every page on my site:
CREATE TABLE "example.com page loads" ( id bigserial, "URL" text NOT NULL, "IP address" inet NOT NULL, "user agent" text, "timestamp" timestamptz NOT NULL DEFAULT now(), PRIMARY KEY (id) )
If the same person loads 100 pages, or many others with the same exact user-agent string load 10,000 pages, the same long "user-agent" string will be 100 / 10,000 times redundantly stored in my bad table, massively inflating it.
This has always been a big problem for me when using plain text web server logs, and later when I was doing exactly what I'm describing (a database table in PostgreSQL).
One very obvious and immediate thought that pops up in my mind is, "Why can not the user agents be stored internally only once and then be referenced by PostgreSQL automatically without revealing this internal optimization for me?"
That means I do not need to create a separate table like this:
CREATE TABLE "example.com unique user agents" ( id bigserial, "user agent" text, PRIMARY KEY (id), UNIQUE ("user agent") )
… and then have to do expensive and annoying manual queries to determine if the user agent already exists in the unique user agent table, and then use a column named "unique user agent ID" referenced from this table Table "" refers. Page loads "instead of a nice, simple text column.
I'm sure you understand exactly what I mean. Basically, it is so obvious that I'm 99% sure that this would have been resolved a few years ago, only I never noticed.
There is probably a simple function to do just that, such as (this is just my guess):
CREATE TABLE "example.com page loads" ( id bigserial, "URL" text NOT NULL, "IP address" inet NOT NULL, "user agent" text OPTIMIZE_UNIQUELY_INTERNALLY, "timestamp" timestamptz NOT NULL DEFAULT now(), PRIMARY KEY (id) )
That would be nice if there is such an "OPTIMIZE_UNIQUELY_INTERNALLY" flag that I can only apply to columns if I want to do this "under the hood" without having to think about it!
If there is, I would save a lot of storage space and headaches.
I do not think that's the same thing as indexes. If you convert the User Agent column to an index, PG will not save each unique value only once, right? It would only create one additionally "Lookup table" for faster queries?