Suppose you have a data lake wiht sensitive data. Due to immaturity of tools, dynamic data masking is unavailable.
The standard answer is a separation of concerns between developer and support engineer.
But how to protect sensitive data from support engineer?
Support engineer need to debug issues, run jobs, etc. If ETL logic depends on sensistive data (i.e. join by IP address, or filter by medical status) this inherently mean support engineer has to have access to sensitive data.
Encryption (besides a lot of burden) looks to not solve the issue: deterministic encryption isn’t secure, moreover support engineer has to have indirect access to decryption keys in order to run the jobs.
Audit of queries also may not solve the issue, because query results may be downloaded.