microservices – Where to place an in-memory cache to handle repetitive bursts of database queries from several downstream sources, all within a few milliseconds span

I’m working on a Java service that runs on Google Cloud Platform and utilizes a MySQL database via Cloud SQL. The database stores simple relationships between users, accounts they belong to, and groupings of accounts. Being an “accounts” service, naturally there are many downstreams. And downstream service A may for example hit several other upstream services B, C, D, which in turn might call other services E and F, but because so much is tied to accounts (checking permissions, getting user preferences, sending emails), every service from A to F end up hitting my service with identical, repetitive calls. So in other words, a single call to some endpoint might result in 10 queries to get a user’s accounts, even though obviously that information doesn’t change over a few milliseconds.

So where is it it appropriate to place a cache?

  1. Should downstream service owners be responsible for implementing a cache? I don’t think so, because why should they know about my service’s data, like what can be cached and for how long.

  2. Should I put an in-memory cache in my service, like Google’s Common CacheLoader, in front of my DAO? But, does this really provide anything over MySQL’s caching? (Admittedly I don’t know anything about how databases cache, but I’m sure that they do.)

  3. Should I put an in-memory cache in the Java client? We use gRPC so we have generated clients that all those services A, B, C, D, E, F use already. Putting a cache in the client means they can skip making outgoing calls but only if the service has made this call before and the data can have a long-enough TTL to be useful, e.g. an account’s group is permanent. So, yea, that’s not helping at all with the “bursts,” not to mention the caches living in different zone instances. (I haven’t customized a generated gRPC client yet, but I assume there’s a way.)

I’m leaning toward #2 but my understanding of databases is weak, and I don’t know how to collect the data I need to justify the effort. I feel like what I need to know is: How often do “bursts” of identical queries occur, how are these bursts processed by MySQL (esp. given caching), and what’s the bottom-line effect on downstream performance as a result, if any at all?

I feel experience may answer this question better than finding those metrics myself.

Asking myself, “Why do I want to do this, given no evidence of any bottleneck?” Well, (1) it just seems wrong that there’s so many duplicate queries, (2) it adds a lot of noise in our logs, and (3) I don’t want to wait until we scale to find out that it’s a deep issue.

sql server – Cannot reclaim Index Unused Memory in In-Memory OLTP

Steps to reproduce the problem

Create a database with memory-optimized filegroup and container
Create schema only in-memory table with nonclustered pk
Simulate insert and delete activity.
My result is that I have high index unused memory that won’t go down.

 USE master
 go
 DROP DATABASE IF EXISTS MemoryOptimizedTest
 CREATE DATABASE MemoryOptimizedTest
 GO
 USE MemoryOptimizedTest
 GO
 ALTER DATABASE MemoryOptimizedTest 
 ADD FILEGROUP imoltp_mod CONTAINS MEMORY_OPTIMIZED_DATA
 GO 
 
 ALTER DATABASE MemoryOptimizedTest ADD FILE (name='imoltp_mod1', filename='c:imoltp_mod1') TO FILEGROUP imoltp_mod
 GO
 
 
 DROP TABLE IF EXISTS dbo.MyCache
 CREATE TABLE dbo.MyCache
 (
    PK int NOT NULL, 
    SecondInt int NOT NULL,
    ThirdInt int NOT NULL,
     CONSTRAINT PK_MyCache PRIMARY KEY NONCLUSTERED (PK)
 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
 
 go

/* Generate activity and monitor table size */
USE MemoryOptimizedTest
go


SELECT
    object_id,
    OBJECT_SCHEMA_NAME(object_id) + '.' + OBJECT_NAME(object_id) AS Table_Name,
    memory_allocated_for_table_kb,
    memory_used_by_table_kb,
    memory_allocated_for_indexes_kb,
    memory_used_by_indexes_kb
FROM sys.dm_db_xtp_table_memory_stats
WHERE OBJECT_ID = OBJECT_ID('dbo.MyCache')

;WITH
  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
  L1   AS(SELECT 1 AS c FROM L0 CROSS JOIN L0 AS B),
  L2   AS(SELECT 1 AS c FROM L1 CROSS JOIN L1 AS B),
  L3   AS(SELECT 1 AS c FROM L2 CROSS JOIN L2 AS B),
  L4   AS(SELECT 1 AS c FROM L3 CROSS JOIN L3 AS B),
  L5   AS(SELECT 1 AS c FROM L4 CROSS JOIN L4 AS B),
  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
, tally AS (SELECT TOP (10000) n FROM Nums ORDER BY n)
INSERT INTO dbo.MyCache (PK, SecondInt, ThirdInt)
SELECT 
    n
    , n+1
    , n+2
FROM tally 

WAITFOR DELAY '00:00:02'
DELETE FROM dbo.MyCache

GO 50

When I run it on my localmachine Microsoft SQL Server 2017 (RTM-GDR) (KB4505224) – 14.0.2027.2 (X64)
with 16 GB Max Memory and 1.5 GB Available memory, the memory_allocated_for_indexes_kb fluctuates normally.

When I run it on our DEV environment Microsoft SQL Server 2019 (RTM-CU7) (KB4570012) – 15.0.4063.15 (X64)
2 TB Max Memory, 220 GB Available memory

The memory_allocated_for_indexes_kb only grows. I’ve simulated activity for a table for few hours and have index used memory = 0.24 MB, Index Unused Memory = 385 MB and it won’t go down.

The garbage collector ran according to PerfMon Sweep expired rows removed/sec in XTP Garbage collection.

I read somewhere that the garbage collector doesn’t free up space until it faces memory pressure but it seems weird that it would hold so much unused memory.

In-memory pepper – Information Security Stack Exchange

First: you are definitely correct, it can’t be stored alongside your code, no secret should.

It obviously needs to be held in memory by the server since it needs to be used.
However, it can’t be in-memory only. As long as a single password hashed with the pepper remains, it must be available. If it’s only held in-memory, you’ll have a different pepper for every server, and for every restart.

The real question becomes: where to persist it.

At this point, this is the same as storing any sensitive cryptographic material (eg: a private key). You have a lot of options, some of which are:

  • store the pepper in a cloud KMS (eg: Google/AWS KMS, Azure Vault, etc…). Either directly or encrypt the pepper using a cloud KMS and store the encrypted pepper (the encrypted pepper can be stored somewhere less sensitive, but still with restricted permissions). But the key stays inside the KMS.
  • same as above with HSM.
  • roll your own secrets storage system. Things like keywhiz and Vault can be very powerful, but these are not simple systems to deploy and manage.
  • deploy the secret to the server at setup or startup time. Not ideal, but still better than a version control system. And there’s still the problem of where it’s really stored.

The general idea behind all of those is that to have access to the pepper, you should show that you have control of the server. This does nothing to stop attackers with sufficient privileges on the server but that’s not what peppers are for. If they have that level of access, they can just replace your binary with one that logs the plaintext passwords.

in memory database – oracle XE 18c inmemory settings

I try to this in oracle XE 18c.
I cant’t alter inmemory_size property but can alter sga_target.
Who relieve me?

SQL> alter system set inmemory_size=200M;
alter system set inmemory_size=200M
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-02095: specified initialization parameter cannot be modified

SQL> alter system set sga_target=1024M;

System altered.

testing – What is the correct way to use an in-memory sqlite database for Unittests?

I have found vague hints on the internet that we can speed up Unittests with an in-memory database by using the following line in our phpunit.xml file:

<env name="SIMPLETEST_DB" value="sqlite://localhost/:memory:"/>

However, this will not work for me. In the method BrowserTestBase::testLoad, I will run into the error Undefined index: value without any further explanation. With a MySQL database, everything works fine (albeit slowly).

What is the correct way to use an in-memory sqlite database for Unittests? Must I install something? Must I put something into the settings file? Is this feature documented somewhere?

javascript – In-memory database class

I built this basic in-memory database that can support some basic operations, like SET, GET, DELETE, COUNT, etc. As well as support transactions. Some of the constraints are as follows: GET​,​ SET​,​ DELETE​, and C​OUNT​ should have a runtime of less than O​ (log n),​ if not better (where ​n​ is the number of items in the database). The memory usage shouldn’t be doubled for every transaction.

One improvement I was thinking to apply is to use a data structure like Map() to store the data — shouldn’t that provide a runtime of O(1) for all basic operations?

I appreciate any advice on how to improve this code.

export class Database {

    constructor() {
        this.database = {valuesCount: {}, names: {}};
        this.transactions = ();
    }

    /**
     * Set the name of the database entry
     *
     * @param {string} name The name to set
     * @param {string} value The value to set the name to
     */
    set(name,value) {
        if(this.database.names(name) === undefined) {
            this.updateValueCount(value);
            this.database.names(name) = value;
        } else if(!!this.database.names(name)) {
            if(this.database.names(name) !== value) {
                this.updateValueCountForExistingName(name, value);
                this.database.names(name) = value;
            }
        }
    }

    /**
     * Update the value count for a new name
     *
     * @param {string} value The value count to update
     */
    updateValueCount(value){
        this.setCountForValue(value);
    }

    /**
     * Update the value count for an existing name
     *
     * @param {string} name The name of the value count to update
     * @param {string} value The value count to update
     */
    updateValueCountForExistingName(name, value){
        this.deleteValuePropertyForName(name);
        this.setCountForValue(value);
    }

    /**
     * Sets the count of a particular value
     *
     * @param {string} value The value to set the count for
     */
    setCountForValue(value) {
        if(!!this.database.valuesCount(value)) {
            this.database.valuesCount(value)++;
        } else {
            this.database.valuesCount(value) = 1;
        }
    }

    /**
     * Get the name of the database entry
     *
     * @param {string} name The name to get
     */
    get(name) {
        console.log(!!this.database.names(name) ? this.database.names(name) : null);
    }

    /**
     * Delete entry from database
     *
     * @param {string} name The name to delete
     */
    deleteFromDatabase(name) {
        if(!!this.database.names(name)) {
            this.deleteValuePropertyForName(name);
        }
    }

    /**
     * Counts the number of occurrences val is found in the database
     *
     * @param {string} value The value to count
     */
    count(value) {
        if(!!this.database.valuesCount(value)) {
            console.log(this.database.valuesCount(value));
        } else {
            console.log(0);
        }
    }

    /**
     * Begins a transaction
     */
    beginTransaction() {
        if(this.transactions.length === 0) {
            this.transactions.push(this.database);
        }
        let shallowCopy = {valuesCount: {...this.database.valuesCount}, names: {...this.database.names}};
        this.transactions.push(shallowCopy);
        this.database = this.transactions(this.transactions.length-1);
    }

    /**
     * Rollback a transaction
     */
    rollback() {
        if(this.transactions.length > 1) {
            this.transactions.pop();
            this.database = this.transactions(this.transactions.length-1);
        } else {
            console.log('TRANSACTION NOT FOUND');
        }
    }

    /**
     * Commit a transaction
     */
    commit() {
        this.database = this.transactions(this.transactions.length-1);
        this.transactions = ();
    }

    /**
     * Delete value property for a particular name
     *
     * @param {string} name The value to delete
     */
    deleteValuePropertyForName(name) {
        this.database.valuesCount(this.database.names(name))--;
        if(this.database.valuesCount(this.database.names(name)) === 0) {
            delete this.database.valuesCount(this.database.names(name));
        }

        delete this.database.names(name);
    }

    /**
     * Handle User Input for Various Database Commands
     *
     * @param {string} input User command line input
     * @returns {boolean}
     */
    handleInput(input) {
        const inputRaw = input.split(' ');
        const action = inputRaw(0);
        const arg1 = inputRaw(1);
        const arg2 = inputRaw(2);
        let name = '';
        let value = '';

        switch(action) {
            case 'SET':
                if(!!arg1 && !!arg2) {
                    name = arg1;
                    value = arg2;
                    this.set(name, value);
                } else {
                    console.log('Invalid Input: the SET command must include a name and a value.');
                }
                break;
            case 'GET':
                name = inputRaw(1);

                if(!!name) {
                    this.get(name);
                } else {
                    console.log('Invalid Input: the GET command must include a name.');
                }
                break;
            case 'DELETE':
                name = inputRaw(1);

                if(!!name) {
                    this.deleteFromDatabase(name);
                } else {
                    console.log('Invalid Input: the DELETE command requires a name.');
                }
                break;
            case 'COUNT':
                value = inputRaw(1);

                if(!!value) {
                    this.count(value);
                } else {
                    console.log('Invalid Input: the COUNT command requires a value to count.');
                }
                break;
            case 'BEGIN':
                this.beginTransaction();
                break;
            case 'ROLLBACK':
                this.rollback();
                break;
            case 'COMMIT':
                this.commit();
                break;
            case 'END':
                return true;
            default:
                console.log('Function is not valid.');
        }
    }
}

caching – How do I decide an initial in-memory cache size given my DB size and expected load throughput?

(Purely for learning purposes)

Say the DB contains 1 billion rows with 200 bytes per row = 200 GB of data.

The traffic at peak is 1000 requests/s, with each request asking for one DB row.

What cache size would I begin with to ease off the load on the DB? I realize that this is determined best empirically and can be tuned as time goes on.

Caches are usually not too large given memory constraint (unless you go for a distributed cache like redis), so we can’t have the in-memory cache be more than say 200 MB of space, which accounts for way less than 1% of the DB size and seems too small. The cache might just spend all its time being 100% occupied with 95% misses and evicting entries and caching new entries using a simple LRU scheme.

Perhaps there’s no point bothering to cache anything in-memory here. In that case, how would you go about coming up with an initial cache size in a redis cache?

Caching or in-memory table in Azure for performance

I am building an Angular web application that retrieves part of its data from a Azure SQL Database table via APIs developed in Azure Functions (with Azure API Management at the API Gateway). The data in the table (30k records) do not change for at least 24 hours. The web app needs to display this data in a grid (table structure) with pagination and users can apply filter conditions to retrieve and show a subset of the data in the grid (again with pagination). They can also sort the data on a column in the grid. The web app will need to be accessed by few hundred users on their iPad/tablet with 3G internet speed. Keeping the latency in mind, I am considering one of these two options for optimum performance of the web app:

1) Cache all the records from the DB table in Azure Redis Cache with cache refresh every 24 hours, so that the application will fetch the data to populate the grid from the cache, thus avoiding expensive SQL DB disk I/O. However, I am not sure how the filtering based on a field value or range of values will happen from Redis Cache data. I have read about using Hash data type for storing multivalued objects in Redis and SortedSet for storing sorted data, but I am particularly not sure about filtering data in Redis based on the range of numeric values (similar to BETWEEN clause in SQL) in Redis Cache. Also, is it at all advisable to use Redis in this way for my use case?

2) Use in-memory OLTP (memory optimized table for this particular DB table) in Azure SQL DB for faster data retrieval. This will allow to handle the filtering and sorting requests from the web app with plain SQL queries. However, I am not sure if it’s appropriate to use memory optimized tables for improving just table read performance (from what I read, Microsoft suggests to use it for insert-heavy transactional operations).

Any comments or suggestions on the above two options or any other alternative way to achieve performance optimization?

Unit test – are in-memory databases a form of integration test?

An in-memory database can be useful for both unit tests and integration tests, but it depends on what you're trying to do.

Unit tests test a single component. Ideally, this device is tested in isolation from other components, but this is not essential. The use of other tested components in a component test is okay for practical reasons. Isolated tests usually run faster and point out the cause of the error. However, setup can be difficult unless the software has been used designed on testability, for example by minimizing dependencies between components and connecting components via small, easily mocked interfaces.

Integration tests check the interactions between components. Components that are irrelevant in this test can still be mocked. But this is also a compromise between speed and fault location compared to comfort. Some tests are so difficult to set up that they are not worth setting up!

An in-memory database is useful for both unit tests and integration tests when you don't want to mock a full data access layer or when you need a real database because of an ORM. Here, using an in-memory database is easier to set up, faster, and provides easy isolation between tests by setting up a new database for each test.

However, this does not test the integration between your software and the actual database. These differences can be significant due to SQL dialects, etc. Therefore, a test plan should also include tests with the real database software, e.g. in a staging environment.

In practice, it is not particularly important to clearly differentiate between unit tests and integration tests. It is more important that you have a good automated test suite and possibly tools and instructions for reproducible manual tests. Personally, I mainly write integration tests because they make it easier to demonstrate value of the software tested and because they offer better confidence in how the software works as a whole. I've found that BDD-style integration tests are particularly helpful, although they still require the use of fine-grained TDD-style unit tests.

Why does the PHP community always rely on file-based logging instead of a combination with in-memory logging?

This is a first thought I have for logging. Obviously something is missing from the whole picture because I don't think anyone had thought about it before.

PHP runs on a request. That said, it is included, which means each request has to open its own connections, etc. This is a given, but when you look at how logging is done, it is a little annoying: every time an error occurs , open a file, add this file and close it, or, as far as I know, you can open a file stream, but it will still be written to the system every time it is logged.

So … why don't we just store all the errors in memory? When the request is complete (every framework under the sun has actions, meaning you can include them right at the end of a request), do we just write them into the file?

Of course, this has the advantage that the request / system collapsed critically for some reason and writing was not possible. However, if this is not a problem, why not just write memory then file?