java – Should all third party methods that access outside resources (like other databases) be wrapped up?

From the perspective of unit testing, the code under test should obviously not be accessing outside resources so the third party methods need to be mocked. However, it seems like this is poor practice because third party methods can change and become static/final which makes mocking difficult in Mockito. So in that sense, is it best practice to always wrap up third party methods?

I suppose this question may also apply to other programming languages and testing frameworks.

Databases and B-Trees: What are Keys and how are they related

I confused about the description & definition of “key” occuring as terminology for databases and b-trees.

In first case dealing with theory of databases a key is defined as a choice for minimal subset $K subset A := {A_1,A_2,…,A_n}$ of a fixed set of attributes $A_i$ parametizing a data table (consisting of relations; a single relation is abstraction of a row in the table), where each attribute is literally a column specifying a property of objects = relations.

A key is characterized by property that there exist no two different relations (a relation is a row) which have exactly the same values for all attributes which belong to the key. And a key is a minimal subset with this property, ie there not exist a proper smaller subset of attributes contained in the key and having the property described in last sentence. Clearly keys are not unique. So the set of a keys is certain subset of the power set of the set attributes $A_i$. See also here: https://en.wikipedia.org/wiki/Unique_key

On the other hand the key concept occures as well for b-trees: https://en.wikipedia.org/wiki/B-tree#Definition

Here keys are a priori numbers or integers and different knots of b-tree contain different totally ordered subsets of keys where the total order on the space of keys is inherited from the order “$ge$” for integers $mathbb{Z}$. Especially the set of keys is a totally ordered subset of integers.

Question: How are the two concept of ‘key’ related to each other? My first idea was that if we consider in light of for definition (as elements of power set of attributes), we can simply randomly enumerate all the keys (that is associate to each key an number; formally speaking to specify an injection $f:mathcal{K} to mathbb{Z}, K mapsto f(K)$ where $mathcal{K} subset Omega(A)$)

and then treat them as numbers when working with b-trees. Is this exactly the correct connection or is there another deeper one and my approach is wrong?

postgresql – how to maximize usage of number of databases on a postgres server while limited by connections

We’re building a system where ideally we’d give a database per client in a multi-tenant setup.

However, although a single Postgres install supports 4,294,950,911 databases on a server, the suggested max_connections is around 100 or so.

That means if we used every active connection up to some recommended amount (using 100 as an example), we can at one time only access 0.000002 % of the available databases.

In no way would I need to utilize 4B databases, but would be great to utilize more than 100, certainly.

In looking up connection info in pg_stat_activity, seems that each connection includes a database name, so I’m guessing you cannot create a connection pool on the same server across different database names. I was hoping there would be a solution that allowed a connection pool across databases on the same server, finding that’s not the case – please enlighten me if there is something in this realm of solutions.

So my options I believe at each end of the spectrum are:

  1. shard tenants across schemas, and use connection pooling on same database

  2. multiplex and/or context switch connections from the API server, using an LRU cache of pg pools that manage connections, closing the ones that the LRU cache disposes.

question

Assuming I wanted to build 10,000 databases per database server, I’d look to get a solution like #2 working, to access different databases. How expensive it is really to open/close a connection if it happens quite often?

And if it’s not feasible, is the number of databases allowed in postgres just a computer science theoretical number, that we’ll only ever use a fraction of a fraction of a percent of? And in reality, we’d at best use a number of databases that correlates closely with the number of connections?

I’m also guessing there may be a hybrid solution that allows us to use many databases but minimize the connections per server with a fan-out using logical replication. But that sounds like it could get more expensive.

databases – Storage of Large Amounts of Game Data (Player stats, matches)

I’m curious as to what the best way would be to store my data. Currently, I’ve setup a matchmaking system, and player stats. When a match ends, my server submits the player data to my ruby on rails api, saving the data to my PostgreSQL database, making at least 10~ queries at the end of a match. However, some people mentioned using MongoDB instead. Why would I want to use Mongo over postgres? Postgres works fine, and it’s what I’ve worked with at my current job to store millions of records. Does anyone have any opinions on this, or what would be statistically the best option?

attack prevention – Two persons-rule on MySQL databases for “manual fixes”

In order to “harden” our compliance, we wanted to enforce a two-persons rule on the MySQL production database for “manual fixes”. Such “manual fixes” frequently arise due to:

  • Bug in the application (we are a fast company :D)
  • Various customer requests that do not have an application feature implemented yet, such as GDPR update requests, special discounts, etc.

We wanted a process that does not require the two persons to be physically side-by-side. One person is on-call, rather junior and is responsible to translate customer service requests into SQL. They might need a GUI (such as MySQL Workbench) to navigate the complex data model and figure out the exact SQL script to produce. The SQL script should feature SELECTs showing the data before and after the change in a non-committed transaction (e.g., AUTOCOMMIT OFF no COMMIT at the end).

The second person is not on-call, rather senior, and fairly familiar with the application’s data model. They should be able to look at the SQL script the non-committed transaction output, and approve or reject via a mobile app during the evening.

We cannot be the first to have this or a similar requirements.

Does anyone know good documentation or tooling to implement such a process?

Here are some similar questions on the topic, but not quite as specific as the present one:

Backup and upgrade MongoDB databases

I have a web application using MongoDB (Version 2.6.12), hosted in a (Ubuntu 16.04) server of DigitalOcean.

I like to use Robo 3T to connect to the remote database and do simple queries.

Sometimes I backup the database with mongodump, and mostly rely on the weekly auto server snapshot provided by DigitalOcean.

Now, I need to undertake queries containing like $lookup, they told me that MongoDB Version 2.16.12 does not support that. So I need to seriously backup my database and use a more recent MongoDB. I still want to keep using Robo3T to do queries to production database (by preference) or backup database (if it is updated very often, e.g., every day).

I have several questions:

1) I would prefer to have a more regular auto-backup (e.g., every day) of the database. Which way is recommended? (Additionally, it seems that Atlas does NOT support Cloud Provider Snapshots for clusters served on DigitalOcean?)

2) If I buy a new server hosted on Azure and install MongoDB 4.2, and copy the whole database by mongodump to the new server, will it work?

2016 – Search Topology – too many databases and one corrupt

Our organization was having an issue with our Search Service App on our SharePoint 2016 On-Prem environment. Once we got it working properly, we noticed that there were 3 sets of topology component databases than our expected two. Then, one of our CrawlDBs became suspect. Any recommendations on how to move forward? It would make sense to me to transfer the current topology component databases to the orphan ones and then remove the corrupt, but I don’t know if this is best practice. Much appreciated.