domain driven design – When bounded contexts and “microservices” collide. A distributed systems dilemma in diagram form

You can’t please everyone. Some people want a lot of context and background on sites like this. Others do not. If you don’t want the background, skip the first three paragraphs.

I am a software architect with about 25 years experience, starting with Amiga Basic, then C, then C++, VB6, delphi, C#, SQL (server), and more C#. Over the last 15 years my focus has been on back ends – databases, data models, and systems integration (not UI development, and certainly not modern web development with giant javascript libraries)

I currently work in a reasonably large “enterprise”. By “enterprise” I mean “not a software development company”. By “reasonably large” I mean our software ecosystem includes such things as an ERP (vendor code), a CRM system (vendor), an HR system (vendor), a few other vendor systems, a data warehouse, a BI stack, and a rapidly growing number of internally developed applications.

The number of internally developed applications is growing rapidly because the business wants to be able to add new functionality that is specific to us, providing market advantages, or just being able to move faster than the large vendors providing our monolithic systems. I expect this story will be familiar to many, although it probably won’t be quite so familiar to people working for pure software development companies, where you don’t have to deal with the problem of integrating with large vendor systems. If you fall into the latter category, please keep this in mind.

Enough general background.

I have been deeply studying just about everything to do with microservices, event driven architectures, ESBs, message brokers, and other integration elements at a hectic rate over the last several months, as well as rereading Evan’s “DDD”, Vernon’s “Implementing DDD”, Hohpe and Woolfe’s “Enterprise Integraiton Patterns”, and other famous books. And I have noticed a problem.

There are several different “primary sources” or “patterns” of advice on this topic. They all make good points. And they all contradict each other somewhere. I believe I can make the similarities and differences obvious with some simple diagrams.

Of course, the big question is “what do you want to achieve?”. Well, let’s settle on things everyone seems to agree on with distributed systems: Given CAP, we are very interested in A and P, not so much C. Eventual consistency is accepted, but we don’t want one system to bring down all the rest, and we do want to partition the systems – for example into bounded contexts per Eric Evans’ DDD.

So, I want you to picture what at first seems to be a fairly “ideal” architecture according to many high profile sources, by which I mean it hits all the right notes. We have an order entry (point of sale) system. It’s a bounded context. We’re not “trying too hard to be microservicey” and creating nanoservices, and we’re also not a distributed monolith. It’s effectively agnostic about the existence of any other system in the enterprise. It’s as decoupled as it can possibly be. It has no hard temporal, logical, or availability dependencies on any other system. It looks something like this:

Decoupled orders bounded context application

One day the business comes along and says “I want order entry (or quoting) functionality in the CRM system”.

Shit.

Orders plz

Now I think I can describe everything else I need to describe purely with a set images which illustrate the various approaches I’ve seen advocated over countless books, blogs, articles, lectures, and videos, making the distinction between them clear. I have never seen the options laid out quite this way, and I think doing so demonstrates that as an industry we don’t seem to have any “logically sound” solution which meets all of our principles of software architecture – except maybe the last. And I would like to hear people’s opinions on what they see.

Personally, I think option 6 is the most – and perhaps only – sane choice. In a couple of places I mention that shared library/schema definitions are “probably not a real objection”. I say this because the business rules are the business rules. There’s only one set of business rules for the bounded context of orders. If the business rules change, everyone using those rules has to change. This isn’t a devops issue.

Option 1 - UI Integration

Option 2 - API Orchestration

Option 3 - Shared libraries and persistence

Option 4 - Shared libraries only

Option 5 - Shared data only

Option 6 - Shared immutable data only

reference request – What are some advanced background topics I’ll need for distributed systems and networks research?

I am a new graduate student in Computer Science who would like to be able to read and understand modern and new distributed systems research papers. My current background / courses and understanding is in the level of undergraduate and beginner graduate level courses in:

  • Networks (TCP/IP stack and applications)
  • Distributed Systems (Graduate level course with Time (logical/vector clocks), 2PC and 3PC, Multicast and membership, election, Consistency , Consensus and Quorums (Paxos), DHTs and Overlays and some modern applications like ZooKeeper etc)
  • Undergraduate Algorithms, Discrete Mathematics and Theory of Computation (basic DFA/NFA and intro to Turing Machines with no rigorous mathematics)

However, I find this background insufficient to read modern research in networks and distributed systems and in particular, I am not aware of modern protocols like QUIC and the formal methods mentioned in the papers which I believe include some sort of model checking and the likes. Also many of the topics I have mentioned above in distributed systems – I lack the background to verify and prove correctness of these protocols and even follow the proofs that they have given.

Any suggestions on a reading list that can prepare me to be in a position to understand modern research in this area would be very helpful.

Creating a top-hat distributed random number generator

enter image description here

I have this Fortran code which generates a flat distribution as it produces a single random number centered on 0.
The function GRNDM (Geant 4 random number generator) produces equally distributed random numbers between the values of 0 and 1. RDUMMY is the name of the vector filled with the random number and the argument “1” states the length of the vector: i.e. GRNDM here will produce a single random number between 0 and 1. The second line then produces random numbers in the interval (μ−σ2,μ+σ2).

I was wondering if there was a way of changing it to produce random numbers with a top hat distribution?

availability groups – SQL Server 2019 Always On using Distributed Network Name

We’re running a couple SQL servers in Azure that are set up with an Always On availability group and Windows Failover Clustering. The servers are Windows 2019 and we’re running SQL Server 2019. When we set up the cluster, it was set up to use a Distributed Network Name instead of a static cluster IP address. Thanks to this we shouldn’t need an internal load balancer according to these notes: https://github.com/MicrosoftDocs/azure-docs/issues/34648.

I’m struggling to understand exactly how this works though. Based on what I read, it seems like our connection strings will point to the DNS name of the cluster (let’s call it AgCluster). If I look in DNS, there is an A record for AgCluster pointing to sql1 and another pointing to sql2. When I use AgCluster in my connection string it seems to always connect me to the primary server, even if I have ApplicationIntent=ReadOnly set. When I query @@SERVERNAME I always get the same server.

So with the Distributed Network Name setup, what should I use in my connection strings to make sure read/write queries go to the primary and read only go to a secondary? Any guides on setting this up in general would be helpful. Thanks!

azure sql database – How can I log which queries are in a distributed transaction on an MS SQL server?

I am looking at migrating a database from a self-hosted cluster to Microsoft Azure SQL. I am aware that there are a few distributed transactions involved, which isn’t supported on Azure SQL.

Is there a way that I can log all distributed transactions and their queries, so that I can inspect the client application and remove the requirement for distributed transactions?

vmware workstation – Migrating from Virtual Switch to Distributed fails randomly

I’m migrating from Virtual Switch to Distributed Switch in vCenter.

Windows Server (DNS) – 192.168.10.2

vCenter – 192.168.10.5

I have 2 Clusters with following esxi hosts.

Compute Cluster

  • compute1.v.lab – vmk0 (192.168.30.10), vmk1 (192.168.30.11)

  • compute2.v.lab – vmk0 (192.168.30.20), vmk1 (192.168.30.21)

Infrastructure Cluster

  • infrastructure1.v.lab – vmk0 (192.168.20.10), vmk1 (192.168.20.11)

  • infrastructure2.v.lab – vmk0 (192.168.20.20), vmk1 (192.168.20.21)

  • infrastructure3.v.lab – vmk0 (192.168.20.30), vmk1 (192.168.20.31)

I’m following this https://www.youtube.com/watch?v=eDJ3OfXTkLs for migrating via GUI, however, for some reason some esxi hosts migrated successfully while others failed. Both set of esxi hosts have similar configuration, in my situation the compute cluster esxi hosts migrated successfully, while infrastructure cluster hosts failed.

I later tried adding 2 vmnics to the switch, then migrating vmnic0 and vmk0 at the same time, surprisingly it migrated successfully but only for some hosts (seems very random) because when I tried it for another host the same way it failed.

My question is if there is a standard way to migrate hosts from VSwitch to DVSwitch because I could not find any and I have been going through articles after articles on this matter without any success for 3 straight weeks and I’m starting to feel exhausted.

Both vmk0 and vmk1 can ping vCenter, and vCenter can ping both interfaces of the esxi host as well.

(root@infrastructure1:~) vmkping -I vmk0 192.168.10.5    
PING 192.168.10.5 (192.168.10.5): 56 data bytes    
64 bytes from 192.168.10.5: icmp_seq=0 ttl=63 time=0.992 ms    
64 bytes from 192.168.10.5: icmp_seq=1 ttl=63 time=0.724 ms    
64 bytes from 192.168.10.5: icmp_seq=2 ttl=63 time=0.720 ms

--- 192.168.10.5 ping statistics ---    
3 packets transmitted, 3 packets received, 0% packet loss    
round-trip min/avg/max = 0.720/0.812/0.992 ms   

(root@infrastructure1:~) vmkping -I vmk1 192.168.10.5    
PING 192.168.10.5 (192.168.10.5): 56 data bytes   
64 bytes from 192.168.10.5: icmp_seq=0 ttl=63 time=0.731 ms    
64 bytes from 192.168.10.5: icmp_seq=1 ttl=63 time=0.895 ms    
64 bytes from 192.168.10.5: icmp_seq=2 ttl=63 time=1.497 ms     

--- 192.168.10.5 ping statistics ---    
3 packets transmitted, 3 packets received, 0% packet loss    
round-trip min/avg/max = 0.731/1.041/1.497 ms     

PS C:UsersAdministrator> ssh root@192.168.10.5    
Command> ping 192.168.20.10    
PING 192.168.20.10 (192.168.20.10) 56(84) bytes of data.    
64 bytes from 192.168.20.10: icmp_seq=1 ttl=63 time=1.06 ms    
64 bytes from 192.168.20.10: icmp_seq=2 ttl=63 time=1.19 ms    
64 bytes from 192.168.20.10: icmp_seq=3 ttl=63 time=0.686 ms    
64 bytes from 192.168.20.10: icmp_seq=4 ttl=63 time=0.833 ms    
^C    
--- 192.168.20.10 ping statistics ---    
4 packets transmitted, 4 received, 0% packet loss, time 9ms    
rtt min/avg/max/mdev = 0.686/0.942/1.194/0.196 ms     

Command> ping 192.168.20.11    
PING 192.168.20.11 (192.168.20.11) 56(84) bytes of data.    
64 bytes from 192.168.20.11: icmp_seq=1 ttl=63 time=0.798 ms    
64 bytes from 192.168.20.11: icmp_seq=2 ttl=63 time=1.11 ms    
64 bytes from 192.168.20.11: icmp_seq=3 ttl=63 time=1.13 ms    
64 bytes from 192.168.20.11: icmp_seq=4 ttl=63 time=0.689 ms    
^C   
--- 192.168.20.11 ping statistics ---    
4 packets transmitted, 4 received, 0% packet loss, time 33ms    
rtt min/avg/max/mdev = 0.689/0.931/1.129/0.195 ms

An error occurred while communicating with the remote host

vmk adapter on infrastructure1 host

vmk adapter on infrastructure2 host

vmk adapter on infrastructure3 host

Ping response from interface (ping & vmkping)

relational database – Alternatives to distributed transactions in .NET?

Lately, I’ve been working on a project which basically is a huge rewrite in .NET Core F# + Event Sourcing + PostgreSQL of an old sub-ledger legacy app written in .NET 4.6 C# + SQL Server.

Since the whole rework thing cannot happen overnight and the legacy process needs to run till every single piece is tested replaced, we opted for distributed transactions via the TransactionScope class, it usually works, but the tradeoff is that you need clean up orphaned transactions in case that there is a crash (and basically that can happen whenever you’re updating a service), chances are not high, but that can still happened and already happened.

Long story short, we need to keep a certain consistency between what is written in the legacy system (ie. SQL Server) and what is written is the new system (ie. PostgreSQL) until everything is fine, it’s a critical system so can’t really mess up with it.

So I’m wondering is there really an alternative when it comes to write some data into both databases (albeit with a different format)?

So that we can have the guarantee that the transaction has worked out (or not) for both DBs (I put the emphasis on both, cause it should be either true or false). What we absolutely want to avoid is that there is a piece of data written into one and not other (system).

I’ve heard about the saga pattern but not too sure how this can be applied in that context knowing that we can’t change much the legacy system.

Algorithms – Count nodes with more than 3 neighbors in distributed computing that are initiated from all locations

I have an algorithm that visits all nodes in a diagram in distributed computing. The initiator is just a location. I want to change it so that the initiator can see the nodes with more than 3 neighbors at the end of the execution.
So the initiator is a site and the expected result is the list of nodes with more than 3 nodes

ios – Continuous delivery to public libraries distributed through package managers

We created an iOS / MacOS library that is used by several iOS and Mac apps from a very large company.

The library is distributed through Cocoapods and Carthage, the package manager for iOS and MacOS libraries.

We have set up pipelines that build on every commit. The test suite of unit tests, UI tests and integration tests is executed on every PR created.

However, we are not sure how to do the continuous deployment. We cannot publish on every PR merge as this would mean:

  1. Too many versions of the library on Cocoapods.
  2. If code changes are required for the upgrade, it must be distributed across different versions.
  3. Our library is not important enough for committed engineers to update regularly on the app page.

Please help me if you have encountered similar problems and which standard practices are used.