architecture – Containers or Serverless? What Affects Your Architectural Decision?

I am curious how folks in the community reason about “containers” vs. “serverless” and different takes.

I recently came across the Dapr project: https://dapr.io/ which I think is a conceptually very interesting approach to system architecture that allows building a portable application that can be plugged into AWS, Azure, GCP, etc.

There is a reference eShop project on github: https://github.com/dotnet-architecture/eShopOnDapr

I have worked with Azure Functions, AWS Lambda, and AWS AppSync but not very heavily with containerized workloads.

Looking at the eShop reference project, I cannot get over how complex the application is compared to what it would take to build the same functionality with the same general application level characteristics (decoupled, event driven, service oriented) using Lambda, AppSync, or Functions. And this got me wondering where teams find this level of complexity worthwhile.

As I understand it, there are a handful of benefits gained from the containerized approach:

  1. Long running application logic. Most serverless frameworks have short timeout quotas so they are not suited for long running processes.
  2. Memory or CPU intensive workloads. With serverless, you have less control over resource intensive workloads. So if you have a workload that requires more control over memory or CPU, containers are a better alternative
  3. Portability. Containerized applications are more portable between different platforms. On-premise, AWS, Azure, or Google — you can deploy your containerized workloads easily between these platforms. It is possible with serverless as well IMO by moving most of your business logic into application libraries (e.g. a layer in Lambda) and using the serverless interface primarily as a bridge to I/O.
  4. (3a) Flexibility. Extending portability, a solution like Dapr would allow the underlying platform components to be replaced. Developing natively for Azure Functions requires using Functions specific bindings to event sources such as Service Bus or Event Grid whereas Dapr would allow the application to plug AWS SNS as an event source on AWS and Service Bus on Azure. In some mission critical scenarios, this may even be THE deciding factor as it allows a multi-cloud deployment (though you have a different challenge of synchronizing data between the clouds) that could theoretically stay up even if one of the major providers goes down.
  5. Legacy apps. Some legacy apps may be easier to “lift-and-shift” into containers whereas moving into Functions or Lambda is a re-write. For example, applications which are written as daemons. Some legacy apps would not fit into a serverless paradigm at all and would require mapping to another platform specific service.
  6. Persistent workloads. Serverless benefits from burst type workloads since there is no cost when the system is not operating. For example, enterprise line of business systems which tend to have predictable spikes in the start of a workday and early afternoon. At night, when there are no users, there is no cost component for the compute. On the other hand, some types of systems which see continuous throughput may be more cost effective with containers.
  7. Ops can be simplified in some cases. By deploying the application components and dependencies (e.g. Redis) as containers, it can be easier to to manage than building complex IAC configuration using ARM templates, CF templates, etc.
  8. Programming language. Functions will have limited support for programming languages so if you are using a less commonly deployed language, you would benefit from the container runtime.

I would love to hear from folks who are working with containerized workloads and who have worked with serverless to get some different angles and other perspectives I’m missing.

It seems to me that in the majority of cases, the application code and deployment model is considerably more concise and easier to reason about with serverless than with containerized workloads for most web application APIs if one is willing to accept the tradeoffs (e.g. lack of platform portability).

java – Architectural design for sending large amount of analytics data from production servers to s3 without impacting request performance

Lets say we have a server getting upto 1000 requests per second, serving them at p99 of 20ms (strong business case for not increasing this latency). The server gc parameters have been carefully tuned for this performance and current latency is already bottlenecked by gc. We want to log structured data related to requests and responses, ideally 100% of it without dropping anything, to S3 in for example gzipped jsonlines format (analytics will be done on this data, each file should be ideally 100MB-500MB in size). Analytics does not have to be realtime. A few hours of delay, for example, is fine. Also the IOUtilization already approaches 100% so writing this data to disk at any time is likely not an option. All code is in Java.

Solution 1:
Use the threads getting and serving requests as producers and have them enqueue each request/response into blocking buffer(s) with error/edge case handling of buffer being full, exception, etc. This way the producer threads dont get blocked no matter what. Then have a consumer threadpool consume from these buffer(s) in a batched way, compress and send to s3. The upside is that it is a simple(ish) solution. The main downside is that all this is done in the same jvm and might increase allocation rate and degrade performance for main requests? I suspect the main source of new object creation might be during serialization to string (is this true?). Putting objects into a fixed queue size or draining to (using drainTo method on BlockingQueue) to an existing collection should not allocate anything new I think.

Solution 2:
Setup a separate service running on the same host (so separate jvm with its own tuned gc if necessary) that exposes endpoints like locaholhost:8080/request for example. Producers send data to these endpoints and all consumer logic lies in this service (mostly same as before). Downside is that this might be more complex. Also sending data, even to localhost, might block the producer thread (whose main job is to serve requests) and decrease throughput per host?

For Solution 1 or 2 are there any Java compatible libraries (producer/consumer libraries or high performance TCP based messaging libraries) that might be appropriate to use instead of rolling my own?

I know these questions can be answered by benchmarking and making a poc, but looking for some direction in case someone has suggestions or maybe a third way I haven’t though of.

architectural patterns – Why not just use stream processing for everything?

Apache Kafka has become a standard for event-driven architectures, specifically stream processing. This has always been contrasted to batch processing, whether that’s traditional ETL or something like ML training.

Many architectures show hybrid implementations that support both batch and streaming. But wouldn’t it make sense to just adopt a technology like Kafka for all data ingestion needs? Everything could be streamed in, but that doesn’t mean everything has to be processed in real time. Kafka can hold onto data for as long as needed, and is really just a type of distributed database; not just a message queue.

Why not simply use Kafka as a “central nervous system” to the entire architecture, with all data sources publishing to Kafka, and all consuming applications subscribing to Kafka topics? Any batch processing can just be a separate service that grabs data from Kafka when needed.

Does anyone do this, or is streaming always added on as a second part of a hybrid architecture?

design patterns – Is there a website to search for architectural approaches to solving common problems?

Suppose I want to build a graphics editor app.

Is there a place I can go (google isn’t helping) to find out what different people have done, how they’ve been able to extend on their design, etc?

Sort of like a search engine where you search for some feature/system capability and the results are solutions and different approaches one can take?

And maybe those are broken down by technology used:

  • web, dekstop, mobile, etc

architectural patterns – Is there alternative to applying events synchronously in command handler in CQRS?

I have workflow where I have complex command handlers encapsulated inside aggregate. These handlers emit some events, and then further logic based on result of these events can emit more events. Sample pseudocode of architecture:

public class ExampleAggregate : Aggregate
{
    public void Execute(CommandA c)
    {
        ChangeState(c.params); // applies events

        // this relies on synchronous application of events inside ChangeState,
        // real code checks more state, in multiple entities inside aggregate
        if (this.state == Y)
        {
            ChangeOtherState(c.otherParams); // applies events
        }
    }

    private State state;

    private void ChangeState(Params params) // can be called from multiple command handlers
    {
        if (params == ... && this.state == X)
        {
            Apply(new StateChanged { state = Y });
        }
        if (params == ... && this.state == Y)
        {
            Apply(new StateChanged { state = Z });
        }
        (...)
    }

    private void ChangeOtherState(...) => /* do some logic and apply some events*/;

    private void On(StateChanged e) => this.state = e.state;
}

(...)

// later, after command is processed, events are pulled and passed further
var aggregate = database.LoadAggregate(command.aggregateId);
aggregate.Execute(command); // synchronous
var events = aggregate.PullEvents();
// in-memory aggregate can be discarded now
database.Publish(events);


This example is simple, but I have more complex examples when applying events and logic is intertwined in loops, for example changing multiple entities inside aggregate and then logic (based on these multiple entities changed and multiple events applied) can emit more events.

I wonder if this is valid and correct approach? Also if there is any alternative or smart pattern which can be used?

java – Architectural problem for class combination (cartesian product) for save format strategy

Hello to everyone and thank you for any suggestion.

I have a family of subclasses of Track (Track is abstract).
Every concrete Track has a different number and types of fields (attributes) that extend the basic abstract track-fields.

Then I have a family of file-types that implement some methods (saveRow() etc..).
Every file is presumed to have different type of row-formatting (imagine csv tabs, headers etc..)

ex:

  • SimpleTrack: double lat, double lon, Calendar dateTime.
  • DetectionTrack: (as SimpleTrack) + boolean detection.
  • ..
  • CsvFile
  • TxtFile

When I create a new (X)Track and a (Y) file, they are independent by nature, but.. the row is a cartesian product of track & row types.

EDIT
(to be more clear): How can I have many concrete-tracks in one hand and many FileTypes in the other hand, and create well-formatted-rows (differents for every file) by tracks which have different data (columns, headers..)?
es:

  • XtrackRow(double a, double b, Calendar date) -> to -> CVSfile (tab delimited with headers)
  • XtrackRow(double a, double b, Calendar date) -> to -> TXTfile (formatted columns and title)
  • YtrackRow(double a, string, b, int c, double e) -> to CSV ..
  • YtrackRow …. -> to -> docx file (with another kind of table or tabulation)
    ..

I see two kinds of solutions:

  1. Tracks send (right)-formatted row to file: every track has to know which kind of format apply (to rows) to give them to save to any specific type of file.
  2. Tracks send raw-data to any kind of file which is responsible to format them: in this case, the file-class must know wich kind of data has to format (every track has diffrent contents, columns, headers..). Moreover, every track-class has to send different number and types of parameter..

The second solution seems to be more fitted to the Single-Responsibility principle.. but I have no Idea how to implement it.

I tried to use Bridge Pattern to solve this problem (using first solution):

abstract class Track{
  ...
  FileInterface file;

  Track(FileInterface fileType){
    this.file = fileType;
  }

  abstract String formatConcreteTrackRow();
  
  void sendRow(){
    String rowToSave = formatConcreteTrackRow();
    file.saveRow(rowToSave);
  }
}

By this way the problem is not already solved, because every concrete-track has to implement a set of methods which returs right formatted rowString: one for every file-type.
If I use a Strategy Pattern:

class SimpleTrack extends Track{
  ...
  RowFormatStrategy rowStrategy;
  
  @override
  String formatConcreteTrackRow(){
    return this.rowStrategy.getRowString("args")
}

but in this case.. every concrete-track require a different StrategyInterface, because every concrete-track has different number and types of arguments to elaborate..
If I do not use Strategy Pattern and I define a set of methods (formatCsvRow(args),formatTxtRow(args)..) I need to include a switch(fileType) loop to choose which method to use.. breaking SOLID principles.. 🙁

Moreover..
how to impose, for every new concrete-track to have right row-format methods for every existent file-template-row?
and.. How to impose, at the same time, for every new file-class to impose new templates and relative methods in every existent concrete-track?

To be honest, it’s also quite reductive impose formatConcreteTrackRow to be a String, but it’s over and over the main problem.

I’m not interested to maintain this kind of class structure, this is only the best solution I found trying to follow SOLID principles. If you can show me a better solution, my intent is to study and understand SOLID procedures to solve these kind of purposes.

(I looked around for similar questions, but I’m not even able to define the specific problem itself..)
Thank you very much.

Architectural decisions of page builders: why does everything need to happen on the post page?

I understand the question relates to page builders, which are 3rd party plugins, but the reason for why they all make this specific architecture choice is weird and has deep implications.

I’ve been going through the guts of most popular page builders and one thing they have in common is that all they do (the page editing, values, etc.) happens on the post page itself. They all have some mechanisms to unhook scripts from the “edit post page” page, such as:

add_filter( 'show_admin_bar', '__return_false' );

Which disables the admin bar. Anyways, the logic is clear-cut:

  1. Remove all basic hooks provided by WordPress so that you can get a “blank page”, but still inheriting a lot of data.
  2. Hook your stuff to make the “edit post” page something else entirely.

However, I fail to see a reason why all of them share the same fate. Why not just create a custom page such as builder.php?id=(postId) and once you’re done, that content is inserted into the body (more precisely the_content) of the post matching the postId.

architectural patterns – Too many conversion between layers

I’m developing a back-end application and I ran into the problem of too many conversions between layers. For a single type it is like this:

enter image description here

So every time I update the model, I need to update six classes and several converters.

How to tackle this complexity or maybe I shouldn’t?

EDIT: On the second thought, after some more development, when an amount of different logic code will grow and models will become more stable this problem might largely go away.

architectural patterns – Clean Architecture – Controllers and Presenters

I am having a hard time trying to wrap my head around the relationship between Controllers and Presenters in the Uncle Bob’s Clean Architecture.

In most of his videos, he talks too little about controllers, and a fair amount about presenters. The knowledge available online isn’t helping me much either, because there are too many “my own” versions of clean architecture, which may or may not be correct.

What I am failing to understand is, most of the images, don’t have a connection between the controller and the presenter, whether it is a “use arrow” or “inherits arrow”, it’s not there. There is however, a single example where it’s connected:

1

This is the only image I can assume that there is an inheritance relation between controllers and presenters. And it makes sense to me. However, the User sits on the outer layer, and interacts with the Views layers only. So, in this sense, sending a form that would reach the controller through the presenters (due to inheritance/implementations)… is this assumption right? I couldn’t find any confirmation of that, sorry if I am being a potato.

Images like this one below made me think that there is no relation between them, and are instead made through the interactors, which I am miserably failing to understand if that’s the case. Because how would the controllers even be triggered if that’s it? Since the interactors have no understanding of what is outside of the boundaries. AND the users can only interact with the views.

2

Thanks in advance, and again, sorry if it’s a dupe, I’ve read a lot and still am failing to wrap my head about this. So, my questions:

  1. Is my first assumption right? Presenters implements/extends controllers?
  2. If not, then, can someone explain me the data-flow with some examples?

Interesting examples I would like to understand is:

  • User sends form data. Simple as that. Starting by clicking a html button (which sits in the view), passing through all the architectures, and going back to the view with a “Success/Error” msg popped at some alert dialog.
  • User gets notified of a change. Instead of implementing a periodic “get” in the presenters, how would a data layer notify the available views (web/smartwatch/app/desktop) when a change happened there. Think something dangerous like, a sudden peak in the sensor data of a bridge pillar.

Thanks.