Parallelism – Spring Batch: No transaction is executed when using SimpleTaskExecutor

I have a simple step that consists of a synchronized StaxEventItemReader, an ItemProcessor, which will take a while to finish, and a JpaItemWriter. If I run without SimpleTaskExecutor there is no problem. However, if you try to run the step with multiple threads, the following exception is thrown: javax.persistence.TransactionRequiredException: no transaction is in progress.

I already had a similar question, but the original author removed it.

Parallelism problem in the job-based system

To install

Suppose I have a database table called Events Here you will find events when a user becomes active or inactive

EventId|Timestamp     |User|Status  |
-------|--------------|----|--------|
1      |11/03/20 04:34|A   |ACTIVE  |
2      |11/03/20 05:11|A   |INACTIVE|
3      |11/03/20 05:15|A   |ACTIVE  |
4      |11/03/20 05:44|A   |ACTIVE  |
5      |11/03/20 06:15|A   |INACTIVE|

And another table called StatusTransition that keeps track of when the Status Changes for a specific user, this links two elements from the Events Table (basically a "Begin" Event and a corresponding "The End" Event). The goal is to track the length of time a user has been active or inactive.

Id|StartEventId|EndEventId|
--|------------|----------|
1 |1           |2         |
2 |2           |3         |
3 |3           |5         |

The process whenever the application receives an event is:

  1. Write the record in the Events Table. Call that Record A.
  2. Get the latest record (the one before writing record A, provided there is always at least one) in the Events Table for this user. Call this retrieved data record Record B.
  3. If Record B has another Status value of Record A then write an entry in the StatusTransition.

problem

The system that performs the above steps should be designed as a job-based distributed system. In this case, multiple worker applications are fed from a job queue.

Because there are multiple applications that may handle the two different events from the same user, it is a problem if the events are not written in the correct order.

Example:

  • Some external applications publish two events in the job queue
  • Worker A receives an event with a timestamp from 3/11/204, 4:54 AM
  • Worker B receives an event with a timestamp from 3/11/204, 4:52 a.m.
  • Worker A writes a StatusTransition Record, link this event and another past event while it should be linked to the received event that Worker B received (but has not yet written).

question

How do I design this so that it still delivers the correct behavior when employees process events out of order? (e.g. like separating the logic that is being written into StatusTransition as another job, they lock Events Table when writing to StatusTransition, Etc.)

Parallelism – dealing with Future # get and InterruptedException Sonar Java: S2142

There is a snippet in a legacy code that I inherited that looks similar to this.

List> callableStrings = Arrays.asList("one", "two", "three").stream()
        .map(s -> (Callable) () -> s)
        .collect(Collectors.toList());

List results =
        someExecutorService.invokeAll(callableStrings).stream()
                .map(
                        future -> {
                            try {
                                return future.get();
                            } catch (ExecutionException | InterruptedException e) {
                                LOG.info("Unexpected Exception encountered while fetching result", e);
                                return null;
                            }
                        })
                .filter(Objects::nonNull)
                .collect(Collectors.toList());

Instead of CallableI'm really dealing with heavier calculations. Sonar lifts the catch (ExecutionException | InterruptedException e) as a problem, suggesting that you should keep throwing InterruptedException. https://rules.sonarsource.com/java/tag/multi-threading/RSPEC-2142

The intent of this code seems to be to filter out any heavy calculations that have had a problem and return only those that have been successful. For context, this code is called as the result of an HTTP request that my application is processing. If the InterruptedException If the client is allowed to spread all the way to the top, it will receive an error response instead of displaying the list of successful results.

There are many different types of information to deal with InterruptedException of Future#get. Most that appear to be very clear and solid advice deal with a situation in which you have implemented something like this Runnable and call Future#get from this class.

This answer, https://stackoverflow.com/a/4268994, suggests that when I see InterruptedException The code above indicates that the thread processing the HTTP request has been interrupted and not the one on which Future is running. Is that correct?

My questions are:

  • Should I deal with the sonar warning and if so, what bad things will happen if the code stays as it is?
  • How should the code be revised to improve it?

Parallelism – unconditionally fair, weakly fair and strongly fair planning

I try to understand the difference between weakly fair and strongly fair planning.

For example, what planning guideline would ensure that a process that is delayed the first time it is instructed to wait can eventually continue?

weak fair: A condition becomes true and remains true at least until the conditional atomic action is carried out.

Strongly fair: If the condition is true indefinitely, a process will eventually consider it true and be able to continue.

I believe that fairly fair planning is necessary because the process of the expected explanation is delayed, but I am not sure.

If someone could provide a good explanation, other than temporal logic?

Parallelism – bakery algorithm: how does it work with secure registers with a single write access?

I am trying to understand the bakery algorithm and read that the bakery algorithm uses secure registers with a single write access. But I cannot understand the correctness of the bakery algorithm with secure registers. Because the safe register may return any value that the register may contain, reads overlap writes.

 choosing(i) = true; 
 number(i) = max(number(0), number(1), …, number (n – 1))+1; 
 choosing(i) = false; 
 for (j = 0; j < n; j++) { 
     while (choosing(j));
     while ((number(j)!= 0) && (number(j),j)<(number(i),i))); 
 } 
 critical section 
 number(i) = 0; 
 remainder section

It is possible to run (number(j),j)<(number(i),i) during thread j tried to update number(j), If the number(j) is a safe register and returns a value less than number(i) (The safe register can return any value that the register can hold when a read overlaps writes) then we will incorrectly wait for the while check.

Is my observation correct or can you please help me understand why a secure register does not matter?

dg.differential geometry – parallelism of 3-manifolds

The answer of Robert Bryant here (https://mathoverflow.net/a/149496/85500) states that any orientable 3-manifold is parallelizable.

So far I have had the impression that only closed (compact & boundless) orientable 3-manifolds were necessarily parallelizable.

I also heard that somewhere (unfortunately I do not remember the source) most Non-compact 3-manifolds can be parallelized, but even if that statement is correct in a sense, I have no idea whether this was "most" informal or whether that statement could be interpreted in a technical sense.

In both cases, my knowledge of differential / algebraic topology is nowhere near as good as my knowledge of local differential geometry, so I do not know any evidence and would probably find it difficult to understand it anyway.


The main motivation for this question is to understand how restrictive it is that space-times are parallelizable in general relativity. If we want a well-placed initial value problem, the spacetime must be of the form $ mathbb R times Sigma $, from where $ Sigma $ is a 3-manifold, so if it is parallelizable $ Sigma $ is parallelizable.

It is also known that space-space allows spin structures if and only if it is parallelizable. So if you want to include fermions in space-time, you want a parallelizable space-time.


Question: What is the correct statement for the parallelizability of orientable 3-manifolds, especially with regard to non-compact?

Parallelism – time complexity of Parallel.ForEach

Pseudocode and comments to explain:

// first select companies to process from db

foreach (company) {
    // select the employees
    foreach (employee of company) {
        // select items they can access
        foreach (item) {
           // do some calculation and save

I think this will be an O (n ^ 3) time complexity. Please correct me if I am wrong. Big O always gave me a headache. My question is, if you introduce parallel processing at the first level, what will it be? What if we introduce a second? parallel.foreach() Also for the second iteration?

SQL Server – Parallelism in a specific database is not active

I have a few databases with identical schemas, let's call them:

DB_A

DB_B

I have a query that takes 11 seconds to execute and returns 11,000 records in DB_A. However, if exactly the same query is executed in the same DB_B, it takes 40 seconds and returns 7,000 records.

The schema is identical, so the query is, but when I run it on DB_B, it goes to degree of parallelism 1, on DB_A it goes to 16.

I tried to set the threshold cost to 0 to enforce concurrency, but I got the same result.

Why is that? How can the same query that is executed in cloned databases behave so differently?

Any ideas are welcome.

I use SQL 2017 standard.

Parallelism – How is this simulation parallelized in Python?

I have a Dungeon and Dragons 5th Edition simulation game. Players, enemies, etc. are represented with Python code. I have a simulation class and the key part is here.

def run(self, n=10000):
"""
Run the Encounter n times, then print the results
"""
self.reset_stats()
for _ in range(n):
    self._encounter.run()
    self.collect_stats()
    self._encounter.reset()
self.calculate_aggregate_stats(n)
self.print_aggregate_stats()

Each simulation has its own Encounter Example. When you run the simulation, do it encounter multiple times. After every run of the encounter, you collect stats (A dictionary that you put in an instance variable of Simulation) and then put the encounter because that encounter and the objects it contains are changed in the course of a run. After you've done all the runs encounter, calculate the overall statistics (basically the average of all statistics in self._statswhich is why you have to surrender n).

I want to parallelize this code so that the n Times through the loop are distributed among multiple threads / processes. If necessary to avoid competition conditions, I can just make a copy of it encounter and change self._stats from an instance variable to a local variable (and I would change that calculate_aggregate_stats appropriate).

How do I do that in parallel? I have experience with MPI, OpenMP, and Pthreads in C and C ++, but I do not understand how parallel processing works in Python. I suspect I should do multiprocessing instead of multithreading because of GIL, but how do I do multiprocessing?

(If this question is better for another platform (eg StackOverflow), please let me know, I just assumed that it is a design question that should be answered here.)

What should we learn / use about Java parallelism in 2019?

I have been learning Java for more than a year. However, I only hear about concurrency and have never seen it in practice. Therefore, I have no idea which method is best / most commonly used. Examples:

  • Thread classes
  • Parallel stream
  • Future / completeable future

I thought I would be "cool" if I used Parallel Stream in an API call, but after seeing this: https://dzone.com/articles/think-twice-using-java-8 I'm mine not sure anymore what to do. Learn to use in Java for parallelism