Java – How to Use parallelStream with Collect Method in Java 11

javajava-11java-stream

I'm trying to understand why there is a different when I change from y.addAll(x) to x.addAll(y) in code snippet below:

List<Integer> result = List.of(1, 2)
    .parallelStream() 
    .collect(
        ArrayList::new,
        (x, y) -> x.add(y),
        (x, y) -> y.addAll(x)
    );
System.out.println(result);

I know, when I use parallelStream(), there is more than one thread run at a time.

collect() has three parameters; the first two parameters I understand. With the third parameter, I know x, y are substreams and they are ArrayLists, but I don't understand why the results are different in each case. I expected them to be the same.

  • (x, y) -> y.addAll(x) // output: [1]

  • (x, y) -> x.addAll(y) // output: [1, 2]

Best Answer

Why one is correct and the other isn't

From the Javadocs of Stream#collect (specifically the last parameter, emphasis mine):

combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.

Similarly, a.addAll(b) adds all elements from b to a but not the other way round. It takes information from the parameter and modifies the receiver.

So, the contract of that method specifies that you have to merge the second argument of the lambda into the first.

If you do (x, y) -> x.addAll(y), it will add all elements of y into x adhering to the contract. However, with (x, y) -> y.addAll(x) you are adding it to the second element resulting in the the elements of y not being added to x which are then missing in the result.

What happens

This is done that way because parallel streams split processing into chunks where different threads process different chunks. After the processing, it needs to merge the elements together which is done using the combiner (the last lambda expression which is the one you talked about). This combiner needs to be able to combine two elements together and the first argument is then used for further processing while the second is discarded.

Let's say we have the numbers 1 and 2 as in your example and assume one thread processes a chunk containing 1 and the other thread processes a chunk containing 2. When collecting, each thread starts by creating a new ArrayList following the ArrayList::new in your code. The threads then add the elements of their corresponding chunks to the list resulting in two lists with one element each (1 for the first thread and 2 for the other). When both threads are finished, the combiner is called to merge/combine the results. With x.addAll(y), it adds the second list to the first which is then returned yielding the correct result. However, with y.addAll(x), it adds the elements of the first list to the second list but Java assumes you want the first list (as that's what you are supposed to modify) so collect returns the first list which doesn't contain the elements processed by the second thread.