Java 8 Parallel Stream – Concurrent Grouping Guide

javajava-8java-stream

Suppose I have a class as

Class Person {
  String name;
  String uid;
  String phone;
}

I am trying to group by all the fields of the class. How do i use parallel streams in JAVA 8 to convert a

List<Person> into Map<String,Set<Person>>

where the key of the map is the value of each field in the class . JAVA 8 the following example groups by a single field, how can i do it for all fields of a class into a single Map?

ConcurrentMap<Person.Sex, List<Person>> byGender =
roster
    .parallelStream()
    .collect(
        Collectors.groupingByConcurrent(Person::getGender));

Best Answer

You can do that by using the of static factory method from Collector:

Map<String, Set<Person>> groupBy = persons.parallelStream()
    .collect(Collector.of(
        ConcurrentHashMap::new,
        ( map, person ) -> {
            map.computeIfAbsent(person.name, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.uid, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.phone, k -> new HashSet<>()).add(person);
        },
        ( a, b ) -> {
            b.forEach(( key, set ) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
            return a;
        }
    ));

As Holger in the comments suggested, following approach can be preferred over the above one:

Map<String, Set<Person>> groupBy = persons.parallelStream()
     .collect(HashMap::new, (m, p) -> { 
         m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); 
     }, (a, b) -> b.forEach((key, set) -> {
         a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
     });

It uses the overloaded collect method which acts identical to my suggested statement above.

Related Solutions

Java Streams GroupingBy – How to Group Same Elements in Java 8

To get a Map<String, List<String>>, you just need to tell to the groupingBy collector that you want to group the values by identity, so the function x -> x.

Map<String, List<String>> occurrences = 
     streamOfWords.collect(groupingBy(str -> str));

However this a bit useless, as you see you have the same type of informations two times. You should look into a Map<String, Long>, where's the value indicates the occurrences of the String in the Stream.

Map<String, Long> occurrences = 
     streamOfWords.collect(groupingBy(str -> str, counting()));

Basically instead of having a groupingBy that return values as List, you use the downstream collector counting() to tell that you want to count the number of times this value appears.

Your sort requirement should imply that you should have a Map<Long, List<String>> (what if different Strings appear the same number of times?), and as the default toMap collector returns an HashMap, it has no notions of ordering, but you could store the elements in a TreeMap instead.

I've tried to summarize a bit what I've said in the comments.

You seems to have troubles with how str -> str can tell whether "hello" or "world" are different.

First of all str -> str is a function, that is, for an input x yields a value f(x). For example, f(x) = x + 2 is a function that for any value x returns x + 2.

Here we are using the identity function, that is f(x) = x. When you collect the elements from the pipeline in the Map, this function will be called before to obtain the key from the value. So in your example, you have 3 elements for which the identity function yields:

f("hello") = "hello"
f("world") = "world"

So far so good.

Now when collect() is called, for every value in the stream you'll apply the function on it and evaluate the result (which will be the key in the Map). If a key already exists, we take the currently mapped value and we merge in a List the value we wanted to put (i.e the value from which you just applied the function on) with this previous mapped value. That's why you get a Map<String, List<String>> at the end.

Let's take another example. Now the stream contains the values "hello", "world" and "hey" and the function that we want to apply to group the elements is str -> str.substring(0, 2), that is, the function that takes the first two characters of the String.

Similarly, we have:

f("hello") = "he"
f("world") = "wo"
f("hey") = "he"

Here you see that both "hello" and "hey" yields the same key when applying the function and hence they will be grouped in the same List when collecting them, so that the final result is:

"he" -> ["hello", "hey"]
"wo" -> ["world"]

To have an analogy with mathematics, you could have take any non-bijective function, such as x². For x = -2 and x = 2 we have that f(x) = 4. So if we grouped integers by this function, -2 and 2 would have been in the same "bag".

Looking at the source code won't help you to understand what's going on at first. It's useful if you want to know how it's implemented under the hood. But try first to think of the concept with a higher level of abstraction and then maybe things will become clearer.

Hope it helps! :)

Java 8 Streams – Collecting from Parallel Stream in Java 8

The Collection object used to receive the data being collected does not need to be concurrent. You can give it a simple ArrayList.

That is because the collection of values from a parallel stream is not actually collected into a single Collection object. Each thread will collect their own data, and then all sub-results will be merged into a single final Collection object.

This is all well-documented in the Collector javadoc, and the Collector is the parameter you're giving to the collect() method:

<R,A> R collect(Collector<? super T,A,R> collector)

Best Answer

Related Solutions

Java Streams GroupingBy – How to Group Same Elements in Java 8

Java 8 Streams – Collecting from Parallel Stream in Java 8

Related Question