To get a Map<String, List<String>>
, you just need to tell to the groupingBy
collector that you want to group the values by identity, so the function x -> x
.
Map<String, List<String>> occurrences =
streamOfWords.collect(groupingBy(str -> str));
However this a bit useless, as you see you have the same type of informations two times. You should look into a Map<String, Long>
, where's the value indicates the occurrences of the String in the Stream.
Map<String, Long> occurrences =
streamOfWords.collect(groupingBy(str -> str, counting()));
Basically instead of having a groupingBy
that return values as List
, you use the downstream collector counting()
to tell that you want to count the number of times this value appears.
Your sort requirement should imply that you should have a Map<Long, List<String>>
(what if different Strings appear the same number of times?), and as the default toMap
collector returns an HashMap
, it has no notions of ordering, but you could store the elements in a TreeMap
instead.
I've tried to summarize a bit what I've said in the comments.
You seems to have troubles with how str -> str
can tell whether "hello" or "world" are different.
First of all str -> str
is a function, that is, for an input x yields a value f(x). For example, f(x) = x + 2
is a function that for any value x
returns x + 2
.
Here we are using the identity function, that is f(x) = x
. When you collect the elements from the pipeline in the Map
, this function will be called before to obtain the key from the value. So in your example, you have 3 elements for which the identity function yields:
f("hello") = "hello"
f("world") = "world"
So far so good.
Now when collect()
is called, for every value in the stream you'll apply the function on it and evaluate the result (which will be the key in the Map
). If a key already exists, we take the currently mapped value and we merge in a List
the value we wanted to put (i.e the value from which you just applied the function on) with this previous mapped value. That's why you get a Map<String, List<String>>
at the end.
Let's take another example. Now the stream contains the values "hello", "world" and "hey" and the function that we want to apply to group the elements is str -> str.substring(0, 2)
, that is, the function that takes the first two characters of the String.
Similarly, we have:
f("hello") = "he"
f("world") = "wo"
f("hey") = "he"
Here you see that both "hello" and "hey" yields the same key when applying the function and hence they will be grouped in the same List
when collecting them, so that the final result is:
"he" -> ["hello", "hey"]
"wo" -> ["world"]
To have an analogy with mathematics, you could have take any non-bijective function, such as x2. For x = -2
and x = 2
we have that f(x) = 4
. So if we grouped integers by this function, -2 and 2 would have been in the same "bag".
Looking at the source code won't help you to understand what's going on at first. It's useful if you want to know how it's implemented under the hood. But try first to think of the concept with a higher level of abstraction and then maybe things will become clearer.
Hope it helps! :)
The Collection
object used to receive the data being collected does not need to be concurrent. You can give it a simple ArrayList
.
That is because the collection of values from a parallel stream is not actually collected into a single Collection
object. Each thread will collect their own data, and then all sub-results will be merged into a single final Collection
object.
This is all well-documented in the Collector
javadoc, and the Collector
is the parameter you're giving to the collect()
method:
<R,A> R collect(Collector<? super T,A,R> collector)
Best Answer
You can do that by using the
of
static factory method fromCollector
:As Holger in the comments suggested, following approach can be preferred over the above one:
It uses the overloaded
collect
method which acts identical to my suggested statement above.