I have a stream of words and I would like to sort them according to the occurrence of same elements (=words).
e.g.: {hello, world, hello}
to
Map<String, List<String>>
hello, {hello, hello}
world, {world}
What i have so far:
Map<Object, List<String>> list = streamofWords.collect(Collectors.groupingBy(???));
Problem 1: The stream seems to lose the information that he is processing Strings, therefore the compiler forces me to change the type to Object, List
Problem 2: I don't know what to put inside the parentesis to group it by the same occurrence. I know that I am able to process single elements within th lambda-expression but I have no idea how to reach "outside" each element to check for equality.
Thank You
Best Answer
To get a
Map<String, List<String>>
, you just need to tell to thegroupingBy
collector that you want to group the values by identity, so the functionx -> x
.However this a bit useless, as you see you have the same type of informations two times. You should look into a
Map<String, Long>
, where's the value indicates the occurrences of the String in the Stream.Basically instead of having a
groupingBy
that return values asList
, you use the downstream collectorcounting()
to tell that you want to count the number of times this value appears.Your sort requirement should imply that you should have a
Map<Long, List<String>>
(what if different Strings appear the same number of times?), and as the defaulttoMap
collector returns anHashMap
, it has no notions of ordering, but you could store the elements in aTreeMap
instead.I've tried to summarize a bit what I've said in the comments.
You seems to have troubles with how
str -> str
can tell whether "hello" or "world" are different.First of all
str -> str
is a function, that is, for an input x yields a value f(x). For example,f(x) = x + 2
is a function that for any valuex
returnsx + 2
.Here we are using the identity function, that is
f(x) = x
. When you collect the elements from the pipeline in theMap
, this function will be called before to obtain the key from the value. So in your example, you have 3 elements for which the identity function yields:So far so good.
Now when
collect()
is called, for every value in the stream you'll apply the function on it and evaluate the result (which will be the key in theMap
). If a key already exists, we take the currently mapped value and we merge in aList
the value we wanted to put (i.e the value from which you just applied the function on) with this previous mapped value. That's why you get aMap<String, List<String>>
at the end.Let's take another example. Now the stream contains the values "hello", "world" and "hey" and the function that we want to apply to group the elements is
str -> str.substring(0, 2)
, that is, the function that takes the first two characters of the String.Similarly, we have:
Here you see that both "hello" and "hey" yields the same key when applying the function and hence they will be grouped in the same
List
when collecting them, so that the final result is:To have an analogy with mathematics, you could have take any non-bijective function, such as x2. For
x = -2
andx = 2
we have thatf(x) = 4
. So if we grouped integers by this function, -2 and 2 would have been in the same "bag".Looking at the source code won't help you to understand what's going on at first. It's useful if you want to know how it's implemented under the hood. But try first to think of the concept with a higher level of abstraction and then maybe things will become clearer.
Hope it helps! :)