For instance, we could go through an entire writer's work and see what words got used most often--or not at all. (Fun trivia for nerds: Lovecraft uses the word "squamous" only once, which is funny because parodies of Lovecraft love that word.)
Which is a long intro to explain why I like writing word-frequency counters in new programming languages. So, to count words in Elixir, you could use this:
You'll also note two things: (1) the program is written with two helper functions, in classic modular fashion (and these functions are defined with defp, which makes them private functions, only call-able by functions within the module); (2) Elixir uses pipes (|>) as a way of handling and handing off data. And I love pipes.
- defmodule Words do
- @doc """
- Count the number of words in the sentence.
- Words are compared case-insensitively.
- """
- @spec count(String.t) :: map()
- def count(sentence) do
- sentence
- |> prep
- |> count_words
- end
- defp prep(sentence) do
- sentence
- |> String.replace(~r/([^\w-]|_)+/u, " ")
- |> String.downcase
- |> String.split
- end
- defp count_words(words) do
- Enum.reduce(words, Map.new,
- fn(word, map) ->
- Map.update(map, word, 1, &(&1 + 1))
- end)
- end
- end
Commentary: @doc and """ are for heredocs. Now if I type "h count" into the terminal, I'll get back that info.
Check out prep, a pretty straightforward way to prep a sentence for counting (with line numbers to help follow):
(14) it takes the sentence;
(15) runs it through a regex replacer to get rid of anything that isn't a word;
(16) then runs that new string of just letters through the downcase function;
(17) then runs that newly downcased string through a split function, which works like all split functions seem to work, taking a string and returning a list of strings.
Now, if I wasn't piping, I would have to include the parameter, like
String.split(sentence)
But when piping, the first parameter is assumed to be whatever is piped in. Now, without piping, I could write this sequence of functions pretty easily, and it would look like this:
String.split (String.downcase (String.replace(sentence, ~r/([^\w-]|_)+/u, " ")))
Which I can read, but which is a little less intuitive, because you have to read it backwards, with every left-side function taking as parameter the output of the right-side function. Yuck.
Then we get to the heart of the word counter program, the count_words function. This function is doing something interesting--and wasn't my first version of this.
My first version:
- defp count_words([], acc), do: acc
- defp count_words([head | tail], acc) do
- quantity = Map.get(acc, head, 0)
- acc = Map.put(acc, head, quantity + 1)
- count_words(tail, acc)
- end
So let's look again at the second (or third) version:
Second version:
- defp count_words(words) do
- Enum.reduce(words, Map.new,
- fn(word, map) ->
- Map.update(map, word, 1, &(&1 + 1))
- end)
- end
So, the heart of this is still a Map function; here, we call Map.update with the map to be updated (map); the key to be updated (word); the initial value to be used if the key is not found (1), and a function that tells how to transform the value if the key is found ("&(&1 + 1)").
We could rewrite that to make it clearer for new Elixir users, like:
Map.update(map, word, 1, fn(x) -> x + 1 end)
But the real magic is Enum.reduce, which does all the work of going through a list until it's empty and resolving all the data in that list into a single structure or value. For instance, a classic use of Enum.reduce would be to sum all the numbers of a list:
Enum.reduce([1, 2, 3], 0, fn(x, acc) -> (x + acc) end)
So we have the list to be reduced ([1, 2, 3]); the initial value to use as the accumulator (0); and a function that tells reduce how to resolve all the elements of the list into a single value ("fn(x, acc) -> (x + acc) end").
(P.S. That's the long way to write the function, which I did to make the action clear; i.e., we take two parameters, the element of the list (x) and the accumulator (which starts at 0), and we add each of them. The really short way to say that would be &(&1 + &2). Awesome.)
So the Enum.reduce in this function takes the list of words; accumulates it in an empty map; and the function that it uses to resolve the list into the map is ... the Map.update function that adds one to the value of the word each time it finds that word.
This comment has been removed by the author.
ReplyDeleteSweet. Can you please explain how the regex works?
ReplyDelete