- On Sorting In Ruby
- 0 – Sorting A Hash
- 1 – When and Why to Use sort_by()
- 2 – Common Mistakes When Sorting With Blocks
When I read the rdoc on sort_by, I understood the general idea that sort_by is more efficient in some situations. The specifics on why were still over my head, so I wasn’t planning to get into specifics during my recent talk on sorting. Yet just a few hours before my talk Jim Weirich was still trying to cajole me into using big words like “Schwartzian Transformation” in my talk because, he teased, “using big words makes you sound important :)”
The good thing is that this gave me a chance to talk it out with him, and actually understand it for real. It was too late for me to add that into my talk a few hours before I was to give it, but I do want to talk about it here now that I understand.
sort_by() is good if the values you’re sorting on require some kind of complex calculation or operation to get their value.
Let’s say you have an aquarium, and you save the dates of when each fish is born in a database. Later, you want to sort the list of fish by age. But you must calculate the age based on the birth date. So the Fish class has an age method:
class Fish ... def age (Date.today - birthday).to_i end end
So when you sort like this:
>> fishes = Fish.find(:all) >> fishes.sort do |a, b| >> a.age <=> b.age >> end
It will calculate age over and over as it sorts. And if you’ve studied sorting algorithms, you know that the items in the list are compared with other list items repeatedly until it can be determined where the items go in the ordered list. So using this way of sorting, the age will be calculated a lot!
When you use sort_by() instead:
>> fishes.sort_by do |a| >> a.age >> end
It does 3 things:
1. It will first go through each item in fishes, calculate age, and put those values into a temporary array keyed by the value. Let’s say we have 3 fish, one 300 days old, one 365 days old, and one 225 days old. The temporary array looks like this
[[300, #<Fish:A>][365, #<Fish:B>][225, #<Fish:C>]
2. The complex calculation is now done, once for each fish. It sorts this temporary array by the first item in each sub array. Meaning, it sorts by the numbers 300, 365 and 225, without recalculating them.
[[225, #<Fish:C>],[300, #<Fish:A>][365, #<Fish:B>]]
3. Lastly, it goes back through the array, grabbing the 2nd array elements (the actual Fish objects) and putting them in order into a flattened 1-dimensional array
[#<Fish:C>, #<Fish:A>, #<Fish:B>]
So, that is how you end up with a sorted array without recalculating values more than you need to. And that is why sort_by() can be more efficient.