Ruby Sorting [1] – When and Why to use sort_by()

When I read the rdoc on sort_by, I understood the general idea that sort_by is more efficient in some situations. The specifics on why were still over my head, so I wasn’t planning to get into specifics during my recent talk on sorting. Yet just a few hours before my talk Jim Weirich was still trying to cajole me into using big words like “Schwartzian Transformation” in my talk because, he teased, “using big words makes you sound important :)”

The good thing is that this gave me a chance to talk it out with him, and actually understand it for real. It was too late for me to add that into my talk a few hours before I was to give it, but I do want to talk about it here now that I understand.

sort_by() is good if the values you’re sorting on require some kind of complex calculation or operation to get their value.

Let’s say you have an aquarium, and you save the dates of when each fish is born in a database. Later, you want to sort the list of fish by age. But you must calculate the age based on the birth date. So the Fish class has an age method:

class Fish
  def age
    ( - birthday).to_i

So when you sort like this:

>> fishes = Fish.find(:all)

>> fishes.sort do |a, b|
>>   a.age <=> b.age
>> end

It will calculate age over and over as it sorts. And if you’ve studied sorting algorithms, you know that the items in the list are compared with other list items repeatedly until it can be determined where the items go in the ordered list. So using this way of sorting, the age will be calculated a lot!

When you use sort_by() instead:

>> fishes.sort_by do |a|
>>   a.age
>> end

It does 3 things:

1. It will first go through each item in fishes, calculate age, and put those values into a temporary array keyed by the value. Let’s say we have 3 fish, one 300 days old, one 365 days old, and one 225 days old. The temporary array looks like this

[[300, #<Fish:A>][365, #<Fish:B>][225, #<Fish:C>]

2. The complex calculation is now done, once for each fish. It sorts this temporary array by the first item in each sub array. Meaning, it sorts by the numbers 300, 365 and 225, without recalculating them.

[[225, #<Fish:C>],[300, #<Fish:A>][365, #<Fish:B>]]

3. Lastly, it goes back through the array, grabbing the 2nd array elements (the actual Fish objects) and putting them in order into a flattened 1-dimensional array

[#<Fish:C>, #<Fish:A>, #<Fish:B>]

So, that is how you end up with a sorted array without recalculating values more than you need to. And that is why sort_by() can be more efficient.

Posted in code. Tags: , . 1 Comment »

One Response to “Ruby Sorting [1] – When and Why to use sort_by()”

  1. Gayle Says:

    I came across something else in relation to sort_by, when sorting by multiple fields. Syntax is a little different. You use an array of fields by which you want to sort.

    >> fishes.sort_by do |a|
    >> [a.age, a.species, a.type]
    >> end

    It’s nice because there’s none of that awkward storing of the result (-1, 0, 1) and then remembering to only do your subsequent comparisons if the result is still 0

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: