Ruby Sorting [2] – Common Mistakes When Sorting With Blocks


This sorting technique is one I’ve had a chance to use at work more lately. But what keeps tripping me up is when you use the block to sorting primarily by one field, with a secondary sort on another field. Let’s say Fish has species and type, and we have these fish in our database:

species type
Platy Sunset
Platy Calico
Molly Dalmation
Platy Rainbow
Guppy Fancy Tail
Platy Mickey Mouse

When I sort first by species, then by type, I keep accidentally doing the following, which gives the wrong sorted results:

>> fishes = Fish.find(:all) 

>> fishes.sort do |a,b|
?>   a.species <=> b.species
>>   a.type <=> b.type
>> end

Doing that, I end up with a list like:

species type
Platy Calico
Molly Dalmation
Guppy Fancy Tail
Platy Mickey Mouse
Platy Rainbow
Platy Sunset

It ignores my first sort on species, and ends up sorting only by type!

Why? What’s wrong with that?  Well, the <=> comparator function (also known informally as the “spaceship operator”) returns either -1, 0 or 1, depending on whether the first value is less than, equal to, or greater than the other.  The block will return the last statement evaluated.  So what happens is we compare species, then we then compare type, and it is always the result of the type comparison, -1, 0 or 1, is returned from the block.  The problem is, if species is not equal, then we want to stop there and return -1 or 1 accordingly and not evaluate type at all.

A simple way to do this is to add “if result==0″ to the end of the type comparison, and only evaluate type if species was equal.

>> fishes = Fish.find(:all) 

>> fishes.sort do |a,b|
?>   result = a.species <=> b.species
>>   result = a.type <=> b.type if result == 0 
>>   result
>> end

This way, it will perform the first search by species, then only continue to perform the secondary search if the result of the first search was zero, that is they were equal values. And so I end up with a list like:

species type
Guppy Fancy Tail
Molly Dalmation
Platy Calico
Platy Mickey Mouse
Platy Rainbow
Platy Sunset
Posted in code. Tags: , . 2 Comments »

Ruby Sorting [1] – When and Why to use sort_by()


When I read the rdoc on sort_by, I understood the general idea that sort_by is more efficient in some situations. The specifics on why were still over my head, so I wasn’t planning to get into specifics during my recent talk on sorting. Yet just a few hours before my talk Jim Weirich was still trying to cajole me into using big words like “Schwartzian Transformation” in my talk because, he teased, “using big words makes you sound important :)

The good thing is that this gave me a chance to talk it out with him, and actually understand it for real. It was too late for me to add that into my talk a few hours before I was to give it, but I do want to talk about it here now that I understand.

sort_by() is good if the values you’re sorting on require some kind of complex calculation or operation to get their value.

Let’s say you have an aquarium, and you save the dates of when each fish is born in a database. Later, you want to sort the list of fish by age. But you must calculate the age based on the birth date. So the Fish class has an age method:

class Fish
...
  def age
    (Date.today - birthday).to_i
  end
end

So when you sort like this:

>> fishes = Fish.find(:all)

>> fishes.sort do |a, b|
>>   a.age <=> b.age
>> end

It will calculate age over and over as it sorts. And if you’ve studied sorting algorithms, you know that the items in the list are compared with other list items repeatedly until it can be determined where the items go in the ordered list. So using this way of sorting, the age will be calculated a lot!

When you use sort_by() instead:

>> fishes.sort_by do |a|
>>   a.age
>> end

It does 3 things:

1. It will first go through each item in fishes, calculate age, and put those values into a temporary array keyed by the value. Let’s say we have 3 fish, one 300 days old, one 365 days old, and one 225 days old. The temporary array looks like this

[[300, #<Fish:A>][365, #<Fish:B>][225, #<Fish:C>]

2. The complex calculation is now done, once for each fish. It sorts this temporary array by the first item in each sub array. Meaning, it sorts by the numbers 300, 365 and 225, without recalculating them.

[[225, #<Fish:C>],[300, #<Fish:A>][365, #<Fish:B>]]

3. Lastly, it goes back through the array, grabbing the 2nd array elements (the actual Fish objects) and putting them in order into a flattened 1-dimensional array

[#<Fish:C>, #<Fish:A>, #<Fish:B>]

So, that is how you end up with a sorted array without recalculating values more than you need to. And that is why sort_by() can be more efficient.

Posted in code. Tags: , . 1 Comment »

Ruby Sorting [0] – Sorting a Hash


It figures that one of the questions someone asked, following my talk on sorting last month, was something I specifically chose not to cover for the sake of time, and to be able to cover more valuable topics. So, when someone asked whether you can sort a Hash and how, I knew A) that you can, and B) I knew I had barely glanced at it in the rdoc on hash sorting, but didn’t remember how it worked. So I had to simply suggest they go read about it themselves. But I was also curious to go read about it more myself.  So I took my own advice, and here’s what I came up with:

I define a hash:

irb(main):001:0> typical_fish_colors =
irb(main):002:0*    {"clownfish" => ["orange","white","black"],
irb(main):003:1*     "goldfish" => ["orange"],
irb(main):004:1*     "angelfish" => ["black","white"]
irb(main):005:1>    }

Default sorting on a hash – sorts the keys.

Pretty simple I guess. What you may not expect, depending on what you know of hashes, is that you don’t get a Hash object returned from calling .sort, you get an Array.  A nested, three-dimensional Array. This is because arrays are ordered, hashes are not.

irb(main):006:0> typical_fish_colors.sort
=> [["angelfish", ["black", "white"]], ["clownfish", ["orange", "white", "black"]],
   ["goldfish", ["orange"]]]

What you can’t do when sorting a hash is use sort! (sort-bang):

irb(main):007:0> typical_fish_colors.sort!
NoMethodError: undefined method `sort!' for #<Hash:0x2d17ea8>
 from (irb):7

What else you can’t do when sorting a hash by default is use symbols for keys:

irb(main):008:0> typical_fish_colors =
irb(main):009:0*     {:clownfish => ["orange","white","black"],
irb(main):010:1*      :goldfish => ["orange"],
irb(main):011:1*      :angelfish => ["black","white"]
irb(main):012:1>     }

irb(main):013:0> typical_fish_colors.sort
NoMethodError: undefined method `<=>' for :clownfish:Symbol
 from (irb):13:in `<=>'
 from (irb):13:in `sort'
 from (irb):13
Posted in code. Tags: , . Leave a Comment »
Follow

Get every new post delivered to your Inbox.