Quantcast
Channel: December 2010 – End of Line
Viewing all articles
Browse latest Browse all 10

Getting to Know the Ruby Standard Library – Abbrev

$
0
0

This article has been republished on Monkey and Crow.

We’re going to take a look at another little piece of ruby’s standard library, this time it is Abbrev, a tiny library that generates abbreviations for a set of words. We will expand ever so slightly on the one-liner from my last post to show an example of Abbrev in action:

require 'abbrev'
commands = Dir[*ENV['PATH'].split(':').map{|p| p+"/*"}].select{|f| File.executable? f}.map{|f| File.basename f}.uniq.abbrev
commands['ls']   #=> 'ls'
commands['spli'] #=> 'split'
commands['spl']  #=> nil

This will match anything on your path, or any substring that will match only one longer string. Combine this with Shellwords, and you have the pieces for an auto completing console. It could also be used for matching rake tasks, tests, or giving suggestions for mistyped methods.

How it Works

So that is what Abbrev does, but how does it work? If you open up the library (qw abbrev if you have Qwandry), you will see that it is pretty small, there’s just one method, and then a helper that extends array.

Starting from the beginning, we see that it takes an array of words and an optional pattern. It stores the abbreviations in table, and tracks of occurrences of each abbreviation in seen using the counting idiom I mentioned in Hash Tricks.

def abbrev(words, pattern = nil)
  table = {}
  seen = Hash.new(0)
  ...

The pattern can be a RegularExpression or a String:

  if pattern.is_a?(String)
    pattern = /^#{Regexp.quote(pattern)}/	# regard as a prefix
  end

If it’s a String, it is converted to a RegularExpression. Notice that Regexp.quote(pattern) is used so that any characters that have special meanings as RegularExpressions will get escaped. If this pattern is present, it is used to ignore any abbreviations that don’t match it. Next we see how the abbreviations are generated for each word:

  ...
  words.each do |word|
    next if (abbrev = word).empty?
    while (len = abbrev.rindex(/[\w\W]\z/)) > 0
      abbrev = word[0,len]

      next if pattern && pattern !~ abbrev

    	case seen[abbrev] += 1
    	when 1
    	  table[abbrev] = word
    	when 2
    	  table.delete(abbrev)
    	else
    	  break
    	end
    end
  end
  ...

The first part of this sets the current word to abbrev, but skips the word if it is blank. The next part of the loop is a little more confusing, what does abbrev.rindex(/[\w\W]\z/) do? It gives you the index of the last character in the String, as far as I can tell in ruby 1.9 this is equivalent to String#length - 1. So the inner while loop is going to use abbrev = word[0,len] to chop off a character each time until the String is empty. The hash seen is incremented by 1 for this substring. If this is the first time the word has been seen, then the word is recorded. If this is the second time the word has been seen, the word is removed because it is not unique. If the word has been seen more than twice, then not only has this word been seen, but we know that all the substrings of this word have been seen and removed, so the loop exits.

  words.each do |word|
    next if pattern && pattern !~ word

    table[word] = word
  end

  table
end

Finally Abbrev loops through the original words and inserts them. This means that if the array contained “look” and “lookout” they both get added as matches for themselves even though “look” is a substring of “lookout”.

So there you have it, ruby’s Abbrev library explained, go forth and shorten words.


Viewing all articles
Browse latest Browse all 10

Trending Articles