Know Ruby: String Accessor
I’ve decided to travel deep into the land of Ruby1 so that I may better know its secrets. I’ll be scouring it for the interesting, the useful and the inane. Re-examining parts that I thought I knew. Exploring forgotten methods and learning whatever I can. Rather than go it alone I hope you’ll join me.
We’ll start our journey with the deceptively simple String
accessor.
Surely you’ve used the []
method but are you aware of all that it can do?
Its plethora of signatures make it the Swiss Army Knife of String
methods.
Let’s delve in.
Index
The first signature needs no introduction.
Given an Integer
it’ll return the character located at that index.
Of course, it hasn’t always been that way.
Prior to 1.9.1 it would return the ASCII code.
> 'Aaron'[0]
# 1.9.1+: "A"
# 1.8.7: 65
I for one am glad those dark days are behind us.
There is however a vestigial remnant from this past.
Within the current syntax lies the ?
.
Followed by a character, ?
would return the ASCII value for that character.
> ?A
# 65
No more late nights memorizing the ASCII table.
The ?
was particularly useful when comparing values returned from []
.
> 'Aaron'[0] == ?A
# true
After 1.9.1, ?
became the equivalent of a single character String
.
> ?A
# "A"
Because of this, existing equality checks worked seamlessly through the transition. Its value these days is… questionable. It does save a character when code golfing. So, I guess it’s not entirely useless.
Continuing on, we can throw a negative Integer
at []
to read from the back.
> 'Aaron'[-1]
# "n"
Stepping beyond the bounds will net us a nil
for our efforts.
> 'Aaron'[5]
# nil
> 'Aaron'[-6]
# nil
Start, Length
Looking for a group of characters?
Simply provide a starting position and the number in your party.
Note that I said position this time and not index.
Unlike before, we’re not locating a character.
We’re locating the space next to a character.
For example, 'Aaron'
has the following 6 positions '(0)A(1)a(2)r(3)o(4)n(5)'
.
Starting at position 0
and requesting 2
characters gets us 'Aa'
.
> 'Aaron'[0, 2]
# "Aa"
If we ask for nothing, we’ll get nothing.
> 'Aaron'[0, 0]
# ""
Get greedy and the method will give us what it can.
> 'Aaron'[0, 10]
# "Aaron"
Negative starting positions work backwards from the end.
In the negative direction our positions are '(-5)A(-4)a(-3)r(-2)o(-1)n'
.
> 'Aaron'[-2, 2]
# "on"
Negative lengths well… there’s no such thing.
> 'Aaron'[2, -1]
# nil
Once again, anything beyond the bounds yields a nil
.
> 'Aaron'[6, 0]
# nil
> 'Aaron'[-6, 0]
# nil
What if we take the last position and ask for a character?
> 'Aaron'[5, 1]
# ""
As long as our starting position is valid and our length isn’t negative, we’re guaranteed a String
.
Range
The trickster of the bunch.
Passing a Range
might seem straight forward enough.
Begin and end with the character indexes you’re looking for.
> 'Aaron'[0..2]
# "Aar"
But what we’ve just seen is a lie.
The beginning and end of the Range
are positional.
In “Aaron”, the highest index is 4
but the highest position is 5
.
If it’s an index, starting with 5
should return a nil
.
> 'Aaron'[5..5]
# ""
We have to go to 6
to get nil
.
> 'Aaron'[6..6]
# nil
There’s one more thing to know.
The end isn’t really the end.
The Range
always steals one more character.
> 'Aaron'[0..0]
# "A"
Before you decide to write off Range
entirely, there are three easy rules to conquer the madness:
-
Valid beginning positions guarantee a
String
. (Remember, the positions are'(0)A(1)a(2)r(3)o(4)n(5)'
and'(-5)A(-4)a(-3)r(-2)o(-1)n'
.) -
Invalid beginning positions guarantee a
nil
. -
Valid beginning positions with equal or later ending positions return a non-empty
String
. (Remember, it's positionally later not numerically higher.)
Even with these rules we should avoid Range
unless we have a very compelling case for it.
String
Oh good, an easy one.
Passing a String
either finds it or doesn’t.
Found.
> 'Aaron'['ron']
# "ron"
Not Found.
> 'Aaron'['z']
# nil
Easy.
Regexp, [Capture]
Let’s start by ignoring the optional capture argument.
Given a regular expression, []
returns the match or nil
.
> 'Aaron'[/[a-z]+/]
# "aron"
> 'Aaron'[/z/]
# nil
It looks a lot like the string matching we saw a moment ago.
Let’s explore the optional capture argument.
> 'Aaron'[/([a-z]+)([a-z])/, 0]
# "aron"
Using 0
returns the entire match.
It’s the same thing we get with no capture argument.
Not the most useful but, it might be handy if the capture group is determined dynamically.
Everything after 0
returns an individual capture.
> 'Aaron'[/A([a-z]+)([a-z])/, 1]
# "aro"
> 'Aaron'[/A([a-z]+)([a-z])/, 2]
# "n"
Everything before 0
returns an individual capture starting from the back.
> 'Aaron'[/A([a-z]+)([a-z])/, -1]
# "n"
> 'Aaron'[/A([a-z]+)([a-z])/, -2]
# "aro"
As of 1.9.2 we can also do named captures.
> 'Aaron'[/A(?<middle>[a-z]+)(?<last>[a-z])/, 'middle']
# "aro"
If we ask for a capture that doesn’t exist we’ll get nil
.
> 'Aaron'[/A([a-z]+)([a-z])/, 3]
# nil
> 'Aaron'[/A([a-z]+)([a-z])/, -3]
# nil
Unless it’s a named capture.
> 'Aaron'[/A(?<middle>[a-z]+)(?<last>[a-z])/, 'does_not_exist']
# IndexError: undefined group name reference: does_not_exist
It feels inconsistent but it’s not the accessors fault.
Regexp
just works that way.
If we only want one part of a String
, []
can be a concise and fast way to get it done.
Benchmark:
require 'benchmark'
N = 100_000
R = /(brown)/
S = 'The quick brown fox.'
Benchmark.bmbm do |bm|
bm.report('match') { N.times { R.match(S).captures.first } }
bm.report('=~') { N.times { S =~ R; $1 } }
bm.report('[]') { N.times { S[R, 1] } }
end
Results:
user system total real
match 0.110000 0.000000 0.110000 ( 0.116550)
=~ 0.050000 0.000000 0.050000 ( 0.047954)
[] 0.040000 0.000000 0.040000 ( 0.041006)
We shouldn’t be suprised that =~
and []
are faster.
Using match
generates a MatchData
object with lots of other information.
That takes time.
We made it.
Hopefully you’ve learned something about the String
accessor.
I certainly picked up a few nuggets of knowledge along the way.
I thought Range
was treated like a series of indexes.
Imagine trying to hunt down a bug and overlooking a line because you expect 'Aaron'[5..5]
to be falsely (i.e. return nil
rather than ""
).
That’s the danger of mental models that don’t match reality.
Come back for more as we continue to get to Know Ruby.
-
Version 2.1.1 ↩