Working in the mobile web, it's hard to keep up with the constant glut of devices on the market and their rapidly evolving capabilities. Of course, progressive enhancement is key; the markup you use in a mobile site should be minimal and look good without any CSS, JavaScript, or images, which is the worst-case. But we don't want to stop there; we want our sites to look as good as they can on as many mobile devices as possible.
WURFL is here to help with that. A crowd-sourced database of mobile devices, updated regularly, WURFL gives you detailed information on the capabilities of over 10,000 mobile devices indexed by user-agent string and grouped by manufacturer. It's relatively straightforward to set up a cronjob for a rake task that downloads an updated WURFL file and stores it in a table in your database; using an ActiveRecord model, you can then query the data provided by WURFL to get all the details for any device which has a user-agent string in the database.
But device makers throw a wrench in these efforts: user-agent strings are frequently updated or modified from their canonical form without rhyme or reason.
Fortunately, we have the Jaro-Winkler distance algorithm, which gives us a quick-and-dirty matching algorithm for strings that are mostly the same but may vary in arbitrary ways. Using this algorithm, we can run indexed queries for exact matches, then pull in some subset (perhaps matching the first 20 characters of the mystery UA string), iterate over these devices, and find the best match to return. Failing to find any implementation of the Jaro-Winkler algorithm in Ruby, I had to cobble together my own; I hope you find it useful.
module JaroWinkler def self.distance(str1, str2) str1.strip! str2.strip! if str1 == str2 return 1 end # str2 should be the longer string if str1.length > str2.length tmp = str2 str2 = str1 str1 = tmp end lmax = str2.length # arrays to keep track of positions of matches found1 = Array.new(str1.length, false) found2 = Array.new(str2.length, false) midpoint = ((str1.length / 2) - 1).to_i common = 0 for i in 0..str1.length first = 0 last = 0 if midpoint >= i first = 1 last = i + midpoint else first = i - midpoint last = i + midpoint end last = lmax if last > lmax for j in first..last if str2[j] == str1[i] and found2[j] == false common += 1 found1[i] = true found2[j] = true break end end end last_match = 1 tr = 0 for i in 0..found1.length if found1[i] for j in (last_match..found2.length) if found2[j] last_match = j + 1 tr += 0.5 if str1[i] != str2[j] end end end end onethird = 1.0/3 if common > 0 return [(onethird * common / str1.length) + (onethird * common / str2.length) + (onethird * (common - tr) / common), 1].min else return 0 end end end
This will return a value such that 0 <= value <= 1. Matches near the beginning of the strings are weighted more heavily than matches near the end, making this a good algorithm to use with user-agent strings which are typically long and have minor variations toward the end.
That said, this isn't necessarily the optimal method to detect user-agent strings, and since it can easily give false positives in cases where a desktop and a mobile version of the same browser exist, it should be used in conjunction with a whitelist of known desktop browsers (which, though substantial, is much shorter than the list of all mobile browsers). So I offer this code with the disclaimer that Jaro-Winkler may not be the most effective algorithm for user-agent string matching.
By the way, I'm just learning Ruby, so if I'm doing anything atrocious here please let me know.
4 comments:
You might also want to look into the algorithms implemented in the amatch ruby library and see if any of them suit your dataset a little better:
http://amatch.rubyforge.org/doc/classes/Amatch.html
Thanks marzagao, I hadn't seen that library. Will give it a go.
Post a Comment