We wanted to be able to get some analytics for the various facilities on RidingResource, and that required some thinking. While Google Analytics is certainly great, and we use it heavily, there are some things that it can’t capture that are valuable data to both us and our customers.
Since RidingResource is essentially a search engine, we realized that there was value in knowing how often a facility’s listing “came up,” either by being directly viewed on it’s detail page, or also by being seen in the search results page. Since we also built an API for a partner, which we’ll announce publicly once it goes live, we thought it would be valuable to track API “hits” as well.
Creating the table to store the analytics data was relatively simple. We just created an Analytic model and connected it to the Contact model – Contact is the model that stores the basic information about facilities listed on RidingResource. We use Single Table Inheritance (STI) for the different types of facilities listed, but that’s for another posting.
We realized that there were not a lot of fields necessary for the analytics table. Since the analytics were connected to a contact, we needed to store the contact ID. Since we identified three different types of analytics, we store an integer for the type field, which we may make into an actual model later.
Lastly, we decided it would be valuable to store the parameters that were used at the time to cause this listing to be displayed. It’s entirely possible that there may some more valuable data that we could search on later, so knowing the params of the “hit” could be valuable.
class Analytic < ActiveRecord::Base belongs_to :contact end class CreateAnalytics < ActiveRecord::Migration def self.up create_table :analytics do |t| t.integer :contact_id t.string :parameters t.integer :analytic_type t.timestamps end end def self.down drop_table :analytics end end |
Ruby on Rails is kind enough to automatically store the params for us as serialized YAML. This way, when we want to actually process and dissect them later, we can simply do the following to get the params hash back the way we need it:
@the_analytics = Analaytic.find(:all, :conditions => :some_conditions) @the_params = YAML::load(@the_analytics[some_specific_one].parameters) |
One thing that needed to be carefully considered was storing “hits” on the search results pages. Because the database is currently a little bit nasty, we’re ending up finding all entries in the DB that match a subset of criteria, and then filtering out the rest that don’t match the remaining criteria. This is actually faster than all the weird joins that end up occuring. After that, there is still the matter of pagination. It’s entirely possible that a facility could be pulled from the DB several times without actually being displayed, so we couldn’t just assume that pulling from the DB in the results page was a hit.
What I realized was that Mislav’s will_paginate does something nice for us – it ends up lopping off all the records outside the pagination range, and leaves us with the few for the current page. This enabled us to simply iterate over the paginated records and store the hits.
@contacts = filter_results(@contact) @contacts = @contacts.paginate(:page => params[:page], :per_page => 8) @contacts.each do |contact| contact.analytics << Analytic.new( {:contact_id => contact.id, :parameters => params, :analytic_type => 1} ) end |
Storing the hits for the detail page and for the API was trivial. There’s only one record to grab on the detail page, so obviously someone is looking at it – hit. Since we don’t know what the people at the other end of the API are actually doing with the data, all we can do is record that a record was provided to the API.
So now that we’re storing the analytics, how the heck do we display them? That’s where Open Flash Chart comes into play. Unfortunately, this turned out to initially be a nightmare for many reasons.
When we first started building RidingResource, I was certainly a rails noob. Not that I am by any means not a noob at this point, but at least I am a little more polished since those early days of not knowing how to do anything. Because I was busy fighting everything at that point, I decided to save myself some headache for the administration area and use Active Scaffold.
Active Scaffold certainly is a nice plugin. It does have a tendency to throw wrenches into the works on occasion because it does some strange things using the Prototype javascript libraries. My first crack at graphing data was to take a look at Flot because it seemed simple and could do the basic things we wanted. The Flot plugin I found (Flotilla) wanted to use jQuery via JRails (don’t think JRuby) which interfered with the Prototype implementation of Active Scaffold. Since the initial graphing was for our admin area, this was out.
After some pondering and question asking in the #rubyonrails channel on Freenode (you can find me there as thoraxe), a few other suggestions came up. Scruffy and Gruff were suggested, but these both used Scalable Vector Graphics. While Firefox supports these today, I was informed that IE does not without a plugin. Our customer base is mostly going to be IE people, and probably not the most tech-savvy. In case these analytics became customer-facing, I did not want to have to worry about teaching non-tech-savvy people how to install browser plugins for IE. Scruffy and Gruff = out.
Next came the flash implementations, of which there were two notable ones. The first I will mention is Ziya, although we did not ultimately choose it. Ziya charts certainly are sexy, but for some reason I decided implementation looked difficult and that the charts were a little bit of overkill for what we needed.
Enter Open Flash Chart, our savior. Well, in the end the savior. It was a hell of a headache getting it to work.
There are several implementations of Open Flash Chart in Ruby on Rails. I have to say that, to a certain extent, all of them are a little sucky. Don’t get me wrong – it’s unfair for me to complain about free code that could make me money! But there is something to say about the cleanliness and simplicity of Technoweenie’s code when compared to some of these plugins.
Open Flash Chart is smart. It is a flash file that you basically feed JSON data to generate charts. That makes it simple. Unless you are using the JSON gem already. Which we are because it is used by dancroak’s twitter_search plugin. Which makes things insane. Remember how Active Scaffold was interfering with Flot? Well, here we were again. Something I was already using interfering with something I wanted to do.
To make a long story short, after much headache surrounding various Open Flash Chart plugins that used Rails’ built-in JSONification, one of the plugins that is mentioned on the OFC webpage happened to use the JSON gem itself. Perfect!
Korin’s Open Flash Chart 2 plugin did the trick. I won’t go into the implementation of everything in its entirety, but I will share the following bits which you may or may not find useful.
Korin’s examples use two controller actions to generate the graph. The first action creates the @graph object which basically just stores the string which represents some code that the swfobject javascript library uses to create the proper html to display the openflashchart.swf. The second action actually generates the JSON that gets fed into the SWF.
One of the other plugins I had found that did not work for me (because of the previously discussed JSON gem issues) was Pullmonkey’s Open Flash Chart plugin. Pullmonkey did something neat using the respond_to method of MimeResponds in ActionController.
def show # find the contact requested @contact = Contact.find_by_url(params[:id]) respond_to do |wants| @all_results = Analytic.find(:all, :conditions => { :contact_id => @contact.id }) @data_results = Analytic.find(:all, :conditions => { :analytic_type => 1, :contact_id => @contact.id}) @data_details = Analytic.find(:all, :conditions => { :analytic_type => 2, :contact_id => @contact.id}) @data_api = Analytic.find(:all, :conditions => { :analytic_type => 3, :contact_id => @contact.id}) wants.html do # set up the graph on the request @graph_results = ofc2(650,300,url_for(:action => :show, :format => :json, :graphtype => :results),"") @graph_details = ofc2(650,300,url_for(:action => :show, :format => :json, :graphtype => :details),"") @graph_api = ofc2(650,300,url_for(:action => :show, :format => :json, :graphtype => :api),"") end wants.json do # provide the JSON back to the flash # call the function to generate the graph based on the graph type that is supplied via params render :text => results_graph.render end end end |
This implementation is a little more elegant, but I’m actually finding that, because both the HTML and JSON generation are happening inside the same controller action, some of my finds are being performed multiple times. This is because the records for these finds are needed by both parts of the action, but the action gets called each time any part of the respond_to gets called. I may end up de-elegantizing this and splitting it back into multiple actions if it becomes a performance issue.
In determining what to do with the analytics data, we decided that initially it made sense to simply graph hits by day. Since we cheated in our creation of the analytics table and used the built in timestamps, we already had a created_at field which contained the DateTime of the hit. If you look at that last sentence carefully, you realize that DateTime is not Date. So how do you lop off the time part? This also still leaves the trouble of calculating how many hits occured on each day, too. This is where the elegance of Ruby really shines.
line_values = [] x_labels_text = [] instance_variable_get("@data_#{params[:graphtype]}").group_by{ |a| Date.ordinal(a.created_at.year, a.created_at.yday) }.each do |day, results| # this will group all of the analytic hits together by day instead of date/time. it then iterates over # these results in a block, where day holds the value of the day of the results, and results is an array # containing the individual results from that day. x_labels_text << day.to_s # put the value of the day into the x axis label line_values << results.length # how many hits occured on this day end |
group_by, ordinal, and iterating over blocks totally saved our butts here. What the above code enabled us to do was to group the entire array of analytics data by the day (after munching the time off using ordinal), and then iterate over the resulting groups. The blocks allow us to both store the day into the array of labels for the x-axis, as well as determine how many hits occured on that day by using the length of the array of data in the group. Brilliant!
As you can see, what started out as a relatively simple idea (“Let’s graph some analytics about the facilities on RidingResource!”) ended up being a relatively non-trivial coding exercise that took us almost a solid day of man-hours. But, in the end, we were left with something simple that did the job, but which leaves us a lot of room for growth and power.
The only real pain point right now has to do with the actual analytics data. If a day goes by without any hits, nothing gets stored in the database. Since we are grouping by the records that we actually pull out of the database, any dates in the middle with no hits will not be represented. So we are left with the problem of how to determine what dates are “in the middle” that have no hits. It’s not an issue right now, but it may become one in the future. I’m sure we’ll be able to figure it out.