Surfacing Google Analytics stats in DSpace

In the recent survey asking the DSpace community for their top 3 feature requests for DSpace 1.6, the number one most requested feature was statistics. As you’ll know from previous posts, I’m a big fan of Google Analytics.

For the uninitiated, you insert a small bit of JavaScript in your web pages, and Google provide a very rich and powerful analytics service for viewing your site statistics.

Recently Google announced the launch of an analytics API that allows you to remotely query and download the statistics its holds about your site.

I like playing with APIs, so throught I’d write a solution that downloads item splashscreen view statistics from Google Analytics and displays them on the item page:

gajspui

The solution is quite simple. It requires the additon on one Java class into DSpace. This class should be run daily to download the statistics. The same class is used by the user interface to display the statistics. If you want to implement this solution, follow the instructions below:

<dependency>
<groupId>com.google.gdata</groupId>
<artifactId>gdata-core</artifactId>
<version>1.0</version>
</dependency>

<dependency>
<groupId>com.google.gdata</groupId>
<artifactId>gdata-analytics</artifactId>
<version>1.0</version>
</dependency>

<dependency>
<groupId>com.google.collect</groupId>
<artifactId>google-collect</artifactId>
<version>1.0</version>
</dependency>
<%
    // See if we can display a counter
    String path = "/handle/" + item.getHandle();
    String count = GoogleAnalyticsHitCounter.getPageCount(path);
    if ((count != null) && (!"".equals(count)))
    {
%>
        <table align="center" class="miscTable">
            <tr>
                <td class="oddRowEvenCol" align="center">
                    This item has been viewed <strong><%= count %></strong> times
                </td>
            </tr>
        </table>
<%
    }
%>

As with the DSpace video player solution I wrote about earlier this week, the code is not perfect, and needs to be improved a bit to make it solid, but is a good start if you wanted to use this type of solution. Enjoy!

package org.dspace.app.googleanalytics;

import java.io.IOException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Properties;
import java.util.Calendar;
import java.util.Date;
import java.text.SimpleDateFormat;

import com.google.gdata.client.analytics.AnalyticsService;
import com.google.gdata.data.analytics.DataEntry;
import com.google.gdata.data.analytics.DataFeed;
import com.google.gdata.data.analytics.Metric;
import com.google.gdata.util.AuthenticationException;
import com.google.gdata.util.ServiceException;
import org.dspace.core.ConfigurationManager;
import org.apache.log4j.Logger;

public class GoogleAnalyticsHitCounter {

/** log4j category */
private static Logger log = Logger.getLogger(GoogleAnalyticsHitCounter.class);

/** Hit counter */
private static Properties counts;

/** When the counter last loaded? */
private static Date lastloaded;

/** The filename of the counter file */
private static String filename;

/**
* Initalise the system
*/
public static void init()
{
// Load the properties file
Calendar yesterday = Calendar.getInstance();
yesterday.add(Calendar.DATE, -1);
lastloaded = yesterday.getTime();
filename = ConfigurationManager.getProperty("dspace.dir") +
System.getProperty("file.separator") +
ConfigurationManager.getProperty("googleanalytics.filename");
counts = new Properties();
loadCounter();
}

/**
* Get the count for a particular page (e.g. /handle/123/456
*
* @param page The page path
* @return The count. Empty String if unknown
*/
public static String getPageCount(String page)
{
// Check we're initialised
if (lastloaded == null)
{
init();
}

// Reload the hits
loadCounter();

// Get the value
if (page == null)
{
page = "";
}
String count = counts.getProperty(page);

// Return the value
if (count != null)
{
return count;
}
return "";
}

/**
* (Re)load the counter. It is reloaded every hour.
*/
private static void loadCounter()
{
// Do we need to load it?
Calendar hourago = Calendar.getInstance();
hourago.add(Calendar.HOUR, -1);
if (lastloaded.before(hourago.getTime()))
{
try
{
counts.load(new FileReader(filename));
lastloaded = Calendar.getInstance().getTime();
}
catch (Exception e)
{
log.warn("Unable to load google hit counter from " + filename);
}
}
}

/**
* Command line method to collect the statistics from Google Analytics.
*
* @param args No arguments used
*/
public static void main(String args[])
{
// Set up the variables
String username = ConfigurationManager.getProperty("googleanalytics.username");
String password = ConfigurationManager.getProperty("googleanalytics.password");
String siteid = ConfigurationManager.getProperty("googleanalytics.siteid");
String startdate = ConfigurationManager.getProperty("googleanalytics.startdate");
String handle = ConfigurationManager.getProperty("handle.prefix");
String root = ConfigurationManager.getProperty("dspace.url");
String filename = ConfigurationManager.getProperty("dspace.dir") +
System.getProperty("file.separator") +
ConfigurationManager.getProperty("googleanalytics.filename");

// Get the local path
String path = "";
try
{
URL localURL = new URL(root);
path = localURL.getPath();
if (path.endsWith("/"))
{
path = path.substring(0, path.length() - 1);
}
}
catch (MalformedURLException e)
{
System.err.println("Invalid dspace.url URL (" + root + ")");
return;
}

AnalyticsService as = new AnalyticsService("gaExportAPI_acctSample_v1.0");
String baseUrl = "https://www.google.com/analytics/feeds/";

// Login to Google
try {
as.setUserCredentials(username, password);
} catch (AuthenticationException e) {
System.err.println("Authentication failed : " + e.getMessage());
return;
}

// The results
Properties counts = new Properties();

// Keep requesting pages of results from Google until a blank page is found
// pages of 1,000 results at a time
URL queryUrl;
int i = 1;
boolean found = true;
int total = 0;

// Get stats up until yesterday
Calendar yesterday = Calendar.getInstance();
yesterday.add(Calendar.DATE, -1);
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");
String enddate = format.format(yesterday.getTime());

while (found)
{
found = false;
try {
String q = baseUrl +
"data?start-index=" + i +
"&ids=ga:" + siteid +
"&start-date=" + startdate +
"&end-date=" + enddate +
"&metrics=ga:pageviews" +
"&dimensions=ga:pagePath" +
"&filters=ga:pagePath%3D~" + path + "/handle/" + handle + "/[0-9]%2B$";
queryUrl = new URL(q);
} catch (MalformedURLException e) {
System.err.println("Malformed URL: " + baseUrl);
return;
}

// Send our request to the Analytics API and wait for the results to come back
DataFeed dataFeed;
try {
dataFeed = as.getFeed(queryUrl, DataFeed.class);
} catch (IOException e) {
System.err.println("Network error trying to retrieve feed: " + e.getMessage());
return;
} catch (ServiceException e) {
System.err.println("Analytics API responded with an error message: " + e.getMessage());
return;
}

for (DataEntry entry : dataFeed.getEntries()) {
String id = entry.getId().substring(70);
id = id.substring(0, id.indexOf('&'));
for (Metric metric : entry.getMetrics()) {
counts.put(id, metric.getValue());
total = total + Integer.parseInt(metric.getValue());
}
found = true;
}

i = i + 1000;
}

// Save the properties file
counts.put("total", "" + total);
try
{
counts.store(new FileOutputStream(filename), null);
System.out.println("Saved " + total + " total hits in " + filename);
}
catch (IOException e)
{
System.err.println("Error saving results to file: " + filename);
return;
}
}
}
Bookmark and Share
Posted on May 29, 2009 at 4:03 am by Stuart · Permalink
In: Uncategorized · Tagged with: , ,

16 Responses

Subscribe to comments via RSS

  1. Written by Bram Luyten
    on May 29, 2009 at 9:58 am
    Permalink

    Great work Stuart !
    Is your experience with the API that it responds quickly or slow ?

    Would be interested to compare whether it becomes slower for big numbers.

  2. Written by stuart
    on May 29, 2009 at 6:56 pm
    Permalink

    Hi Bram, The stats are downloaded in pages of 1,000 at a time, and is done so ‘offline’ by a daily cron job. So speed of response from the API isn’t really a problem. (At the moment, it seems to take about second or so per 1,000 results)

  3. Written by Hardik Mishra
    on September 18, 2009 at 6:14 pm
    Permalink

    Hello Sir,

    I am running dspace in local.

    I have succesfully implemented code.

    But I am getting zero hits.

    I googleAnalytics accout in my profile

    I do have Website URL: http://localhost:8080/dspace

  4. Written by Stuart
    on September 24, 2009 at 6:57 am
    Permalink

    Have you set ‘googleanalytics.siteid = 123456789′ appropriately to your site id?

  5. Written by Hardik Mishra
    on September 24, 2009 at 5:55 pm
    Permalink

    Hello Sir,

    I have set googleanalytics.siteid = UA-10719208-1
    and i do have profile id = 21619613

    Which one i need to set.

    Q 2 : Does the code works locally ?

  6. Written by Stuart
    on September 28, 2009 at 9:24 am
    Permalink

    For putting statistics on your site you use the site ID. Unfortunately I don’t think Google Analytics works when running your web site as http://localhost/

  7. Written by Urban Andersson
    on October 27, 2009 at 10:16 pm
    Permalink

    This is very interesting!

    Although it fails to compile on my server – I get the error message “[INFO] Compilation failure/…../GoogleAnalyticsHitCounter.java:[96,6] load(java.io.InputStream) in java.util.Properties cannot be applied to (java.io.FileReader)”

    I refers to the loadCounter() static, but I am not quite sure what to look for here…
    Java version is jdk1.5.0_15

  8. Written by Stuart
    on October 28, 2009 at 9:52 am
    Permalink

    Hi Urban,

    I think loading the contents of a Properties file using a FileReader was only introduced in Java 1.6.

    Try changing the line:

    counts.load(new FileReader(filename));

    to

    counts.load(new FileInputStream(new File(filename)));

    You’ll also need extra import lines at the top of the file:

    import java.io.File;
    import java.io.FileInputStream;

    Thanks,

    Stuart

  9. Written by Urban Andersson
    on October 28, 2009 at 10:55 am
    Permalink

    Works like a dream. Many thanks!

  10. Written by alessandra bianchi
    on October 30, 2009 at 11:04 pm
    Permalink

    Hi,
    [I think this is the right place to ask...],
    where can I have a look to a public DSpace GA stats?

  11. Written by Stuart
    on October 31, 2009 at 9:01 am
    Permalink

    Hi Alessandra,

    I don’t know if there are any public DSpace instances running this code.

    Thanks,

    Stuart

  12. Written by Gary
    on December 17, 2009 at 5:13 pm
    Permalink

    when I got the updated analyticscounts.properties, Should I need to restart the jspui service in tomcat to get the updated count?

  13. Written by Stuart
    on December 17, 2009 at 7:50 pm
    Permalink

    Hi Gary,

    I wrote the code a while ago, and haven’t looked at it for a while. Looking at it again, I think it should reload the data every hour, although it was never fully tested (was more of a proof of concept) so it might not work fully.

    Thanks,

    Stuart

  14. Written by Paulo Jobim
    on February 6, 2010 at 12:35 pm
    Permalink

    Dear Stuart
    I have addapted this code to use with the xmlui Interface.
    I acctually store the hitcounts in a dc field so I can browse items by Hitcounts. Recentlly the code stopped working, giving me an error:Authentication failed : Captcha required
    the code still works on an instance of dspace in my Macbook but not in the server of the Institute.
    Any hints on how to solve this.
    Paulo

  15. Written by Stuart
    on February 6, 2010 at 12:44 pm
    Permalink

    Hi Paulo,

    I’ve not seen this error myself before. http://code.google.com/apis/gdata/docs/auth/clientlogin.html#Examples looks useful. I think it may be that the API is really intended to present data to a user, rather to a system that uses it, so a user would be able to re-authenticate and solve the Captcha.

    Another Google help page also suggests:

    If a user supplies an incorrect username or password, or a similar error occurs, the AuthenticationException is thrown. If your application uses ClientLogin to authorize, and a program requests a token too frequently, the user is presented with a captcha challenge response. (links to the URL above).

    I hope that helps,

    Stuart

  16. Written by Michael Guthrie
    on February 10, 2010 at 1:06 am
    Permalink

    Hi Paulo and others looking at incorporating the Google Analytics API into DSpace. Starting from some of the code on this page we at OpenRepository.com have been able to get a pretty nice result for the statistics of our repositories. For some background on what and how we did it, click through to here: http://openrepository.com/products/enhanced-statistics From there you can click through to the demo repository and see some examples of the API in action. Hope this gives you all some impetus and hope as to what can be achieved.
    Bests,
    Michael

Subscribe to comments via RSS

Leave a Reply