Posts Tagged ‘analytics’

Google Analytics is not a statistics package!

Tuesday, July 29th, 2008

As everyone knows I’m a big fan of using Google Analytics with repositories in order to see what is happening with your repository with respect to visitors - what they are looking at / which links they are following / where they are coming from / how many people are visiting the site etc.

However from time to time I come across views regarding some of the data that is not captured by Google Analytics. Such data includes users who do not allow javascript / cookies, and visitors who click directly on ‘files’ (e.g. PDF files). In this second case, the data isn’t tracked because there is no web page shown from which to run the Google Analytics tracking code. In an attempt to help collect some of this information I have used a script by Patrick H. Lauke which triggers when a user clicks to download a file from a metadata jump-off page. It registers the click with Google Analytics and the download is recorded. But as I said, it doesn’t direct hits to the file that did not first go via the repository.

Is this a problem? Personally I don’t think so:

  • At least some of the data is now being recorded, which is better than none. It might not be numercially accurate, but hopefully it is still representative of user behaviour.
  • Remember that Google Analytics is an analytics package, not a statistics package. It does not claim to record every click, but is more intended to help with analysing and improving the user experience (e.g. “Do I get more file downloads if I place the list of files above the metadata or below it” or “Do users that land on a browse page download more files than those that arrive directly on an item page”).
  • If you want raw download figures, use a proper statistics system that works from web server logs (e.g. IRStats or a common web stats system such as AWStats). Most likely you’ll want to use both.

Tracking repository searches from the inside

Friday, June 6th, 2008

One of the many great features of Google Analytics is that it can shown the search terms that visitors to your site have used in search engines. This is a great tool for finding out what brings users to your repository.

Seven months ago Google launched a new feature in Google Analytics that also allows you to track the search terms used by visitors within your repository. Its very easy to set up, all you need to do is enable the feature and set the query parameter used by your repository. Follow these rules from the help pages:

  1. Log in to your Google Analytics account.
  2. Click ‘Edit’ under Website Profiles for the profile you would like to enable Site Search for.
  3. Click ‘Edit’ from the ‘Main Website Profile Information’ section of the Profile Settings page.
  4. Select the ‘Do Track Site Search’ radio button in the Site Search section of the Edit Profile Information page.
  5. Enter your ‘Query Parameter’ in the field provided. Please enter only the word or words that designate an internal query parameter such as “term,search,query”. Sometimes the word is just a letter, such as “s” or “q”. You may provide up to five parameters, separated by a comma.
  6. Select whether or not you want Google Analytics to strip out the query parameter from your URL. Please note that this will only strip out the parameters you provided, and not any other parameters in the same URL. This has the same functionality as excluding URL Query Parameters in your Main Profile - if you strip the query parameters from your Site Search Profile, you don’t have to exclude them again from your Main Profile.

Google Analytics Site SearchFor DSpace you need to set the query parameter to query and with EPrints set it to simple.

To view the results, follow the links shown in the image (Content -> SIte Search) and explore the results. 

Here is some interesting statistics from our repository as an example of the extra stats it can provide:

  • 89% of visits did not make use of a a site search, whilst the remaining 11% did.
  • 39% of search users left the system having performed the search without going any further (e.g. looking at one of the items found by the search)
  • 22% of searchers resulted in search refinements being undertaken by the searcher
  • 50% of searches were performed from the repository homepage, the remaining from item, collection and community pages.
  • Following a search, the average visitor stayed on the site for a further 1 minute and 30 seconds.
  • 8% of searches were performed without the visitor having entered a search term.

Repository bounce rates

Monday, May 26th, 2008

Bounce rate imageI’ve often wondered about what people do when they visit a repository, and whether what they are doing while visiting the repository could be considered ‘good’ in terms of the usefulness and general aims of the repository. Let me explain… I’m a big fan of Google Analytics, and one of the things it lets you see is what people do once they get to your repository. For each page it can show where they came from, how long each user stayed there, and whether they ‘bounced’ straight off to another web site afterwards (that is, Google Analytics on your repository did not encounter another view from that user in their browsing session), or whether they stayed within your repository to hopefully view more items.

The help file for Google Analytics describes the bounce rate as:

Bounce Rate: Bounce rate is the percentage of single-page visits (i.e. visits in which the person left your site from the entrance page). Bounce rate is a measure of visit quality and a high bounce rate generally indicates that site entrance (landing) pages aren’t relevant to your visitors. You can minimize Bounce Rates by tailoring landing pages to each keyword and ad that you run. Landing pages should provide the information and services that were promised in the ad copy.

If you consider an e-commerce website such as Amazon, then this description, and the aim of reducing the bounce rate must hold true. If your visitor searched for an item in a search engine, came to your website, viewed the item, and then ‘bounced’ away, you have lost the sale and the visitor took their business elsewhere. That is ‘bad’.

However, what is the purpose of a repository?

If you take the view that a repository (of the open access persuasion) is there to provide access to resources, then a bounce may not be so bad after all. Image the following scenario:

“I’m a researcher in the field of building robotic sailing boats. I’ve read an article that cites a paper by the title of ‘An Autonomous Sailing Robot for Ocean Observation’. So I duly perform a search using Google Scholar and it see a paper by that title is the top result. I visit the link and find myself in a repository which holds that paper. I download the paper, and go on my way, happy to have found what I wanted.”

Within Google Analytics we would see several different aspects of this visit:

  1. We’d see the visit to the metadata jump-off page.
  2. We’d see that the visitor came from Google Scholar.
  3. We’d see the search term that was used by the user within Google Scholar
  4. We’d see that the visitor stayed on the metadata jump-off page for say 20 seconds.
  5. Then… nothing. In other words, it wold be registered as a bounce.

So in traditional analytics terms this looks like a bad visit. However, was it? Clearly not. The visitor got what they wanted, and the repository has done its job. Why did Google Analytics not register the fact that the visitor read the PDF version of the paper though?

Unlike website log file analysis software (e.g. AWStats) Google Analytics can’t see every single interaction between the user and the web server. It can only see pages which include a small bit of Javascript that send the details of the visit to Google. So in the case of the repository, the metadata jump-off page contains the code so Google Analytics knows about the visit, but the PDF cannot contain the code. Google Analytics therefore doesn’t know about the successful download of the PDF. Maybe one day Google will address this issue in some way? It would be great if they could.

The repository has served it purpose, and the visitor got what they were after, but is it also the job of the repository to hold the user and to attract them to other related items in the repository? There are many ways this could be done, a subject for another day, but these will no doubt include elements of Web 2.0, social networking and item suggestion. This issue does though highlight one of the origins and ongoing features of Google Analytics - that of supporting e-commerce sites, particularly those that make use of its AdWords scheme.

But for me, for now, I think I’m reasonably happy with a bounce!