Archive for May, 2008

Remotely killing RDC users

Tuesday, May 27th, 2008

Part of my job involves me having to log in to Windows servers for one reason or another. Unless the machine has died, it is usually easiest and quickest to do this from my desk. However… often other people forget to log off from the same server, and take up all the connections, meaning that there are none left to use to log in. Usually this requires a physical trip to the machine to chuck them off. But no longer….!

This blog post gives a great way of doing it locally, by bringing up a command prompt for the remote server, and then killing anyone who is logged in, but disconnected. For my own reference, here are the things to do:

  1. Download PsTools from Microsoft
  2. Run: psexec \\x.x.x.x -u user -p password cmd (e.g. psexec -u DOMAIN\user -p ******* cmd)
  3. Type ‘qwinsta‘ on the remote server to list the logged in users
  4. Type ‘logoff 2 /v‘ (replace the ‘2′ with the relevant session number)

No more wasted time walking to the machine room!

Shibboleth, SWORD, and DSpace 1.5

Tuesday, May 27th, 2008

It was nice to see the announcement recently from the MAMS (Meta Access Management System) project at Macquarie University in Australia that they have implemented Shibboleth authentication for DSpace 1.5. It makes use of the stackable authentication system, and is therefore very nicely integrated with the DSpace architecture.

I’ve been playing with Shibboleth a bit recently, trying to get Blackboard on Windows to work with it. Apparently it works on Unix, but no one knows about Windows. To cut a long story short, I got it almost working. Shibooleth will get our local identity provider to perform the authentication, but unfortunately the IIS Shibboleth ISAPI filter sets the username in the HTTP_REMOTE_USER header, rather than the REMOTE_USER header. Blackboard isn’t configurable enough to look in either. So an enhancement request has been submitted!

Anyway, I was thinking it would be nice to convert our DSpace instance to use Shibboleth authentication rather than than LDAP. The great power of Shibboleth will be when all our systems use it, and we only have to log in to the IdP once. But I hit a snag in my thoughts…

I’ve just told a professor in our university that he could use our SWORD interface to remotely deposit some data that he is creating on a geographically remote server. This means he can periodically automatically archive the data in order to abide by the terms and conditions of the AHRC grant that is funding his work. How would SWORD work with Shibboleth? SWORD works with HTTP basic, and it would be hard to delegate this in the background to Shibboleth. So does that mean I can’t use Shibboleth?

But then I remembered the modular structure of DSpace. Each module (such as the user interface, the OAI-PMH interface, the SWORD interface) is deployed to the application server separately.  Each module can therefore have its own configuration. Normally they would share a single configuration, but I could use different configurations for each one. The normal user interface can use Shibboleth for authentication, and the SWORD interface can keep using LDAP,or what might be even better would be to use the local in-built password system in DSpace so that the professor doesn’t have to embed his university username and password in a script. 

So to conclude - we can use Shibboleth with DSpace whilst still having a working SWORD interface. Nice!

Repository bounce rates

Monday, May 26th, 2008

Bounce rate imageI’ve often wondered about what people do when they visit a repository, and whether what they are doing while visiting the repository could be considered ‘good’ in terms of the usefulness and general aims of the repository. Let me explain… I’m a big fan of Google Analytics, and one of the things it lets you see is what people do once they get to your repository. For each page it can show where they came from, how long each user stayed there, and whether they ‘bounced’ straight off to another web site afterwards (that is, Google Analytics on your repository did not encounter another view from that user in their browsing session), or whether they stayed within your repository to hopefully view more items.

The help file for Google Analytics describes the bounce rate as:

Bounce Rate: Bounce rate is the percentage of single-page visits (i.e. visits in which the person left your site from the entrance page). Bounce rate is a measure of visit quality and a high bounce rate generally indicates that site entrance (landing) pages aren’t relevant to your visitors. You can minimize Bounce Rates by tailoring landing pages to each keyword and ad that you run. Landing pages should provide the information and services that were promised in the ad copy.

If you consider an e-commerce website such as Amazon, then this description, and the aim of reducing the bounce rate must hold true. If your visitor searched for an item in a search engine, came to your website, viewed the item, and then ‘bounced’ away, you have lost the sale and the visitor took their business elsewhere. That is ‘bad’.

However, what is the purpose of a repository?

If you take the view that a repository (of the open access persuasion) is there to provide access to resources, then a bounce may not be so bad after all. Image the following scenario:

“I’m a researcher in the field of building robotic sailing boats. I’ve read an article that cites a paper by the title of ‘An Autonomous Sailing Robot for Ocean Observation’. So I duly perform a search using Google Scholar and it see a paper by that title is the top result. I visit the link and find myself in a repository which holds that paper. I download the paper, and go on my way, happy to have found what I wanted.”

Within Google Analytics we would see several different aspects of this visit:

  1. We’d see the visit to the metadata jump-off page.
  2. We’d see that the visitor came from Google Scholar.
  3. We’d see the search term that was used by the user within Google Scholar
  4. We’d see that the visitor stayed on the metadata jump-off page for say 20 seconds.
  5. Then… nothing. In other words, it wold be registered as a bounce.

So in traditional analytics terms this looks like a bad visit. However, was it? Clearly not. The visitor got what they wanted, and the repository has done its job. Why did Google Analytics not register the fact that the visitor read the PDF version of the paper though?

Unlike website log file analysis software (e.g. AWStats) Google Analytics can’t see every single interaction between the user and the web server. It can only see pages which include a small bit of Javascript that send the details of the visit to Google. So in the case of the repository, the metadata jump-off page contains the code so Google Analytics knows about the visit, but the PDF cannot contain the code. Google Analytics therefore doesn’t know about the successful download of the PDF. Maybe one day Google will address this issue in some way? It would be great if they could.

The repository has served it purpose, and the visitor got what they were after, but is it also the job of the repository to hold the user and to attract them to other related items in the repository? There are many ways this could be done, a subject for another day, but these will no doubt include elements of Web 2.0, social networking and item suggestion. This issue does though highlight one of the origins and ongoing features of Google Analytics - that of supporting e-commerce sites, particularly those that make use of its AdWords scheme.

But for me, for now, I think I’m reasonably happy with a bounce!