Is the Repository Developer a dying breed?

Is the Repository Developer a dying breed, and should we care?

Cast your mind back, perhaps seven or eight years.  It was the heyday of repository development.  Projects such as DSpace and EPrints were taking off, and institutions around the world were watching the area closely and with excitement to see where this glorious new world would take us.

But back in those days repositories were similar to the early motor car – you needed a lot of money, several years, and your own mechanic/driver (developer) to make it work.  Luckily, back then these resources were often available, money was perhaps a little easier to come by, and there were many funding opportunities from the likes of JISC to help out too.

As the repository developer worked with the repository software for a few years, they became intimately related with the software – they knew how it worked, how it was structured, what it could and couldn’t do, how to structure data within the repository, and often became key players in the development of the open source platforms by taking on roles such as DSpace Committership.  Life was good, and I was lucky to be part of this, riding on the waves of e-theses, JISC projects (Repository Bridge, ROAD, RSP, Deposit Plait, SWORD) and the start of local institutional open access advocacy movements.

However… life moves on.  The early repository developers have taken different career paths, and now find themselves in different situations.

  • Some have left the domain when the project funded projects slowed down, repositories could be implemented without a dedicated developer, and new areas of interest arose.
  • Some have progressed in their careers, and chosen to take a non-technical route up the tree.
  • Some have taken a commercial route, choosing to take their skills into the commercial sector and providing development services back to repository-using institutions.
  • Others have specialised as repository developers, but often find their emphasis has to be on compliance or marketing issues such as statistics, research assessment, or branding, rather than continuing to develop and apply core repository functionality.

It is rare to find a role these days where a developer can specialise in repositories, spending the majority of their time in that area.

I believe there is still a need for repository developers, as they bring many benefits such as:

  • An understanding of the technology that helps them to know when repository technology can and should be applied, and when it should not.  Often repositories do not appear to be suitable choices for some of our system requirements as we’re used to confining them to electronic theses and journal papers, but they have great potential in new areas.
  • They know the underlying technology and data structures used by repositories, and how these can be mapped onto new domains.  This can save institutions time and money, as they can re-use their existing repository infrastructure and expertise, rather than in investing in others.
  • Equally and opposite, they know the weaknesses of repositories, and where current or future functionality will not be suitable.
  • They provide technical credibility.  Often repositories are run by libraries, but in environments where there are IT departments who may hold varying views on the technical development competence of the library.
  • They make the running of the current repository/ies more smooth, and can help manipulate the data they contain (import / export / update / delete) in ways that are not supported by the native interfaces.
  • Repositories are starting to become integration targets of enterprise systems such as CRIS systems.  Having a repository develop around can make these integrations easier.

I think we’re seeing a downward spiral in the availability of repository developers.  From time to time you see job adverts seeking experienced repository developers, and unfortunately they seem to be becoming a rare breed – a breed which I think we should protect, recognise, foster, and grow.  I’m lucky to have worked in several large institutions lately where there have been small, effective, embedded, and valued repository development teams.  However these types of teams are starting to become fewer and harder to find.

Opportunities for repository developers to network, learn, and share, are decreasing.  There are exciting events such as Open Repositories with their Developers Challenge, and the Repository Fringe, but these are only annual events, and do not provide the opportunity for repository developers to show their skills and the potential of repository technologies to anyone outside of the repository community.

How will this affect the repository community?  I think that there will be increasing problems in the repository world if repository developers become a very rare breed:

  • The large open source platforms that we have come to rely upon (installations of platforms such as DSpace, EPrints, and Fedora number in the thousands) will find it harder to continue to develop and keep pace with current requirements.  The large amount of development effort that has gone into these systems over the past decade could be wasted, and we’ll fail to see some of the benefits that only come to fruition after this length of maturity.
  • There will be fewer exemplars of good practice of repository use to inspire and drive forward the innovative use of repositories.
  • Those who administer repositories will lose their local allies who are able to provide the tools and integrations to make repositories a local success.
  • The potential for repositories to be involved in new hot topics such as Research Data Management, the resurgence of interest in open access publishing, or the need for better digital preservation may be missed.
  • It will be even harder to recruit experienced and passionate repository developers, and without well-established teams of these, new developers thrust into the arena will find it harder to grow their skills and knowledge.

What should we, and can we do about it?  It think that we need to value the role of the repository developer, and continue to recognise that despite there being less requirement for a repository developer in order to run a repository, the absence of development skills may inhibit a good repository service from becoming a great repository service.  As we value multi-talented systems librarians, we should value the repository developer as a multi-skilled employee that allows us to correctly apply and integrate repository technologies.

Looking around at commercial companies that offer repository development services (for example Cottage Labs and atmire) we see the sort of innovative thinking that has so much potential in this area, and when I talk to staff involved with these companies there seems no shortage of people wanting their skills.  And this is good, and shows that there is a demand.  But equally I feel we need to keep growing these skills within institutions, and not let the local Repository Developer become a dying breed.

Repositories are in their teenage years, we nurtured them through birth, messy childhoods, promising early years, and now we’re starting to get a glimpse of how they can become powerful embedded tools.  But without the continued availability of skilled parents to shepherd their development, they may never reach their full adulthood potential.

[This blog post was written on the way to a DevCSI event for managers of developers, where we shall be looking at how we can show the positive impact of having local development teams within universities, and from my perspective and passion, their particular value to libraries.]site

Thoughts on the Elevator

The JISC have been running an experimental funding system known as the JISC Elevator.  The introduction on the site’s homepage describes the concept well:

JISC elevator is a new way to find and fund innovative ways to use technology to improve universities and colleges. Anyone employed in UK higher or further education can submit an idea. If your idea proves popular then JISC will consider it for funding. The elevator is for small, practical projects with up to £10,000 available for successful ideas. So if you have a brainwave, why not pitch it on the elevator?

A small team of us from the University of Edinburgh Digital Library submitted a proposal: The Open Access Index #oaindex.  The video submission is shown below…

I’ve previously blogged about the experience of creating this submission.  This post however contains a few observations about the Elevator concept, and the proposals that have been submitted.

First off – I’m a big fan of this system for a number of reasons:

  • It gave us an avenue to submit this type of proposal for a small amount of funding (only a few thousand pounds)
  • It provided us with a public platform and forum to socialise and discuss the idea
  • It adds a more open peer review stage to the process
  • It could encourage proposals from first-time bidders (although the public nature of it might put some people off?)

It will be interesting to see if or how the concept evolves overtime.  Last week I got to chat about this with Andy Mcgregor the JISC Programme Manager in charge of this, and Owen Stephens.  A few ideas that arose include:

  • Restrict the number of votes that any one person can place – make voters think harder about which ideas are most worthy of funding as there is only a limited number of projects that can be funded
  • Perhaps allocate each voter a set of votes or mock money or shares – they decide how they invest them across the proposals (all to one great idea, or spread across a few)
  • Be more transparent about the funding each project has requested and who has voted

I’ve been following the different ideas as they’ve been submitted, and a few trends have surprised me.

The first relates to the funding band that the proposal falls into.  There are three funding bands: up to £2,500, up to £5,000, and up to £10,000.  There is a total of £30,000 being made available to fund some of the submissions.  I can’t tell what band the proposals that have already received enough votes for fall into, but for those that are still collecting votes, the breakdown is as follows:

I was surprised at the number that requested the full £10,000.  Of course, it could be that those in the ‘unknown’ category (those which  have already received the number of votes they require) are all in the lower bands, therefore require fewer votes, and are there now fully voted for.  When pitching an idea, I always consider the amount of money available, and therefore the likelihood of receiving a given share of that money.  In this case, due to the very limited funding, we chose to submit a proposal in the lowest band to (hopefully) increase our chances.

The second aspect of the submissions that struck me was the domain to which the submission relates. I’ve split these up into three very broad (and arguably very bad) categories: Learning (students / learning enhancement), IT (systems, development) and Library (materials, metrics).

The ‘education’ category received by far the most submissions, with IT and Library lagging far behind.  Indeed our #oaindex proposal seems to be the only one in the library domain.  Why is this?  Perhaps the amounts available are much lower in the IT and Library domains than we are used to bidding for?  Perhaps there are less opportunities for funding in the education domain?  Are those in the education domain better at seizing these new and innovative funding opportunities than those in the library or IT domain?  Discuss…!

When we created our video, we ensured that we mentioned who we were and which institution we worked for.  However it didn’t cross our minds to include any sort of branding in our submission.  I only thought of this when watching some of the others.

We were not alone, only a few included some.  Did we miss an opportunity here, or is the brand somewhat irrelevant to the format of submitting elevator pitches: should voters be influenced by the idea more than by the host institution?

Our submission took the format of ideas being drawn on a whiteboard, with a voice-over in the background.  I’ll openly admit that this was because none of us really wanted to stand in front of a camera for 3 minutes.  Given how much we laughed during the simple voice recording, I think doing this in front of the camera would have taken even longer.  Sorry – we didn’t keep the out-takes!!!

Voting for the proposals ends in a few days, and I’m looking forward to seeing which get funded, which don’t get enough votes, and whether or not the concept continues.  But the scheme certainly gets my vote for the periodic allocation of small amounts of funding for great ideas!

[ The data that I’ve collected on the proposals can be seen at: https://docs.google.com/spreadsheet/ccc?key=0AgXAkDGxqBWYdHR1c0l6Uzk5aDFfQzlaM0ZtV04wcVE I’d be happy to receive updates or corrections.]racer games

A tale of two bids

This is a tale of two bids; two recent JISC bids to be precise.  One submitted via the  ‘traditional’ route, and one via the experimental ‘Elevator‘ route.  This blog post is a brief reflection of my thoughts about these, and a comparison of the experience, in particular comparing the effort involved.

First, I should provide a brief explanation of the two routes:

  • The ‘traditional’ route: Traditionally JISC requires bid proposals to be submitted as text documents, usually in the range of 6 to 12 pages.  These include cover sheets, budgets, benefits and risks, a bit about the people involved, and of course an explanation of the problem that will be investigated.  On top of that, there are letters of support and FOI checklists.  As part of the recent JISC Digital Infrastructure call, we submitted a couple of bids.  What we bid for is somewhat irrelevant, but I will disclose that the two bids we submitted were requesting funding of approx £30,000 each.  These proposals will now be marked by internal and external markers, followed by a panel decision.
  • The Elevator route: JISC are currently running an experimental funding stream, known as the JISC Elevator.  The idea is that proposals should be lightweight, consisting of a brief video presentation, along with a few words.  No budgets, no letters of support, no FOI statements – just an elevator pitch about the idea.  This is the first difference.  The second difference is that ‘the crowd’, which in this case consists of anyone with a .ac.uk email address, are allowed to vote on which projects should be considered for funding.  Any that get enough votes will go forward for consideration by a panel.  The number of votes required are proportional to the amount requested, with three bands being up to £2,500, up to £5,000, and up to £10,000.  We pitched at the bottom end of this scale, meaning that we required 50 votes (which we received in less than 24 hours).

I’ll openly admit that the traditional route is often stressful.  It takes around about 1 week of effort (full time), usually spread over 3 or 4 weeks.  The final days tend to get quite frantic as everything is pulled together, we go through internal reviews and consents, seek letters of support, and pull the bid together for final submission.

In comparison, our pitch for the elevator took about half a day – an hour to refine the idea and seek approval, an hour to write a script, an hour to record the voices, an hour to make the video, and a few minutes to upload it.

The feeling at the end of the elevator process was markedly different to the end of the traditional process – and this felt good.  However, when you look at the sums (adjusted slightly to make the numbers easier)…

Elevator: 1/2 day = £3,000 potential funding
Traditional: 5 days = £30,000 potential funding

So the actual potential return per hour invested in bid writing is the same!

However if I extrapolate this to other bids I have written in the past, some of which have been for higher amounts, the trend does not seem to continue in a linear fashion.

My personal experience (your mileage may vary – it would be good to compare notes!), is that bids in the range of thirty to perhaps two hundred thousand, take a similar amount of time:  a week or so for a primary proposal author, and various time commitments from other parties.  But bids above this amount then start taking longer again, as project complexities, often around collaboration and external involvement kick in.

What can I conclude from this?  I’m not sure really!  Feel free to draw your own conclusions and to make comparisons with your own experience.

What I can say, is that we enjoyed the process of making, submitting, and then publicising our elevator pitch.  We felt that we had more freedom to be inventive with our interpretation of the submission requirements, and felt quite refreshed at the end of the process, rather than frazzled!

Now we await the outcomes of both…сколько стоит раскрутка сайта

Back in the UK

This is just a quick blog post to say that I am now back in the UK, following almost three wonderful years in Auckland, New Zealand at The University of Auckland.

Having worked in England and Wales, this time I’ll be in Scotland, as the Head of Digital Library Services at the University of Edinburgh.  This role currently encompasses acquisitions, metadata, e-resources, digital library development, information systems, repositories, and research publications.  Exciting new areas such as research data management are also on the cards.

I’m very much looking forward to working in the UK, and meeting up with ex collaborators, colleagues, and friends.

Edinburgh is of course the host venue for this year’s Open Repositories conference, so I hope to see many of you there!ru-dota2.ru

Building a ‘blogliography’

I shall shortly be moving on to pastures new from my current role at The University of Auckland Library. Before I depart, I wanted to document a few of the projects that I have worked on during my (almost) three years in Auckland, the first of which is contained in this post: a project to build a new online bibliography for the New Zealand Asia Institute.

As a library, we host a lot of online collections, many of which are simple bibliographies of materials relating to a particular subject.  In this case we worked with the library’s Business and Economics subject team to build an online bibliography of materials concerning the business interactions between New Zealand and Asia.

Here is a list of some of the high-level requirements we were given:

  • Small scale (a couple of thousand records)
  • Import data from an EndNote library (initial import followed by periodic updates)
  • Multilingual content (English, Chinese, Japanese, Korean)
  • Additional static content
  • Provision of RSS feeds

Traditionally our library has used a product from http://www.inmagic.com/ to deliver this sort of site. However this time we tried something a little different… we built it using blogging software.  To be more precise, we built it using the WordPress blogging platform (the same software as powers this blog).

Here are some of the reasons that we chose WordPress:

  • WordPress sites can contain a mixtures of blog entries (in this case bibliography entries) and static content.  NZAIS has a static home page and other static content, along with lots of entries.  Each entry in the bibliography is a blog post.
  • Being a blogging platform, certain features such as word clouds and RSS feeds are part of the standard configuration.
  • Like all well-mannered systems, it defaults to UTF-8, meaning the multilingual content poses no problem.
  • WordPress supports themes.  It is very easy to choose a suitable theme, and then customise it for your specific needs (colour scheme, logo, etc).
  • The system can be extended using plugins.

The last point is one that was particularly pleasurable to work with: the majority of the requirements that could not be fulfilled directly with WordPress could be delivered using a third-party free plugin.  In order to turn a traditional blog into a useful online bibliography, we used the following plugins (in alphabetical order):

  • Breadcrumb NavXT:  Used to provide breadcrumb functionality to assit the user know where they are in the site
  • Bulk Delete:  Useful when developing to remove old content, or when performing a complete re-load of data
  • Custom Field Template:  WordPress supports ‘metadata’ through the use of ‘custom fields’ which can be set for each post.  This plug allows that metadata to be set via a template for each post
  • Google AJAX Translation:  Allows individual entries (blog posts) to be translated on-the-fly without the page being reloaded.  Useful for translating between English / Chinese / Japanese / Korean content
  • Google Translator:  Adds a translate widget to see the whole site translated into a different language
  • Search Everything:  Modified the search system to search all fields, including the custom fields
  • CSV importer:  Data from EndNote was exported to a text file, and a short Java script used to convert it into a CSV (Comma Separated Values) file which was then imported using this plugin.  We modified this plugin to support multiple values in some fields (for example to allow multiple authors to be inserted for a single item)
  • Custom Field Taxonomies:  Builds controlled vocabulary functionality into custom fields / metadata

The site can be viewed at http://nzais.auckland.ac.nz/

The following screenshot shows the use of metadata fields (custom fields) for an entry in the bibliography:

[This development took place in late 2009 / early 2010.  It was a successful project and proved that WordPress is a flexible platform for delivery a site such as this.  However the technology used to power this site is about to change, not because of any problem with the site, but due to a rationalisation of platforms in the library.  With an ever increasing number of similar collections needing to be developed each year, we decided to develop a single solution for those collections that don’t fit into our traditional repository offerings (DSpace research outputs repository or ExLibris DigiTool).  To this end we developed what we’ve called the Super Index.  More about that in another post…!]vzlom-facebook.com

The collection is dead! Long live the collection!

“The collection is dead! Long live the collection!”  That summarises my current thoughts and feelings about the collection hierarchy structure in DSpace.

When first installed, DSpace shows its need for a community and collection hierarchy, because without at least one community, containing at least one collection, it is impossible to even submit an item.  Therefore, from day one, DSpace repository managers get used to creating collections, giving them names, and creating a hierarchy.  Having created a first community with its first collection, it seems silly to have just a single community and collection.  So the repository managers creates a larger hierarchy – often mimicking the organisational structure of their institution.  Often this hierarchy extends down to departments, research groups, even individuals.

And the repository manager feels good.  There is structure which will give people a sense of belonging, and more importantly, ownership of ‘their collection’.  This will help them gather content for their repository by helping the individual or research group feel that it is their space.

Very often of course it means that a repository has a lot of structure with empty collections.  Also it leads to other problems – what about theses?  We want them to appear in their department’s collection, but also to appear in our central library thesis collection for harvesting by a service such as EThOS.  Or what a about an article that has been co-authored by people across different departments?  Or departments that move around in the organisation.  This collection structure is starting to feel a bit inflexible and restrictive!  What started of as a useful tool that made us feel ‘organised’, now feels the opposite.

Luckily, help is now at hand!  Since version 1.7 DSpace has included the ‘discovery’ module.  This is nothing ground-breaking as such, just a faceted search feature using solr contributed to DSpace by atmire.  The real beauty and power of faceted search comes with their ability to make ‘virtual collections’.  You want a collection of theses published by the faculty of science: sure – just link to the type:thesis + faculty:science facet search result page.  You want a collection for Prof. S Smith?  No problem, link to the author:s. smith faceted search result page.  Want a collection of the ‘recent’ publications of a department?  Just link to the department:foobar + year:2011.

Facets give us the ability to create ‘views’ over the data, based on properties (metadata) of items.  Maybe this is why Google and other search engines are more popular than http://www.dmoz.org/?  We like to have our own collections defined instantly by a search, not be forced to traverse a hierarchy dictated by others.  Of course this does rely on quality (consistent / present / correct) metadata to ensure that items all appear in their virtual collections.  To conclude – sometimes it feels like “The collection is dead!”.  We have better ways to create structure over the repository rather than through it.

But wait!  I cry “Long live the collection!”.

Whilst in our main ‘institutional repository’ at The University of Auckland Library (http://researchspace.auckland.ac.nz/) we have been rationalising the number of collections we have and removing most of organisational structure, we have been making use of DSpace collections in another very useful way…

We’re halfway through a project known internally as the SuperIndex.  Not (just!) because it is ‘super’, but because we are creating a super-index of many of our disparate bibliographic and digital special collections.  We have many databases of collections all over the place, and the SuperIndex project aims to bring them all together into a single system.  This will make management of the collections more consistent, while reducing the number of systems to maintain.  DSpace is our chosen central management system.

This is where the collection structure is becoming very useful. Each collection of items really is a collection.  An item in a database about fisheries in New Zealand will not (or is unlikely to!) appear in any of our other special collections.  This collection structure makes it easier to manage each collection separately.  We can run curation tasks on a collection, or control who has rights to edit a collection.  The management of this repository is much wider than the institutional repository that is just administered by one team.  We will have staff in many areas editing the items.  It also allows us to create individual websites for each collection, each with their own URL structure and branding – the end user does not know that the item they are viewing is actually managed in a DSpace somewhere, and that the DSpace contains thousands of other items in different collections.

So the ‘collection’ is starting to become less useful in the standard institutional repository of research outputs (which is, in a way, a single collection) but is having a new life for us in managing what could be seen as more traditional ‘collections’ in a single DSpace repository.

I’d be interested to hear what you think are the strengths and weaknesses, or reasons for and against forced collection hierarchies in DSpace.

“The collection is dead! Long live the collection!”vaxter-vk.ru