The collection is dead! Long live the collection!

“The collection is dead! Long live the collection!”  That summarises my current thoughts and feelings about the collection hierarchy structure in DSpace.

When first installed, DSpace shows its need for a community and collection hierarchy, because without at least one community, containing at least one collection, it is impossible to even submit an item.  Therefore, from day one, DSpace repository managers get used to creating collections, giving them names, and creating a hierarchy.  Having created a first community with its first collection, it seems silly to have just a single community and collection.  So the repository managers creates a larger hierarchy – often mimicking the organisational structure of their institution.  Often this hierarchy extends down to departments, research groups, even individuals.

And the repository manager feels good.  There is structure which will give people a sense of belonging, and more importantly, ownership of ‘their collection’.  This will help them gather content for their repository by helping the individual or research group feel that it is their space.

Very often of course it means that a repository has a lot of structure with empty collections.  Also it leads to other problems – what about theses?  We want them to appear in their department’s collection, but also to appear in our central library thesis collection for harvesting by a service such as EThOS.  Or what a about an article that has been co-authored by people across different departments?  Or departments that move around in the organisation.  This collection structure is starting to feel a bit inflexible and restrictive!  What started of as a useful tool that made us feel ‘organised’, now feels the opposite.

Luckily, help is now at hand!  Since version 1.7 DSpace has included the ‘discovery’ module.  This is nothing ground-breaking as such, just a faceted search feature using solr contributed to DSpace by atmire.  The real beauty and power of faceted search comes with their ability to make ‘virtual collections’.  You want a collection of theses published by the faculty of science: sure – just link to the type:thesis + faculty:science facet search result page.  You want a collection for Prof. S Smith?  No problem, link to the author:s. smith faceted search result page.  Want a collection of the ‘recent’ publications of a department?  Just link to the department:foobar + year:2011.

Facets give us the ability to create ‘views’ over the data, based on properties (metadata) of items.  Maybe this is why Google and other search engines are more popular than http://www.dmoz.org/?  We like to have our own collections defined instantly by a search, not be forced to traverse a hierarchy dictated by others.  Of course this does rely on quality (consistent / present / correct) metadata to ensure that items all appear in their virtual collections.  To conclude – sometimes it feels like “The collection is dead!”.  We have better ways to create structure over the repository rather than through it.

But wait!  I cry “Long live the collection!”.

Whilst in our main ‘institutional repository’ at The University of Auckland Library (http://researchspace.auckland.ac.nz/) we have been rationalising the number of collections we have and removing most of organisational structure, we have been making use of DSpace collections in another very useful way…

We’re halfway through a project known internally as the SuperIndex.  Not (just!) because it is ‘super’, but because we are creating a super-index of many of our disparate bibliographic and digital special collections.  We have many databases of collections all over the place, and the SuperIndex project aims to bring them all together into a single system.  This will make management of the collections more consistent, while reducing the number of systems to maintain.  DSpace is our chosen central management system.

This is where the collection structure is becoming very useful. Each collection of items really is a collection.  An item in a database about fisheries in New Zealand will not (or is unlikely to!) appear in any of our other special collections.  This collection structure makes it easier to manage each collection separately.  We can run curation tasks on a collection, or control who has rights to edit a collection.  The management of this repository is much wider than the institutional repository that is just administered by one team.  We will have staff in many areas editing the items.  It also allows us to create individual websites for each collection, each with their own URL structure and branding – the end user does not know that the item they are viewing is actually managed in a DSpace somewhere, and that the DSpace contains thousands of other items in different collections.

So the ‘collection’ is starting to become less useful in the standard institutional repository of research outputs (which is, in a way, a single collection) but is having a new life for us in managing what could be seen as more traditional ‘collections’ in a single DSpace repository.

I’d be interested to hear what you think are the strengths and weaknesses, or reasons for and against forced collection hierarchies in DSpace.

“The collection is dead! Long live the collection!”vaxter-vk.ru

6 thoughts on “The collection is dead! Long live the collection!

  1. Mark Diggory

    Stuart,

    Great Article. I want to emphasize that I agree with your position on the usefulness of Collections within DSpace as a “Management Tool”. I’m always concerned when endusers try to map complex classification systems such as Organizational Hierarchies onto DSpace Community/Collection Hierarchies. This often results in complex situations where they want to use Item Mapping or linking of Collections into multiple Communities to make those cases that are not a pure “tree” fit the use case. I’m of the strong opinion that use of Communities and Collections be focused on a need for a contained collection from a management standpoint:

    1. Need for a unique submission process for a group of users.
    2. Controlled submission rights for a subgroup of users.
    3. Specific reviewer workflow needs for a subgroup of reviewers.

    In all these cases the reason for structuring collections has to do with “Humans” rather than with “Content”. Which leads to conservative approach limiting use of Collections as a managerial tool for those who have a stake-hold in being able to work with the Content. I think this fits well with your points, where, you as the stakeholder needing to manage your content seek to create a Community/Collection hierarchy that fits your management need rather than some organizational or classification Scheme.

    I expect to see a number of new capabilities for Discovery coming down the pipeline in the coming year. These will/may include:

    a.) Item level Access Control in Search Results: Limiting Items that are private from viewing in search results when the user does not have permission to see them

    b.) Real time Indexing of Items in Submission, Workflow and Withdrawn states. Allowing Administrators and Collection Managers to search for content that matches specific Workflow or Submission criteria.

    c.) Spring based Configuration of Solr Indexing and View rendering, allowing addons to inject facetting and indexing rules into Discoveries indexing process.

    d.) Indexing of External Authority Records to support a “semantic search” capability in DSpace more directly. More term completion capabilities. More default linking to external sources for authority control.

    I think there is one question on my mind that has always been controversial… Could it ever be made “optional” that a Collection be required to Create an Item in a Users Workspace. More specifically, we’ve worked very hard to make Submission Process configuration be bound to a Collection. In a world where users can create Wiki pages in their own Wiki Profile space, upload Documents into Google Docs under their own workspace, Add Photos and Videos to their own Facebook accounts… Maybe its time to “diversify” the capability to decide on the submission process submitters want to use when submitting content into DSpace rather than forcing a specific submission process on the Submitter that is hardcoded to the Collection. I believe this can be the nature of the Content Type centric Ingest

    From the DCAT Group…
    https://wiki.duraspace.org/pages/viewpage.action?pageId=23268096
    https://jira.duraspace.org/browse/DS-164

    Leading us to The past few years of work around improving the configurability of the Submission workflow

    https://wiki.duraspace.org/display/GSOC/GSoC+2011+-+DSpace+Submission+Enhancements

    And finally to Robin Taylors interesting directions on Type Based Submission.

    https://jira.duraspace.org/browse/DS-464

    I think getting content into the workflow should be the first priority of DSpace, optimizing that capability is paramount to all other features. Your work with SWORD gets us there when the Submission process is engineered externally to DSpace. But I think for those cases where institutions want to use DSpace and its Processes via the UI, we really need get the process to empower the submitter firstly, then hand the curator control secondly.

  2. Steve Hitchcock

    Stuart, Two findings from my PhD thesis (2002), which investigated overlaying pre-configured links on a collection of Web documents. 1 In tests I found users preferred to search rather than browse links – note, these tests were performed around the time Google first emerged, when search results were more random and less useful, so it was a more surprising result at the time. 2 A collection is defined as much by what is left out as what is included. This is particularly the case on the Web where a collection may be overlaid with multiple services (such as a citation linking service), and where it is possible to have multiple views on a collection. After all, a collection is simply editorialising over content, and we have to be clear who, how and why editorialise? This is all a way of saying I think you are heading in the right direction. Collections can be useful but should not be imposed as a fundamental means of organising a repository.

  3. George

    Hello Stuart, your post was very helpful but it seems that I have a question.. In this website (https://researchspace.auckland.ac.nz/) there is an autosuggest feature in the discovery module. Is this imlpemented in the discovery module or is it an extra feature that you implemented?

    I really need a feature like this one..

Leave a Reply

Your email address will not be published. Required fields are marked *