Direct from MS Word to DSpace via SWORD
As a member of the SWORD project, it has been a great seeing Microsoft’s External Research group integrate SWORD into Word 2007, their Zentity repository, and their online journal hosting system. There is a good overview of this work in a presentation given by Pablo Fernicola at the Open Repositories 2009 conference entitled ‘Connecting Authors and Repositories Through SWORD‘.
This blog post is about the functionality I have added to DSpace to allow it to accept deposits from within Microsoft Word using SWORD.
If you are unaware of the authoring add-in, then before reading the rest of this blog, take a look at Pablo’s YouTube video ‘Integrating with repositories and journal submissions’ at http://www.youtube.com/watch?v=2_M2gfUyVzU. The video explains the authoring add-in, so I’ll not duplicate that information in this blog post. The rest of this post explains how I extended DSpace to work with the add-in…
In order for DSpace to be able to ingest a package, it needs an ingester that understands the format and knows how to unpack it and extract the metadata and file(s). In the case of .docx files created by Microsoft Word, it needs to know how to extract the metadata from within the file, and to archive the file as-is. This is a pretty easy task as a .docx file is actually just a zip file (try renaming it from .docx to .zip and then take a peek inside!). So I wrote an ingester than unzips the file, extracts the NLM metadata that the add-in inserted in the file, and then creates a new DSpace item with that metadata. Finally it adds the complete .docx file as a bitstream for people to download.
Some of the metadata such as the authors identities are held in the .docx file is held in the customXml/item*.xml files, and other parts such as the article title and abstract are held in the actual document contents in word/document.xml. The ingester extracts these values for use in the new DSpace item.
<w:t>Add an S to Microsoft Word and you get SWORD</w:t>
<my:name.> <my:name.content-type.datatypeattribute.attribute.></my:name.content-type.datatypeattribute.attribute.> <my:name.name-style.datatypeattribute.attribute.></my:name.name-style.datatypeattribute.attribute.> <my:surname.>Lewis</my:surname.> <my:given-names.>Stuart</my:given-names.> </my:name.>
I then configured the DSpace ingesters to use the docx ingester when it encountered .docx files:
plugin.named.org.dspace.content.packager.PackageIngester = \
org.dspace.content.packager.PDFPackager = Adobe PDF, PDF, \
org.dspace.content.packager.DSpaceMETSIngester = METS, \
org.dspace.content.packager.DSpaceDocxIngester = DOCX
I then configured the SWORD package to expose the fact that it supported .docx files in its SWORD service document:
sword.accept-packaging.Docx.identifier = application/vnd.openxmlformats-officedocument.wordprocessingml.document
sword.accept-packaging.Docx.q = 1.0
Finally the DSpace SWORD interface needed to know which packager to use for .docx files based on their MIME type:
plugin.named.org.dspace.sword.SWORDIngester = \
org.dspace.sword.SWORDMETSIngester = http://purl.org/net/sword-types/METSDSpaceSIP \
org.dspace.sword.SimpleFileIngester = SimpleFileIngester \
org.dspace.sword.DocxIngester = application/vnd.openxmlformats-officedocument.wordprocessingml.document
All that is needed to use this is a copy of the authoring add-in (http://research.microsoft.com/en-us/projects/authoring/), and a suitable formatted template for the repository that you wish to deposit the document into (dspace-swordapp-org.docx). The template is preconfigured to deposit directly into the DSpace SWORD demo repository which I have upgraded with the new code to accept .docx deposits. Feel free to create an account in that repository, install the add-in, load the template, and try out a deposit!
This complete end to end process allows you to create Word templates, and to mark them up with required and optional fields. It also allows you to embed details of the SWORD deposit repository URL (so the users do not need to know what it is) within the template for easy deposit. This could be used for example for a journal editor to provide a template and a deposit location for new paper submissions all-in-one. And this use case could be extended: for example if a faculty member wants all their students to submit an assignment with a template, they could do so and use the repository as the end point rather than a traditional VLE. And unlike a VLE, the repository will probably provide search and indexing facilities across the deposited documents. I’m sure as this tool gets used more, there will be a lot of new ideas for how it can be used.
Comments welcome!
In: Uncategorized · Tagged with: dspace, repositories, sword
SWORD PHP Library version 0.7 released
I have just released version 0.7 of the SWORD PHP library. It can be downloaded from http://php.swordapp.org/
This latest version adds two new features:
- When performing a deposit, the client now sets the ‘Content-Disposition:filename’ header so that the SWORD server knows what to name the file. This is required by SWORD implementations (such as the Intrallect implementation) that store the deposited file verbatim (as per http://www.swordapp.org/docs/sword-profile-1.3.html#b.9.2)
- When performing a deposit, the optional X-No-Op (pretend to perform the deposit) and X-Verbose (provide a verbose response) headers can now be sent (as per http://www.swordapp.org/docs/sword-profile-1.3.html#b.9.2)
To show how easy it is to use the library, see the following code which requests a service document, creates a package, and then deposits it:
// Import the library
require("swordappclient.php");
// Create an instance of the client
$sac = new SWORDAPPClient();
// Request a service document
$sdr = $sac->servicedocument($url, $user, $password, $onbehalfof);
// Import the packager library
require('packager_mets_swap.php');
// Create a new package with the root and directory of the input files, and the root and directory of the package
$package = new PackagerMetsSwap($rootin, $dirin, $rootout, $fileout);
// Add metadata to the package
$package->setType($test_type);
$package->setTitle($title);
$package->setAbstract($abstract);
foreach ($creators as $creator) {
$package->addCreator($creator);
}
// Add a file to the package
$package->addFile($filename, $mimetype);
// Now deposit the package
$dr = $sac->deposit($depositurl, $username, $password, $onbehalfof, $filename, $packageformat, $pacakgecontenttype);
Please send requests or leave a comment for features for the next version.
In: Uncategorized · Tagged with: repositories, sword
How does the Facebook SWORD client actually work?
I’ve been asked a few questions recently about how SWORD clients work, and in particular how the SWORD Facebook client works. The Facebook client is one of the most complete demonstration clients that there is, and as such ‘hides’ a lot of the work that goes on behind the scenes. This post will explain how a SWORD deposit from within Facebook actually works:
- First off, the user has to select the repository they wish to deposit into. This can either be done by selecting from a dropdown list of known demo SWORD repositories, or by manually entering the URL of a service document:

- Most repositories will require their users to authenticate using a username and password. These are typically passed to the SWORD server using HTTP BASIC authentication. Optionally, an ‘on-behalf-of’ user can be specified (see the SWORD specification for what this means):

- When this initial form is submitted, the client will visit the SWORD server, and request the service document by performing a HTTP GET of the service document URL. In the case of the SWORD Facebook application, it is written using the SWORD PHP library. The PHP library uses cURL to retreive the service document. Using a 3rd party library such as the PHP library makes it *really* easy to do this. Here is the required PHP:
require("swordappclient.php");
$sac = new SWORDAPPClient();
$sdr = $sac->servicedocument($url, $user, $password, $onbehalfof);
- Hopefully (assuming a valid service document URL, username and password) the client will receive a service document back from the SWORD server. The service document will specify which collections a user may submit to and provide some details about each collection (e.g. name, URL to deposit to, policy, prefered packaging types etc). Different repository platforms interpret the meaning of ‘collection’ differently. In DSpace, these map to DSpace collections, wheras in EPrints, they relate to workspaces within the user’s account.

- Once you have selected a collection into which you wish to deposit an item, you are presented with a form requesting metadata. Your promted for the type of item, its peer-review status, title, abstract, and first author. You can optionally add second and their author names, and an existing URL for the item.

- So what happens with this metadata you enter? In a nutshell, it all gets crosswalked and wrapped up in a METS document, encoded in SWAP. To see what I mean, look at the following example:
<?xml version="1.0" encoding="utf-8" standalone="no" ?> <mets ID="sort-mets_mets" OBJID="sword-mets" LABEL="DSpace SWORD Item" PROFILE="DSpace METS SIP Profile 1.0" xmlns="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd"> <metsHdr CREATEDATE="2008-09-04T00:00:00"><br /><agent ROLE="CUSTODIAN" TYPE="ORGANIZATION"> <name>Stuart Lewis</name> </agent> </metsHdr> <dmdSec ID="sword-mets-dmd-1" GROUPID="sword-mets-dmd-1_group-1"> <mdWrap LABEL="SWAP Metadata" MDTYPE="OTHER" OTHERMDTYPE="EPDCX" MIMETYPE="text/xml"> <xmlData> <epdcx:descriptionSet xmlns:epdcx="http://purl.org/eprint/epdcx/2006-11-16/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://purl.org/eprint/epdcx/2006-11-16/ http://purl.org/eprint/epdcx/xsd/2006-11-16/epdcx.xsd"> <epdcx:description epdcx:resourceId="sword-mets-epdcx-1"> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/type" epdcx:valueURI="http://purl.org/eprint/entityType/ScholarlyWork" /> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/title"> <epdcx:valueString>Item Title</epdcx:valueString> </epdcx:statement> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/terms/abstract"> <epdcx:valueString>Item Abstract</epdcx:valueString> </epdcx:statement> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/creator"> <epdcx:valueString>Lewis, Stuart</epdcx:valueString> </epdcx:statement> <epdcx:statement epdcx:propertyURI="http://purl.org/eprint/terms/isExpressedAs" epdcx:valueRef="sword-mets-expr-1" /> </epdcx:description> <epdcx:description epdcx:resourceId="sword-mets-expr-1"> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/type" epdcx:valueURI="http://purl.org/eprint/entityType/Expression" /> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/language" epdcx:vesURI="http://purl.org/dc/terms/RFC3066"> <epdcx:valueString>en</epdcx:valueString> </epdcx:statement> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/type" epdcx:vesURI="http://purl.org/eprint/terms/Type" epdcx:valueURI="http://purl.org/eprint/entityType/Expression" /> <epdcx:statement epdcx:propertyURI="http://purl.org/dc/terms/available"> <epdcx:valueString epdcx:sesURI="http://purl.org/dc/terms/W3CDTF">2009-04-28</epdcx:valueString> </epdcx:statement> <epdcx:statement epdcx:propertyURI="http://purl.org/eprint/terms/Status" epdcx:vesURI="http://purl.org/eprint/terms/Status" epdcx:valueURI="http://purl.org/eprint/status/PeerReviewed" /> <epdcx:statement epdcx:propertyURI="http://purl.org/eprint/terms/copyrightHolder"> <epdcx:valueString>Stuart Lewis</epdcx:valueString> </epdcx:statement> </epdcx:description> </epdcx:descriptionSet> </xmlData> </mets>
- If you examine the XML, you’ll see the metadata in the METS document. For example the title is described on line 14, the abstract on line 17, the author on line 20, and the deposit date on line 31. The mapping of metadata elements from the form fields to the METS/SWAP is fixed in the software.
- The next stage of the submission is to upload a file to add to the metadata to make the package to deposit. The details of the file you choose are added in to the METS document. This is done using the fileSec and structMap portions of the METS standard:
<fileSec> <fileGrp ID="sword-mets-fgrp-1" USE="CONTENT"> <file GROUPID="sword-mets-fgid-0" ID="sword-mets-file-0" MIMETYPE="application/pdf"> <FLocat LOCTYPE="URL" xlink:href="SWORD Ariadne Jan 2008.pdf" /> </file> </fileGrp> </fileSec> <structMap ID="sword-mets-struct-1" LABEL="structure" TYPE="LOGICAL"> <div ID="sword-mets-div-1" DMDID="sword-mets-dmd-1" TYPE="SWORD Object"> <div ID="sword-mets-div-2" TYPE="File"> <fptr FILEID="sword-mets-file-0" /> </div> </div> </structMap>
- The mets file (mets.xml) and the uploaded file are the put into a zip file, and deposited to the repository. Again, these two steps are very easy using the PHP library:
require('packager_mets_swap.php');
// Create a new package with the root and directory of the input files, and the root and directory of the created package
$package = new PackagerMetsSwap($rootin, $dirin, $rootout, $fileout);
// Add metadata to the package
$package->setType($test_type);
$package->setTitle($title);
$package->setAbstract($abstract);
foreach ($creators as $creator) {
$package->addCreator($creator);
}
// Add a file to the package
$package->addFile($filename, $mimetype);
// Now deposit the package
require("swordappclient.php");
$sac = new SWORDAPPClient();
$dr = $sac->deposit($depositurl, $username, $password, $onbehalfof, $filename, $packageformat, $pacakgecontenttype);
- Once the pacakge is sent to the repository, it is up to the repository to decide how to handle the package. In the case of DSpace, two things happen. The first is that a raw copy of the original package is archived in a new item allowing us to see precisely what was deposited. This is hidden from users though. Secondly, the package is opended up and processed. The file is added to the item, and the metadata is crosswalked using XSL to Dublin Core as used by DSpace. An example of part of the XSL used is shown below (mapping dcterms creator to dc.contributor.author):
<!-- creator element: dc.contributor.author --> <xsl:if test="./@epdcx:propertyURI='http://purl.org/dc/elements/1.1/creator'"> <dim:field mdschema="dc" element="contributor" qualifier="author"> <xsl:value-of select="epdcx:valueString"/> </dim:field> </xsl:if>
I hope this post clears up a bit of what goes on ‘behind the scenes’ of a SWORD client. I hope it also shows how easy it can be to create a SWORD client using the PHP library which provides not only code to request service documents and deposit items, but also to create pacakges in a format that is accepted by DSpace and EPrints. Any questions?
In: Uncategorized · Tagged with: dspace, repositories, sword
Surfacing Google Analytics stats in DSpace
In the recent survey asking the DSpace community for their top 3 feature requests for DSpace 1.6, the number one most requested feature was statistics. As you’ll know from previous posts, I’m a big fan of Google Analytics.
For the uninitiated, you insert a small bit of JavaScript in your web pages, and Google provide a very rich and powerful analytics service for viewing your site statistics.
Recently Google announced the launch of an analytics API that allows you to remotely query and download the statistics its holds about your site.
I like playing with APIs, so throught I’d write a solution that downloads item splashscreen view statistics from Google Analytics and displays them on the item page:

The solution is quite simple. It requires the additon on one Java class into DSpace. This class should be run daily to download the statistics. The same class is used by the user interface to display the statistics. If you want to implement this solution, follow the instructions below:
- Create a new directory (java package) at [dspace-src]/dspace-api/src/main/java/org/dspace/app/googleanalytics
- Download the code shown at the bottom of this post, and save it as GoogleAnalyticsHitCounter.java in the directory that you just created.
- Edit [dspace-src]/dspace-api/pom.xml to add in the dependencies on the Google API libraries:
<dependency> <groupId>com.google.gdata</groupId> <artifactId>gdata-core</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.google.gdata</groupId> <artifactId>gdata-analytics</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.google.collect</groupId> <artifactId>google-collect</artifactId> <version>1.0</version> </dependency>
- Then download and save gdata-src.java-1.32.1.zip and extract and save (somewhere handy) the jar files: gdata-core-1.0.jar, gdata-analytics-1.0.jar, google-collect-1.0.jar (in zip file as google-collect-1.0-rc1.jar)
- Inatall each of these by running the following Maven commands, adjusting paths as appropriate:
- mvn install:install-file -DgroupId=com.google.gdata -DartifactId=gdata-core -Dversion=1.0 -Dfile=gdata-core-1.0.jar -Dpackaging=jar
- mvn install:install-file -DgroupId=com.google.gdata -DartifactId=gdata-analytics -Dversion=1.0 -Dfile=gdata-analytics-1.0.jar -Dpackaging=jar
- mvn install:install-file -DgroupId=com.google.collect -DartifactId=google-collect -Dversion=1.0 -Dfile=google-collect-1.0.jar -Dpackaging=jar
- Next, edit [dspace-src]/dspace-jspui/dspace-jspui-webapp/src/main/webapp/display-item.jsp, and somewhere in the code (choose where you want it), add the following code:
<%
// See if we can display a counter
String path = "/handle/" + item.getHandle();
String count = GoogleAnalyticsHitCounter.getPageCount(path);
if ((count != null) && (!"".equals(count)))
{
%>
<table align="center" class="miscTable">
<tr>
<td class="oddRowEvenCol" align="center">
This item has been viewed <strong><%= count %></strong> times
</td>
</tr>
</table>
<%
}
%>
- If you don’t deploy your user interface as the ROOT webapp, then you’ll have to add the context in the line: String path = “/handle/” + item.getHandle();
- Now build and deploy DSpace as you would normally (mvn package; ant update; etc…)
- Edit dspace.cfg and add in the following entries:
- googleanalytics.username = your-google-analytics@email.address.com
- googleanalytics.password = your-google-analytics-password
- googleanalytics.siteid = 123456789
- googleanalytics.filename = analyticscounts.properties
- googleanalytics.startdate = 2007-07-17
- Adjust the email address and password as appropriate.
- Log in to Google Analytics and find out the first date that you have statistics for. Set this in the start date entry, in the form of yyyy-mm-dd
- View the dashboard of your Google Anlytics, and look at the URL. Part of it will include ‘id=nnnnnnn‘. Copy the id number and enter it in the dspace.cfg siteid entry.
- Download and compile your statistics by running (from [dspace]/bin/)
- dsrun org.dspace.app.googleanalytics.GoogleAnalyticsHitCounter
- If everything worked as it should, you should now have a file [dspace]/analyticscounts.properties If you look in this file, you find entires in the form of ‘/handle/xxxx/yyyy=55′.
- Now start tomcat, view an item, and if the handle appears in the downloaded stats, you should see the item count!
As with the DSpace video player solution I wrote about earlier this week, the code is not perfect, and needs to be improvide a bit to make it solid, but is a good start if you wanted to use this type of solution. Enjoy!
package org.dspace.app.googleanalytics;
import java.io.IOException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Properties;
import java.util.Calendar;
import java.util.Date;
import java.text.SimpleDateFormat;
import com.google.gdata.client.analytics.AnalyticsService;
import com.google.gdata.data.analytics.DataEntry;
import com.google.gdata.data.analytics.DataFeed;
import com.google.gdata.data.analytics.Metric;
import com.google.gdata.util.AuthenticationException;
import com.google.gdata.util.ServiceException;
import org.dspace.core.ConfigurationManager;
import org.apache.log4j.Logger;
public class GoogleAnalyticsHitCounter {
/** log4j category */
private static Logger log = Logger.getLogger(GoogleAnalyticsHitCounter.class);
/** Hit counter */
private static Properties counts;
/** When the counter last loaded? */
private static Date lastloaded;
/** The filename of the counter file */
private static String filename;
/**
* Initalise the system
*/
public static void init()
{
// Load the properties file
Calendar yesterday = Calendar.getInstance();
yesterday.add(Calendar.DATE, -1);
lastloaded = yesterday.getTime();
filename = ConfigurationManager.getProperty("dspace.dir") +
System.getProperty("file.separator") +
ConfigurationManager.getProperty("googleanalytics.filename");
counts = new Properties();
loadCounter();
}
/**
* Get the count for a particular page (e.g. /handle/123/456
*
* @param page The page path
* @return The count. Empty String if unknown
*/
public static String getPageCount(String page)
{
// Check we're initialised
if (lastloaded == null)
{
init();
}
// Reload the hits
loadCounter();
// Get the value
if (page == null)
{
page = "";
}
String count = counts.getProperty(page);
// Return the value
if (count != null)
{
return count;
}
return "";
}
/**
* (Re)load the counter. It is reloaded every hour.
*/
private static void loadCounter()
{
// Do we need to load it?
Calendar hourago = Calendar.getInstance();
hourago.add(Calendar.HOUR, -1);
if (lastloaded.before(hourago.getTime()))
{
try
{
counts.load(new FileReader(filename));
lastloaded = Calendar.getInstance().getTime();
}
catch (Exception e)
{
log.warn("Unable to load google hit counter from " + filename);
}
}
}
/**
* Command line method to collect the statistics from Google Analytics.
*
* @param args No arguments used
*/
public static void main(String args[])
{
// Set up the variables
String username = ConfigurationManager.getProperty("googleanalytics.username");
String password = ConfigurationManager.getProperty("googleanalytics.password");
String siteid = ConfigurationManager.getProperty("googleanalytics.siteid");
String startdate = ConfigurationManager.getProperty("googleanalytics.startdate");
String handle = ConfigurationManager.getProperty("handle.prefix");
String root = ConfigurationManager.getProperty("dspace.url");
String filename = ConfigurationManager.getProperty("dspace.dir") +
System.getProperty("file.separator") +
ConfigurationManager.getProperty("googleanalytics.filename");
// Get the local path
String path = "";
try
{
URL localURL = new URL(root);
path = localURL.getPath();
if (path.endsWith("/"))
{
path = path.substring(0, path.length() - 1);
}
}
catch (MalformedURLException e)
{
System.err.println("Invalid dspace.url URL (" + root + ")");
return;
}
AnalyticsService as = new AnalyticsService("gaExportAPI_acctSample_v1.0");
String baseUrl = "https://www.google.com/analytics/feeds/";
// Login to Google
try {
as.setUserCredentials(username, password);
} catch (AuthenticationException e) {
System.err.println("Authentication failed : " + e.getMessage());
return;
}
// The results
Properties counts = new Properties();
// Keep requesting pages of results from Google until a blank page is found
// pages of 1,000 results at a time
URL queryUrl;
int i = 1;
boolean found = true;
int total = 0;
// Get stats up until yesterday
Calendar yesterday = Calendar.getInstance();
yesterday.add(Calendar.DATE, -1);
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");
String enddate = format.format(yesterday.getTime());
while (found)
{
found = false;
try {
String q = baseUrl +
"data?start-index=" + i +
"&ids=ga:" + siteid +
"&start-date=" + startdate +
"&end-date=" + enddate +
"&metrics=ga:pageviews" +
"&dimensions=ga:pagePath" +
"&filters=ga:pagePath%3D~" + path + "/handle/" + handle + "/[0-9]%2B$";
queryUrl = new URL(q);
} catch (MalformedURLException e) {
System.err.println("Malformed URL: " + baseUrl);
return;
}
// Send our request to the Analytics API and wait for the results to come back
DataFeed dataFeed;
try {
dataFeed = as.getFeed(queryUrl, DataFeed.class);
} catch (IOException e) {
System.err.println("Network error trying to retrieve feed: " + e.getMessage());
return;
} catch (ServiceException e) {
System.err.println("Analytics API responded with an error message: " + e.getMessage());
return;
}
for (DataEntry entry : dataFeed.getEntries()) {
String id = entry.getId().substring(70);
id = id.substring(0, id.indexOf('&'));
for (Metric metric : entry.getMetrics()) {
counts.put(id, metric.getValue());
total = total + Integer.parseInt(metric.getValue());
}
found = true;
}
i = i + 1000;
}
// Save the properties file
counts.put("total", "" + total);
try
{
counts.store(new FileOutputStream(filename), null);
System.out.println("Saved " + total + " total hits in " + filename);
}
catch (IOException e)
{
System.err.println("Error saving results to file: " + filename);
return;
}
}
}
In: Uncategorized · Tagged with: analytics, dspace, repositories
Easy pseudo-video streaming for DSpace repositories
A few days ago someone posted an enquiry to the dspace-general email list asking how to embed a video player in DSpace web pages. This was followed up by a lot of replies along the lines of “it would be great if DSpace could do that!”.
I wrote a quick reply saying how I thought it had been implemented, and described the solution as “quick and easy”. I thought I’d better put my money where my mouth is, and prove that it really is quick and easy. So I spent the last hour of my working day making it work, and here is how to do it:
- Download the JW FLV media player from http://www.longtailvideo.com/players/jw-flv-player/
- Unzip the download, and copy player.swf and swfobject.js into [dspace-src]/dspace/modules/jspui/src/main/webapp/
- Add the following code to the bottom of [dspace-src]/dspace-jspui/dspace-jspui-api/src/main/java/org/dspace/app/webui/jsptag/Itemtag.java (before the final ‘}’):
private void showMediaPlayer() throws IOException
{
try
{
Bundle[] bundles = item.getBundles("ORIGINAL");
if (bundles.length > 0)
{
Bitstream[] bitstreams = bundles[0].getBitstreams();
boolean found = false;
for (Bitstream bitstream : bitstreams)
{
if (!found)
{
if ("video/x-flv".equals(bitstream.getFormat().getMIMEType()))
{
// We found one, don't search for any more
found = true;
// Display the player
HttpServletRequest request = (HttpServletRequest)pageContext.getRequest();
String url = request.getContextPath() +
"/bitstream/" + item.getHandle() + "/" +
bitstream.getSequenceID() + "/" +
UIUtil.encodeBitstreamName(bitstream.getName(), Constants.DEFAULT_ENCODING);
JspWriter out = pageContext.getOut();
out.println("<script type=\"text/javascript\" src=\"" + request.getContextPath() +
"/swfobject.js\"></script>\n" +
"<center><div id=\"player\">Video</div></center>" +
"<script type=\"text/javascript\">\nvar so = new SWFObject('" +
request.getContextPath() + "/player.swf','mpl','320','240','9');\n" +
"so.addParam('allowscriptaccess','always');\n" +
"so.addParam('allowfullscreen','true');\n" +
"so.addParam('flashvars','&file=" + url + "&autostart=true');\n" +
"so.write('player');\n" +
"</script>");
}
}
}
}
}
catch (SQLException sqle)
{
// Do nothing
}
}
In the same file, find the line that reads private void render() throws IOException” and straight after the opening brace ‘{’ add a new line that reads:
showMediaPlayer();
- Rebuild and redeploy DSpace as you would normally (mvn package; ant update; etc)
- Log in to your DSpace instance as an administrator and go to the bitstream format registry.
- Enter a new format with the mime type video/x-flv and the file extension flv
- Now grab yourself an flv video. A quick way of doing this is to use http://keepvid.com/ and to enter the URL of a YouTube video. It will then download this as an flv video.
- Create a new item in DSpace, and upload this file. It should recognise it as a flash video file.
- Now view the item, and if the code is working correctly, it will have detected a video exists and will bring up the video player.

As I said, quick, and easy! Now I didn’t say the solution was beautiful, efficient, or written is the best way possible; this is just a proof of concept.
Whilst this solution doesn’t give you proper video streaming, it does give you a halfway house that integrates nicely with DSpace.
Perhaps we should make this is into a pluggable system for DSpace 1.6 where you can register classes that can render file types, and then make a configurable option to register viewers to filetypes? Thoughts?
In: Uncategorized · Tagged with: dspace, repositories, youtube
DSpace 1.6 survey results
Well, the results of the recent DSpace 1.6 survey in which we asked the DSpace community to list the top three features they would like to see in version 1.6 have now been published. The results will probably come as no surprise, but here are the top three features:
- Better statistics
- An embargo facility
- Batch metadata editing
We have now assigned a ‘point person’ to each of these who will drive the process forward to decide how we go about achieving these goals. Obviously this is not an exhaustive list of features that will be in 1.6, but they will help to guide development efforts.
The full results of the survey can be seen in a ‘wordle’ at http://www.wordle.net/gallery/wrdl/794098/DSpace_1.6_survey_results

We will also be using Twitter (http://twitter.com/dspacetweets) and its RSS feeds (http://twitter.com/statuses/user_timeline/37160113.rss) to provide updates on version 1.6 as they develop. This will be an interesting experiment to see if this proves a useful way of disseminating development activities as they occur.
In: Uncategorized · Tagged with: dspace, repositories
DSpace 1.6: You decide!
From an email I sent out to the DSpace email lists today:
As you’ll have seen from recent emails, the DSpace community has now released version 1.5.2 of the DSpace software. It has many new features, some enhancements to current features, and some bug fixes. Many of you will also know that a small team of developers have been working on DSpace version 2.0 which will bring with it many essential architectural enhancements to ensure that DSpace continues to fulfil the needs of the user community over the coming years. DSpace 2.0 is likely to be released early in 2010.
In the mean time, the DSpace committers have decided to start working on DSpace version 1.6. By moving to 1.6 (rather than 1.5.3) we can add new features that require changes to underlying DSpace database. We can’t tell you just yet what new features will be in version 1.6 because we haven’t decided! And that is where you come in…
We’d like you to tell us which three features you would like to see in version 1.6. To help you do this, we have created an online survey at http://dspacesurvey.info/. We know nobody likes to be bombarded with surveys, so we’ve kept this one really short. In fact, it asks only one question:
“What should be in DSpace version 1.6?”
The survey has three boxes to enable you to tell us what your top three new features would be. We can then look at all the survey responses to help decide where we should devote our development effort. We’ll put all the commonly requested features into JIRA to enable further commenting and voting. As always, if you want to develop your own new features, we’d love to work with you to get those features included provided that they are in scope, or if you want to work with us on the new features that the community votes for, please get in touch! The DSpace community relies on developers donating their time and expertise to help improve the software. If you want to join in, get in touch at the dspace-devel email list.
But for now, what are you waiting for? Fill in the survey… http://dspacesurvey.info/
(Please complete the survey by the 28th of April as we’ll close it then)
In: Uncategorized · Tagged with: dspace, repositories
DSpace Google Summer of Code students 2009
For the third year in a row, the DSpace Foundation has been successful in being selected to take part in the annual Google Summer of Code (a.k. GSOC). This year we welcome back one past student, and three new students to work on the DSpace repository code to develop new features and to experiment a bit.
The following projects have been accepted:
- Andrius Blazinskas Fedora DAO implementation for DSpace, beta release
- Gaurav Kejriwal Collection Administration Enhancements
- Ashly Markose Report Generation Tool for DSpace
- Bojan Suzic DSpace REST webapp
Congratulations to Andrius, Gaurav, Ashly and Bojan!
In: Uncategorized · Tagged with: dspace, gsoc, repositories
DSpace 1.5.2 – What’s in it for me?
You may have seen the recent announcement saying that DSpace 1.5.2 is now released. When it comes to upgrading software, especially something as large and possibly critical as repository software, there is always a decision to be made about whether to upgrade or not. As one of the DSpace committers, I’ve worked on some of the changes in 1.5.2 so understand quite well what it contains. This blog post lists some of the more important changes in 1.5.2 (compared to 1.5.1) to hopefully help you decide whether to upgrade or not. The changes are split into broad categories to make the list easier to follow. The full list of changes can be seen here.
Features
- SWORD support has been upgraded to the new 1.3 version of SWORD.
- Sitemaps.org sitemaps, and robots.txt files have been added to the XML (Manakin) user interface.
- DOI links can now be rendered in the JSP user interface.
- The uketd_dc OAI-PMH metadata format is now included as standard. This is required by UK institutions who want to expose their electronic theses to the EThOS service. No additional code is now required.
- A new statistics collection sub-system has been written to help collect statistics in a better fashion.
- A better PDF extraction utility (xpdf) can now be used to extract the fulltext out of some files where the current system (pdfbox) fails, or uses too much memory.
- The exporter tools will now export nested files correctly (e.g. where you have archived a website). It also now exports file descriptions that you may have set.
- The exporter tool has a new option (-m or -migrate) which strips out any per-repository-specifics which would get re-created when re-ingested. This is useful if you pre-load content into a test repository, and then want to move them into a production repository without taking with them items their temporary handle, provenance information and deposit dates etc.
Security
- Prior to 1.5.2 items that were restricted (anonymous access disabled) were still accessible via OAI-PMH, RSS feeds, and daily subscription emails. There are now configuration options to disable each of these.
- Provenance information used to be included in the metadata that was exposed in the <head> element of the item web pages (for consumption by search engines and tools such as Zotero). The provenance information is not needed, and has been removed.
Authentication
- Shibboleth support has been added.
- Support for Hierarchical LDAP servers has been added (where users are spread across branches of an LDAP tree, rather than existing all in the same branch).
- Automatic groups for all LDAP and Password users have been added, allowing to create groups for all authenticated users, without having to add them to a group manually.
- IP authentication now works for users who are not logged in, as well as for users who are logged in. Prior to 1.5.2, users had to first log in, before they could access collections or items that were protected by IP address. It now works in a better fashion so that, for example, internal users can access such materials without having to have an account.
Internationalisation
- New or updated translations of the software have been added for Thai, Italian, Ukrainian and Greek.
Bug fixes
There have been many bug fixes, but of note are:
- SWORD deposits are now a lot faster. For example a deposit of a 10MB PDF file on my development server used to take about a minute, it now takes 9 seconds.
- Statistics now work correctly in the xml user interface. Prior to 1.5.2 they did work, but unfortunately looked in the wrong directory for the reports, so the reports did not show.
- If you have multiple authentication methods available (e.g. LDAP and Password Authentication) you won’t suffer any more database connection leaks.
- The handling of UTF-8 has been standardized throughout the xml user interface.
- When non-admin users deposit items using the xml user interface, the admin menu options no longer appear immediately following a deposit. Non-admin users could never use these links, but they shouldn’t have appeared.
For a neat little overview of the changes in 1.5.2, see this Wordle made using the CHANGES file from DSpace:
In: Uncategorized · Tagged with: dspace, repositories, sword
New job, New Zealand
This is a quick blog post to mark the fact that I have now moved to New Zealand. Up until the end of February I was working for Information Services at Aberystwyth University as a team leader, project manager, repository developer, and JISC project person. Following a week of holiday in the UK, I have now moved to Auckland in New Zealand along with my wife and two young children.
I am now working as a programmer in the Digital Services Team at the University of Auckland Library. The team is a multi-disciplined group of people working on all things digital within the library. As the biggest university in New Zealand, this involves many dozens of projects and lots of new and exciting things. In the five days I have been working here already, I have already started working with EAD xsl files for DigiTool, extraction of XMD metadata from PDF files, DSpace item migration tools, and research management systems. Lots of new fun stuff to learn about!
No doubt there will be many more blog posts on my work as it develops further.
In: Uncategorized · Tagged with: life


