Daily Archives: September 9, 2014

GitHub to repository deposit

Over the past few months there have been positive shifts in the infrastructure available to archive software.  To ‘archive software’ can mean many things to many people, but for the purposes of this blog post, I’ll take the view that this is to take (well managed) code out of an existing source code control system, make a point-in-time snapshot of the code, and deposit that into a long-term repository, along with some basic descriptive metadata.

To this end, both Figshare and Zenodo have recently developed and released integrations into GitHub.  These both allow the depositor to easily take a copy of their code from GitHub, and deposit it into the respective repository.  One of the key benefits of doing this is that the repository platforms are then able to assign a persistent DataCite DOI (Digital Object Identifier) to the software, which makes it easier to cite and track through scholarly literature.

As one of the developers of the open SWORD deposit protocol that facilitates the deposit of resources into repositories, I thought it would be good to try and re-create this functionality using SWORD.  Below is the ‘recipe’ of how this works…

Step one (optional): Setup your browser with a bookmark
To make it easier to deposit code from GitHub, you can install a ‘bookmarklet‘ that automatically detects that GitHub repository, and lets the deposit system know where this is.  This means that from any GitHub repository, you can click on the bookmark to deposit the code.  To install it, visit http://easydeposit.swordapp.org/example/github/easydeposit/ and drag the bookmarklet at the bottom of the page to your browser’s bookmark bar:

Install bookmarklet

Step two: Choose the GitHub repository to deposit

GitHub makes use of accounts and repositories.  Each user of the service has an account, and each account can create multiple code repositories.  URLs for GitHub are in the form of https://github.com/{account}/{repository}, for example the PHP programming language is stored in GitHub: https://github.com/php/php-src (php is the account name, and php-src is the code repository for the PHP language).

Choose the GitHub repository that you wish to deposit in the repository by opening the repository in your browser.  In the example below, this is the DSpace repository platform’s code repository:

Choose repository

Step three: Click the bookmark!
If you click the ‘GitHub Deposit’ bookmark that you created earlier, this will redirect you to a SWORD deposit system.  The bookmarklet contains javascript that passes the URL of the GitHub repository to the deposit client, and populates the form automatically.  Alternatively you can just visit http://easydeposit.swordapp.org/example/github/easydeposit/ and enter the URL of the repository yourself:

Click bookmark

Step four: Download the code

Clicking ‘Next >’ will initiate the download of the latest version of the code (‘master’ in git terminology).  Depending on the size of the repository, this may take a few seconds.  The code isn’t doing anything clever, and unlike the Zenodo and Figshare integrations, it doesn’t make use of the GitHub API.  Instead, it downloads the master.zip file by constructing a URL such as https://codeload.github.com/DSpace/DSpace/zip/master.   It then uses basic metadata such as the title of the repository (title), the account holder (author), the URL of the repository (link) and the latest check-in comment and revision hash (abstract).  These are then presented back to you to confirm:

Verify metadata

Step five: Perform the deposit
Upon clicking the deposit button, the code will then translate the metadata into a METS file, and zip that up alongside the downloaded code bundle.  All this is then deposited into the demo DSpace server (http://demo.dspace.org/).  Assuming the deposit works, you’ll be presented with the URL of the deposited code.  In this case, it is a ‘handle’, but to all intents and purposes that is a DOI, and DSpace can be configured to issue DOIs.

Handle issued

Step six: View the code

To see the deposited code in the repository, just click on the handle link!  For example, http://hdl.handle.net/10673/51. This will take you to the repository, where the metadata can be seen, and the code downloaded!

Code in the repository

This isn’t a highly polished integration, and was thrown together in a couple of hours, by adding it as an optional ‘step’ in the configurable web-based deposit client ‘EasyDeposit‘.  But it is a good demonstration that creating small tools that archive code into SWORD-compliant repositories (DSpace, EPrints, Fedora, etc) can be achieved quite quickly!site