On Boundaries

"These are the boundaries that have been dictated to us" I think, looking at some of the pictures drawn for the presentation on the different services in GeoCrossWalk that I'm giving tomorrow - http://prezi.com/144735/view/.
 
These kinds of boundaries are also the things that aren't recordable in the world; parish, or city, or electoral ward boundaries. Things that OpenStreetMap can't yet do well, because they can't be measured from the outside.
 
I look at the flickr alpha shapes project - http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ - and wonder why they kept to shapes of places, rather than shapes in places. Implicitly the closest WOEIDs are already referenced to a larger shape by virtue of containment - Edinburgh contains Newington, and so on. The naming of points is already constrained by Yahoo's gazetteer data set.
 
Bruce Gittings ( http://www.geo.ed.ac.uk/scotgaz/ ) suggested shapes of areas where certain old musical styles were spread. Why not also infer shapes for turns of phrase in language, the works of architects, kinds of craft practise. These things can be observed, noted, recorded. Would there be interesting research applications?

Oh, and the name "Spacewalk" has been thumbs-downed-on. Which is fine by me as long as a better one can be found.

 

Building it up at the Repository Fringe

One of two parts: the other
 
"The most successful repository is the internet: embrace it", said Ben Steen in the introductory session at the Repository Fringe. Presenting the Enlighten project, William Nixon pointed out that over 80% of accesses to their repository were direct from search engines, not through their own access interfaces, whether human or machine oriented.
 
The OpenDOAR repository index project has a trial search interface across text contents in repositories.
"This service does not use the OAI-PMH protocol, or the metadata held within repositories. Instead, it relies on Google's indexes, which in turn rely on repositories being suitably structured and configured for the Googlebot web crawler"
 
This is a problem with repository aggregation and indexing; the same work is being done in many places, all collections are partial, and Google right now will do it better than anyone else.
 
Repositories collect a lot of metadata mostly about print publication and institutional context; its relevance to a "discovery" use case is such that indexes like OpenDOAR and IESR can, at a baseline, ignore it.
Meanwhile, it's the necessity of collecting this kind of metadata that makes the repository seem an admin burden to researchers. "When there is one-click deposit, I will happily participate".
 
The web started as a new medium for scholarly communications. Linked data and distributed storage are the original problems, not our new solutions: http://ltt-www.lcs.mit.edu/Papers/linking.html

Before indexes, there were links; after indexes, there are still links.

 


Original from flickr

Named entity recognition techniques can go some way to generating links - picking out topical terms, collections of common keyphrases, geographical and temporal associations, to help researchers follow their noses in ways that indexes can't predict.
 
Tools like Zotero allow researchers to create ad-hoc packages of works that are conceptually linked, and publish them to inform collaborators and to attract new ones. One can imagine similar services for linking works together in packages (originating from different systems within an institution, or across institutions - from anywhere that is exposed to the web.) This starts to answer some of the "marketing" questions about repositories. Authors create stories rather than just works; publishing packages of knowledge about a research subject helps facilitate re-use.
 
"Machine learning" techniques could help bootstrap this; inferring linkedness from citations and annotations can take this effort further.
But human curatorial energy is necessary, both to create collections and to provide a peer review process that can lend authority to collections.
(But who provides this curatorial energy, and who provides the support to do it? How can researchers benefit, both in terms of discovering interesting work, and gaining support for their own work?)
 
In short, I think I'd like to see:

  • Different kinds of linking services - "clear, focussed, services that only do one or two things", like sameAs http://www.sameas.org/about.php
  • Effort towards knowledge packages rather than "data repositories" - recognising the need to expose existing systems to the public internet rather than build new infrastructures which magnify the tensions already exemplified in the world of repositories
  • Funder support that values reflection on and curation of, existing bodies of research, and tools to facilitate that rather than creating pressure to do more.
  • Less handwaving about The Cloud, while recognising that "separating storage from service" is a key the *technical* resolution of these problems. More focus on emerging, "ground-up" efforts like OpenGRID

(This probably makes more sense in the context of part one)

Keeping it real at the Repository Fringe

One of two parts: the next

I'm still processing what I learned at Beyond the Repository Fringe 2009. These words from Clifford Lynch's closing keynote resonated for me: "Distance conversation from implementation".
 
So I want to try to separate out reflections on organisation and culture, from reflections on services and implementations. The following is a mash-up of things said by many people at the Repository Fringe. The next entry is the geeky part.

3772823543_ccb6f5568c_m

Here's Sheila Cannell exhorting us to consider a Giant Purple Cow. Original from flickr


"La crisis" should compel a change in models of "scholarly communication". "Traditional" peer-reviewed journals depend on their subscriptions for existence. Academic libraries, squeezed for resources, are cancelling the subscriptions that sustain the access-barrier model for academic publishing. The "serials crisis" is perceived serious enough to have its own Wikipedia page. Meanwhile the ground tremors of academic communications practise changing online are rumbling.
 
Yet the peer-review process that precedes publication in scholarly journals is still very much needed. If nothing else, it helps provide the primary "metric" for academic research - the spread and depth of citation of works, the novelty and quality of works - that helps guarantee researchers, their groups and departments continued funding.
 
If "traditional" journals fade away, what replaces them? We have the "updater pays" open access journal model as run by PLoS - after acceptance by peer review, authors pay a publication fee which goes to support the running of the journal and guarantee open access. This fee can be factored into research grants, but not everyone can afford to pay nor wish to pay. There are huge bodies of archival material for whom there is no living or identifiable "updater".
 
University libraries and presses used to have a bigger role in journal publishing (I wish I knew more about the histories).  Different kinds of free software Open Journal Systems allow libraries to resume this role with few resources, rescuing "faded journals" as Edinburgh has done with Critical African Studies. A "workflow" is provided for editors and contributors to manage their own peer-review process. Journals become a kind of "packaging" for materials whose primary point of access, to many, is through an open access repository.
 
So now universities have repositories; they are seen in many ways by many people. Repositories seem to be a placeholder, standing in for different ideals about the benefit of the internet to the insitutition. For some, repositories are marketing tools; for others, institutional memories. Some people want to see long-term preservation of materials; others, collaboration tools for "research pools", or a means to an end of universal open access. Repositories become loaded up with all the valuable purposes that they *could* serve - too many to agree on what they are.
 
So tensions arise; repositories are run at the "institutional" level because that is where the financial support is. (In places like Southampton they've arisen because a broadminded technology department, or a digital library, has thrown up an e-prints service and it's gained critical mass.) Funders and grantmakers set the pace and create a mandate for dissemination of research materials in specific ways.
 
But researchers are oriented towards topic or department, as that's where their direct (social and financial) support is coming from; an institutional repository feels like another administrative burden, another hoop to jump for the research assessment exercise.
Faster-moving "centres" and "research institutes" with commercial or governmental ties capture the energy flow which could be used by departments or subject groups across many institutions to build their own infrastructures. It makes sense for a repository to broaden scope, to become a social memory for towns and for corporations, to pick up all sorts of research-relevant materials that are otherwise likely to "fade"; but that can't be supported through academic resources.
 
So I was startled to hear that contributors to repository services aren't often users of them. If the practical and promotional needs of individuals aren't answered by the current shape of repositories, then collective needs at institional level are unlikely to be met either.

As the late Rachel Heery wrote in her Digital Repositories Roadmap Review for JISC; "There needs to be a shift in emphasis from the repository to the objective." Clifford Lynch echoed this with a call for a shift in emphasis to "repository services", rather than repository; so the conversation shifts from "what can repositories be?" to "what services meet the different objectives within repositories"...

Next, the technological speculation.

The trouble with OpenSearch browser search box plugins

I've seen a couple of academic registry/repository sites recently which expose an OpenSearch interface - IESR at MIMAS, http://iesr.ac.uk/use/opensearch/ and the Enlighten repository out of University of Glasgow - http://enlightenrepository.wordpress.com/2009/07/09/opensearch-plug-in-for-en...

 It is good to see that there is life beyond OAI-PMH and that OpenSearch is at least being trialled to see if it meets a good subset of users' needs.

 But; whenever I am presented with one of these browser search box plugins, the story is always the same:

 * Step 1, enthusiastically add the plugin to the browser search box, run a few sample searches, gaze at the metadata
* Step 2, wander off and do something else. A day passes...
* Step 3, the next day, use the search box expecting to see search engine results. Realise that this is an unrelated registry/repository result set. Switch the search box back to Google

 And I never look at the browser search box plugin again.

 Is this my own lazy behaviour? I never really saw the benefits of OpenSearch as being able to expose a search interface for just one site;
I imagined it better suited for aggregators who want to crawl through and collect search results from many different sites.

 I imagined the OpenSearch Geo Extensions would work like that. An individual repository node wouldn't expect users to engage with its OpenSearch directly; instead a "portal"-type node would run a set of OpenSearches, cache the results, update them for specific requests periodically based on how popular the search terms were. http://www.opensearch.org/Specifications/OpenSearch/Extensions/Geo/1.0/Draft_2 - I ought to update this to match the OGC Discussion Draft version.

 Perhaps I have misunderstood OpenSearch Nature. But it occurs to me that the browser search box is an obvious place for big aggregator sites, not so much for smaller registries. Ultimately it's less context-switching just to visit a link (I know i can visit IESR just by typing 'ie' into the address bar) than it is to select one OpenSearch from a list of many in the browser search box.

 I'd like to know how widespread OpenSearch is becoming within UK HE/FE and its repository scene, I wonder if there's an index.