Discovering Online
Resources. Reports
From the Front:
Domain-Specific
Perspectives on Cross-Domain
Discovery
Contents
-
Introduction
-
UKOLN MODELS 4: evaluation of cross-domain resource discovery,
Rosemary Russell, UK Office
for Library and Information Networking
-
Archaeology Data Service: evaluation of resource discovery
metadata for archaeological resources, Paul
Miller, Archaeology Data Service
-
History Data Service: evaluation of resource discovery metadata
for historical data sets, Cressida
Chappell, History Data Service
-
Oxford Text Archive: evaluation of resource discovery metadata for electronic texts and linguistic corpora,
Michael Popham, Alan
Morrison, Jakob Fix, Oxford
Text Archive
-
Performing Arts Data Service: evaluation of resource discovery
metadata for moving image resources, Celia
Duffy, Performing Arts Data Service
-
Performing Arts Data Service: evaluation of resource discovery metadata for sound resources,
Celia Duffy, Performing Arts
Data Service
-
Visual Arts Data Service: evaluation of resource discovery
metadata for the visual arts, museums, and cultural heritage communities,
Catherine Grout, Visual Arts
Data Service
1 Introduction
The Dublin Core evaluation process undertaken by the AHDS and UKOLN involved
a series of workshops held between December 1996 and May 1997. The first
of these involved representatives from a wide range of academic, computing,
and curatorial communities who explored whether and how access might be
provided to networked scholarly information resources irrespective of their
intellectual content and of where, how, and by whom they are managed. Addressing
these general issues, it established a framework which helped to focus
six further and more specialist workshops that formally evaluated the Dublin
Core against the resource discovery requirements of a particular arts and
humanities community and the information resources of interest to that
community. Between them, the specialist workshops represented most of the
subject, media, and curatorial perspectives of interest to the arts and
humanities. In addition, their use of the Dublin Core as a touchstone in
the evaluation process ensured that a common formalism was used to express
different resource discovery requirements. Accordingly, the survey produced
compatible results and enabled the AHDS and UKOLN to identify both consensus
and conflict in the resource discovery needs of very large and substantial
communities. A final meeting of workshop convenors was held to resolve
conflicts, and thus to develop a unifying approach to resource discovery.
That approach, based upon the Dublin Core, is reported in Chapter 3. The
present chapter summarises the more focused evaluations of resource discovery
requirements upon which it is built. The full text of the workshop reports
are available online via the AHDS Web site at http://ahds.ac.uk.
2 UKOLN MODELS 4: evaluation of cross-dopmain resource
discovery
Rosemary Russell, UK Office
for Library and Information Networking
2.1 The Models 4 workshop: integrating access to resources across domains
This first workshop in the AHDS/UKOLN series involved nearly 50 participants
representing a range of curatorial professions (e.g. libraries, museums,
archives, and data archives), systems experts, and scholarly users in a
discussion of whether and how to integrate access to information resources
across domains. The topic reflected the hypothesis that people want to
search for and locate relevant information resources, irrespective of their
format and of whether they are held in libraries, museums, data archives,
or any other organisational structure. The workshop set out to test the
hypothesis, and if found to be true, to explore how resources and systems
might be structured in order to realise it in information service environments.
The workshop was the fourth in a series of MODELS workshops (MOving
to Distributed Environments for Library Services) conducted by UKOLN and
sponsored by the Electronic Libraries Programme (more information about
MODELS may be found at <URL: http://www.ukoln.ac.uk/models/>).
MODELS was ideally situated to launch the AHDS's and UKOLN's work on metadata
for resource discovery. As an ongoing concern, it aims to develop a framework
for managing distributed library and information services by facilitating
informed and focused discussion of salient strategic and practical issues,
and by articulating the technical service models which emerge from those
discussions. Even prior to this workshop, MODELS had established a track
record for involving key stakeholders in discussions which ultimately initiated
work of national significance.
2.2 Significant findings
2.2.1 Cross-domain resource discovery
It was agreed at an early stage in the workshop that cross-domain searching
is highly desired by both scholarly and curatorial communities, making
technical and service models worth pursuing. That is, the group agreed
about the desirability of being able to search across a range of potentially
complementary online information resources in a manner which, from the
users point of view, obscured any differences that existed in the underlying
resources' hardware, software, record structures, and query languages.
What constituted cross-domain discovery, however, was seen to differ. For
some it involved searching across the holdings of several institutions
operating within a single curatorial tradition. In the archival community,
for example, finding aids differ so considerably in both their structure
and content that cross-domain discovery could be seen as searching across
the holdings of two or more archives. From other perspectives, cross-domain
discovery could entail searching across the different databases maintained
by a single institution for different aspects of its collection - for example,
by a library about its book, manuscript, and print collections. For still
others, cross-domain resource discovery might entail searching for information
across a range of library catalogues, museum databases, archival finding
aids, data archive catalogues, and subject-based catalogues of World Wide
Web resources. The different definitions of cross-domain discovery encouraged
the group to focus upon...
2.2.2 Defining domains
Flexibility in approach was deemed desirable in defining the domains which
surrounded discrete collections of information resources. Thus, domains
could reflect the curatorial traditions which shaped the way in which those
resources were managed (libraries, museums, archives, etc.), the academic
disciplines within which they were principally created or used (archaeology,
film studies, geology), or the regional settings where they were stored
(north Wales, south-east). Crucially, by facilitating cross-domain discovery
the group felt it possible to break down the institutional, disciplinary,
geographical, and other barriers which may impede access to and use of
information.
2.2.3 A search model and its implication for metadata
With regard to cross-domain discovery itself, a reiterative or staged approach
was identified. At the first stage a user requires rudimentary information
about relevant information resources in order simply to be made aware of
their existence. To sustain this stage, generic metadata such as the Dublin
Core is required. At a second stage, the user having found a potentially
interesting information resource, might need richer metadata to determine
whether to acquire, browse, or analyse it. At a third stage, the same user
having acquired to a resource might need still further descriptive information
in order to use the resource effectively. At both of these later stages,
metadata in more specialist formats (e.g. MARC records for books, ISAD(G)
records for archival material, TEI headers for electronic texts) would
likely be required. It was here that the group articulated a search model
which enabled the user, in a single search environment, to 'drill down'
or move progressively through a hierarchy of increasingly rich and specialist
metadata as they moved through a continuum from resource discovery to resource
evaluation, access, and use.
2.3 Areas for further
investigation
2.3.1 Metadata for resource discovery
Domain-specific approaches to resource description, though fundamentally
different, need not impede cross-domain resource discovery, particularly
if discovery is seen as part of the continuum described above. Some consensus
is required across domains, however, about the minimum level of metadata
that needs to be associated with an information resource if it is to be
located meaningfully by those who might wish to gain access. As a first
step towards that consensus, domain specialists should, in light of the
documentation standards and best practice that are current within their
domains, formally identify their resource discovery requirements and express
them using a common formalism for comparative purposes. Within this process
the Dublin Core could provide the common formalism but also a starting
point for discussions about domain-specific resource discovery requirements.
This recommendation was particularly formative for the six more specialist
workshops convened by the AHDS and UKOLN.
2.3.2 Collections description
The issue of collections description first arose at the third MODELS workshop
(Dempsey
and Russell 1996) and focused on mechanisms for providing users but
also resource discovery tools some forward knowledge about the contents
of a particular collection catalogue. In the context of cross-domain resource
discovery, collections description was seen as a higher level of resource
discovery metadata; that is as metadata to help users select from a range
of online catalogues, finding aids, Web-based gateways, etc. those worth
including in a particular search. Discussion touched on the possibilities
for using Centroids (Knight
and Hamilton 1997). More importantly it called for further investigation
into communities' collections description practices and into the collection
description requirements of users involved in real cross-domain discovery.
Whereas the former investigation requires traditional research (an eLib
supporting study of collections description is currently being co-ordinated
by UKOLN and is due for completion in autumn 1997), the latter requires
the development of cross-domain discovery services such as the one being
built by the AHDS and reported in Chapter 4 of this volume.
2.3.3 The Z39.50 protocol for search and retrieve
The Z39.50 protocol seemed potentially capable of permitting users to implement
the reiterative cross-domain search and retrieve model outlined above.
Although a relatively unexplored area, some investigative work was at the
time of the MODELS workshop either underway (for example, by a group of
UK-based archivists) or intended (for example by the AHDS). The Z39.50
Digital Collections Profile appeares particularly promising (Library
of Congress, 1996). It provides a generic or over-arching framework
capable of accommodating more specialist profiles each of which is designed
to navigate heterogeneous collections databases developed by particular
communities of, for example, museums, libraries, curators of geospatial
information.
2.3.4 Controlled vocabularies in a cross-domain environment
Data description standards ensure that information resources are described
with reference to a common range of attributes. Controlled vocabularies
ensure a degree of consistency in the use of attribute values, and thus
a degree of consistency in search and retrieval. In a cross-domain discovery
scenario, the user is likely to encounter different, possibly competing
controlled vocabularies which reflect the underlying domains' different
resource discovery and description requirements. Resolving these conflicts
is essential if users are to be assisted in meaningfully searching a wide
range of information resources. Mapping between controlled vocabularies
may be one option although here too, it was felt that experience of user
behaviour in testbed cross-domain discovery environments was an essential
first step in addressing this issue.
2.4 Conclusion
The MODELS 4 workshop brought together a wide range of representatives
from relevant communities, and initiated discussion on the hitherto little
explored area of cross-domain resource discovery. It confirmed the desirability
of this approach to resource discovery and debated possible strategies
and implementations. It was particularly successful in identifying the
metadata requirements and how these might build upon work already conducted
under the Dublin Core initiative. As such it laid an important foundation
for the more specialist AHDS/ UKOLN workshops which followed.
3. Archaeology Data Service: evaluation of resource discovery
metadata for archaeological resources
Paul Miller, Archaeology
Data Service
3.1 Introduction
As with the other workshops in this series, that held for archaeology was
interested primarily in those elements of metadata essential to facilitating
effective discovery of resources (Miller
and Wise 1997). Undeniably important questions about metadata for other
purposes such as management or re-use of information were declared outside
the scope of the meeting, and are being explored through other fora with
which the Archaeology Data Service is involved.
Those participating in the workshop and its follow-up consultation process
represented the major national heritage agencies, local government, the
museum and research communities, and others, ensuring an exhaustive consideration
of issues from a number of archaeological interests with very different
priorities and experiences. Despite not necessarily being versed in the
language of metadata per se, many participants found that their everyday
interaction with diverse resources ensured that most had at least an unconscious
grasp of the key issues.
A number of problematic issues were highlighted for further exploration,
but the workshop found the proposed Dublin Core Metadata Element Set to
be essentially fit for the purpose of facilitating the online discovery
of archaeological resources.
3.2 Significant problems and potential solutions
3.2.1 Notions of 'the resource'
As also highlighted by the Visual Arts Data Service,
this workshop encountered some confusion in defining what metadata should
relate to. Debate focused upon the question of whether resource discovery
metadata such as that proposed should describe the archaeological resource
itself (an archaeological site, perhaps) or the manifestation of that resource
as data (a digital excavation archive). It was felt that metadata for the
latter was more appropriate, but the division between metadata for archaeology
and metadata for data continues to blur occasionally, leading to potential
confusion and misuse.
3.2.2 Collection versus item level description
Most archaeological resources are grouped together to form collections
of related information, such as an excavation archive which comprises plans,
photographs, databases, artefacts, artefact reports, environmental evidence,
etc. These collections may themselves be grouped into larger collections
formed on regional, national, or content criteria.
In terms of creating metadata, it is undoubtedly easiest to do this
at the level of the most encompassing collection, with a single record
created for each of the National Monuments Records, for example. In facilitating
access, of course, it is more useful for comprehensive metadata to be searchable
at the lowest possible level with, ultimately, complete metadata records
for each individual object or computer file. The most practical and effective
implementation obviously lies somewhere between these extremes, and is
determined by the nature of the resources themselves and the ease of user
access to those resources shown to be worthy of further exploration by
a metadata-based search.
The manner in which collections are defined, the ways in which they
interrelate, and the level to which Dublin Core-style metadata should be
provided were all identified as important areas for further exploration
such as that now underway.
3.2.3 Notions of authorship
As with other workshops in this series, Archaeology found the Dublin Core's
notions of primary intellectual responsibility as expressed in the DC.creator
element difficult to align with the realities of digital surrogates of
physical archaeological resources.
3.2.4 Overloading the Core
In exploring effective use of the Dublin Core, the Archaeology workshop
encountered two important and related problems associated with overloading
the Dublin Core model.
The first of these was element overload, where a very few elements (principally
DC.subject and DC.coverage) were employed as 'data buckets' into which
large quantities of information might be placed, normally within a complex
system of Dublin Core TYPE sub-qualifiers.
The second was overload of the entire Dublin Core itself, where a structure
designed exclusively for resource discovery was employed to provide more
detailed metadata better disseminated by means of more specialised structures.
A large number of standards exist for describing the detail of archaeological
resources (Miller
and Wise 1997) and the Warwick Framework model (Lagoze
et al 1996) should be explored in search of a means by which this detail
might be linked to the more general Dublin Core record.
3.3 Recommendations regarding the Dublin Core
3.3.1 Registering SCHEMEs and TYPEs
It was felt that implementations of Dublin Core such as that proposed
by the AHDS required a central registry for recommended Dublin Core
SCHEMEs and TYPEs, and that such registries should be kept closely linked
to less formal developments in the wider Dublin Core community.
3.3.2 Disciplinary definitions for Dublin Core elements
The current definitions for Dublin Core elements (Weibel,
this volume) remain largely rooted in a documentary paradigm, although
the meaning behind the definitions is more widely applicable. It is suggested
that definitions suitable for individual subject communities are developed,
with careful validation to ensure that the meaning remains the same, regardless
of the words used.
3.3.3 Greater attention to DC. coverage
The Dublin Core's coverage element is essential to the archaeological community,
both spatially and temporally. This element requires greater attention
and a coherent development programme in order to ensure fitness for purpose.
This element, perhaps more than any other, is in danger of overload and
its role needs careful assessment alongside more detailed schema such as
those under development within CEN TC287 (CEN
1997) and ISO TC211 (ISO
1997).
4 History Data Service: evaluation of resource discovery
metadata for historical data sets
Cressida Chappell, History Data
Service
4.1 Introduction
This workshop assessed historians' resource discovery requirements with
regard to digital resources (Chappell
and Anderson 1997). It focused in particular on historical databases
and implicitly on social science databases with which historical databases
share so much structurally and semantically in common. The two documentation
standards which are most widely used to record and catalogue information
about such data sets were reviewed - ISAD(G), or the General International
Standard Archival Description (ICA
1997) developed by an ad hoc commission of the International Council
for Archives during the early 1990s for archival materials in general rather
than necessarily for digital ones; and the Standard Study Description developed
by a consortium of Social Science Data Archives in the 1970s specifically
for machine-readable files (DDI
1997). The group benefited substantially from members' experience of
extant online catalogues, notably that of the Data Archive's catalogue,
BIRON (Data
Archive 1997a), and the Council of European Social Science Data Archives'
Integrated Data Catalogue which acts as a gateway to permit searching in
parallel across the holdings of eleven data archives (Data
Archive 1997b).
4.2 Significant problems and potential solutions
4.2.1 Defining resource discovery
The workshop accepted in principle that resource discovery could be based
on a minimum set of descriptive elements provided that fuller information
is supplied, possibly in Warwick framework-style packages. As in other
workshops, however, there was some difficulty in identifying the boundary
between resource discovery and the fuller assessment of a resource which
might be required before deciding actually to use it, and thus, precisely
how much metadata was necessary for resource discovery.
After discussion, members agreed that six categories of information
were absolutely essential: the source(s) on which a resource was based;
the geographical area (e.g. the British Isles, Madison County); chronological
period (e.g. 1900-1945); and subject (e.g. labour history, urbanisation)
it referred to. Also, a data set's title and the person(s) or organisation(s)
responsible for its creation were considered vital.
More pressing than the categories of essential metadata was the depth
of information supplied in any one category for any particular data resource.
In general, members preferred comprehensive information supplied to a fine
level of granularity, particularly with regard to a resource's geographic
and chronological coverage, and to the source(s) on which it was based.
They wanted to be able to search for all data sets spanning a given year
or range of years, and then retrieve information about the periodicity
of the underlying data (e.g. annually collected data, a decennial census).
Far from merely being able to locate data sets based upon their geographic
extent, members also wished to be able to further refine such a selection
based upon the level of detail offered by each data set. Thus a search
for the English county of Essex might not only recover all data sets indexed
by the term 'Essex', but would also recover all data sets from a higher
level (e.g. England, the United Kingdom, or Europe) which are known to
include county-level data for Essex. The search might also extend to data
sets covering a smaller area identifiable as a subset of the county, such
as parish-level records.
4.2.2 Collection versus item level description
Users' resource discovery requirements would in some cases be based on
the content of dataset tables, records, or fields rather than on the information
contained in a resource's metadata. Thus, an individual searching through
a range of data resources based on censuses or census-like lists might
wish to locate only those resources which made explicit reference to a
particular named individual, thus blurring the boundary between resource
discovery on the one hand, and resource browsing or analysis on the other.
Though useful, cost and technical considerations are likely to mitigate
against this functionality being supplied across the holdings of a relatively
large data service.
4.3 Recommendations regarding the Dublin Core
Given the essential requirements for initial resource discovery documented
above, the group agreed that the Dublin Core made a useful starting point,
but it identified the following problem areas.
4.3.1 DC.creator and DC.contributor
The relationship between the elements DC.creator and DC.contributor was
seen as hopelessly confused, and members preferred to eliminate DC.contributor
altogether and use DC.creator with an appropriate list of controlled responsibility
statements.
4.3.2 DC.type
The preliminary list of object types proposed by Knight and Hamilton (Knight
and Hamilton 1997) was felt to be inadequate for the needs of historians
and historical data. The provisional hierarchy developed since this workshop
(Tennant
1997) would seem to better encapsulate the data forms encountered by
historians. It is, however, likely that more history-specific information
about such details as data structure will either need to be added to the
current model or represented within a SCHEME specifically for historical
data sets.
4.3.3 DC.coverage
The coverage element was seen as potentially problematic as a result of
the manner in which historically crucial information relating to spatial
location and temporal duration were both forced into it. It was felt preferable
for information relating to space and time to be separated into two elements
or, failing this, for a rigorous system of TYPEs to be employed within
the element to clearly distinguish between different forms of the two.
As well as providing locational or durational data, this element will be
required to provide significant amounts of contextual information on such
details as data granularity, locational precision, and temporal periodicity.
5 Oxford Text Archive: evaluation of resource discovery
metadata for electronic texts and linguistic corpora
Michael Popham, Alan
Morrison, Jakob Fix, Oxford
Text Archive
5.1 Introduction
This workshop focused on identifying the metadata essential to finding
electronic texts of interest to those working in the fields of literary
and linguistic studies, and encompassed texts of every type and period
(Popham
et al. 1997). It worked with a broad definition of what might constitute
a 'text' in order to consider various forms of text collection (e.g. collected
works, anthologies), linguistic corpora, and other works (e.g. dictionaries,
reference works).
Arguably, this workshop should have encountered the fewest challenges
when evaluating the Dublin Core against the communities' resource discovery
needs. The Dublin Core was initially envisaged as metadata for document-like
objects, and there has been substantial work 'mapping' between the Dublin
Core and the two text documentation standards which focused the group's
attention: MARC (Library
of Congress 1997b), and the Text Encoding Initiative's Header (Giordano
1996). Despite this, two significant challenges were identified which
tempered the consensus that emerged regarding the Dublin Core's suitability
for resource discovery.
5.2 Significant
problems and potential solutions
5.2.1 Defining resource discovery
This crucial issue bears directly on how much information (metadata) is
actually required for any given resource. The consensus was that the more
information that could be fed back to a user in response to an enquiry,
the easier it would be for that user to identify the resources likely to
be of interest.
5.2.2 Variety of users' resource discovery requirements
The workshop focused initially on the needs of literary and linguistic
scholars, but rejected early on the possibility of considering the disciplines
in any uniform way. The problem was further compounded given that texts
(whether electronic or not) are frequently of interest to scholars working
across the range of humanities and other disciplines, and who therefore
represent an extremely broad range of resource discovery requirements.
The group did feel that Warwick Framework-style packaging of more detailed
and specialist documentation offered a reasonable mechanism for satisfying
the resource discovery requirements of diverse user communities. Currently,
such a model is employed by academics working with conventional library
catalogues to discover paper-based texts. The catalogue provides basic
search facilities for author, title, keyword, and subject. The initial
inquiry can then be followed either by browsing the complete library catalogue
record (if available, e.g. online), and/or by consulting a copy of the
work itself. With this in mind, it was felt that the basic information
necessary for the successful discovery of non-electronic resources in literary
and linguistic studies would also appear to be sufficient for discovering
their electronic counterparts, and that the Dublin Core made a good starting
point for satisfying these basic information requirements.
5.2.3 Scope. Collection versus item level description
The problem is easily stated though not easily addressed. In an anthology
of verse or the collected works of an individual playwright, should the
metadata relate only to description at the collection level, or should
each individual work (or even section - e.g. chapter, verse, act, scene)
within a collection also have its own descriptive metadata? If the latter,
then in certain circumstances (e.g. a collection of works by the same author),
perhaps certain metadata could be inherited from the collection-level description
by each of the works constituting the collection. Similarly, the collection-level
metadata description should perhaps be sufficient to convey basic information
about each of the individual works within the collection (but would this
be feasible in the case of, say, an anthology of 500 poems produced by
different authors?). These issues are of even greater concern when considering
large-scale literary or linguistic corpora which may contain many thousands
of individual texts. The concept of scope also raised a number of related
issues, such as the possible requirement to identify discrete resources
(e.g. a number of specific texts within a corpus, a specific act within
a play), and the need to know whether or not a resource was static or dynamic
(i.e. liable to change), as knowing such information might aid initial
resource discovery when searching across large volumes of material.
Here the problems seemed less surmountable and it was later agreed at
a meeting of workshop convenors that they were likely to be addressed by
individual service or information providers who would weigh up their users'
resource discovery needs against the size of their collections and the
costs and redundancy entailed in item-level description.
5.3 Recommendations regarding the Dublin Core
5.3.1 Problematic elements and element usage
The elements DC.subject and DC.description presented difficulties with
purely literary texts (for example, there are many potential keywords for
Shakespeare's play Hamlet related to notions of love, betrayal, insanity,
etc., but a text about the play might require only a handful of subject
keywords), though none were envisaged for linguistic resources.
The relationship between DC.source and DC.relation was considered to
be confused, and the group felt unsure about where best to express the
relations familiar to those studying literary materials (e.g. an adaptation
by X of Y's translation of a work by Z).
DC.type was considered useful but not essential and presented problems
as the group was sceptical about the usefulness of the proposed Dublin
Core object types (at the time represented by the work of Knight
and Hamilton 1997). It recommended instead the use of one of the many
existing controlled vocabulary lists, such as those used by conventional
library cataloguing staff to describe genres of literary resources.
5.3.2 Element qualifiers
The group argued that these were necessary for the Dublin Core elements
DC.title, DC.creator, DC.contributor, DC.date, and DC.identifier. With
regard to DC.date it argued for a controlled list of types allowing for
date of original creation of a work, the publication date of the relevant
printed edition of that work, and the release date of the electronic version
of the printed edition.
5.3.3 Implementation issues
-
The group's discussions pinpointed three key implementation issues:
-
the importance of controlled vocabularies, particularly for DC.creator;
-
mechanisms for coping with date ranges in DC.date;
-
the desirability of more rather than less comprehensive information in
DC.source, possibly including pointers to metadata for the source(s).
6 Performing Arts Data Service: evaluation of resource
discovery metadata for moving image resources
Celia Duffy, Performing Arts
Data Service
6.1 Introduction
Moving image resources are, of course, of interest to a great many more
discipline areas than those of the performing arts. This workshop (Duffy
and Owen 1997a) , however, focused on the discipline areas of film,
TV, and theatre studies and considered resource discovery issues relating
chiefly to movies, TV, drama, and recordings of staged performances. Participants
represented a cross-section of expertise and interest from both service
providers and user groups.
There is a marked difference between the specialised, individualistic
cataloguing practices at film archives (which often adopt their own in-house
procedures and systems) and those of general libraries. The difficulty
for general libraries is the fact that moving image resources are generally
not amenable to descriptive methods designed for text-based materials.
The Dublin Core, with its origins in describing document-like objects,
shares these difficulties. The workshop examined its potential use for
describing moving images resources, tested it against a variety of examples,
and critically reviewed its application. It concluded that the Dublin Core
model could be used to describe moving image resources with some provisos
as noted below.
6.2 Significant
problems and potential solutions
6.2.1 Dublin Core terminology
If one of the aims of the Dublin Core is for non-library-trained researchers
to supply metadata records with their data, the language used and definitions
given have to be meaningful to those researchers. At least for those working
with moving images, this was not felt to be the case at present.
DC.coverage was sufficiently problematic that it is not recommended
for use at all in conjunction with moving image resources. Similarly difficult
was DC.publisher, with notions of 'publication' far less straightforward
than for text-based resources. The provision of clear guidelines for both
general and subject-specific users should alleviate this problem and help
to ensure that the core set is used consistently. The definition of other
elements has been clarified for use in an AHDS context but there are still
likely to be a large number of SCHEME and TYPE qualifiers. It is questionable
whether these are intuitive enough for non-specialists to use.
6.2.2 Qualifiers to Dublin Core elements
Whilst keeping in mind the fact that the Dublin Core is intended for core
description and not to replace more precise and specialised cataloguing
methods, the workshop concluded that the Dublin Core would only be useful
for moving image resources with ample provision of qualifying statements.
The number of roles (director, producer, performer, etc.) attributed to
individuals under DC.creator and DC.publisher (the difficult distinction
between primary and secondary contributors in DC.creator and DC.contributor
having been jettisoned) will be the most difficult to restrict.
Moving image resources will often need qualifiers which make a clear
distinction between original works, their various manifestations in production,
and their digital surrogates with respect to DC.creator, DC.publisher,
DC.date, and others. As it remains difficult to recommend use of DC.coverage
in its present form, important information relating to place will have
to be moved to other elements, again with appropriate qualification.
A definitive list of qualifiers for each element is still to be determined.
6.2.3 Specialist versus inter-disciplinary users
Throughout its discussions of the Dublin Core the workshop tried to maintain
a balance between the needs and expectations of a non-specialist, inter-disciplinary
searcher (which are difficult to predict) and those of a specialist user.
Many of the problems encountered arise out of knowledge and experience
of the needs of searchers within the disciplines of film, television, or
theatre studies. Although it is generally agreed that interdisciplinary
searching is a positive goal, the more precise needs of the interdisciplinary
searcher have still to be determined. There is a case for more research
in the area of interdisciplinary searchers' behaviour.
6.3 Recommendations regarding the Dublin Core
6.3.1 The problem of authorship
It is often neither possible nor desirable to assign a principal 'author'
to a moving image resource; even if the convention of using the director
is adhered to for movie resources, it cannot be consistently applied for
television and other recorded performances. The option of listing those
of 'secondary' importance within DC.contributor implies a hierarchy of
artistic effort which is problematic in the extreme. The contents of DC.creator
and DC.contributor should therefore be combined within DC.creator, with
the roles of each named individual clearly specified.
6.3.2 Coverage
This element, as defined, cannot be used consistently for moving image
resources. However, the concepts of place and duration (teased out by implication
from the current definition's "spatial locations and temporal duration")
are extremely important for moving image resources and can be included
under other elements.
Place might be accommodated within DC.subject (tagged for provenance,
country of production, etc.) and DC.publisher (tagged with place of release/broadcast/production).
This leads, however, to potential overload, particularly of the subject
element. Running time (duration) fits more naturally along with playback
information in DC.format than within DC.coverage.
6.3.3 Element qualifiers
The only element for which some kind of SCHEME or TYPE qualifier is not
potentially useful is the free-text DC.description.
7 Performing Arts Data Service: evaluation of resource discovery metadata
for sound resources
Celia Duffy, Performing Arts
Data Service
7.1 Introduction
This workshop (Duffy
and Owen 1997b) focused on the discipline of music, considering both
sound and printed music resources and their application to the Dublin Core
metadata element set.
In the discipline of music, it can be difficult to divorce the needs
for information retrieval in sound recordings from those of the printed
versions of the same music; particularly in the field of Western art music,
users are likely to be searching for both. It is a characteristic of music
resources that the same work can manifest itself in many different ways,
e.g. as a sound recording, a score, an arrangement, a manuscript, or a
MIDI file. This multiplicity of representations of a work (a good selection
of which were covered in the workshop) is not as prominent an issue in
book cataloguing but is one which causes conflicts between music librarianship
and traditional book-based approaches.
UKMARC is a widely-used standard in music libraries, although, as it
was developed primarily for book-based media, its use for music resources
can be problematic. Further discussion of MARC and the special issues faced
in sound and printed music cataloguing with a detailed consideration of
current standards appears in
Malcolm Jones' briefing paper (1997) and the workshop report itself
(Duffy
and Owen 1997b).
7.2 Significant problems and potential solutions
7.2.1 Mapping existing conventions and standards to the Dublin Core
The workshop found that there was a reasonably good correspondence between
descriptions necessary for sound and printed music resources and the Dublin
Core, in the sense that a home could be found somewhere within the Dublin
Core structure for the most important categories. There were cases where
important search terms had to be shoehorned in to unexpected places (such
as DC.coverage), or scattered over various Dublin Core elements in a way
that separated information that would usually be linked together in one
statement (for example, separating place from date of recording or date
of publication from publisher). DC.coverage gave most cause for concern
here. The provision of clear guidelines for both general and subject-specific
users should alleviate this problem and help to ensure that the core set
is used consistently.
7.2.2 Overloading the core
The workshop found that most of the Dublin Core elements needed to contain
not one, but several pieces of information to make sense for music resources.
Given that an expert group tends inevitably to construct a 'wish list'
for particular cataloguing requirements in their discipline and some requirements
noted in the workshop report are likely to be too detailed for a core level
of description, nevertheless the amount of qualification still necessary
for music resources may prove problematic in an interdisciplinary searching
environment. Many of the proposed qualifications attempt to separate information
about an original work from its recreation in the form of a recording;
for example, users need to be able to distinguish a composer from a performer
in the DC.creator element and the date of original composition from the
date of recording in DC.date.
Many elements are in danger of being overloaded. The recommendation
to combine DC.creator and DC.contributor into a single element for all
individuals who have a creative input to the resource, will necessitate
a large number of explanatory tags to explain roles. Other elements which
are in danger of being overloaded are DC.subject (including information
on genre, medium, associated names and places), DC.publisher (again, this
is not so straightforward for sound resources as it may be for text-based
resources and many qualifications are necessary), and DC.date (again several
dates to be disentangled of recording, of original composition, of release).
A definitive list of qualifiers and a mechanism for providing more detailed
and specialist documentation within the core is still to be determined.
7.3 Recommendations regarding the Dublin Core
7.3.1 The problem of authorship
As in the area of moving
image resources, it is often neither possible nor desirable to assign
one 'author' to a recorded sound resource; the convention of citing the
composer as the main author is arguable for Western art music, but cannot
be consistently applied to other types of music. The option of listing
those of 'secondary' importance implies a hierarchy of artistic effort
which is itself problematic. It is therefore recommended that DC.creator
and DC.contributor are combined, with appropriate tagging schemes to explain
roles.
7.3.2 Coverage
This element, as defined, cannot be used consistently for sound or printed
music resources. However, the concepts of place and duration (teased out
by implication from the current definition's "spatial locations and temporal
duration") are individually extremely important and can be included under
other elements. The workshop felt that place of recording (or provenance/origination)
was such a significant element for sound resources that it should ideally
have a field of its own. Place does not fit naturally with duration. Normal
practice would be to combine place and date of recording, and duration
with format.
As a result of further discussion, and in line with the moving image
workshop, it is recommended that this element is not used for sound resources.
Duration fits naturally with DC.format. Place of publication, or release,
should go with DC.publisher. Place of origination, or recording, and any
associative place name will be included in DC. subject.
7.3.3 Difficulties of definition
Usage has been clarified for DC.type, which should no longer contain genre
statements and thus not overlap with DC.subject. Usage has also been clarified
for DC.format (relating to playback or handling information), which should
also include duration. The definition of DC.publisher needs clearly to
state its inclusiveness: a publisher in the case of recorded music may
be interpreted as a record company, distributor, agent, or broadcasting
organisation. This definition of publisher is wider than usual.
The potential confusion between DC.source and DC.relation should be
resolved by restricting the use of source to describing an (usually analogue)
original from which a (usually digital) copy has been made. DC.relation
can then be used for hierarchical relationships (for example, tracks on
an album, individual songs relating to a song cycle) as outlined in the
AHDS guidelines in Chapter
3.
7.3.4 Element qualifiers
The only element for which some kind of SCHEME or TYPE qualification is
not potentially useful is the free-text DC.description.
8 Visual Arts Data Service: evaluation of resource discovery
metadata for the visual arts, museums, and cultural heritage communities
Catherine Grout, Visual Arts
Data Service
8.1 Introduction
This workshop aimed to examine the descriptive information needed to enable
the discovery of visual arts, museums, and cultural heritage resources
on the Internet, particularly in the form of digital images. It aimed to
decide which of these were of 'core' significance, to indicate relevant
specialist standards, terminology resources, syntaxes etc., and to consider
the effectiveness of Dublin Core as a basis for resource discovery metadata
in this domain. This exercise was in some respects a complex one, as a
very significant corpus of information description standards already exists
for use by members of the three communities represented. A review of these
standards was provided in a document circulated in advance for the participants
(Gill
et al. 1997). The workshop was followed by an extensive process of
reporting and consultation in order both to recommend solutions to the
problems identified at the workshop and to subject these recommendations
to a process of rigorous review by members of the relevant communities.
The reports which detail this process are available on the Visual Arts
Data Service site on the World Wide Web (Gill
and Grout 1997).
8.2 Significant problems and potential solutions
8.2.1 Identification of the source of intellectual content
One of the most significant problems which the workshop began to address
was the need to identify the source of intellectual content when creating
and using resource discovery metadata. In essence the discussions at the
workshop lead to the need to find an effective answer to the following
question: how can a clear distinction between originals, surrogates, and
online resources be made using Dublin Core?
The process of creating metadata information about digital networked
resources will often involve the description of a number of different entities,
since the investment of intellectual content can occur at many stages.
Since information in visual arts, museums, and cultural heritage is often
derived from physical, tangible original objects such as works of art,
objects in a collection, or sites of historic interest, the ability to
optionally specify what exactly is being described by the metadata becomes
more significant than for less object-focused research areas which tend
to centre around the retrieval of bibliographic resources, and where information
about the physical manifestation(s) of the work is usually inconsequential
compared to information about the work itself.
The principal solution proposed to this problem by the Visual Arts Data
Service was the application of optional 'intellectual content source' qualifiers.
These were original, surrogate, and resource, which could be further refined
by the use of the optional sub-qualifiers analogue or digital. These qualifiers
could be applied to any of the Dublin Core's elements. However, despite
the merits of a logical system such as this, it does seem likely that it
will prove to be syntactically too complex for wide-spread implementation
and could also represent a barrier to cross-domain searching.
8.2.2 Granularity: items and collections
The Dublin Core originated from the library community, and was originally
intended to provide a simple means of describing document-like objects
which were defined by example. Over the course of the Dublin Core workshop
series, however, the element set was refined and the notion of a document-like
object extended to include any networked resource that appears to be identical
to diverse users. This means that the Dublin Core can now be used to describe
a much wider range of networked resources.
This also paves the way for the application of the Dublin Core to descriptions
at varying levels of granularity; it can still be used to describe a discrete
individual item such as a Web page or a digital image, but it can also
now be applied to more general resources, such as a collection of Web pages
forming a site, or multiple digital images arranged as a collection.
This tension between item and collection level descriptions is particularly
pertinent for this domain, as both the original works and the digital resources
based upon them will tend to exist both as individual items and as parts
of larger collections; descriptions of both the items and the collections
will inevitably be used for retrieval by users, depending upon their search
goals.
The problem is further exacerbated by the fact that collections can
be contained within larger collections. For example, a collection of objects
donated by an individual may form part of the larger collection of
a museum or gallery, but will still need to retain its unique identity
and provenance.
The VADS recommended the following strategy to address this issue. This
was the use of qualifiers for DC.relation as suggested by members of the
wider Dublin Core community (Guenther
1997). The most useful qualifier to allow relationships between items
and collections to be described was felt to be DC.relation.isMemberOf.
8.3 Recommendations regarding the Dublin Core
8.3.1 Need for user documentation and implementation guidelines
It was recommended during the course of the workshop that more information
was needed to allow consistent interpretations and implementations of the
Dublin Core. This was borne out by the experience of members of an editorial
group elected at the workshop, who used the VADS's Edinburgh Recommendations
as a basis for the construction of sample metadata on items from their
collections. Although a template was supplied, implementations still differed
substantially between the authors, suggesting that domain-specific as well
as general guidelines will be needed in future to allow for consistent
resource description and discovery.
8.3.2 Need for wider awareness of the high-granularity resource discovery
needs of this domain
Essentially, the workshop highlighted an issue at the heart of resource
discovery requirements for visual arts, museums, and cultural heritage
material. While the Dublin Core was invented to describe document-like
objects, it is anticipated that members of these communities will wish
to use the Core to describe and retrieve information about more complex
and multi-level entities such as an electronic exhibition catalogue which
could, for example, contain descriptions of the life and work of several
artists, each accompanied by several digital images. Given the diverse
and complex electronic resources which exist in this domain, it is therefore
particularly important when using Dublin Core to define where the basis
of intellectual content lies and what exactly the metadata is setting out
to describe.
Conclusion
Together these statements of requirement reflect the needs of a wide range
of scholarly communities, curatorial domains, and of humanities information
resources. Their expression in terms of a common formalism readily highlights
both convergent and divergent requirements and the outlines of a conceptual
map of metadata for cross-domain resource discovery. At a final meeting
of workshop convenors and AHDS and UKOLN representatives, that map was
more fully charted particularly by paying attention to apparently conflicting
domain-specific requirements. Where possible such conflicts were resolved
on the day and reported back to the communities represented at each of
the domain-specific workshops for further review and comment. A few issues
were referred to small groups of workshop convenors or referred back to
Dublin Core discussion lists for input. The result - an implementation
of the Dublin Core appropriate to cross-domain discovery of humanities
resources - is reported in the following chapter.
Return
to table of contents
Send comments or questions to info@ahds.ac.uk
Last modified: Monday, 17-Nov-97 16:52:01 GMT by D. Greenstein
URL: http://www.ahds.ac.uk/public/arlist.html