Speaking Notes from a Tech Talks Presentation by Daniel Alemneh, December 6, 2006



The digital life cycle management starts from the point an item is created or selected for digitization (if not born-digital), continues through image cleanup, metadata capture, and derivative creation, and extends to ensuring long-term access. The two aspects of digital library data quality are the quality of the data in the objects themselves, and the quality of the metadata associated with the objects. Metadata errors occur in a variety of forms, but when errors exist, in whatever form, they affect discovery, access, use, and long term management of the resources.

Metadata quality

Metadata quality in terms of:

Error free

  • Typographical Errors:
    • Letter transposition e.g. 9198 for 1998
    • Letter omission, e.g. Omt for omit
    • Letter insertion, e.g. asnd for and
    • letter substitution or misstrokes, e.g. likw for like
  • Adding/selecting wrong information in the wrong field/subfield

No omissions

  • Null
  • Incomplete information

Non ambiguous

  • Inconsistency eg. multiple spellings, multiple possible meanings, mixed cases, initials, etc.

Factors influencing metadata quality

The metadata quality characteristics depend on various factors, including:

Local requirements

  • What type of objects will the repository contain? [Heterogeneity]
  • How will they be described? And used? And by whom? [Granularity]
  • What functionality is required locally?
  • How will it be interfaced?
  • What entry points will be used?

Collaborators' requirements

  • What is required for interoperability? (Structural, semantics, and syntax).
  • Are requirements formal or informal?
  • Will metadata be meaningful within aggregations of various kinds?
  • Will access restrictions be imposed?


  • Are resources sufficient to produce the required metadata quality? (very unlikely)
  • (If not), what are the priorities? (CBA)

Managing metadata quality

The metadata quality assurance cycle:

Determine level of quality required

  • Partners may have much in common, but they have diverse and sometimes conflicting metadata requirements as well.

Determine nature of gap and how to close it

  • effectiveness, efficiency, practicability, scalability


  • One size does not fit all!


  • Resources unlikely to be available to meet all requirements

Test the workflow and the quality cycle

  • retest

            Click to view full-size image

UNTL metadata quality assurance mechanisms

  • Metadata Template Creator

    • Sample workflow
  • Metadata analysis tool demonstration

    • NULL Values
      • No Null values allowed for Mandatory elements (Title, Subject, Description, Language, Coverage, Resource Type, Format)
    • List All Values
      • Shows every value in the field selected and the number of times it shows up in the metadata.
      • All descriptive elements can be visually viewed for errors using the aid of various tools including:
        • Highlighter .(On/Off)
        • Qualifiers (Use/Ignore)
        • Sub-elements (All Qualifiers/Each Qualifiers)
    • List Values by Institution
      • List by institution with total no. of records from each institution.
    • List Values by Collection
      • List by collection with total no. of records from each collection.
    • Authorities Values
      • Maintain the details about each and every controlled list to ensure key uniqueness.
    • Other Graphical Reports:
      • Records Added Over Time
      • Records Added Per Month
      • Files Added Over Time
      • Clickable Map of Texas:
        • By Collection
        • By Institution
      • Word Clouds
        • By Collection
        • By Institution
  • UNTL Metadata Quality Cycle

                       Click to view full-size image



About the Author:  Daniel Gelaw Alemneh is a metadata management specialist for the UNT Libraries' Digital Projects Unit.


Wednesday, December 6, 2006 - 12:00pm