ETD at the UNT Libraries: Setting the Scene

Version 1.0, August 2007

Introduction

Responsibility for storing and providing user access to new University of North Texas (UNT) Electronic Theses and Dissertations (ETDs) is moving from Academic Computing Services to the UNT Libraries. The first section of this document provides background on ETDs at UNT and elsewhere, followed by a description of current local practice. Finally, it offers recommendations for an optimal environment to ensure long-term access to and preservation of UNT’s ETDs. For the current status of the project, see ETD Progress Report. If you have questions or comments regarding this document or the ETD project in general, please contact Mark E. Phillips at mark.phillps@unt.edu or Daniel Gelaw Alemneh at daniel.alemneh@unt.edu.

Scope

Development of a viable workflow requires balancing the needs of multiple stakeholders. It should be noted that the preliminary analysis presented in this document is based only on UNT manuals and guideline documents, and the scope is limited to the UNT Libraries’ projected stewardship role. In view of these limitations, the workflow based on this document’s recommendations will be tested for feasibility by all ETD stakeholders. We are anticipating an ETD pilot for Fall 2007 with eventual implementation of the full-fledged ETD workflow campus-wide.

Background

Theses and dissertations represent a wealth of content created by university students in the degree-seeking process. Historically, the UNT Libraries housed two copies of each paper-based UNT thesis or dissertation, depositing one copy in the University Archives and placing one copy for use in the Libraries’ general collections.

Among the first five American universities that required ETDs for graduation, UNT began accepting theses and dissertations in electronic format in 1999. This switch to an electronic delivery system of scholarly output fundamentally changed the way these documents were handled and stored. The ETDs were loaded onto UNT Academic Computing Services servers and the UNT Libraries provided bibliographic access through the Libraries’ online catalog.

New technology for digital interchange provides opportunities for extensive dissemination of graduate students’ scholarly work. Over the past few years, institutional ETD programs have become the norm, not the exception.

An ETD program provides processes, standards, software that automates functions, and a digital infrastructure that facilitates access and preservation. Commentators (ETD 2007) agree that implementation of comprehensive strategies for placing a collection of institutional intellectual output on the Web requires some changes in institutional policies and practices. It also needs the support of a wide range of stakeholders on campuses. These include graduate students, faculty members, libraries, graduate schools, and, in some cases, commercial publishers and other external players.

In the early 2000s, the first wave of ETDs were usually stored as a part of digital library collections administered by university libraries. These collections later served as the foundation for institutional repositories. By extending their existing objectives and by working together at the state, national, and international levels, university libraries are now playing a vital role in ensuring permanent and persistent access to this indigenous knowledge base.

Current ETD Workflow at UNT [prior to Fall 2007]

Currently [prior to Fall 2007] the ETD’s are handled at the University of North Texas in the following workflow:

  1. Degree candidates submit their ETDs to the Graduate School in the Portable Document Format (PDF).
  2. The Graduate Reader reviews the files for formatting errors and permissions issues and ensures that candidates supply any needed corrections or documentation. When the files meet all the requirements of the Graduate School, the Graduate Reader approves them.
  3. Two copies of each approved PDF are made:

    • UNT version

    i. File folder created with student name, as “lastname_firstname” Student folder is saved in either an “Open” or “Restricted” folder. (An open thesis/dissertation will be available to the entire Internet community. A restricted thesis/dissertation will be limited to use by those with a valid UNT login. According to the Graduate School’s electronic document filing form, all ETDs are openly available unless compelling reasons exist to restrict the document.)

    ii. Thesis or dissertation file is named either “thesis” or “dissertation” and saved in student folder. File is saved with protections that prevent copy/pasting and/or printing, either in whole or in part).

    iii. Index page (to be used by catalogers) is created using Microsoft FrontPage. File is named “index” and also saved in student folder.

    • ProQuest copy File saved as “lastname-f” with no protections. Theses are saved in Theses folder; dissertations are stored in Dissertations folder.

    At the end of the semester, after the Registrar has formally closed the semester (usually 6-8 weeks after commencement), the Graduate School distributes the approved files:

    • UNT files are loaded onto Academic Computing Services (ACS) file server; hard copies of title pages/abstracts are sent to the UNT Libraries’ catalogers. Once all files have been cataloged, ACS transfers the files for the semester over to its Web server. The PDF files are delivered password-protected so that users are not allowed to print or copy text from them. The Graduate School controls the password and also stores a copy of each ETD that is not password-protected.
    • ProQuest copies are burned to CD and sent via FedEx (along with all accompanying material) to Proquest.

Rarely, UNT’s Vice President for Technology Transfer directs that certain theses or dissertations must be completely locked down due to patent or proprietary concerns. These ETDs are not released to the UNT Libraries, nor sent to ProQuest, for the period of 1 year. Each year, for the period of three years, the VP is responsible for letting the Graduate Reader know whether lock-down should continue for another year. At the end of three years, all lock-downs are released.

Diagram of the File/Folder Structure

The ETD files are stored in a directory structure which follows this convention:

year
  |
  |-Author
      |-index.html
      |-dissertation.pdf

Table 1: File and folder structure in the existing ETD directory

Description of Metadata

In the current [prior to Fall 2007] practice, metadata for each ETD exists in two places: in the Libraries’ online catalog, and with the ETD itself in the form of an HTML file with an active link to the text of the ETD.

Online Catalog

The Libraries’ online catalog provides both a description of each ETD and an access link to the ETD’s full text. The regular display visible to catalog users is based on MARC (MAchine-Readable Cataloging) records (see figure 1) . The creation of standardized MARC records for ETDs ensures that researchers worldwide will be able to locate ETDs not only in the UNT online catalog, but also through consortial catalogs such as WorldCat.

ETD Catalog records

Figure 1: Sample UNT Libraries ETD catalog record display

You can also view the regular display at the UNT Libraries Catalog page. Table 2 shows the MARC display format. See also the actual MARC display at the UNT Libraries Catalog page.

LEADER 00000nam  2200000Ia 4500
001    54104498
003    OCoLC
005    20040129095735.0
006    m        d
007    cr an---------
008    040129s2003    txu     sbm   000 0 eng d
040    INT|cINT
049    INTT
099    Electronic Dissertation
100 1  Abunasser, Rima Jamil.
245 10 Corporate Christians and terrible Turks|h[electronic
       resource] :|baesthetics, economics, and empire in the
       early British travel narrative, 1630-1780 /|cRima Jamil
       Abunasser.
260  [Denton, Tex.] :|bUniversity of North Texas,|c2003.
440  0 NT dissertation, English ;|v2003
500    Title from title page display.
502    Thesis (Ph. D.)--University of North Texas, Dec., 2003.
504    Includes bibliographical references.
538    Mode of access: Internet, via World Wide Web.
538    System requirements: Adobe Acrobat Reader.
650  0 Travelers' writings, British|xHistory and criticism.
650  0 English prose literature|y17th century.
650  0 English prose literature|y18th century.
650  0 Economics in literature.
650  0 Aesthetics in literature.
650  0 Politics in literature.
856 40 |uhttps://www.library.unt.edu/theses/open/20033/
       abunasser_rima/index.htm|zconnect to online resource.
910    eTHESES
959    Rec'd on Proquest s.o. .o3364306 $15.00 (fiche)

Table 2: MARC display


Metadata Associated With the ETD

Metadata that is stored with the ETD in the form of an HTML file provides an abstract of and key facts about the thesis/dissertation. This information is used by catalogers to help create MARC records for the Libraries’ online catalog. The HTML file is named “index” and also saved in the student folder. It contains the following information in a table format:

LABEL DESCRIPTION EXAMPLE
Author's Name Student responsible for the creation of the thesis/dissertation
[last name, first name, middle initial]
Woods, Christopher P
Document Type Type of Resources
[Controlled vocabularies of: Dissertation or Thesis]
Dissertation
Title Title of the thesis/dissertation. [Title Information exactly as it appears on the document] A Transcription of Op. 94 Morceau de Concert, by Camille Saint-Saëns For Solo Bass Trombone and Brass Ensemble
Degree Degree Information
[Controlled vocabularies of all UNT degrees ]
Doctor of Musical Arts
Major Degree Information. [Can be from a controlled list] Performance
Committee Name of Committee Members, including Major Professor (thesis/dissertation advisor) Vern Kagarice, Major Professor
Gene Cho
Brian Bowman
Graham Phipps
Thomas Clark
Keywords Subject
[One or more subject values denoting the discipline and/or area for the given thesis/ dissertation]
Camille Saint-Saëns, Morceau de Concert, bass trombone, brass ensemble
Graduation Date Month (in English) and Year May 2001
Availability [Controlled vocabularies of: Open or Restricted] restricted
Abstract [Brief description of the content of the thesis/dissertation] (Abstract supplied by the author)

The transcription is an addition to the repertoire for brass ensemble and bass trombone. Consideration is given to the nineteenth-century orchestration treatises of Berlioz and Strauss as well as the twentieth-century texts of Erik Leidzén, Walter Piston, and Samuel Adler. The transcription process is shaped by the principles of these writers. The score is contained in the appendix.


Files: Link to the PDF file dissertation.pdf
Special Conditions If any .


Table 3 - Description of sample metadata HTML file associated with a UNT ETD

Desired Situation (Recommendations)

The following sections describe the desired environment for storage and preservation of and access to UNT’s ETDs. Based on these recommendations, we will develop a workflow for placing ETD’s in the UNT Libraries’ Digital Collections (DC) operated by the Digital Projects Unit (DPU).

Environment

The Keystone Digital Library System serves as a framework for the creation, management and public display of digital objects collected by the the UNT Libraries and housed in the UNT Libraries’ Digital Collections. This framework is also used as the primary development framework for all other digital collections managed by the Libraries’ Digital Projects Unit. Other projects include The Portal to Texas HistorySM and the Congressional Research Service Reports Archive. All combined the Digital Projects Unit manages over 30,000 digital objects consisting of over 210,000 files. We have developed processes and workflows to manage and preserve large collections of digital objects with the metadata housed in these systems being a key component.

Metadata

Metadata for the ETDs should be created in a way that supports the international standards set by the Networked Digital Library of Theses and Dissertations (NDLTD) as well as the published standards set by the Texas Digital Library (TDL), of which UNT is a member. As can be seen from the sample description in Table 3 the existing metadata as received from the Graduate School lacks some metadata elements (such as the degree information and degree grantor institution name) which are important for resource sharing at the national and international levels. In light of this new requirements, the UNT Libraries Metadata schema currently used in the Libraries’ Digital Collections would need to be modified to comply with TDL recommendations. Further developing the elements in the ETD metadata will facilitate wider access to the ETDs through various retrieval systems including the Libraries’ Digital Collections, the Libraries’ online catalog, and search engines such as Google and Yahoo. Wider access to ETDs will in turn increase the visibility of UNT and its scholarship.

Files

The Libraries will store the ETDs in the Digital Collections system which is built on the Keystone Digital Library System framework. We will store all metadata in XML files in the system with references to the presentation PDF files that are stored on the display servers. Archival copies of all PDFs which make up each ETD will reside in the Libraries’ Digital Archive with required preservation metadata.

By storing the files in these systems, we will be able to respond to changes in technology that would otherwise affect the accessibility of the ETD files. Moreover, we can create reports based on characteristics of the ETDs themselves.

Files stored in the Libraries’ Digital Collections and ultimately placed in the Libraries’ Digital Archive should be stored and made available with the fewest possible proprietary and software-based rights management mechanisms enabled. It is more desirable that any rights management decisions should be made at the system level and should control the type of access that is available for the ETDs.

Services

The Digital Projects Unit has developed various services which make use of the data stored in the Libraries’ Digital Collections. These services include full-text and fielded keyword and phrase searching, collection and subject level browsing, and syndication services such as RSS and ATOM feeds. Because the Digital Collections are searchable using the SRU and OpenSearch protocols, the UNT Libraries’ Digital Collections can be included in federated search systems.

The Digital Collections’ metadata is harvestable using the Open Archives Initiative’ Protocol for Metadata Harvesting (OAI-PMH). Many groups including Google, OCLC, OAISter and the NDLTD use this protocol to facilitate the harvesting of metadata records for inclusion in their search systems. We are working toward the creation of SiteMaps to allow other search engines such as Yahoo and Microsoft’s Live Search to crawl and index the Digital Collections content.

All content placed in the Libraries Digital Collections benefits from development projects carried out in the Digital Projects Unit. We are planning several user studies to identify ways to create and refine interfaces to enhance access to the various collections held by the Libraries.

The Digital Collections system incorporates stable URLs sometimes referred to as permanent URLs. Users will be able to cite a thesis or dissertation with confidence that others will be able to find that document in the system at a later date.

We will also develop new features specifically for the ETD collection. For example, for the ETDs in the Digital Collections we plan to provide “citations on-the-fly” in several formats commonly used by our students.

Summary

Responsibility for storing and providing user access to current UNT ETDs is moving from UNT Academic Computing Services to the UNT Libraries. As depicted in figure 2, the UNT Libraries will house the ETD files in the Libraries’ Digital Collections and Digital Archive.

ETD-Structure

Figure 2: UNT Libraries ETDs By Type

With the exception of “Problem in Lieu of Thesis”e, we will continue to catalog ETDs (and link to the texts from the Libraries’ online catalog. We will modify the metadata accompanying ETD files to meet TDL standards and provide appropriate access through both the Libraries’ Digital Collections and the TDL Repository. By maintaining the ETDs in the Libraries’ well-established systems, we will be able to respond to changes in technology and ensure long-term preservation of the files. Users will benefit from searching, browsing, syndication services, and regular enhancements available in the Digital Collections.

In the next phase of this project we will develop a workflow detailing the specific steps that the Libraries will follow to receive, store, describe, monitor, preserve, and provide access to UNT’s ETDs.

Resources

Appendices

Appendix-1 Metadata for UNT ETDs

(For complete recommendations and implementation, see ETD at the UNT.)

ETD Metadata Element Outline

  • Title
    • type
  • Creator
    • name
    • type
    • role
    • information
  • Contributor
    • name
    • type
    • role
    • information
  • Publisher
    • name
    • place
    • information
  • Date
    • originalCreationDate
    • digitalCreationDate
  • Language
    • Description
    • contentDescription
    • physicalDescription
  • Subject
    • authority
  • Primary Source
    • Coverage
    • placeName
    • timePeriod
    • date
    • dateRange
      • startDateRange
      • endDateRange
  • Source
  • Relation
  • Collection
  • Institution
  • Rights
    • access
    • license
    • holder
    • rightsStatement
  • Resource Type
  • Format
  • Identifier
    • type
  • Degree
    • name
    • level
    • discipline
    • department
    • grantor
  • Note

Appendix-2 UNT-ETD Metadata to MARC Crosswalk Specification

UNTL-ETD Element

MARC Element Description

Remark

Title:

245a

(246 for alternatives & 242 for translation)

Creator:

100a

Contributor:

720a

720e (for role)

Publisher:

260b

260a (for place)

Date:

008 positions 7-10

Language:

008 positions 35-37

Description:

520a

Subject:

653a

Primary Source:

---

Coverage:

651 or 690

Source:

---

Relation:

---

Collection:

---

Institution:

---

Rights:

540

Resource Type:

655

leader 6&7 (As text objects, 6 set to 'a' and as monographs 'm' in 7)

Format:

856q

Identifier:

856u

Degree:

502a

Note:

504

(for note 5xx)

     

Appendix-3 UNT-ETD Metadata Crosswalks to ND-LTD and TDL

UNTL-ETD Element ND-LTD TDL

Title:

Title (and alternative)

Title Information

Creator:

Creator

Name of Author

Contributor:

Contributor (and role)

Name of Thesis Advisor & Committee Members

Publisher:

Publisher

---

Date:

Date

Original Information

Language:

Language

Langauge

Description:

Description

Abstract

Subject and Keywords:

Subject

Subject

Primary source:

---

---

Coverage:

Coverage

Subject

Source:

---

---

Relation:

---

---

Collection:

---

---

Institution:

---

---

Rights Management:

Rights

---

Resource type:

Type

Type of resources

Format:

Format

Physical Description

Identifier:

Identifier

Identifier, (and Location)

Metadata Information:

---

Record Information

Note:

(Description-Note)

---

[Degree Information - Name]

Degree (name, level, discipline, grantor)

Degree Information

[Degree Information - Level]

Degree (name, level, discipline, grantor)

Degree Information

[Degree Information - Discipline]

Degree (name, level, discipline, grantor)

Degree Information

[Degree Information - Degree Grantor]

Degree (name, level, discipline, grantor)

Name of Degree Grantor

---