Metadata used in the American Memory Project at the Library of Congress

 

 

 

Prepared for the

International Music Metadata Projects Working Group

 

by Lynda Aldana

January, 2001

 

The purpose of this report is to provide descriptive as well as technical information to the International Music Metadata Working Group on the music and music-related collections that are part of the American Memory Project.  The online collections of the American Memory Project, specifically, the music and music related projects, are just an example of how libraries and archives across the country and internationally are working on new ways to provide access to materials and are exploring options beyond conventional cataloging records.

The American Memory Historical Collections for the National Digital Library is a group of over 90 collections that have been digitized and are available on the Library of Congress’ web page.  At least 21 of these 90 collections are music or contain music-related materials.  The materials, currently available in American Memory Project, represent a variety of collections from within the Library of Congress as well as from other universities.  The collections also represent a wide variety of materials that have been digitized and are accessible to the outside users.  These include, print and manuscript materials, photographs, sheet music, sound recordings, motion pictures, and maps.  The American Memory Project is not a static project, there are at least 27 collections listed on the “Future Collections” link at the American Memory home page as being “currently in progress.”  In the December 2000 issues of Notes, Judy Tsou reviews the American Memory Project for the Digital Media Reviews section.  In the review, she describes, among other items, the music collections, ways to search, the collection finder and the learning pages.

There are numerous resources at the American Memory site that provide information on the technical specifications of the collections.  Additionally, each collection has at least one or possibly both of the sections titled “Building the Digital Collection” or “Cataloging the Collection” as a hyperlink on the opening title page for the collection.

 According to information found on the American Memory Project pages:

“In its presentation of historical collections, the Library of Congress uses Standard Generalized Markup Language (SGML) for two types of documents: finding aids and the full texts of books, pamphlets, manuscripts, and other historical texts.”

 

The finding aids use Encoded Archival Description and SGML.  A finding aid typically lists the scope, contents, and provenance of a collection.  The contents of the collection are listed according to the way they are physically arranged and stored. For users with the appropriate viewer, some of the collections have the option of viewing the finding aid in SGML.  Currently, the Leonard Bernstein Collection and the Federal Theater Project are music collections that offer the option of viewing the finding aid in SGML format as well as in HTML.  For a further discussion of EAD see the report to the IMMPWG prepared by Lois Schultz.

A Document Type Definition (DTD) can be developed for specific purposes and SGML tag sets can be created to meet the requirements of the DTD.  A DTD specifically for the American Memory Project was developed in 1992 when the project began.  The American Memory DTD conforms to the guidelines for humanities texts established by the Text Encoding Initiative (TEI).  The full text materials are marked up in SGML and then converted to HTML so that they are viewable on the web with most browsers.

SGML is a very powerful markup tool and allows users to move around large documents very easily – jumping from section to section.  It emphasizes the description of the document, and has the added benefit of not being hardware dependent.  It allows for more thorough indexing of these full text documents than might otherwise be possible. This expands the user's ability to search a document or a set of documents.

The American Memory Project uses the full text search engine called Inquery for search and retrieval.  Retrieved searches are arranged according to relevance to the words that were entered.  This search engine isn't fully aware of SGML tags.  For the American Memory projects, all metadata records that are not in MARC format are converted to "pseudo-SGML" with the tags that represent index fields such as "creator," "title," "subject," etc.  The American Memory Project uses the same set of indexes for MARC and pseudo-SGML records. 

For most of the textual materials and collections there is the option to search either the bibliographic record or the full text.  Searching the bibliographic record will include the author, title or subject terms assigned by the person who created the record.  Choosing to search the full text means that the digitized version of the text material will be included in the search.  Each type of search has its advantages and disadvantages.  Each collection will generally have information about searching and browsing on it home page.

The Leonard Bernstein Collection is an example of a collection that has made finding aid available in both SGML and HTML formats.  The African-American Sheet Music, 1850-1920 (Brown University), the Band Music from the Civil War Era, and Music for the American Nation projects have MARC records.  However, the MARC records for the Band Music from the Civil War Era, and the Music for the American nation projects do not necessarily conform to AACR2.  "We'll Sing to Abe Our Song," "Historic American Sheet Music, 1850-1920," and "America Singing" use "pseudo-SGML."  The "Historic American Sheet Music, 1850-1920 is actually available in two forms: pseudo-SGML at the American Memory site and in EAD at the Duke University site.

According to the collection's home page the Leonard Bernstein Collection ca. 1920-1989 is one "of the largest and most varied of the many special collections held by the Library of Congress Music Division."  The online collection contains a selection of photographs, scripts from the "Young People's Concert" and the "Thursday Evening Previews," and over 1,100 pieces of correspondence.  The finding aid is a work in progress and will be updated as more of the collection is processed.  The Bernstein finding aid provides links to selected photographs from the collection that have been digitized.  The online version of the finding aid is available in either SGML or HTML.  This collection is searchable using keyword searching but it also has an author index, subject index, and title index for browsing.

            The collection African-American Sheet Music, 1850-1920: Selected from the Collection of Brown University is an example of a collection that was cataloged according to MARC scores format; then transferred, marked-up and added to the American Memory site.  In 1980 Brown University began a project to catalog 1,700 titles from the African-American Sheet Music Collection at the John Hay Library and it is the cataloging from this project that has been used for the bibliographic records linked to the digital reproductions in the African-American Sheet Music collection.  According to the section on the collection's home page titled "Interoperability between the Library of Congress and Brown University," the

"… the image files are mounted at the Library of Congress and presented through the same page-turning interface as used for many of the Library's own American Memory collections.  Copies of MARC catalog records created at Brown are held in the online catalog at Brown, indexed using InQuery by the Library of Congress for American Memory, and incorporated into union catalogs at OCLC and the Research Libraries Group.  All these records link to the digital reproductions mounted at the Library of Congress."  

Also, Brown University created separate bibliographic records for the digital reproductions in order to distinguish them from the original items.  This includes adding information to the original record about the digital reproduction as found in the Draft Interim Guidelines for Cataloging Electronic Resources.  As cataloging records were delivered to the Library of Congress for mounting and indexing, slight modifications were made to them, including removing the information about the creation of the digital object.

The Band Music from the Civil War Era is an unusual Music Division online collection in that it is not based on a permanent collection within the division but has instead been created solely for presentation online.  Additionally, this collection presents manuscripts, musical scores and parts, photographs and music.  This collection contains over 700 musical compositions and 19 recorded examples of brass band music.  The eight full score modern editions were created using the music writing software Finale.  Bibliographic records are MARC records but do not necessarily conform to AACR2 standards.

"We'll Sing to Abe Our Song!": Sheet Music about Lincoln, Emancipation, and the Civil War from the Alfred Whital Stern Collection of Lincolniana is an example of a collection from the American Memory Project that uses pseudo-SGML.  Pseudo-SGML is what is used for records that are not in the MARC format.  Metadata records that are not in the MARC format get represented in pseudo-SGML tags that represent index fields such as title and creator. This then enables Inquery to index these files.  

The American Memory and the National Digital Library Program are very clearly making use of tools that expand the possibilities of providing access to large collections or even materials that are related but not actually the same location, physically.  Also, many of the collections are multi-format collections; providing access to a field recording as well as transcriptions of the text in the same “place.”  A user can potentially read the transcription while listening to the recording, view a picture of the performer, and then read the full text of fieldnotes made as he was being recorded without moving from one room to the next or even travelling out of state.  


Analysis of Sample Collections from the American Memory Collection

Representing finding aids, collections cataloged according to MARC, MARC (but not necessarily AACR2) and psudeo-SGML

 

 

Project title: Leonard Bernstein Collection, ca. 1920-1989

 

I.          Project Description

            Bibliographic documentation: HTML, Finding: EAD

            Database Structure: Hierarchical

Metadata

                        Administrative: text, some scanned images

                        Structural:

            Level of Detail: Item, but also collection since the finding aid is available

II.         Background Documentation: contains more than 400,000 items documenting the life and career of one of 20th-century America's most important musical figures.  This online collection makes available 85 photographs, 177 scripts from the Young People's Concerts, 74 scripts from the Thursday Evening Previews, and over 1,100 pieces of correspondence.  The collection's complete finding is also available online.

III.               Information retrieval

Client-Server Architecture: Yes

Supports Boolean Searching: No, searching is by keyword

Supports browsing: Browsing by Title Index, Name Index, and Subject Index

IV.              Accessibility/display

No special viewers are needed for viewing collection.  The finding aid is available in HTML or SGML but a special viewer is needed to view in SGML format. 

SGML/XML compliant: Yes

SGML browser (e.g. Dynaweb/Panorama): Yes, to view and work with the finding aid

HTML display: Yes

Sound files: No

Image files: Yes

 

Project Title: African-American Sheet Music, 1850-1920: Selected from the Collections of Brown University.

 

I.                    Project Description

Bibliographic documentation: MARC records

Database Structure: Hierarchical

Metadata

Administrative: Scanned Images of sheet music, digital images of music notation and the lyrics, text indexing

Structural:

Level of Detail: Item

II.         Background Documentation

The sheet music in this digital collection has been selected from the Sheet Music Collection at the John Hay Library at Brown University.  One of the most important categories in the Sheet Music Collection is the African-Americana.  This consists of music by and relating to African Americans, from the 1820s to the present day, and consists of approximately 6,000 items.  Of that number, 1,700 items are fully cataloged in MARC format, from which the 1,305 titles digitized in this project have been drawn.

III.       Information retrieval

Client-Server Architecture: Yes

Supports Boolean Searching: No

Supports browsing: By browsing the Subject Index, Author Index, or Title Index

IV.       Accessibility/display

SGML/XML compliant:

SGML browser (e.g. Dynaweb/Panorama): No

HTML display: Yes

Sound files: No

Image files: Yes

 

Project Title: Band Music from the Civil War Era

 

I.          Project Description

            Bibliographic documentation: MARC but not necessarily AACR2

            Database Structure: Hierarchical

Metadata

                        Administrative:

                        Structural:

            Level of Detail: Item

II.         Background Documentation: This online collection includes both printed and manuscript music (mostly in the form of "part books" for individual instruments) selected from the collections of the Music Division of the Library of Congress and the Walter Dignam Collection of the Manchester Historic Association  (Manchester, New Hampshire).  The collection features over 700 musical compositions, as well as 8 full-score modern editions and 19 recorded examples of brass band music in performance.  This collection is not based on a permanent collection within the Music Division but was created solely for presentation online.

III.               Information retrieval

Searching is by keyword

Client-Server Architecture: Yes

Supports Boolean Searching:    No

Supports browsing: Browse by Subject Index and Title Index

IV.       Accessibility/display:

SGML/XML compliant: Yes

SGML browser (e.g. Dynaweb/Panorama): No

HTML display: Yes

Sound files: Yes, available to users in WAV, RealAudio, and MP3 file formats

Image files: Yes, also some images available, as high resolution/quality .tif format and a viewer that can view .tif files is needed.

 

Project Title: "We'll Sing to Abe Our Song!": Sheet Music about Lincoln, Emancipation, and the Civil War from the Alfred Whital Stern Collection of Lincolniana

 

I.          Project Description

            Bibliographic Documentation: pseudo-SGML

            Database Structure: Hierarchical

            Metadata

                        Administrative:

                        Structural:

II.         Background Documentation: This collection includes more than two hundred sheet-music compositions that represent Lincoln and the war as reflected in popular music.  It spans the years from 1859 to 1909, the centenary of his birth.  Alfred Whital Stern who is considered the greatest private collector of materials relating to the life and times of Abraham Lincoln.

III.       Information retrieval:

Client-Server Architecture: Yes

Supports Boolean Searches: No.  Keyword searches

Supports Browsing: Browse the Title Index, Author Index, Subject Index, or Publisher Index

IV.       Accessibility/display

            SGML/XML compliant: Yes

            SGML browser (e.g. Dynaweb/Panorama): No

HTML display: Yes.  Additionally, lyrics were transcribed and at the Library of Congress and marked up in HTML for viewing online

            Sound files: No

Image Files: Yes, high-resolution versions of images provided as .tif file