BWF Metadata

Audio Metadata Primer

This document covers the very small subset of metadata that pertains to audio preservation projects, and specifically the metadata that is used to interface with an audio preservation service provider such as The Audio Archive. We provide both technical references and information, as well as practical advice on how to use the various metadata standards in conjunction with audio files.

BWF Metadata

Background

Where to put the metadata

Catastrophic metadata

WAVE RIFF INFO_List versus BWF <bext> chunk

Compatibility

Providing Metadata Field Descriptions

Example: WAVE-RIFF header fields

Example: BWF extension field

MP3 Metadata

Background

MP3 ID3 Tags

AES-X098 Technical Metadata

File Naming Schema

BWF Metadata - Background

The Broadcast Wave Format (BWF) was introduced in 1996, and is an extension to the WAVE file format. The BWF format consists of an additional data chunk embedded within the WAVE file header.

BWF files continue to use the .wav extension in the file name (for example, there is no such thing as a .bwf file name extension). The beauty of the BWF is that when an application does not recognize the BWF data, the application will still read the WAVE portion of the file, providing BWF with good general compatibility with non-BWF WAVE applications.

To learn more about BWF, see the EBU BWF standards documents in our Standards section on this website.

For the purposes of audio preservation, there are two sets of headers and associated fields that we need to concern ourselves with in the WAVE-BWF file:

RIFF INFO_List tags (available to all WAVE files)

BWF extension (also known as the “bext chunk”)

which will be covered later.

Where to put the metadata

There are two choices on where to store metadata:

In a separate file, application, or database

Embedded in the audio file itself

The choice will primarily depend upon your institution and the choices you have made on how to handle metadata. If you provide us with the metadata that you want stored in the audio file itself, we will embed it in the WAVE-BWF file with the appropriate tags and in the proper data chunks.

Good cases can be made for either choice. We've seen some institutions even do both.

Catastrophic Metadata

In general, we do advocate a minimum set of metadata be embedded in the WAVE-BWF file. This is referred to as catastrophic metadata. It allows you to identify your audio files in the event of an IT disaster. If for whatever reason the WAVE-BWF file somehow gets completely disassociated from an external file, application or database, it will still be possible to identify the audio file from the catastrophic metadata.

There are different aproaches to catastrophic metadata. For example, it would be sufficient to simply store a USID (Unique Source Identifier) in the <bext> chunk as the Originator Reference. This may be less human readable, but with the right software tools, the USID is all that would be needed to identify the file.

Another approach would be to embed just enough descriptive information (Description, Originator, Date, etc.) to identify the recording.

WAVE RIFF INFO_List versus BWF <bext> chunk

The INFO_List and <bext> chunks are two different chunks within the WAVE header.

The RIFF INFO_List tags are not recognized as an archival standard, but are used in various commercial audio editing applications as well as some audio players and music distribution systems. Their popularity does not earn them "defacto standard" status, but they are used nonetheless with frequency. Whether to populate these tags is up to the individual archive, and may depend on whether you have applications that can take advantage of these fields.

The BWF <bext> chunk is a recognized and frequently used location for archival metadata. We strongly recommend using the BWF for at least catastrophic metadata.

If you use JHOVE, the JSTOR/Harvard Object Validation Environment, you will find that the RIFF INFO_List tags are unsupported.

BWF Metadata - Compatibility

As a practical matter, there exists a wide range of implementations when it comes to the number of characters found in the WAVE RIFF INFO_List tags, and how many characters are displayed in various applications.

For maximum compatibility across software applications, The Audio Archive recommends that all RIFF INFO_List tags and BWF field descriptions be limited to no more than 64 characters. Should you require more than 64 characters, we recommend that the first 64 characters contain all the critical and essential information, and that the remaining 192 characters (for a total of 256 characters) only contain details knowing that they may be truncated, discarded, or not displayed in some software applications.

BWF Metadata - Providing Metadata Field Descriptions

Depending on the complexity of the BWF Descriptions, either a Microscoft Excel spreadsheet or Word document are basic tools to provide the descriptive metadata. If you have XML representations of your metadata, we can work with that, too.

An Excel spreadsheet is preferred if the character-count for the descriptions is straightforward and obviously no more than 64 characters. Otherwise, the “Word Count” tool in Microsoft Word can be used to check the number of characters in a description, and the descriptions provided in a Microsoft Word document.

NOTE: When using the “Word Count” tool, be sure to: (a) highlight the field description for which you want to count the characters, and (b) look at the “Characters (with spaces)” value because spaces also count against the maximum number of characters.

BWF Metadata - Example: WAVE RIFF INFO_List Tags

One way to use the WAVE RIFF INFO_List tags in conjunction with the BWF fields is to make the INFO_List tags common to a group of recordings, whereas the BWF fields will contain details describing just one unique recording. For example:

Title: John Cage Disc Collection
Subject: Music, 1939 - 1951
Engineer: Eric Jacobs, The Audio Archive
Copyright:
Genre: 20th century music, Piano, Percussion, Electronic, Aleatory
Artist: Cage, Couper, Russell, Roldan, Harrison, Beyer, Cowell, Wolff
Keywords: Piano, Percussion, Music, Experimental, Chance, Radio, Dance
Originator Software: WaveLab 6.10
Creation Date: 2006-09-11
Original Medium: 16-inch Electric Transcription (ET) Disc

There is also a "Comment" field available.

This is how the RIFF INFO_List tags appear in the Cube-Tec Audiocube and Wavelab software (click on the image to display the full size image):

The WAVE-RIFF INFO_List tags corresponding to these fields are:

INAM (Title)

ISBJ (Subject)

IENG (Engineer)

ICOP (Copyright)

IGNR (Genre)

IART (Artist)

IKEY (Keywords)

ISFT (Originator Software)

ICRD (Creation Date)

IMED (Original Medium)

ICMT (Comment)

BWF Metadata - Example: BWF extension fields

Description: H 029 John Cage. Imaginary Landscape (1 of 2) - copies 1/3, 2/3
Originator: John Cage
Origination Date (yyyy-mm-dd): 1945-02-18
Coding History: <entered by audio engineer>
Originator reference: <unique source identifier>
Origination Time (hh-mm-ss): 09-59-59
Time Reference: 0s

Please see the official EBU Standard documents listed in our Standards section on this website. The EBU BWF User Guide provides an excellent introduction to the use of BWF fields.

The Audio Archive can help you generate the correct BWF extension fields for your project. Here is what the BWF extension fields look like in the Cube-Tec Audiocube and Wavelab applications (click on the image to display the full size image):

MP3 Metadata - Background

There are many MP3 fields (also known as ID3 tags), but the most common fields for compatibility among most MP3 players are listed further in this section.

The MP3 standard allows for up to 255 characters, but various MP3 players handle the display of these fields differently. For example, an iPod only displays the first 20 characters or so. The number of characters displayed is variable and depends on font kerning and the iPod model screen size. Another iPod limitation is that the "Artist", "Album Title" and song "Title" are truncated when scrolling through lists, but when playing a song, the iPod will display the entire song “Title” (but the “Artist” and “Album Title” will still be truncated).

Windows Media Player, the Apple iTunes player, and other computer-based players will display all 255 characters, although you may have to adjust the window size and column width to see the entire description.

Generally, The Audio Archive advises limiting the character count to 32 characters if you envision the MP3 files being accessed on portable players or other specialized applications, otherwise using the same field lengths as the WAVE-RIFF and BWF are fine.

If you would like 32 character field descriptions, you will need to provide shorter descriptions for the following fields (MP3 ID3 tags):

MP3 Metadata - MP3 ID3 Tags

Our recommendations on how to generrate MP3 ID3 tags if you already use the RIFF LIST_info tags is as follows:

Artist: <USE WAVE-RIFF TITLE>
Album Title: <USE WAVE-RIFF SUBJECT>
Title: <USE BWF DESCRIPTION>
Year: <USE BWF ORIGINATION DATE>
Track Number: <possibly use this to sequence titles in a group>
Genre: <USE WAVE-RIFF GENRE>

AES-X098B Technical Metadata

The Audio Archive is a member of the AES-X098 standards group, and has been contributing to the development of this draft standard for audio technical metadata over the past two years. We expect this standard to be finalized in late-2008, and have been providing customers with data that conforms to this draft standard since 2006.

AES-X098B technical metadata (over 30 fields) capture for each audio item:

Media type

Condition of the media

Processes used to conserve and playback the media

Playback anomalies (dropouts, noise, speed, EQ)

The benefits of capturing this information to archivists, researchers and listeners:

Assessment of media condition

Assessment of recording quality

Identify chronology, sequence, or other relationships between recordings based on physical or electrical properties (media type and manufacturer, playback speed, length and other dimensions, common noise problems, similar damage or deterioration of the media, format)

Reduce the need to access the physical collection

Examples of AES-X098B are available from The Audio Archive as Microsoft Excel spreadsheets upon request.

File Naming Schema

There is no single best way to name the audio files. Naming conventions will depend on the collection and how it is organized, as well as the media type and format. Possible elements that can be used in a file-naming schema include:

Accession number

Collection name

Date of recording

Item number

Serial number

Disc or tape side

Part number (recordings that span multiple media)

Media type (open reel, coarse groove, LP, cassette)

Format (sample rate, word length, bit rate)

Intended use (preservation master, access copy)

Serial number

The above elements are hardly an exhaustive list, but these are commonly used elements for human-generated file names. Depending upon the nature of your collection, you might want to name files based on artist, instrument, geographical location, or any number of possibilities. If you are uncertain which elements to incorporate into a file-naming schema, we will work with you to identify a reasonable schema.

For maximum compatibility across operating systems and media types, we recommend that file names be limited to a maximum of 32 characters (not including the file extension). ISO standards for file systems often restrict string lengths to 128 characters (including the file extension).

Yet another option is to use randomly (or rule-based) machine-generated file names. Often a database application can generate these names or "handles".



Tools WAV Properties Extension for viewing BWF fields and more (Windows) A handy tool for viewing the BWF <bext> and INFO_List chunks (among others) in the WAVE RIFF header. This tool integrates with the Properties viewer (ie. right click on file name in windows browser, and select "Properties) by adding an "Info" tab. The "Info" tab then displays a structured view of the WAVE RIFF header chunks. You can even copy and paste header text from the new "Info" tab. Not fancy. Not pretty. But works well and has low system overhead. Works best for ad hoc file review. You can find it in the utilities area of www.mda-vst.com. Although the software instructions do not explicitly support Windows XP, we find it does work on this operating system (as well as the older versions of Windows). RIFF File Viewer (Windows) Identical to the WAVE Properties Extension tool above, except it is wrapped in its own dedicated user interface. not as at-your-fintertips as the WAV Properties Extension because you need to locate the RIFF File Viewer software (it does not automatically install into your Programs list) and start it. But it works best when you have to review a bunch of files at the same time. You can also find this software in the utilities area of www.mda-vst.com.	Audio Metadata Primer This document covers the very small subset of metadata that pertains to audio preservation projects, and specifically the metadata that is used to interface with an audio preservation service provider such as The Audio Archive. We provide both technical references and information, as well as practical advice on how to use the various metadata standards in conjunction with audio files. BWF Metadata Background Where to put the metadata Catastrophic metadata WAVE RIFF INFO_List versus BWF <bext> chunk Compatibility Providing Metadata Field Descriptions Example: WAVE-RIFF header fields Example: BWF extension field MP3 Metadata Background MP3 ID3 Tags AES-X098 Technical Metadata File Naming Schema BWF Metadata - Background The Broadcast Wave Format (BWF) was introduced in 1996, and is an extension to the WAVE file format. The BWF format consists of an additional data chunk embedded within the WAVE file header. BWF files continue to use the .wav extension in the file name (for example, there is no such thing as a .bwf file name extension). The beauty of the BWF is that when an application does not recognize the BWF data, the application will still read the WAVE portion of the file, providing BWF with good general compatibility with non-BWF WAVE applications. To learn more about BWF, see the EBU BWF standards documents in our Standards section on this website. For the purposes of audio preservation, there are two sets of headers and associated fields that we need to concern ourselves with in the WAVE-BWF file: RIFF INFO_List tags (available to all WAVE files) BWF extension (also known as the “bext chunk”) which will be covered later. Where to put the metadata There are two choices on where to store metadata: In a separate file, application, or database Embedded in the audio file itself The choice will primarily depend upon your institution and the choices you have made on how to handle metadata. If you provide us with the metadata that you want stored in the audio file itself, we will embed it in the WAVE-BWF file with the appropriate tags and in the proper data chunks. Good cases can be made for either choice. We've seen some institutions even do both. Catastrophic Metadata In general, we do advocate a minimum set of metadata be embedded in the WAVE-BWF file. This is referred to as catastrophic metadata. It allows you to identify your audio files in the event of an IT disaster. If for whatever reason the WAVE-BWF file somehow gets completely disassociated from an external file, application or database, it will still be possible to identify the audio file from the catastrophic metadata. There are different aproaches to catastrophic metadata. For example, it would be sufficient to simply store a USID (Unique Source Identifier) in the <bext> chunk as the Originator Reference. This may be less human readable, but with the right software tools, the USID is all that would be needed to identify the file. Another approach would be to embed just enough descriptive information (Description, Originator, Date, etc.) to identify the recording. WAVE RIFF INFO_List versus BWF <bext> chunk The INFO_List and <bext> chunks are two different chunks within the WAVE header. The RIFF INFO_List tags are not recognized as an archival standard, but are used in various commercial audio editing applications as well as some audio players and music distribution systems. Their popularity does not earn them "defacto standard" status, but they are used nonetheless with frequency. Whether to populate these tags is up to the individual archive, and may depend on whether you have applications that can take advantage of these fields. The BWF <bext> chunk is a recognized and frequently used location for archival metadata. We strongly recommend using the BWF for at least catastrophic metadata. If you use JHOVE, the JSTOR/Harvard Object Validation Environment, you will find that the RIFF INFO_List tags are unsupported. BWF Metadata - Compatibility As a practical matter, there exists a wide range of implementations when it comes to the number of characters found in the WAVE RIFF INFO_List tags, and how many characters are displayed in various applications. For maximum compatibility across software applications, The Audio Archive recommends that all RIFF INFO_List tags and BWF field descriptions be limited to no more than 64 characters. Should you require more than 64 characters, we recommend that the first 64 characters contain all the critical and essential information, and that the remaining 192 characters (for a total of 256 characters) only contain details knowing that they may be truncated, discarded, or not displayed in some software applications. BWF Metadata - Providing Metadata Field Descriptions Depending on the complexity of the BWF Descriptions, either a Microscoft Excel spreadsheet or Word document are basic tools to provide the descriptive metadata. If you have XML representations of your metadata, we can work with that, too. An Excel spreadsheet is preferred if the character-count for the descriptions is straightforward and obviously no more than 64 characters. Otherwise, the “Word Count” tool in Microsoft Word can be used to check the number of characters in a description, and the descriptions provided in a Microsoft Word document. NOTE: When using the “Word Count” tool, be sure to: (a) highlight the field description for which you want to count the characters, and (b) look at the “Characters (with spaces)” value because spaces also count against the maximum number of characters. BWF Metadata - Example: WAVE RIFF INFO_List Tags One way to use the WAVE RIFF INFO_List tags in conjunction with the BWF fields is to make the INFO_List tags common to a group of recordings, whereas the BWF fields will contain details describing just one unique recording. For example: Title: John Cage Disc Collection Subject: Music, 1939 - 1951 Engineer: Eric Jacobs, The Audio Archive Copyright: Genre: 20th century music, Piano, Percussion, Electronic, Aleatory Artist: Cage, Couper, Russell, Roldan, Harrison, Beyer, Cowell, Wolff Keywords: Piano, Percussion, Music, Experimental, Chance, Radio, Dance Originator Software: WaveLab 6.10 Creation Date: 2006-09-11 Original Medium: 16-inch Electric Transcription (ET) Disc There is also a "Comment" field available. This is how the RIFF INFO_List tags appear in the Cube-Tec Audiocube and Wavelab software (click on the image to display the full size image): The WAVE-RIFF INFO_List tags corresponding to these fields are: INAM (Title) ISBJ (Subject) IENG (Engineer) ICOP (Copyright) IGNR (Genre) IART (Artist) IKEY (Keywords) ISFT (Originator Software) ICRD (Creation Date) IMED (Original Medium) ICMT (Comment) BWF Metadata - Example: BWF extension fields Description: H 029 John Cage. Imaginary Landscape (1 of 2) - copies 1/3, 2/3 Originator: John Cage Origination Date (yyyy-mm-dd): 1945-02-18 Coding History: <entered by audio engineer> Originator reference: <unique source identifier> Origination Time (hh-mm-ss): 09-59-59 Time Reference: 0s Please see the official EBU Standard documents listed in our Standards section on this website. The EBU BWF User Guide provides an excellent introduction to the use of BWF fields. The Audio Archive can help you generate the correct BWF extension fields for your project. Here is what the BWF extension fields look like in the Cube-Tec Audiocube and Wavelab applications (click on the image to display the full size image): MP3 Metadata - Background There are many MP3 fields (also known as ID3 tags), but the most common fields for compatibility among most MP3 players are listed further in this section. The MP3 standard allows for up to 255 characters, but various MP3 players handle the display of these fields differently. For example, an iPod only displays the first 20 characters or so. The number of characters displayed is variable and depends on font kerning and the iPod model screen size. Another iPod limitation is that the "Artist", "Album Title" and song "Title" are truncated when scrolling through lists, but when playing a song, the iPod will display the entire song “Title” (but the “Artist” and “Album Title” will still be truncated). Windows Media Player, the Apple iTunes player, and other computer-based players will display all 255 characters, although you may have to adjust the window size and column width to see the entire description. Generally, The Audio Archive advises limiting the character count to 32 characters if you envision the MP3 files being accessed on portable players or other specialized applications, otherwise using the same field lengths as the WAVE-RIFF and BWF are fine. If you would like 32 character field descriptions, you will need to provide shorter descriptions for the following fields (MP3 ID3 tags): MP3 Metadata - MP3 ID3 Tags Our recommendations on how to generrate MP3 ID3 tags if you already use the RIFF LIST_info tags is as follows: Artist: <USE WAVE-RIFF TITLE> Album Title: <USE WAVE-RIFF SUBJECT> Title: <USE BWF DESCRIPTION> Year: <USE BWF ORIGINATION DATE> Track Number: <possibly use this to sequence titles in a group> Genre: <USE WAVE-RIFF GENRE> AES-X098B Technical Metadata The Audio Archive is a member of the AES-X098 standards group, and has been contributing to the development of this draft standard for audio technical metadata over the past two years. We expect this standard to be finalized in late-2008, and have been providing customers with data that conforms to this draft standard since 2006. AES-X098B technical metadata (over 30 fields) capture for each audio item: Media type Condition of the media Processes used to conserve and playback the media Playback anomalies (dropouts, noise, speed, EQ) The benefits of capturing this information to archivists, researchers and listeners: Assessment of media condition Assessment of recording quality Identify chronology, sequence, or other relationships between recordings based on physical or electrical properties (media type and manufacturer, playback speed, length and other dimensions, common noise problems, similar damage or deterioration of the media, format) Reduce the need to access the physical collection Examples of AES-X098B are available from The Audio Archive as Microsoft Excel spreadsheets upon request. File Naming Schema There is no single best way to name the audio files. Naming conventions will depend on the collection and how it is organized, as well as the media type and format. Possible elements that can be used in a file-naming schema include: Accession number Collection name Date of recording Item number Serial number Disc or tape side Part number (recordings that span multiple media) Media type (open reel, coarse groove, LP, cassette) Format (sample rate, word length, bit rate) Intended use (preservation master, access copy) Serial number The above elements are hardly an exhaustive list, but these are commonly used elements for human-generated file names. Depending upon the nature of your collection, you might want to name files based on artist, instrument, geographical location, or any number of possibilities. If you are uncertain which elements to incorporate into a file-naming schema, we will work with you to identify a reasonable schema. For maximum compatibility across operating systems and media types, we recommend that file names be limited to a maximum of 32 characters (not including the file extension). ISO standards for file systems often restrict string lengths to 128 characters (including the file extension). Yet another option is to use randomly (or rule-based) machine-generated file names. Often a database application can generate these names or "handles".
© The Audio Archive, Inc. - All Rights Reserved Privacy Notice Graphic Design by A.J. Ross