Bagging

Bag Names





 Name OK? Comment
photos.tar NoInstitution name is missing.
ncsu.photos.tar Yes*Should untar to a directory called ncsu.photos
ncsu.edu.photos.tar Yes*Should untar to a directory called ncsu.edu.photos
ncsu.edu.photos.b1.tar NoShould be b01.of10, assuming there are 10 parts to this bag.
ncsu.edu.photos.b01.of10.tar Yes Note the dot before "of10", and note that ".tar" comes at the end.


* Because early versions of this document were unclear, some institutions uploaded bags with names like "institution.bag.tar" while others used "institution.edu.bag.tar," and the system accepted both naming schemes. The system will continue to accept both naming schemes, but for the sake of consistency, and to simplify your internal processes, you should stick with one or the other.

File and Directory Names

File and Folder names must follow POSIX conventions:

Other Considerations to be aware of:
  • Generic Files in APTrust are referenced by their uri, which is the original filepath relative to the bag. This will support atomistic updating of items in the future.
  • File and folder names should be unique across multi-part bags to make sure all items are processed and not treated as a file update.
  • Though APTrust does not currently version files, you can easily create item versions in APTrust by writing files to a bag using the datetime stamp in the filename.
  • File and Folder names are treated as case sensitive for processing purposes.

Bag Structure

Bags must have the following structure. Items in bold are required. Others are optional. Additional notes appear below. Note the new rules on manifests!

<institution_id.item_uid[.b###.of###]>/

|    aptrust-info.txt

|    bag-info.txt

|    bagit.txt

|    manifest-md5.txt and/or manifest-sha256.txt

|    tagmanifest-md5.txt

|    tagmanifest-sha256.txt

|    [custom tag files]

\----data/

    |    [payload files]

\----[custom_tag_dir]/

    |    [custom tag files]

Required Tag Files

bagit.txt

This is requited by the BagIt specification, and should contain the following:


BagIt-Version:  0.97

Tag-File-Character-Encoding:  UTF-8

bag-info.txt

Valid APTrust bags MUST contain a bag-info.txt file with the following fields, which may be blank:

This file MAY contain additional fields.

aptrust-info.txt

This file MUST be present and MUST contain the following tag fields.
  • Title: Human readable title for searching and listing in APTrust. This cannot be empty.  
  • Access: One of three enumerated access conditions. [“Consortia”, “Restricted”, “Institution”]

Bag Serialization

Bags serialize for use by APTrust must use TAR as their serialization format, MUST not use compression and MUST follow the file and folder naming restrictions as well as end with the .tar extension.

Bag Size

Initially bags sent to APTrust should be limited to 250 GB for the final tarred bag. Space available for temporary file processing puts a practical limit on total bag sizes in APTurst. We expect this limit to grow over time but the initial performance data will help determine the final limits for the service.

Quick Checklist

Valid bags meet all of the following criteria:
  • The bag was submitted as a tar file, without compression
  • Bag name follows the pattern <institution.edu>.bag_name[.b###.of###].tar. 
  • Bag untars to a directory whose name matches the name of the tar file, minus the .tar extension.
  • Bag contains an md5 or sha256 manifest (or both)
  • Bag contains the data directory
  • Bag contains bagit.txt, as described above
  • Bag contains bag-info.txt as described above
  • Bag contains aptrust-info.txt as described above
  • All data files are in the manifest, and all checksums matched
  • All tag files mentioned in the tag manifest are present, and checksums match (you may omit tag files from the tag manifests)



Updated on Jun 23, 2016 by Andrew Diamond (Version 14)