Bagging

Bag Names

Root directories for bags will be named using a combination of institutional id as determined by the institutional profile inside of APTrust and the unique identifier of the item to be preserved. Bag root directories must conform to the naming conventions listed under ‘File and Folder Naming Restrictions’ below. Dots in the bag root name should be used as delimiters between name parts as designated above and any dots or other special characters normally found in either institution ID or item unique ID should be truncated or converted to dashes or underscores. See Bag Name Examplesbelow for more information.

Multipart bag names must end with ‘b###.of###’ where ### is the number of that bag in the bag count.  Bag count sequences begin at 001. 

For example, if the University of Virginia has institutional code ‘virginia.edu’ and is creating a bag for an item with the unique ID ‘uva-lib:1229365’ then the bag root directory should be named ‘virginia.edu.uva-lib_1229365’


If this was a 200 multipart bag then the first bag root directory could be named ‘virginia.edu.uva-lib_1229365.b001.of200’, the second ‘virginia.edu.uva-lib_1229365.b002.of200’, and the last bag being ‘virginia.edu.uva-lib_1229365.b200.of200’.  When tarred these will of course carry the .tar extension for for example ‘virginia.edu.uva-lib_1229365.b016.of200.tar’


We enforce bag naming conventions because when we untar bags in a staging area to validate their contents, we don't want bags untarring to the same directory and overwriting each other.


 Name OK? Comment
 photos.tar No Institution name is missing.
 ncsu.photos.tar Yes* Should untar to a directory called ncsu.photos
 ncsu.edu.photos.tar Yes* Should untar to a directory called ncsu.edu.photos
 ncsu.edu.photos.b1.tar   No Should be b01.of10, assuming there are 10 parts to this bag.
 ncsu.edu.photos.b01.of10.tarYes  Note the dot before "of10", and note that ".tar" comes at the end.
 

File and Directory Names

File and Folder names must follow POSIX conventions:


  • Contain upper or lower case letters, numbers, dots or dashes. (A–Z a–z 0–9 . _ -)

  • Are considered case sensitive.

  • MUST not begin with a dash. (-)

  • Restricted to 255 characters in length including extension.

  • MUST be at least 1 character in length.



Other Considerations to be aware of:


  • Generic Files in APTrust are referenced by their uri, which is the original filepath relative to the bag.  This will support atomistic updating of items in the future.

  • File and folder names should be unique across multi-part bags to make sure all items are processed and not treated as a file update.

  • Though APTrust does not currently version files, you can easily create item versions in APTrust by writing files to a bag using the datetime stamp in the filename.

  • File and Folder names are treated as case sensitive for processing purposes.

Bag Structure


\----[custom_tag_dir]/

    |    [custom tag files]

Manifests

Tag Manifests

Custom Tag Files

As of March 29, 2016, we preserve all tag files, except bagit.txt, which will be recreated when you restore a bag. Custom tag files may be in any format, including binary. We will not try to parse them, but we will validate their checksums if they are listed in the tag manifests.

Required Tag Files

bagit.txt


This is requited by the BagIt specification, and should contain the following:


BagIt-Version:  0.97

Tag-File-Character-Encoding:  UTF-8

bag-info.txt file

Valid APTrust bags MUST contain a bag-info.txt file with the following fields, which may be blank:


Source-Organization:  This should be the human readable name of the APTrust partner organization.

Bagging-Date: as per specification using ISO 8601 UTC format.

Bag-Count:  as per specification

Internal-Sender-Description:  [Optional] Human readable description of the contents of the bag.

Internal-Sender-Identifier:  [Optional] Internal or alternate identifier used at the senders location.


This file MAY contain additional fields.

aptrust-info.txt

This file MUST be present and MUST contain the following tag fields.


Title:  Human readable title for searching and listing in APTrust.

Access:  One of three enumerated access conditions.  [“Consortia”, “Restricted”, “Institution”]


Bag Serialization

Bags serialize for use by APTrust must use TAR as their serialization format, MUST not use compression and MUST follow the file and folder naming restrictions as well as end with the .tar extension.

Bag Size

Initially bags sent to APTrust should be limited to 250 GB for the final tarred bag.  Space available for temporary file processing puts a practical limit on total bag sizes in APTurst.  We expect this limit to grow over time but the initial performance data will help determine the final limits for the service.


Checklist


 Bag name follows the pattern <institution.edu>.bag_name[.b###.of###].tar     
 Bag untars to a directory whose name matches the name of the tar file, minus the .tar extension. 
  
  
  
  
  
  
  
  
  
  
  
  


Updated on Apr 1, 2016 by Andrew Diamond (Version 7)