Extract a huge number of files from AWS s3 glacier

…or how to run multiple commands in parallel

You can first try s3cmd and if it doesn’t work, go for an advanced solution which supports millions of files.

s3cmd restore \
    --recursive s3://bucket.raw.rifiniti.com \
    --restore-days=10

To bulk request files to be extracted from glacier I use this script. I hope that will be useful to you also

#!/bin/bash
#
# Get s3 objects from glacier by prefix
# The prefix is optional!
#
# How to use:
#  ./export-prefix.sh bucketName 30 2019-04-30
#  ./export-prefix.sh bucketName 30
#
#
export bucket=$1

# How many days to keep the objects
export day=$2
export prefix=$3

if [ -z "$prefix" ]
then
  cmd="aws2 s3api list-objects  --bucket $bucket"
else
  cmd="aws2 s3api list-objects  --bucket $bucket --prefix $prefix"
fi

readarray -t KEYS < <($cmd | jq '.Contents[] |  select( .StorageClass != "STANDARD" ) | ."Key"')
for key in "${KEYS[@]}"; do
  echo "aws s3api restore-object --bucket $bucket --key ${key} --restore-request '{\"Days\":$day,\"GlacierJobParameters\":{\"Tier\":\"Standard\"}}'" >> /tmp/commands.sh
done

echo "Generated file /tmp/commands.sh"

echo "Splitting the huge file into small files: /tmp/sub-commands*"
split -l 1000 /tmp/commands.sh /tmp/sub-commands.sh.
chmod a+x /tmp/sub-commands*

The script will generate in /tmp/commands.sh file with all the commands that you need to run.

When you have a lot of files it would be not possible to run the bash script because it would be killed at some point. To avoid this, we have to split the /tmp/commands.sh into parts. This is what the last part of the shell script is doing.

Now use this snippet to run the commands file by file.

for x in `ls /tmp/sub-commands*`; do
  echo "working on $x"
  `$x`
done

Or if you have installed “parallels” you can run much faster with

for x in `ls /tmp/sub-commands*`; do
  echo "working on $x"
  parallel -j 10 < $x
done

Update: Make the script work with keys containing spaces

Update2: Make it work with a lot of files and add parallel example

Gudasoft

Software development

Recent Posts

Categories

Katz Got Your Tongue

Extract a huge number of files from AWS s3 glacier

…or how to run multiple commands in parallel

Google analytics