dockerToEcr

Pulled/Modified from original source: https://github.com/wellcomecollection/platform-infrastructure/blob/4b16beef44efbe8faa9a62f5459ab6f706e07032/builds/copy_docker_images_to_ecr.py

Docker Hub is starting to introduce rate limits for anonymous users. [1]

We pull images from Docker Hub in our CI setup. We were starting to hit the rate limits in CI (partly because we run a lot of parallel workers, partly because our tests pull a lot of different images).

This script mirrors images from Docker Hub to repositories in ECR inside our AWS account.

Our CI workers are EC2 instances that run inside the same AWS account, so our CI can pull images from ECR instead of Docker Hub without having to pay AWS egress charges or hitting Docker Hub rate limits.

If you want to use this script to mirror images from Docker Hub to ECR:

  • Get some local AWS credentials, so that boto3.client("ecr") returns an ECR client authenticated against the account you want to mirror images to
  • Put your own account ID in ACCOUNT_ID
  • Replace the list IMAGE_TAGS with the tags of every image you want to mirror

To run:

ENVIRONMENT=build.prod mirror_docker_images_to_ecr.py

  • ENVIRONMENT tells the script to locate the proper properties.json file to use.
  • (i.e. properties.build.prod.json) to get the list of images.
  • It assumes this lives in a CDK repo, so it will look in ../cdk/ for the properties file

The properties.ENVIRONMENT.json file should contain the following as an example:

"mirrored_repos": [
    { "name": "node", "tags": [ "14.4.0-alpine", "15-alpine", "15.10.0" ] },
    { "name": "openjdk", "tags": [ "14-alpine" ] },
    { "name": "amazoncorretto", "tags": [ "16-alpine", "11-alpine" ] }
]
  1#!/usr/bin/env python3
  2"""
  3Pulled/Modified from original source: https://github.com/wellcomecollection/platform-infrastructure/blob/4b16beef44efbe8faa9a62f5459ab6f706e07032/builds/copy_docker_images_to_ecr.py
  4
  5Docker Hub is starting to introduce rate limits for anonymous users. [1]
  6
  7We pull images from Docker Hub in our CI setup.  We were starting to hit the
  8rate limits in CI (partly because we run a lot of parallel workers, partly because
  9our tests pull a lot of different images).
 10
 11This script mirrors images from Docker Hub to repositories in ECR inside
 12our AWS account.
 13
 14Our CI workers are EC2 instances that run inside the same AWS account, so our CI
 15can pull images from ECR instead of Docker Hub without having to pay AWS egress
 16charges or hitting Docker Hub rate limits.
 17
 18If you want to use this script to mirror images from Docker Hub to ECR:
 19
 20*   Get some local AWS credentials, so that ``boto3.client("ecr")`` returns
 21    an ECR client authenticated against the account you want to mirror images to
 22*   Put your own account ID in ``ACCOUNT_ID``
 23*   Replace the list ``IMAGE_TAGS`` with the tags of every image you want
 24    to mirror
 25
 26To run:
 27
 28`ENVIRONMENT=build.prod mirror_docker_images_to_ecr.py`
 29
 30* ENVIRONMENT tells the script to locate the proper properties.json file to use.
 31* (i.e. properties.build.prod.json) to get the list of images.
 32* It assumes this lives in a CDK repo, so it will look in ../cdk/ for the properties file
 33
 34The properties.ENVIRONMENT.json file should contain the following as an example:
 35
 36```
 37"mirrored_repos": [
 38    { "name": "node", "tags": [ "14.4.0-alpine", "15-alpine", "15.10.0" ] },
 39    { "name": "openjdk", "tags": [ "14-alpine" ] },
 40    { "name": "amazoncorretto", "tags": [ "16-alpine", "11-alpine" ] }
 41]
 42```
 43
 44[1]: https://www.docker.com/blog/what-you-need-to-know-about-upcoming-docker-hub-rate-limiting/
 45
 46"""
 47
 48# import base64
 49# import subprocess
 50import json
 51import os
 52# import boto3
 53from utils.aws import get_aws_account_id, ecr_login
 54from utils.docker import docker
 55# from botocore.exceptions import ClientError
 56
 57REGION = os.environ.get('AWS_DEFAULT_REGION', 'us-east-1')
 58
 59
 60# def get_ecr_repo_names_in_account(*, ecr_client):
 61#     """
 62#     Returns a set of all the ECR repository names in an AWS account.
 63#     """
 64#     repo_names = set()
 65#
 66#     paginator = ecr_client.get_paginator("describe_repositories")
 67#     for page in paginator.paginate(registryId=account_id):
 68#         for repo in page["repositories"]:
 69#             repo_names.add(repo["repositoryName"])
 70#
 71#     return repo_names
 72
 73
 74# def docker_login_to_ecr(ecr_client, *, account_id):
 75#     """
 76#     Authenticate Docker against the ECR repository in a particular account.
 77#
 78#     The authorization token obtained from ECR is good for twelve hours, so this
 79#     function is cached to save repeatedly getting a token and running `docker login`
 80#     in quick succession.
 81#     """
 82#     response = ecr_client.get_authorization_token(registryIds=[account_id])
 83#
 84#     try:
 85#         auth = response["authorizationData"][0]
 86#     except (IndexError, KeyError):
 87#         raise RuntimeError("Unable to get authorization token from ECR!")
 88#
 89#     auth_token = base64.b64decode(auth["authorizationToken"]).decode()
 90#     username, password = auth_token.split(":")
 91#
 92#     cmd = [
 93#         "docker",
 94#         "login",
 95#         "--username",
 96#         username,
 97#         "--password",
 98#         password,
 99#         auth["proxyEndpoint"],
100#     ]
101#
102#     subprocess.check_call(cmd)
103
104
105# def create_ecr_repository(ecr_client, *, name):
106#     """
107#     Create a new ECR repository.
108#     """
109#     try:
110#         ecr_client.create_repository(repositoryName=name)
111#     except ClientError as err:
112#         if err.response["Error"]["Code"] == "RepositoryAlreadyExistsException":
113#             pass
114#         else:
115#             raise
116
117
118# def mirror_docker_hub_images_to_ecr(ecr_client, *, account_id, image_tags):
119def mirror_docker_hub_images_to_ecr(image_tags):
120    """
121    Given the name/tag of images in Docker Hub, mirror those images to ECR.
122    """
123
124    print("Authenticating Docker with ECR...")
125    ecr_login()
126    # docker_login_to_ecr(ecr_client, account_id=account_id)
127
128    for hub_tag in image_tags:
129        ecr_tag = f"{get_aws_account_id()}.dkr.ecr.{REGION}.amazonaws.com/mirrored/{hub_tag}"
130        print(f"Mirroring {hub_tag} to {ecr_tag}")
131        docker("pull", hub_tag)
132        docker("tag", hub_tag, ecr_tag)
133        docker("push", ecr_tag)
134
135
136if __name__ == "__main__":
137    env_file = open('properties.'+os.environ["ENVIRONMENT"]+'.json', "r")
138    env_data = json.load(env_file)
139    IMAGE_TAGS = []
140    for s in env_data['mirrored_repos']:
141        for t in s['tags']:
142            IMAGE_TAGS.append(s['name'] + ":" + t)
143
144    # mirror_docker_hub_images_to_ecr(ecr_client=boto3.client("ecr"), account_id=get_aws_account_id(), image_tags=IMAGE_TAGS)
145    mirror_docker_hub_images_to_ecr(image_tags=IMAGE_TAGS)
def mirror_docker_hub_images_to_ecr(image_tags)
120def mirror_docker_hub_images_to_ecr(image_tags):
121    """
122    Given the name/tag of images in Docker Hub, mirror those images to ECR.
123    """
124
125    print("Authenticating Docker with ECR...")
126    ecr_login()
127    # docker_login_to_ecr(ecr_client, account_id=account_id)
128
129    for hub_tag in image_tags:
130        ecr_tag = f"{get_aws_account_id()}.dkr.ecr.{REGION}.amazonaws.com/mirrored/{hub_tag}"
131        print(f"Mirroring {hub_tag} to {ecr_tag}")
132        docker("pull", hub_tag)
133        docker("tag", hub_tag, ecr_tag)
134        docker("push", ecr_tag)

Given the name/tag of images in Docker Hub, mirror those images to ECR.