dockerToEcr
Pulled/Modified from original source: https://github.com/wellcomecollection/platform-infrastructure/blob/4b16beef44efbe8faa9a62f5459ab6f706e07032/builds/copy_docker_images_to_ecr.py
Docker Hub is starting to introduce rate limits for anonymous users. [1]
We pull images from Docker Hub in our CI setup. We were starting to hit the rate limits in CI (partly because we run a lot of parallel workers, partly because our tests pull a lot of different images).
This script mirrors images from Docker Hub to repositories in ECR inside our AWS account.
Our CI workers are EC2 instances that run inside the same AWS account, so our CI can pull images from ECR instead of Docker Hub without having to pay AWS egress charges or hitting Docker Hub rate limits.
If you want to use this script to mirror images from Docker Hub to ECR:
- Get some local AWS credentials, so that
boto3.client("ecr")
returns an ECR client authenticated against the account you want to mirror images to - Put your own account ID in
ACCOUNT_ID
- Replace the list
IMAGE_TAGS
with the tags of every image you want to mirror
To run:
ENVIRONMENT=build.prod mirror_docker_images_to_ecr.py
- ENVIRONMENT tells the script to locate the proper properties.json file to use.
- (i.e. properties.build.prod.json) to get the list of images.
- It assumes this lives in a CDK repo, so it will look in ../cdk/ for the properties file
The properties.ENVIRONMENT.json file should contain the following as an example:
"mirrored_repos": [
{ "name": "node", "tags": [ "14.4.0-alpine", "15-alpine", "15.10.0" ] },
{ "name": "openjdk", "tags": [ "14-alpine" ] },
{ "name": "amazoncorretto", "tags": [ "16-alpine", "11-alpine" ] }
]
1#!/usr/bin/env python3 2""" 3Pulled/Modified from original source: https://github.com/wellcomecollection/platform-infrastructure/blob/4b16beef44efbe8faa9a62f5459ab6f706e07032/builds/copy_docker_images_to_ecr.py 4 5Docker Hub is starting to introduce rate limits for anonymous users. [1] 6 7We pull images from Docker Hub in our CI setup. We were starting to hit the 8rate limits in CI (partly because we run a lot of parallel workers, partly because 9our tests pull a lot of different images). 10 11This script mirrors images from Docker Hub to repositories in ECR inside 12our AWS account. 13 14Our CI workers are EC2 instances that run inside the same AWS account, so our CI 15can pull images from ECR instead of Docker Hub without having to pay AWS egress 16charges or hitting Docker Hub rate limits. 17 18If you want to use this script to mirror images from Docker Hub to ECR: 19 20* Get some local AWS credentials, so that ``boto3.client("ecr")`` returns 21 an ECR client authenticated against the account you want to mirror images to 22* Put your own account ID in ``ACCOUNT_ID`` 23* Replace the list ``IMAGE_TAGS`` with the tags of every image you want 24 to mirror 25 26To run: 27 28`ENVIRONMENT=build.prod mirror_docker_images_to_ecr.py` 29 30* ENVIRONMENT tells the script to locate the proper properties.json file to use. 31* (i.e. properties.build.prod.json) to get the list of images. 32* It assumes this lives in a CDK repo, so it will look in ../cdk/ for the properties file 33 34The properties.ENVIRONMENT.json file should contain the following as an example: 35 36``` 37"mirrored_repos": [ 38 { "name": "node", "tags": [ "14.4.0-alpine", "15-alpine", "15.10.0" ] }, 39 { "name": "openjdk", "tags": [ "14-alpine" ] }, 40 { "name": "amazoncorretto", "tags": [ "16-alpine", "11-alpine" ] } 41] 42``` 43 44[1]: https://www.docker.com/blog/what-you-need-to-know-about-upcoming-docker-hub-rate-limiting/ 45 46""" 47 48# import base64 49# import subprocess 50import json 51import os 52# import boto3 53from utils.aws import get_aws_account_id, ecr_login 54from utils.docker import docker 55# from botocore.exceptions import ClientError 56 57REGION = os.environ.get('AWS_DEFAULT_REGION', 'us-east-1') 58 59 60# def get_ecr_repo_names_in_account(*, ecr_client): 61# """ 62# Returns a set of all the ECR repository names in an AWS account. 63# """ 64# repo_names = set() 65# 66# paginator = ecr_client.get_paginator("describe_repositories") 67# for page in paginator.paginate(registryId=account_id): 68# for repo in page["repositories"]: 69# repo_names.add(repo["repositoryName"]) 70# 71# return repo_names 72 73 74# def docker_login_to_ecr(ecr_client, *, account_id): 75# """ 76# Authenticate Docker against the ECR repository in a particular account. 77# 78# The authorization token obtained from ECR is good for twelve hours, so this 79# function is cached to save repeatedly getting a token and running `docker login` 80# in quick succession. 81# """ 82# response = ecr_client.get_authorization_token(registryIds=[account_id]) 83# 84# try: 85# auth = response["authorizationData"][0] 86# except (IndexError, KeyError): 87# raise RuntimeError("Unable to get authorization token from ECR!") 88# 89# auth_token = base64.b64decode(auth["authorizationToken"]).decode() 90# username, password = auth_token.split(":") 91# 92# cmd = [ 93# "docker", 94# "login", 95# "--username", 96# username, 97# "--password", 98# password, 99# auth["proxyEndpoint"], 100# ] 101# 102# subprocess.check_call(cmd) 103 104 105# def create_ecr_repository(ecr_client, *, name): 106# """ 107# Create a new ECR repository. 108# """ 109# try: 110# ecr_client.create_repository(repositoryName=name) 111# except ClientError as err: 112# if err.response["Error"]["Code"] == "RepositoryAlreadyExistsException": 113# pass 114# else: 115# raise 116 117 118# def mirror_docker_hub_images_to_ecr(ecr_client, *, account_id, image_tags): 119def mirror_docker_hub_images_to_ecr(image_tags): 120 """ 121 Given the name/tag of images in Docker Hub, mirror those images to ECR. 122 """ 123 124 print("Authenticating Docker with ECR...") 125 ecr_login() 126 # docker_login_to_ecr(ecr_client, account_id=account_id) 127 128 for hub_tag in image_tags: 129 ecr_tag = f"{get_aws_account_id()}.dkr.ecr.{REGION}.amazonaws.com/mirrored/{hub_tag}" 130 print(f"Mirroring {hub_tag} to {ecr_tag}") 131 docker("pull", hub_tag) 132 docker("tag", hub_tag, ecr_tag) 133 docker("push", ecr_tag) 134 135 136if __name__ == "__main__": 137 env_file = open('properties.'+os.environ["ENVIRONMENT"]+'.json', "r") 138 env_data = json.load(env_file) 139 IMAGE_TAGS = [] 140 for s in env_data['mirrored_repos']: 141 for t in s['tags']: 142 IMAGE_TAGS.append(s['name'] + ":" + t) 143 144 # mirror_docker_hub_images_to_ecr(ecr_client=boto3.client("ecr"), account_id=get_aws_account_id(), image_tags=IMAGE_TAGS) 145 mirror_docker_hub_images_to_ecr(image_tags=IMAGE_TAGS)
120def mirror_docker_hub_images_to_ecr(image_tags): 121 """ 122 Given the name/tag of images in Docker Hub, mirror those images to ECR. 123 """ 124 125 print("Authenticating Docker with ECR...") 126 ecr_login() 127 # docker_login_to_ecr(ecr_client, account_id=account_id) 128 129 for hub_tag in image_tags: 130 ecr_tag = f"{get_aws_account_id()}.dkr.ecr.{REGION}.amazonaws.com/mirrored/{hub_tag}" 131 print(f"Mirroring {hub_tag} to {ecr_tag}") 132 docker("pull", hub_tag) 133 docker("tag", hub_tag, ecr_tag) 134 docker("push", ecr_tag)
Given the name/tag of images in Docker Hub, mirror those images to ECR.