使用 Amazon Rekognition 检测整个幻灯片图像中的有丝分裂图

由柏拉图重新发布

关注： 0

即使在引入一百多年后，组织学仍然是肿瘤诊断和预后的金标准。解剖病理学家评估组织学，根据癌症患者的肿瘤基因型和表型及其临床结果将其分为不同的组 [1,2]。然而，人类对组织学载玻片的评估是主观的且不可重复的 [3]。此外，组织学评估是一个耗时的过程，需要训练有素的专业人员。

随着过去十年的重大技术进步，全幻灯片成像 (WSI) 和深度学习 (DL) 等技术现已广泛应用。 WSI 是对传统显微镜载玻片进行扫描，以从这些载玻片中生成单一的高分辨率图像。这允许对大量病理图像进行数字化和收集，这本来会非常耗时且昂贵。此类数据集的可用性通过使用机器学习 (ML) 等技术来帮助病理学家通过快速识别感兴趣的特征来加速诊断，从而创造了加速诊断的创新方法。

在本文中，我们将探索没有 ML 经验的开发人员如何使用 Amazon Rekognition自定义标签训练对细胞特征进行分类的模型。 Amazon Rekognition 自定义标签是亚马逊重新认识这使您能够构建自己的基于 ML 的专业图像分析功能，以检测特定用例中不可或缺的独特对象和场景。特别是，我们使用包含犬乳腺癌 [1] 的完整幻灯片图像的数据集来演示如何处理这些图像并训练检测有丝分裂图的模型。此数据集的使用已获得 Marc Aubreville 教授的许可，他同意允许我们在这篇文章中使用它。有关更多信息，请参阅本文末尾的致谢部分。

解决方案概述

该解决方案由两个部分组成：

Amazon Rekognition 自定义标签模型 — 为了使 Amazon Rekognition 能够检测有丝分裂图，我们完成了以下步骤：
- 使用 WSI 数据集采样以生成足够大小的图像亚马逊SageMaker Studio 以及在 Jupyter notebook 上运行的 Python 代码。 Studio 是一个基于 Web 的 ML 集成开发环境 (IDE)，它提供了将模型从试验阶段推向生产阶段所需的所有工具，同时还能提高您的工作效率。我们将使用 Studio 将图像拆分成更小的图像来训练我们的模型。
- 使用上一步中准备的数据训练 Amazon Rekognition 自定义标签模型以识别苏木精-伊红样本中的有丝分裂图。
前端应用程序 — 为了演示如何使用我们在上一步中训练的模型，我们完成了以下步骤：

下图说明了解决方案体系结构。

部署本文中讨论的实现所需的所有资源以及整个部分的代码都可以在 GitHub上. 您可以克隆或分叉存储库，进行所需的任何更改，然后自己运行它。

在接下来的步骤中，我们将通过代码了解获取和准备数据、训练模型以及在示例应用程序中使用它所涉及的不同步骤。

成本

运行本演练中的步骤时，您会因使用以下 AWS 服务而产生少量费用：

亚马逊重新认识
AWS 法门
应用程序负载均衡器
AWS机密管理器

此外，如果不再处于免费套餐期限或条件内，您可能会因以下服务产生费用：

代码管道
代码构建
亚马逊ECR
亚马逊SageMaker

如果您在完成本演练后正确完成清理步骤，如果 Amazon Rekognition 自定义标签模型和 Web 应用程序运行一小时或更短时间，您的成本可能会低于 10 美元。

先决条件

要完成所有步骤，您需要具备以下条件：

训练有丝分裂图分类模型

我们运行从 Studio 笔记本训练模型所需的所有步骤。如果您以前从未使用过 Studio，您可能需要在船上第一的。有关详细信息，请参阅快速上载到Amazon SageMaker Studio.

以下某些步骤需要比标准 ml.t3.medium 笔记本中可用的内存更多的内存。确保您选择了 ml.m5.large 笔记本。您应该会在页面右上角看到 2 vCPU + 8 GiB 指示。

本节的代码可作为 Jupyter笔记本文件.

加入 Studio 后，请按照这些说明授予 Studio 代表您调用 Amazon Rekognition 的必要权限。

依赖

首先，我们需要完成以下步骤：

更新 Linux 软件包并安装所需的依赖项，例如 OpenSlide：

!apt update > /dev/null && apt dist-upgrade -y > /dev/null
!apt install -y build-essential openslide-tools python-openslide libgl1-mesa-glx > /dev/null

使用 pip 安装 fastai 和 SlideRunner 库：

!pip install SlideRunner SlideRunner_dataAccess fastai==1.0.61 > /dev/null

下载数据集（我们提供了一个脚本来自动执行此操作）：
```
from dataset import download_dataset
download_dataset()
```

处理数据集

我们将从导入我们在整个数据准备阶段使用的一些包开始。然后，我们下载并加载该数据集的注释数据库。该数据库包含有丝分裂图形（我们要分类的特征）在整个幻灯片图像中的位置。请参见以下代码：

%reload_ext autoreload
%autoreload 2
import os
from typing import List
import urllib
import numpy as np
from SlideRunner.dataAccess.database import Database
from pathlib import Path DATABASE_URL = 'https://github.com/DeepPathology/MITOS_WSI_CMC/raw/master/databases/MITOS_WSI_CMC_MEL.sqlite'
DATABASE_FILENAME = 'MITOS_WSI_CMC_MEL.sqlite' Path("./databases").mkdir(parents=True, exist_ok=True)
local_filename, headers = urllib.request.urlretrieve( DATABASE_URL, filename=os.path.join('databases', DATABASE_FILENAME),
)

因为我们使用的是 SageMaker，所以我们创建了一个新的 SageMaker 会议对象来简化任务，例如将我们的数据集上传到亚马逊简单存储服务（亚马逊 S3）桶。我们还使用 SageMaker 默认创建的 S3 存储桶来上传我们处理过的图像文件。

slidelist_test 数组包含幻灯片的 ID，我们将其用作测试数据集的一部分，以评估训练模型的性能。请参见以下代码：

import sagemaker
sm_session = sagemaker.Session() size=512
bucket_name = sm_session.default_bucket() database = Database()
database.open(os.path.join('databases', DATABASE_FILENAME)) slidelist_test = ['14','18','3','22','10','15','21']

下一步是获取一组训练区域和测试幻灯片，以及其中的标签，我们可以从中提取较小的区域来训练我们的模型。 get_slides 的代码在 sampling.py 文件中 GitHub上.

from sampling import get_slides image_size = 512 lbl_bbox, training_slides, test_slides, files = get_slides(database, slidelist_test, negative_class=1, size=image_size)

我们想从训练和测试幻灯片中随机抽样。我们使用训练和测试幻灯片列表并随机选择 n_training_images 时间文件进行培训，和 n_test_images 测试一个文件：

n_training_images = 500
n_test_images = int(0.2 * n_training_images) training_files = list([ (y, files[y]) for y in np.random.choice( [x for x in training_slides], n_training_images)
])
test_files = list([ (y, files[y]) for y in np.random.choice( [x for x in test_slides], n_test_images)
])

接下来，我们为训练图像创建一个目录，为测试图像创建一个目录：

Path("rek_slides/training").mkdir(parents=True, exist_ok=True)
Path("rek_slides/test").mkdir(parents=True, exist_ok=True)

在我们生成训练模型所需的较小图像之前，我们需要一些辅助代码来生成描述训练和测试数据所需的元数据。以下代码确保感兴趣特征（有丝分裂图）周围的给定边界框正好在我们正在切割的区域内，并生成一行 JSON 描述图像及其中的特征亚马逊SageMaker地面真相格式，这是 Amazon Rekognition 自定义标签需要的格式。有关用于对象检测的此清单文件的更多信息，请参阅清单文件中的对象本地化.

def check_bbox(x_start: int, y_start: int, bbox) -> bool: return (bbox._left > x_start and bbox._right < x_start + image_size and bbox._top > y_start and bbox._bottom < y_start + image_size) def get_annotation_json_line(filename, channel, annotations, labels): objects = list([{'confidence' : 1} for i in range(0, len(annotations))]) return json.dumps({ 'source-ref': f's3://{bucket_name}/data/{channel}/{filename}', 'bounding-box': { 'image_size': [{ 'width': size, 'height': size, 'depth': 3 }], 'annotations': annotations, }, 'bounding-box-metadata': { 'objects': objects, 'class-map': dict({ x: str(x) for x in labels }), 'type': 'groundtruth/object-detection', 'human-annotated': 'yes', 'creation-date': datetime.datetime.now().isoformat(), 'job-name': 'rek-pathology', } }) def generate_annotations(x_start: int, y_start: int, bboxes, labels, filename: str, channel: str): annotations = [] for bbox in bboxes: if check_bbox(x_start, y_start, bbox): # Get coordinates relative to this slide. x0 = bbox.left - x_start y0 = bbox.top - y_start annotation = { 'class_id': 1, 'top': y0, 'left': x0, 'width': bbox.right - bbox.left, 'height': bbox.bottom - bbox.top } annotations.append(annotation) return get_annotation_json_line(filename, channel, annotations, labels)

随着 generate_annotations 功能到位，我们可以编写代码来生成训练和测试图像：

import datetime
import json
import random from fastai import *
from fastai.vision import *
from tqdm.notebook import tqdm # Margin size, in pixels, for training images. This is the space we leave on
# each side for the bounding box(es) to be well into the image.
margin_size = 64 training_annotations = []
test_annotations = [] def check_bbox(x_start: int, y_start: int, bbox) -> bool: return (bbox._left > x_start and bbox._right < x_start + image_size and bbox._top > y_start and bbox._bottom < y_start + image_size) def generate_images(file_list) -> None: for f_idx in tqdm(range(0, len(file_list)), desc='Writing training images...'): slide_idx, f = file_list[f_idx] bboxes = lbl_bbox[slide_idx][0] labels = lbl_bbox[slide_idx][1] # Calculate the minimum and maximum horizontal and vertical positions # that bounding boxes should have within the image. x_min = min(map(lambda x: x.left, bboxes)) - margin_size y_min = min(map(lambda x: x.top, bboxes)) - margin_size x_max = max(map(lambda x: x.right, bboxes)) + margin_size y_max = max(map(lambda x: x.bottom, bboxes)) + margin_size result = False while not result: x_start = random.randint(x_min, x_max - image_size) y_start = random.randint(y_min, y_max - image_size) for bbox in bboxes: if check_bbox(x_start, y_start, bbox): result = True break filename = f'slide_{f_idx}.png' channel = 'test' if slide_idx in test_slides else 'training' annotation = generate_annotations(x_start, y_start, bboxes, labels, filename, channel) if channel == 'training': training_annotations.append(annotation) else: test_annotations.append(annotation) img = Image(pil2tensor(f.get_patch(x_start, y_start) / 255., np.float32)) img.save(f'rek_slides/{channel}/{filename}') generate_images(training_files)
generate_images(test_files)

获得所有必需数据的最后一步是编写一个 manifest.json 每个数据集的文件：

with open('rek_slides/training/manifest.json', 'w') as mf: mf.write("n".join(training_annotations)) with open('rek_slides/test/manifest.json', 'w') as mf: mf.write("n".join(test_annotations))

将文件传输到 S3

我们使用 upload_data SageMaker 会话对象公开的用于将图像和清单文件上传到默认 SageMaker S3 存储桶的方法：

import sagemaker sm_session = sagemaker.Session()
data_location = sm_session.upload_data( './rek_slides', bucket=bucket_name,
)

训练 Amazon Rekognition 自定义标签模型

借助 Amazon S3 中已有的数据，我们可以开始训练自定义模型。我们使用 Boto3 库创建一个 Amazon Rekognition 客户端并创建一个项目：

import boto3 project_name = 'rek-mitotic-figures-workshop' rek = boto3.client('rekognition')
response = rek.create_project(ProjectName=project_name) # If you have already created the project, use the describe_projects call to
# retrieve the project ARN.
# response = rek.describe_projects()['ProjectDescriptions'][0] project_arn = response['ProjectArn']

项目准备就绪后，您现在需要一个指向 Amazon S3 中训练和测试数据集的项目版本。每个版本理想地指向不同的数据集（或它的不同版本）。这使我们能够拥有不同版本的模型，比较它们的性能，并根据需要在它们之间切换。请参见以下代码：

version_name = '1' output_config = { 'S3Bucket': bucket_name, 'S3KeyPrefix': 'output',
} training_dataset = { 'Assets': [ { 'GroundTruthManifest': { 'S3Object': { 'Bucket': bucket_name, 'Name': 'data/training/manifest.json' } }, }, ]
} testing_dataset = { 'Assets': [ { 'GroundTruthManifest': { 'S3Object': { 'Bucket': bucket_name, 'Name': 'data/test/manifest.json' } }, }, ]
} def describe_project_versions(): describe_response = rek.describe_project_versions( ProjectArn=project_arn, VersionNames=[version_name], ) for model in describe_response['ProjectVersionDescriptions']: print(f"Status: {model['Status']}") print(f"Message: {model['StatusMessage']}") return describe_response response = rek.create_project_version( VersionName=version_name, ProjectArn=project_arn, OutputConfig=output_config, TrainingData=training_dataset, TestingData=testing_dataset,
) waiter = rek.get_waiter('project_version_training_completed')
waiter.wait( ProjectArn=project_arn, VersionNames=[version_name],
) describe_response = describe_project_versions()

在我们创建项目版本后，Amazon Rekognition 会自动启动训练过程。训练时间取决于几个特征，例如图像的大小和图像的数量、类的数量等。在这种情况下，对于 500 张图像，训练大约需要 90 分钟才能完成。

测试模型

训练后，Amazon Rekognition Custom Labels 中的每个模型都在 STOPPED 状态。要将其用于推理，您需要启动它。我们从项目版本描述中获取项目版本 ARN，并将其传递给 start_project_version. 请注意 MinInferenceUnits 参数——我们从一个推理单元开始。此推理单元支持的实际最大每秒事务数 (TPS) 取决于您的模型的复杂性。要了解有关 TPS 的更多信息，请参阅此博客文章.

model_arn = describe_response['ProjectVersionDescriptions'][0]['ProjectVersionArn'] response = rek.start_project_version( ProjectVersionArn=model_arn, MinInferenceUnits=1,
)
waiter = rek.get_waiter('project_version_running')
waiter.wait( ProjectArn=project_arn, VersionNames=[version_name],
)

当您的项目版本列为 RUNNING，您可以开始将图像发送到 Amazon Rekognition 进行推理。

我们使用测试数据集中的文件之一来测试新启动的模型。您可以改用任何合适的 PNG 或 JPEG 文件。

from matplotlib import pyplot as plt
from PIL import Image, ImageDraw # We'll use one of our test images to try out our model.
with open('./rek_slides/test/slide_0.png', 'rb') as image_file: image_bytes=image_file.read() # Send the image data to the model.
response = rek.detect_custom_labels( ProjectVersionArn=model_arn, Image={ 'Bytes': image_bytes }
) img = Image.open(io.BytesIO(image_bytes))
draw = ImageDraw.Draw(img) for custom_label in response['CustomLabels']: geometry = custom_label['Geometry']['BoundingBox'] w = geometry['Width'] * img.width h = geometry['Height'] * img.height l = geometry['Left'] * img.width t = geometry['Top'] * img.height draw.rectangle([l, t, l + w, t + h], outline=(0, 0, 255, 255), width=5) plt.imshow(np.asarray(img))

Streamlit 应用程序

为了演示与 Amazon Rekognition 的集成，我们使用了一个非常简单的 Python 应用程序。我们使用流光库来构建一个简单的用户界面，我们在其中提示用户上传图像文件。

我们使用 Boto3 库和 detect_custom_labels 方法与项目版本 ARN 一起调用推理端点。响应是一个 JSON 文档，其中包含在图像中检测到的不同对象的位置和类别。在我们的例子中，这些是算法在我们发送到端点的图像中找到的有丝分裂图。请参见以下代码：

import os import boto3
import io
import streamlit as st
from PIL import Image, ImageDraw rek_client = boto3.client('rekognition') uploaded_file = st.file_uploader('Image file')
if uploaded_file is not None: image_bytes = uploaded_file.read() result = rek_client.detect_custom_labels( ProjectVersionArn='<YOUR_PROJECT_ARN_HERE>', Image={ 'Bytes': image_bytes } ) img = Image.open(io.BytesIO(image_bytes)) draw = ImageDraw.Draw(img) st.write(result['CustomLabels']) for custom_label in result['CustomLabels']: st.write(f"Label {custom_label['Name']}, confidence {custom_label['Confidence']}") geometry = custom_label['Geometry']['BoundingBox'] w = geometry['Width'] * img.width h = geometry['Height'] * img.height l = geometry['Left'] * img.width t = geometry['Top'] * img.height st.write(f"Left, top = ({l}, {t}), width, height = ({w}, {h})") draw.rectangle([l, t, l + w, t + h], outline=(0, 0, 255, 255), width=5) st_img = st.image(img)

将应用程序部署到 AWS

要部署应用程序，我们使用 AWS CDK 脚本。整个项目可以在 GitHub上 . 让我们看看脚本部署的不同资源。

创建 Amazon ECR 存储库

作为设置部署的第一步，我们创建了一个 Amazon ECR 存储库，我们可以在其中存储我们的应用程序容器映像：

aws ecr create-repository --repository-name rek-wsi

在 AWS Secrets Manager 中创建和存储您的 GitHub 令牌

CodePipeline 需要 GitHub 个人访问令牌来监控您的 GitHub 存储库的更改和拉取代码。要创建令牌，请按照 GitHub 文档. 该令牌需要以下 GitHub 范围：

repo 作用域，用于完全控制从公共和私有存储库读取工件并将其拉入管道。
admin:repo_hook 范围，用于完全控制存储库挂钩。

创建令牌后，将其存储在新的秘密中 AWS机密管理器如下：

aws secretsmanager create-secret --name rek-wsi/github --secret-string "{"oauthToken":"YOUR-TOKEN-VALUE-HERE"}"

将配置参数写入 AWS Systems Manager Parameter Store

AWS CDK 脚本从中读取一些配置参数 AWS Systems Manager参数存储，例如 GitHub 仓库的名称和所有者，以及目标帐户和区域。在启动 AWS CDK 脚本之前，您需要在自己的账户中创建这些参数。

您可以使用 AWS CLI 执行此操作。只需调用 put-parameter 带有名称、值和参数类型的命令：

aws ssm put-parameter --name <PARAMETER-NAME> --value <PARAMETER-VALUE> --type <PARAMETER_TYPE>

以下是 AWS CDK 脚本所需的所有参数的列表。它们都是类型 String:

/rek_wsi/prod/accountId — 我们部署应用程序的帐户 ID。
/rek_wsi/prod/ecr_repo_name — 存储容器映像的 Amazon ECR 存储库的名称。
/rek_wsi/prod/github/branch — GitHub 存储库中的分支，CodePipeline 需要从中提取代码。
/rek_wsi/prod/github/owner — GitHub 存储库的所有者。
/rek_wsi/prod/github/repo — 存储我们代码的 GitHub 存储库的名称。
/rek_wsi/prod/github/token — Secrets Manager 中包含您的 GitHub 身份验证令牌的密钥的名称或 ARN。这是 CodePipeline 能够与 GitHub 通信所必需的。
/rek_wsi/prod/region — 我们将部署应用程序的区域。

请注意 prod 所有参数名称中的段。虽然对于这样一个简单的示例，我们不需要这种级别的详细信息，但它将能够在可能需要不同环境的其他项目中重用这种方法。

由 AWS CDK 脚本创建的资源

我们需要在 Fargate 任务中运行的应用程序具有调用 Amazon Rekognition 的权限。所以我们首先创建一个 AWS身份和访问管理 (IAM) 任务角色 RekognitionReadOnlyPolicy 附加政策。请注意， assumed_by 以下代码中的参数采用 ecs-tasks.amazonaws.com 服务负责人。这是因为我们使用 Amazon ECS 作为编排器，所以我们需要 Amazon ECS 承担这个角色并将凭证传递给 Fargate 任务。

streamlit_task_role = iam.Role( self, 'StreamlitTaskRole', assumed_by=iam.ServicePrincipal('ecs-tasks.amazonaws.com'), description='ECS Task Role assumed by the Streamlit task deployed to ECS+Fargate', managed_policies=[ iam.ManagedPolicy.from_managed_policy_arn( self, 'RekognitionReadOnlyPolicy', managed_policy_arn='arn:aws:iam::aws:policy/AmazonRekognitionReadOnlyAccess' ), ],
)

构建完成后，我们的应用程序容器映像位于私有 Amazon ECR 存储库中。我们需要一个描述它的对象，我们可以在创建 Fargate 服务时传递该对象：

ecs_container_image = ecs.ContainerImage.from_ecr_repository( repository=ecr.Repository.from_repository_name(self, 'ECRRepo', 'rek-wsi'), tag='latest'
)

我们为此应用程序创建一个新的 VPC 和集群。您可以修改此部分以使用您自己的 VPC，方法是使用 from_lookup 的方法 Vpc 类：

vpc = ec2.Vpc(self, 'RekWSI', max_azs=3)
cluster = ecs.Cluster(self, 'RekWSICluster', vpc=vpc)

现在我们有了要部署到的 VPC 和集群，我们创建 Fargate 服务。我们为此任务使用 0.25 个 vCPU 和 512 MB RAM，并在其前面放置一个公共应用程序负载均衡器 (ALB)。部署后，我们使用 ALB CNAME 访问应用程序。请参见以下代码：

fargate_service = ecs_patterns.ApplicationLoadBalancedFargateService( self, 'RekWSIECSApp', cluster=cluster, cpu=256, memory_limit_mib=512, desired_count=1, task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions( image=ecs_container_image, container_port=8501, task_role=streamlit_task_role, ), public_load_balancer=True,
)

为了在每次将代码推送到主分支时自动构建和部署新的容器镜像，我们创建了一个简单的管道，其中包含一个 GitHub 源代码操作和一个构建步骤。我们在此处使用在前面的步骤中存储在 AWS Secrets Manager 和 AWS Systems Manager Parameter Store 中的机密。

pipeline = codepipeline.Pipeline(self, 'RekWSIPipeline') # Create an artifact that points at the code pulled from GitHub.
source_output = codepipeline.Artifact() # Create a source stage that pulls the code from GitHub. The repo parameters are
# stored in SSM, and the OAuth token in Secrets Manager.
source_action = codepipeline_actions.GitHubSourceAction( action_name='GitHub', output=source_output, oauth_token=SecretValue.secrets_manager( ssm.StringParameter.value_from_lookup(self, '/rek_wsi/prod/github/token'), json_field='oauthToken'), trigger=codepipeline_actions.GitHubTrigger.WEBHOOK, owner=ssm.StringParameter.value_from_lookup(self, '/rek_wsi/prod/github/owner'), repo=ssm.StringParameter.value_from_lookup(self, '/rek_wsi/prod/github/repo'), branch=ssm.StringParameter.value_from_lookup(self, '/rek_wsi/prod/github/branch'),
) # Add the source stage to the pipeline.
pipeline.add_stage( stage_name='GitHub', actions=[source_action]
)

CodeBuild 需要将容器映像推送到 Amazon ECR 的权限。要授予这些权限，我们添加 AmazonEC2ContainerRegistryFullAccess CodeBuild 服务主体可以担任的定制 IAM 角色的策略：

# Create an IAM role that grants CodeBuild access to Amazon ECR to push containers.
build_role = iam.Role( self, 'RekWsiCodeBuildAccessRole', assumed_by=iam.ServicePrincipal('codebuild.amazonaws.com'),
) # Permissions are granted through an AWS managed policy, AmazonEC2ContainerRegistryFullAccess.
managed_ecr_policy = iam.ManagedPolicy.from_managed_policy_arn( self, 'cb_ecr_policy', managed_policy_arn='arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess',
)
build_role.add_managed_policy(policy=managed_ecr_policy)

CodeBuild 项目登录私有 Amazon ECR 存储库，使用 Streamlit 应用程序构建 Docker 镜像，并将镜像连同一个 appspec.yaml 和 imagedefinitions.json 文件中。

appspec.yaml 文件描述任务（端口、Fargate 平台版本等），而 imagedefinitions.json 文件将容器映像的名称映射到它们对应的 Amazon ECR URI。请参见以下代码：

container_name = fargate_service.task_definition.default_container.container_name
build_project = codebuild.PipelineProject( self, 'RekWSIProject', build_spec=codebuild.BuildSpec.from_object({ 'version': '0.2', 'phases': { 'pre_build': { 'commands': [ 'env', 'COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)', 'export TAG=${COMMIT_HASH:=latest}', 'aws ecr get-login-password --region $AWS_DEFAULT_REGION | ' 'docker login --username AWS ' '--password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com', ] }, 'build': { 'commands': [ # Build the Docker image 'cd streamlit_app && docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .', # Tag the image 'docker tag $IMAGE_REPO_NAME:$IMAGE_TAG ' '$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG', ] }, 'post_build': { 'commands': [ # Push the container into ECR. 'docker push ' '$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG', # Generate imagedefinitions.json 'cd ..', "printf '[{"name":"%s","imageUri":"%s"}]' " f"{container_name} " "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG " "> imagedefinitions.json", 'ls -l', 'pwd', 'sed -i s"|REGION_NAME|$AWS_DEFAULT_REGION|g" appspec.yaml', 'sed -i s"|ACCOUNT_ID|$AWS_ACCOUNT_ID|g" appspec.yaml', 'sed -i s"|TASK_NAME|$IMAGE_REPO_NAME|g" appspec.yaml', f'sed -i s"|CONTAINER_NAME|{container_name}|g" appspec.yaml', ] } }, 'artifacts': { 'files': [ 'imagedefinitions.json', 'appspec.yaml', ], }, }), environment=codebuild.BuildEnvironment( build_image=codebuild.LinuxBuildImage.STANDARD_5_0, privileged=True, ), environment_variables={ 'AWS_ACCOUNT_ID': codebuild.BuildEnvironmentVariable(value=self.account), 'IMAGE_REPO_NAME': codebuild.BuildEnvironmentVariable( value=ssm.StringParameter.value_from_lookup(self, '/rek_wsi/prod/ecr_repo_name')), 'IMAGE_TAG': codebuild.BuildEnvironmentVariable(value='latest'), }, role=build_role,
)

最后，我们将不同的流水线阶段放在一起。最后一个动作是 EcsDeployAction，它采用前一阶段构建的容器镜像，并对我们的 ECS 集群中的任务进行滚动更新：

# Create an artifact to store the build output.
build_output = codepipeline.Artifact()
# Create a build action that ties the build project, the source artifact from the
# previous stage, and the output artifact together.
build_action = codepipeline_actions.CodeBuildAction( action_name='Build', project=build_project, input=source_output, outputs=[build_output],
)
# Add the build stage to the pipeline.
pipeline.add_stage( stage_name='Build', actions=[build_action]
)
deploy_action = codepipeline_actions.EcsDeployAction( action_name='Deploy', service=fargate_service.service, # image_file=build_output input=build_output,
)
pipeline.add_stage( stage_name='Deploy', actions=[deploy_action],
)

净化

为避免产生未来成本，请清理您在此解决方案中创建的资源。

Amazon Rekognition 自定义标签模型

在关闭 Studio notebook 之前，请确保停止 Amazon Rekognition 自定义标签模型。如果您不这样做，它将继续产生成本。

rek.stop_project_version( ProjectVersionArn=model_arn,
)

或者，您可以使用 Amazon Rekognition 控制台停止服务：

在Amazon Rekognition控制台上，选择 使用自定义标签 在导航窗格中。
项目在导航窗格中。
选择版本 1 rek-mitotic-figures-workshop 项目。
点击 使用模型 标签，选择 Stop 停止.

Streamlit 应用程序

要销毁与 Streamlit 应用程序关联的所有资源，请从 AWS CDK 应用程序目录运行以下代码：

cdk destroy RekWsiStack

AWS机密管理器

要删除 GitHub 令牌，请按照文件.

结论

在本文中，我们介绍了使用真实世界数据为数字病理学应用程序训练 Amazon Rekognition 自定义标签模型的必要步骤。然后，我们学习了如何使用从 CI/CD 管道部署到 Fargate 的简单应用程序中的模型。

Amazon Rekognition 自定义标签使您能够构建支持 ML 的医疗保健应用程序，您可以使用 Fargate、CodeBuild 和 CodePipeline 等服务轻松构建和部署这些应用程序。

您能想到任何应用程序来帮助研究人员、医生或他们的患者让他们的生活更轻松吗？如果是这样，请使用本演练中的代码构建您的下一个应用程序。如果您有任何问题，请在评论部分分享。

致谢

我们要感谢 Marc Aubreville 博士教授友好地允许我们在这篇博文中使用 MITOS_WSI_CMC 数据集。数据集可以在 GitHub上.

参考资料

[1] Aubreville, M., Bertram, CA, Donovan, TA 等。一个完整注释的犬乳腺癌的完整幻灯片图像数据集，以帮助人类乳腺癌研究。科学数据 7, 417 (2020)。 https://doi.org/10.1038/s41597-020-00756-z

[2] Khened, M.、Kori, A.、Rajkumar, H. et al. 用于全幻灯片图像分割和分析的通用深度学习框架。 Sci Rep 11579（2021）。 https://doi.org/10.1038/s41598-021-90444-8

[3] PNAS 27-2018-115 13(2970)E2979-E12；首次发布于 2018 年 XNUMX 月 XNUMX 日； https://doi.org/10.1073/pnas.1717139115

关于作者

巴勃罗·努涅斯·波尔彻， 理学硕士，是一名高级解决方案架构师，在 Amazon Web Services 的公共部门团队工作。 Pablo 专注于帮助医疗保健公共部门客户根据最佳实践在 AWS 上构建新的创新产品。他获得了硕士学位。布宜诺斯艾利斯大学生物科学专业。在业余时间，他喜欢骑自行车和修补支持 ML 的嵌入式设备。

拉兹万约纳塞克, PhD, MBA, 是 Amazon Web Services 在欧洲、中东和非洲的医疗保健技术领导者。他的工作重点是通过利用技术帮助医疗保健客户解决业务问题。此前，Razvan 是西门子 Healthineers 人工智能（AI）产品的全球负责人，负责 AI-Rad Companion，这是一个基于人工智能和基于云的数字健康成像解决方案系列。他拥有 30 多项医学成像 AI/ML 专利，并发表了 70 多篇关于计算机视觉、计算建模和医学图像分析的国际同行评审技术和临床出版物。 Razvan 在慕尼黑工业大学获得计算机科学博士学位，在剑桥大学贾吉商学院获得 MBA 学位。