AWS Glue interaktiivsed seansid allow engineers to build, test, and run data preparation and analytics workloads in an interactive notebook. Interactive sessions provide isolated development environments, take care of the underlying compute cluster, and allow for configuration to stop idling resources.
Glue interactive sessions provides default recommended configurations, and also allows users to customize the session to meet their needs. For example, you can provision more workers to experiment on a larger dataset or set the idle timeout for long-running workloads. With the flexibility to change these options depending on the workload, you may need ensure that the options are changed within specific boundaries and apply a control mechanism.
In this post, we present the process of deploying a korduvkasutatav lahendus to enforce AWS Glue interactive session limits on three options: connection, number of workers, and maximum idle time. The first option addresses the need for applying custom inspection and controls on traffic, for example by enforcing an interactive session to only be run inside a VPC. The other two enforce limits on costs and usage of AWS liim resources by enforcing an upper boundary on the number of workers and idle time per session. You can further extend the solution for other properties or services within AWS Glue.
Ülevaade lahendusest
The proposed architecture is built on serverless components and runs whenever a new AWS Glue interactive session is created.
Töövoo etapid on järgmised.
- A data engineer creates a new AWS Glue interactive session either through the AWS-i juhtimiskonsool or in a Jupyter notebook locally.
- The interactive session produces a new event to AWS CloudTrail jaoks
CreateSession
event with all relevant information to identify and inspect a session as soon as the session is initiated. - An Amazon EventBridge rule filters the CloudTrail events and invokes an AWS Lambda function to inspect the
CreateSession
sündmus. - The Lambda function inspects the
CreateSession
event and checks for all defined boundary conditions. Currently, the boundaries configurable with this solution are limited to maximum number of workers, idle timeout in minutes, and deployment with connection enforced. - If any of the defined boundary conditions are not met, for example too many workers are provisioned for the session, depending on the provided configuration, the function ends the interactive session immediately and sends an email via Amazoni lihtne teavitusteenus (Amazon SNS). If the session hasn’t started yet, the function will wait for it to start before taking any action.
- If the session was stopped, an email is sent to an SNS topic. There is no information available in the interactive session notebook on the reason for the ending of the session. Therefore, additional context information is provided through the SNS topic to the data engineers.
- If the function fails, the sessions are logged in a surnud kirjade järjekord sees Amazoni lihtsa järjekorra teenus (Amazon SQS). Furthermore, the queue is monitored and in case of a message, it will trigger an Amazon CloudWatch äratus.
The following steps walk you through how to build and deploy the solution. The code is available in the GitHub repo.
Eeldused
Selle ülevaate jaoks peaksid teil olema järgmised eeltingimused.
Overview of the deployed resources
All the necessary resources are defined in an AWS CloudFormation file located under cfn/template.yaml
. To deploy those resources, we use AWS-i serverita rakendusmudel (AWS SAM), which enables us to conveniently build and package all the dependencies and also manages the AWS CloudFormation steps for us.
CloudFormationi virn juurutab järgmisi ressursse.
- A Lambda function with its library, both defined under the directory src/functions. The function is the control. It will validate that the session is started within the limits defined.
- An EventBridge rule. This event listens to CloudTrail and in case of a new interactive session, will trigger the control Lambda function.
- An SQS dead-letter queue (DLQ) attached to the Lambda function. This keeps a record of events that triggered a Lambda function failure.
- Two CloudWatch alarms monitoring the Lambda function failures and the messages in the DLQ.
If notification via email is enabled, two more resources are deployed:
Lisaks juurutab AWS CloudFormation kõik vajaliku AWS-i identiteedi- ja juurdepääsuhaldus (IAM) roles and policies, and an AWS-i võtmehaldusteenus (AWS KMS) key to ensure that the exchanged data is encrypted.
Rakendage lahendus
To facilitate the deployment lifecycle, including the setup of the user local environment, we provide a Makefile that describes all the necessary steps. Make sure you have your AWS credentials renewed and have access to your account. For more information, refer to Konfiguratsiooni- ja mandaadifaili seaded.
- Explore the Makefile and adjust the Region and stack name as needed by modifying the values of the variables
AWS_REGION
jaSTACK_NAME
. - komplekt
KILL_SESSION = "True"
if you want to immediately stop the interactive session that has been found out of boundaries. Allowed values are True or False; the default is True. - komplekt
NOTIFICATION_EMAIL_ADDRESS = <your.email@provider.com>
aastaMakefile
if you want get notified when a session has been found out of boundaries. - Set values for your controls:
ENFORCE_VPC_CONNECTION
to stop sessions not running inside a VPC (true or false).MAX_WORKERS
to set the maximum number of workers for a session (numeric).MAX_IDLE_TIMEOUT_MINUTES
to define the maximum idle time for sessions in minutes (numeric).
- Install all the prerequisite libraries:
These will be installed under a newly created Python virtual environment inside this repository in the directory
.venv
. - Deploy the new stack:
This command will complete the following tasks:
- Check if the prerequisites are met.
- täitma
pytest unittest
on the Python files. - Validate the CloudFormation template.
- Build the artifacts (Lambda function and Lambda layers).
- Deploy the resources via AWS SAM.
Testige lahendust
Viitama Tutvustame Jupyteri AWS Glue'i interaktiivseid seansse for information about running an interactive session. If you follow the instructions in the post (see the section Run your first code cell and author your AWS Glue notebook), the initialization of the interactive session should fail with an error similar to the following.
Example of code in the cell:
Received output:
If you enabled the email feature, you should also get an email notification.
You can also check on the AWS Glue console that your session ID isn’t listed.
Koristage
Clean up the deployed resources by running the following command:
Note that the resources deployed from following the recommended post, Tutvustame Jupyteri AWS Glue'i interaktiivseid seansse, will not be removed with the previous command.
Piirangud
The delivery guarantee for CloudTrail events to EventBridge are best effort. This means CloudTrail will attempt to deliver all events to EventBridge, but in some rare cases, an event might not be delivered. For more information, refer to Sündmused AWS-i teenustest.
Järeldus
This post described how to build, deploy, and test a solution to enforce boundary conditions on AWS Glue interactive sessions in order to enforce constraints on the number of workers, idle timeouts, and AWS Glue connection.
You can adapt this solution based on your needs and further extend it to allow controls on other options.
To learn more about how to use AWS Glue interactive sessions, refer to Tutvustame Jupyteri AWS Glue'i interaktiivseid seansse ja Autor PyCharmiga AWS-liimi töid kasutades AWS-liimi interaktiivseid seansse.
Autoritest
Nicolas Jacob Baer is a Senior Cloud Application Architect with a strong focus on data engineering and machine learning, based in Switzerland. He works closely with enterprise customers to design data platforms and build advanced analytics/ml use-cases.
Luca Mazzaferro is a Senior DevOps Architect at Amazon Web Services. He likes to have infrastructure automated, reproducible and secured. In his free time he likes to cook, especially pizza.
Kemeng Zhang is a Cloud Application Architect with a strong focus on machine learning and UX, based in Switzerland. She works closely with customers to design user experiences and build advanced analytics/ml use-cases.
Mark Walser, a Senior Global Data Architect at Amazon Web Services, collaborates with customers to develop innovative Big Data solutions that solve business problems and speed up the adoption of AWS services. Outside of work, he finds pleasure in running, swimming, and all things related to technology.
Gal Heyne is a Product Manager for AWS Glue with a strong focus on AI/ML, data engineering and BI, based in California. She is passionate about developing a deep understanding of customer’s business needs and collaborating with engineers to design easy to use data products.
- SEO-põhise sisu ja PR-levi. Võimenduge juba täna.
- PlatoData.Network Vertikaalne generatiivne Ai. Jõustage ennast. Juurdepääs siia.
- PlatoAiStream. Web3 luure. Täiustatud teadmised. Juurdepääs siia.
- PlatoESG. Autod/elektrisõidukid, Süsinik, CleanTech, Energia, Keskkond päikeseenergia, Jäätmekäitluse. Juurdepääs siia.
- BlockOffsets. Keskkonnakompensatsiooni omandi ajakohastamine. Juurdepääs siia.
- Allikas: https://aws.amazon.com/blogs/big-data/enforce-boundaries-on-aws-glue-interactive-sessions/
- :on
- :on
- :mitte
- $ UP
- 1
- 10
- 100
- 7
- a
- MEIST
- juurdepääs
- konto
- tegevus
- kohandama
- Täiendavad lisad
- aadressid
- Vastuvõtmine
- edasijõudnud
- AI / ML
- alarm
- Materjal: BPA ja flataatide vaba plastik
- võimaldama
- lubatud
- võimaldab
- Ka
- Amazon
- Amazon Web Services
- an
- analytics
- ja
- mistahes
- taotlus
- kehtima
- Rakendades
- arhitektuur
- OLEME
- argumendid
- AS
- At
- autor
- Automatiseeritud
- saadaval
- AWS
- AWS CloudFormation
- AWS liim
- põhineb
- BE
- olnud
- enne
- BEST
- Suur
- Big andmed
- Blogi
- mõlemad
- piirid
- ehitama
- ehitatud
- äri
- kuid
- by
- California
- kutsudes
- CAN
- mis
- juhul
- juhtudel
- rakk
- muutma
- muutunud
- kontrollima
- Kontroll
- lähedalt
- Cloud
- Cluster
- kood
- koostööd
- täitma
- komponendid
- Arvutama
- Tingimused
- konfiguratsioon
- ühendus
- konsool
- piiranguid
- kontekst
- kontrollida
- kontrolli
- kulud
- looma
- loodud
- loob
- MANDAAT
- volikiri
- Praegu
- tava
- Kliendid
- andmed
- andmeinsener
- Andmete ettevalmistamine
- sügav
- vaikimisi
- määratletud
- tarnima
- esitatud
- tarne
- Olenevalt
- juurutada
- lähetatud
- juurutamine
- kasutuselevõtu
- juurutab
- kirjeldatud
- Disain
- arendama
- arenev
- & Tarkvaraarendus
- DevOps
- lihtne
- jõupingutusi
- kumbki
- lubatud
- võimaldab
- krüpteeritud
- lõppeb
- jõustada
- jõustamisel
- insener
- Inseneriteadus
- Inseneride
- tagama
- ettevõte
- ettevõtte kliendid
- keskkond
- keskkondades
- viga
- eriti
- Eeter (ETH)
- sündmus
- sündmused
- näide
- erand
- vahetatud
- Kogemused
- eksperiment
- laiendama
- hõlbustada
- FAIL
- ei
- ebaedu
- vale
- tunnusjoon
- fail
- Faile
- Filtrid
- leiab
- esimene
- Paindlikkus
- Keskenduma
- järgima
- Järel
- järgneb
- eest
- avastatud
- tasuta
- Alates
- funktsioon
- edasi
- Pealegi
- GAL
- saama
- Globaalne
- garantii
- Olema
- he
- tema
- Kuidas
- Kuidas
- HTML
- http
- HTTPS
- IAM
- ID
- identifitseerima
- Identity
- Idle
- if
- kohe
- import
- in
- Kaasa arvatud
- info
- Infrastruktuur
- algatatud
- uuenduslik
- sees
- juhised
- interaktiivne
- sisse
- kutsub
- isoleeritud
- IT
- ITS
- töö
- Tööturg
- jpg
- Jupyteri sülearvuti
- Võti
- suurem
- kihid
- Õppida
- õppimine
- raamatukogud
- Raamatukogu
- eluring
- meeldib
- piiratud
- piirid
- Loetletud
- kohalik
- kohapeal
- asub
- loginud
- masin
- masinõpe
- tegema
- juhtimine
- juht
- haldab
- palju
- maksimaalne
- mai..
- vahendid
- mehhanism
- Vastama
- sõnum
- kirjad
- mõdu
- võib
- protokoll
- jälgitakse
- järelevalve
- rohkem
- nimi
- vajalik
- Vajadus
- vaja
- vajadustele
- Uus
- äsja
- ei
- märkmik
- teade
- number
- toimunud
- of
- on
- ainult
- töö
- valik
- Valikud
- or
- et
- Muu
- välja
- väljund
- väljaspool
- pakend
- kirglik
- kohta
- pitsa
- Platvormid
- Platon
- Platoni andmete intelligentsus
- PlatoData
- rõõm
- Poliitika
- post
- ettevalmistamine
- eeldused
- esitada
- eelmine
- probleeme
- protsess
- toodab
- Toode
- tootejuht
- Toodet
- omadused
- pakutud
- anda
- tingimusel
- annab
- säte
- Python
- HARULDANE
- valmis
- põhjus
- soovitatav
- rekord
- piirkond
- seotud
- asjakohane
- Eemaldatud
- uuendatakse
- Hoidla
- Vahendid
- rollid
- Eeskiri
- jooks
- jooksmine
- jookseb
- Sam
- Osa
- tagatud
- vaata
- saadab
- vanem
- Saadetud
- Serverita
- Teenused
- istung
- istungid
- komplekt
- seade
- ta
- peaks
- sarnane
- lihtne
- lahendus
- Lahendused
- LAHENDAGE
- mõned
- Varsti
- Säde
- konkreetse
- kiirus
- Kestab
- algus
- alustatud
- väljavõte
- olek
- Sammud
- Peatus
- peatatud
- tugev
- kindel
- ujumine
- Šveits
- Võtma
- võtmine
- ülesanded
- Tehnoloogia
- šabloon
- test
- et
- .
- oma
- Seal.
- seetõttu
- Need
- asjad
- see
- need
- kolm
- Läbi
- aeg
- et
- liiga
- teema
- liiklus
- muudab
- vallandada
- vallandas
- tõsi
- kaks
- tüüp
- all
- aluseks
- mõistmine
- us
- Kasutus
- kasutama
- kasutamise juhtumid
- Kasutaja
- Kasutajad
- kasutamine
- ux
- KINNITAGE
- Väärtused
- kaudu
- virtuaalne
- ootama
- ootamine
- läbikäiguks
- tahan
- oli
- we
- web
- veebiteenused
- millal
- millal iganes
- mis
- kuigi
- will
- koos
- jooksul
- Töö
- töötaja
- töötajate
- töövoog
- töötab
- veel
- sa
- Sinu
- sephyrnet