Techniques for training large neural networks

OpenAI

Techniques for training large neural networks

AITime Stamp: June 9, 2022 2:00 AM

Source Node: 2973879

Republished By Plato

Followers: 0

Pipeline parallelism splits a model “vertically” by layer. It’s also possible to “horizontally” split certain operations within a layer, which is usually called Tensor Parallel training. For many modern models (such as the Transformer), the computation bottleneck is multiplying an activation batch matrix with a large weight matrix. Matrix multiplication can be thought of as dot products between pairs of rows and columns; it’s possible to compute independent dot products on different GPUs, or to compute parts of each dot product on different GPUs and sum up the results. With either strategy, we can slice the weight matrix into even-sized “shards”, host each shard on a different GPU, and use that shard to compute the relevant part of the overall matrix product before later communicating to combine the results.

One example is Megatron-LM, which parallelizes matrix multiplications within the Transformer’s self-attention and MLP layers. PTD-P uses tensor, data, and pipeline parallelism; its pipeline schedule assigns multiple non-consecutive layers to each device, reducing bubble overhead at the cost of more network communication.

Sometimes the input to the network can be parallelized across a dimension with a high degree of parallel computation relative to cross-communication. Sequence parallelism is one such idea, where an input sequence is split across time into multiple sub-examples, proportionally decreasing peak memory consumption by allowing the computation to proceed with more granularly-sized examples.

SEO Powered Content & PR Distribution. Get Amplified Today.
PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
Source: https://openai.com/research/techniques-for-training-large-neural-networks

Time Stamp: June 9, 2022

More from OpenAI

DALL·E 3 system card

DALL·E 3 system card

Source Cluster:

Source Node: 2915604

Time Stamp: Oct 3, 2023

GPT-3 Powers the Next Generation of Apps

Source Cluster:

Source Node: 768074

Time Stamp: Mar 25, 2021

AI-Written Critiques Help Humans Notice Flaws

Source Cluster:

Source Node: 1354690

Time Stamp: Jun 13, 2022

ChatGPT: Optimizing Language Models for Dialogue

Source Cluster:

Source Node: 1764594

Time Stamp: Nov 30, 2022

Frontier AI regulation: Managing emerging risks to public safety

Frontier AI regulation: Managing emerging risks to public safety

Source Cluster:

Source Node: 2757509

Time Stamp: Jul 6, 2023

Measuring Goodhart’s Law

Source Cluster:

Source Node: 1590955

Time Stamp: Apr 13, 2022

DALL·E 2 Pre-Training Mitigations

Source Cluster:

Source Node: 1541594

Time Stamp: Jun 28, 2022

Governance of superintelligence

Governance of superintelligence

Source Cluster:

Source Node: 2674691

Time Stamp: May 22, 2023

Join us for OpenAI’s first developer conference on November 6 in San Francisco

Join us for OpenAI’s first developer conference on November 6 in San Francisco

Source Cluster:

Source Node: 2897619

Time Stamp: Sep 6, 2023

Scaling laws for reward model overoptimization

Scaling laws for reward model overoptimization

Source Cluster:

Source Node: 2784435

Time Stamp: Oct 19, 2022

Sam Altman returns as CEO, OpenAI has a new initial board

Sam Altman returns as CEO, OpenAI has a new initial board

Source Cluster:

Source Node: 2985671

Time Stamp: Nov 29, 2023

OpenAI at NeurIPS 2020

Source Cluster:

Source Node: 1849626

Time Stamp: Dec 4, 2020