שילוב CiM עבור האצת מסקנות ML

הנדסה למחצה

שילוב CiM עבור האצת מסקנות ML

סמיקונדקטורחותמת זמן: 16 בינואר 2024 11:47

צומת המקור: 3064987

הועלה מחדש על ידי אפלטון

עוקב: 0

A technical paper titled “WWW: What, When, Where to Compute-in-Memory” was published by researchers at Purdue University.

תקציר:

“Compute-in-memory (CiM) has emerged as a compelling solution to alleviate high data movement costs in von Neumann machines. CiM can perform massively parallel general matrix multiplication (GEMM) operations in memory, the dominant computation in Machine Learning (ML) inference. However, re-purposing memory for compute poses key questions on 1) What type of CiM to use: Given a multitude of analog and digital CiMs, determining their suitability from systems perspective is needed. 2) When to use CiM: ML inference includes workloads with a variety of memory and compute requirements, making it difficult to identify when CiM is more beneficial than standard processing cores. 3) Where to integrate CiM: Each memory level has different bandwidth and capacity, that affects the data movement and locality benefits of CiM integration.
In this paper, we explore answers to these questions regarding CiM integration for ML inference acceleration. We use Timeloop-Accelergy for early system-level evaluation of CiM prototypes, including both analog and digital primitives. We integrate CiM into different cache memory levels in an Nvidia A100-like baseline architecture and tailor the dataflow for various ML workloads. Our experiments show CiM architectures improve energy efficiency, achieving up to 0.12x lower energy than the established baseline with INT-8 precision, and upto 4x performance gains with weight interleaving and duplication. The proposed work provides insights into what type of CiM to use, and when and where to optimally integrate it in the cache hierarchy for GEMM acceleration.”

מצא נייר טכני כאן. פורסם בדצמבר 2023 (הדפסה מוקדמת).

Sharma, Tanvi, Mustafa Ali, Indranil Chakraborty, and Kaushik Roy. “WWW: What, When, Where to Compute-in-Memory.” arXiv preprint arXiv:2312.15896 (2023).

קריאה קשורה
הגדלת יעילות האנרגיה של AI עם מחשוב בזיכרון
כיצד לעבד עומסי עבודה של zettascale ולהישאר במסגרת תקציב חשמל קבוע.
מודלים מחשוב בזיכרון עם יעילות ביולוגית
AI גנרטיבי מאלץ את יצרני השבבים להשתמש במשאבי מחשוב בצורה חכמה יותר.
SRAM In AI: The Future Of Memory
Why SRAM is viewed as a critical element in new and traditional compute architectures.

הפצת תוכן ויחסי ציבור מופעל על ידי SEO. קבל הגברה היום.
PlatoData.Network Vertical Generative Ai. העצים את עצמך. גישה כאן.
PlatoAiStream. Web3 Intelligence. הידע מוגבר. גישה כאן.
PlatoESG. פחמן, קלינטק, אנרגיה, סביבה, שמש, ניהול פסולת. גישה כאן.
PlatoHealth. מודיעין ביוטכנולוגיה וניסויים קליניים. גישה כאן.
מקור: https://semiengineering.com/cim-integration-for-ml-inference-acceleration/

בול זמן: ינואר 16, 2024

עוד מ הנדסה למחצה

התפרקות מתודולוגיות עיצוב ואימות

התפרקות מתודולוגיות עיצוב ואימות

אשכול המקור:

הנדסה למחצה

צומת המקור: 1896293

בול זמן: יאן 12, 2023

סיכום הנייר הטכני של תעשיית השבבים: 8 במאי

סיכום הנייר הטכני של תעשיית השבבים: 8 במאי

אשכול המקור:

הנדסה למחצה

צומת המקור: 2637673

בול זמן: מאי 8, 2023

עיצוב IP

עיצוב IP

אשכול המקור:

הנדסה למחצה

צומת המקור: 1985937

בול זמן: מר 1, 2023

בעיות ארגון מחדש של RTL

אשכול המקור:

הנדסה למחצה

צומת המקור: 2770152

בול זמן: יולי 18, 2023

193i ליטוגרפיה תופסת את מרכז הבמה...שוב

ליתוגרפיה 193i תופסת את מרכז הבמה...שוב

אשכול המקור:

הנדסה למחצה

צומת המקור: 2724624

בול זמן: יוני 15, 2023

מערך זיכרון פרו-אלקטרי מוערם המורכב מטרנזיסטורי השפעת שדה פרו-אלקטריים עם מגורים לרוחב

מערך זיכרון פרו-אלקטרי מוערם המורכב מטרנזיסטורי השפעת שדה פרו-אלקטריים עם מגורים לרוחב

אשכול המקור:

הנדסה למחצה

צומת המקור: 2970260

בול זמן: נובמבר 10, 2023

הכשרת דגמי LLM גדולים עם מיליארדים עד טריליון פרמטרים במחשב העל של ORNL

הכשרת דגמי LLM גדולים עם מיליארדים עד טריליון פרמטרים במחשב העל של ORNL

אשכול המקור:

הנדסה למחצה

צומת המקור: 3065936

בול זמן: יאן 16, 2024

גישה חדשה לעיצוב חיישנים

גישה חדשה לעיצוב חיישנים

אשכול המקור:

הנדסה למחצה

צומת המקור: 3038974

בול זמן: דצמבר 28, 2023

Fabs מתחילים להגביר את למידת המכונה

Fabs מתחילים להגביר את למידת המכונה

אשכול המקור:

הנדסה למחצה

צומת המקור: 3026072

בול זמן: דצמבר 19, 2023

מחקרים: 3 בינואר

מחקרים: 3 בינואר

אשכול המקור:

הנדסה למחצה

צומת המקור: 1862431

בול זמן: יאן 3, 2023

Edge HW-SW Co-Design Platform המשלבת RISC-V ו-HW Accelerators

Edge HW-SW Co-Design Platform המשלבת RISC-V ו-HW Accelerators

אשכול המקור:

הנדסה למחצה

צומת המקור: 2656404

בול זמן: מאי 16, 2023

שכבת תרגום DRAM, מנגנון למיפוי כתובות גמיש והעברת נתונים בתוך התקני זיכרון מבוססי CXL

שכבת תרגום DRAM, מנגנון למיפוי כתובות גמיש והעברת נתונים בתוך התקני זיכרון מבוססי CXL

אשכול המקור:

הנדסה למחצה

צומת המקור: 2753849

בול זמן: יולי 6, 2023