معرفی
In the quickly growing landscape of artificial intelligence and machine learning, TinyLlama 1.1B emerges as a noteworthy development. In an era where computational constraints pose challenges for running more complex models, TinyLlama stands out by defying expectations. It showcases the remarkable performance of compact models.
This article aims to provide an analysis of TinyLlama 1.1B, a compact large language model. We will delve into its core aspects, like how it was trained in performance benchmarks and practical implementation using the Hugging Face platform. We will even run this model on the free Google Colab and test its maths and reasoning abilities.
اهداف یادگیری
- Gain a comprehensive understanding of TinyLlama 1.1B
- Explore the intricate training process that the model has gone through
- Analyze the performance and benchmark results to assess its efficacy
- Learn the practical steps to implement TinyLlama 1.1B using coding examples
این مقاله به عنوان بخشی از بلاگاتون علم داده.
جدول محتوا
What is TinyLlama 1.1B?
TinyLlama 1.1B, a part of the broader Llama project, is a testament to language modeling advancements. It’s a model with 1.1 billion parameters, trained on a staggering 3 trillion tokens, which puts it in a unique position in the AI landscape. Unlike its larger counterparts, TinyLlama 1.1B is designed to be more efficient and manageable, making it a good choice for applications with limited computational resources.
This open-source model democratizes access to state-of-the-art AI technology, allowing many developers and researchers to explore and innovate in the field of natural language processing. It is a model known for its ability to balance performance with resource consumption, a critical consideration in today’s diverse computational environments.
Training Process of TinyLlama 1.1B
The training process of TinyLlama 1.1B is fascinating, like the model itself. The training of TinyLlama took place just for 90 days, trained on the 16 A100-40G GPUs.The pretraining was done on 3 Trillion Tokens, and the TinyLlama Team has published the intermediate model between each half a trillion.
As for the data, Slimpajama and Starcoderdata were taken with a combined dataset size of 950 Billion Tokens. The natural language-to-code ratio was kept at 7:3, i.e. 70% of the data was natural language, and 30% was code. Thus, to achieve the 3 Trillion Tokens mark for fine-tuning, the TinyLlama underwent 3 epochs of training for this dataset.
There is even a chat version of TinyLlama called the TinyLlama-Chat released. Initially, this model underwent fine-tuning on the UltraChat dataset, which contains diverse synthetic conversations generated by ChatGPT. This step was crucial in making the model to handle different conversational contexts and styles.
Further refinement was achieved using the DPOTrainer on the UltraFeedback dataset. This training phase focused on aligning the model’s responses to align with human-like conversational patterns. The result is a model that not just grasps information on different topics but even interacts in a natural and engaging way.
شما همچنین می توانید بخوانید: Getting Started with LlaMA 2: A Beginner’s Guide
Performance and Benchmark Results
Evaluating the performance of TinyLlama 1.1B reveals its capability to deliver high-quality responses swiftly. Its training has endowed it with the ability to cater to multilingual applications, an important feature in our globalized world. Despite its smaller size, TinyLlama 1.1B is still catching up to its larger counterparts regarding response quality and speed, making it a potent tool in different AI applications.
The benchmarks for TinyLlama 1.1B, while less extensive than those for larger models, still demonstrate its proficiency in handling complex language tasks. Its ability to generate coherent and contextually relevant responses in multiple languages is particularly impressive. The model was tested on different benchmarks like HellaSwag, WinoGrande, ARC, MMLU, and others. The combined average score came out to be 52.99. This is way better than the other 1 Billion Parameter Model, i.e. the Pythia 1B, which achieved an average score of 48.3. The table depicts the individual scores of each benchmark
محک | TinyLlama 1.1B Score |
---|---|
هلاسواگ | 59.2 |
Obqa | 36.0 |
وینو گراند | 59.12 |
ARC_c | 30.12 |
ARC_e | 55.25 |
boolq | 57.83 |
piqa | 73.29 |
میانگین | 52.9 |
TinyLlama – Getting Started
Here, in this section, we will download the quantized version of TinyLlama Chat and run it in Google Colab. Before downloading the model, we have to download and install the following Python Packages
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python
!pip3 install huggingface-hub
- La CMAKE_ARGS=”-DLLAMA_CUBLAS=روشن” و FORCE_CMAKE=1, will allow the llama_cpp_python to utilize the Nvidia GPU available in the free colab version.
- سپس ما نصب می کنیم lama_cpp_python بسته بندی از طریق pip3
- ما حتی دانلود می کنیم huggingface-hub, with which we will be downloading the quantized TinyLlama 1.1B Chat
To test the TinyLlama 1.1B Chat model, we need first to download the quantized version of it. To download it, we will run the following code
from huggingface_hub import hf_hub_download
# specifying the model name
model_name = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
# specifying the type of quantization of the model
model_file = "tinyllama-1.1b-chat-v1.0.Q8_0.gguf"
# download the model by specifying the model name and quantized model name
model_path = hf_hub_download(model_name, filename=model_file)
اینجا hugging_face_hub library will take care of the process of downloading the quantized model. For this, we import the hf_hub_download that takes in the following parameters:
- نام مدل: To this variable, we pass the model that we wish to download. Here we wish to download the TinyLlama 1.1B Chat GGUF model.
- model_file: Here we specify the type of quantized model we want to download. Here we will download the 8-bit quantized version of the TinyLlama 1.1B Chat.
- Finally, we pass these parameters to the hf_hub_download, which takes in these parameters and downloads the specified model. After downloading, it returns the path where the model is downloaded.
- این مسیر برگردانده شده در حال ذخیره است مسیر_مدل متغیر.
Now, we can load this model through the lama_cpp_python library. The code for loading the model will be like the one below.
from llama_cpp import Llama
llm = Llama(
model_path=model_path,
n_ctx=512, # the number of i/p tokens the model can take
n_threads=8, # the number of threads to use
n_gpu_layers=40# how many layers of the model to offload to the GPU
)
ما وارد می کنیم پشم لاما کلاس از lama_cpp، که پارامترهای زیر را می گیرد
- model_path: This variable takes in the path where our model is stored. We have obtained the path from the previous step, which we will be providing here
- n_ctx: Here, we give the context length for the model. For now, we are providing 512 tokens as the context length
- n_threads: Here we mention the number of threads to be used by the پشم لاما کلاس
- n_gpu_layers: We specify this if we have a running GPU, which we do in case of the free colab. To this, we pass 40, which implies that we want to offload the entire model into the GPU and do not want any part of it to run in the system RAM
- Finally, we create an object from this پشم لاما class and give it to the variable llm
Running this code will load the TinyLlama 1.1B Chat quantized model onto the GPU and set the appropriate context length. Now, it’s time to perform some inferences on this model. For this, we work with the below code
output = llm(
"<|im_start|>usernWho are you?<|im_end|>n<|im_start|>assistantn", # User Prompt
max_tokens=512, # Number of output tokens generated
stop=["</s>"], # Token which tells the LLM to stop
)
print(output['choices'][0]['text']) # Model generated text
To infer the model, we pass the following parameters to the LLM:
- prompt/chat template: This is the Prompt Template needed to chat with the model. The above-mentioned template(i.e. <im_end>, <im_start>) is the one that works for the TinyLlama 1.1B Chat model. In the template, the sentence after the User is the User Prompt, and the generation will be generated after the Assistant.
- max_tokens: To this variable, we pass a value that defines the maximum number of tokens a Large Language Model can output when a Prompt is given. For now, we are limiting it to 512 tokens.
- متوقف کردن: To this variable, we pass the stop token. The stop token tells the Large Language Model to stop generating further tokens. For TinyLlama 1.1B Chat, the stop token is <s>
The generated text is stored in the output variable when we run this. The result is generated in a format similar to the OpenAI API call. Hence, we can access the generation through the given print statement, similar to how we access the generation from the OpenAI responses. The output generated can be seen below
For a model of this size, its generated response is top-notch. This is unexpected from a model of this size; the grammar and tone look perfectly fine, and there is no sign of repetition of sentences. Let’s try testing the model’s reasoning capabilities
output = llm(
"<|im_start|>usernIf all students who study hard get good grades,
and John got good grades, can we conclude that John studied hard?
<|im_end|>n<|im_start|>assistantn",
max_tokens=512,
stop=["</s>"],
)
print(output['choices'][0]['text'])
output = llm(
"<|im_start|>usernHow fast can a snake fly?n<|im_end|>n<|im_start|>assistantn",
max_tokens=512,
stop=["</s>"],
)
print(output['choices'][0]['text'])
So far, so good. From the examples we have seen, the model generates good answers. But this may not be true in all cases because we only test it on a limited number of questions. Let’s even test the model on its math reasoning capabilities
output = llm(
"<|im_start|>usernJohn is twice as old as Sarah, and Sarah is three years
older than Mary. If Mary is 10 years old, how old is John?n<|im_end|>n<|im_start|>assistantn",
max_tokens=512,
stop=["</s>"],
)
print(output['choices'][0]['text'])
output = llm(
"<|im_start|>usernWhat is the missing number in this pattern:
1, 4, 9, 16, __, 36?n<|im_end|>n<|im_start|>assistantn",
max_tokens=512,
stop=["</s>"],
)
print(output['choices'][0]['text'])
From the examples we have seen, it is clear that the TinyLlamaChat performs extremely poorly in answering simple aptitude questions in math. This is expected because the model was not pretrained on any maths dataset. The quality of the generation can be improved by fine-tuning it on the math dataset
Coming to fine-tuning, the TinyLlama is a go-to choice for those who are restricted with limited hardware and wish to fine-tune large language models on their specific dataset
Potential Use Cases and Applications
Given the compact size of TinyLlama, which boasts 1.1 billion parameters, its applications are mainly suited to environments where larger models might not be as feasible due to hardware limitations or greater efficiency. Here are some specific use cases keeping its size in consideration:
برنامه های موبایل: TinyLlama’s smaller size makes it a good choice for integrating into mobile apps where on-device processing is necessary. This includes language translation apps, personal assistant features, and chatbots that can operate efficiently on smartphones.
Embedded Systems in IoT Devices: In the Internet of Things (IoT) field, the computing resources are often limited; TinyLlama can be used to add intelligent language processing capabilities to different equipment like smart home assistants, wearable tech, and other such connected equipment.
محاسبه لبه: For applications that benefit from processing data closer to the source rather than in a centralized cloud environment, TinyLlama can be employed effectively. This includes real-time language processing in automotive systems, manufacturing equipment, and other edge devices.
Low-Resource Language Research: Due to its smaller size and lower computational requirements, TinyLlama can be a valuable tool in linguistic research, especially for under-resourced languages where large-scale model training isn’t feasible.
ابزار آموزشی: In educational settings, especially those with limited access to high-end computing resources, TinyLlama can be used to develop language learning apps, interactive educational tools, and other learning aids.
Content Generation for Small Businesses: Small businesses with limited resources can use TinyLlama for generating content, like product descriptions, marketing copy, and customer correspondence, without the need for extensive computing power.
نمونه سازی و آزمایش: Developers and researchers who wish to experiment with language models but lack access to high-powered computing resources can use TinyLlama to prototype and develop new NLP applications.
Efficient Data Analysis: TinyLlama can be used for text analysis and data extraction in scenarios where quick and efficient processing is needed, like analyzing customer feedback, survey responses, or social media interactions.
نتیجه
TinyLlama 1.1B is a testament to the advancements in the field of AI and natural language processing. Its development and widespread availability are vital to creating more efficient, small, and quick inference language models. By balancing a smaller parameter footprint with robust performance, TinyLlama 1.1B addresses the critical need for powerful and practical models for a wide array of applications. Its ability to understand and generate language in a human-like manner while being light enough for different computing environments makes it a go-to choice for people struggling to run Large Language Models on their machines. The model can be fine-tuned easily on a dataset and can be trained with limited computing resources.
The Key Takeaways From this Article Include
- Designed for efficiency, TinyLlama 1.1B is available to a wider audience, including those with limited computational resources, making it suitable for several applications.
- The model underwent an extensive training process, including training on 3 trillion tokens over 90 days using 16 A100-40G GPUs.
- Despite its smaller size, TinyLlama 1.1B delivers high-quality, contextually relevant responses in multiple languages, making it a model to consider.
- It is a good choice for mobile applications, IoT equipment, educational tools, and more, its compact size and efficiency allow for broad applications.
- Its lower computational requirements make it a valuable tool in linguistic research, especially for under-resourced languages.
- The model is a good choice for those experimenting with language models or developing new NLP Apps, mainly in settings with limited computational power.
پرسش و پاسخهای متداول
A. TinyLlama 1.1B is a compact, efficient large language model with 1.1 billion parameters, trained on 3 trillion tokens, suitable for applications with limited computational resources.
A. It was trained over 90 days using 16 A100-40G GPUs on datasets including Slimpajama and Starcoderdata, with a natural language to code ratio of 7:3.
A. TinyLlama 1.1B shows its skills in handling complex language tasks, scoring an average of 52.99 across benchmarks like HellaSwag, MMLU, and WinoGrande.
A. It’s suitable for applications where size and speed are an important issue. These include mobile apps, IoT equipment like home automation devices, content generation for small businesses, and efficient data analysis.
A. Absolutely, it’s a perfect choice for developers and researchers who lack access to high-powered computing resources for prototyping and developing new NLP applications. The TinyLlama model can be even run on a Raspberry Pi machine.
A. While it really excels in different language tasks, it shows limitations in mathematical reasoning, which can be improved by fine-tuning relevant datasets.
رسانه نشان داده شده در این مقاله متعلق به Analytics Vidhya نیست و به صلاحدید نویسنده استفاده می شود.
مربوط
- محتوای مبتنی بر SEO و توزیع روابط عمومی. امروز تقویت شوید.
- PlatoData.Network Vertical Generative Ai. به خودت قدرت بده دسترسی به اینجا.
- PlatoAiStream. هوش وب 3 دانش تقویت شده دسترسی به اینجا.
- PlatoESG. کربن ، CleanTech، انرژی، محیط، خورشیدی، مدیریت پسماند دسترسی به اینجا.
- PlatoHealth. هوش بیوتکنولوژی و آزمایشات بالینی. دسترسی به اینجا.
- منبع: https://www.analyticsvidhya.com/blog/2024/01/tinyllama-1-1b-size-doesnt-matter/
- : دارد
- :است
- :نه
- :جایی که
- $UP
- 1
- 10
- 11
- 12
- 16
- 1b
- 36
- 40
- 52
- 7
- 9
- 90
- a
- توانایی
- توانایی
- کاملا
- دسترسی
- رسیدن
- دست
- در میان
- اضافه کردن
- آدرس
- پیشرفت
- پس از
- AI
- ایدز
- اهداف
- تراز
- تراز کردن
- معرفی
- اجازه دادن
- اجازه دادن
- همچنین
- an
- تحلیل
- علم تجزیه و تحلیل
- تجزیه و تحلیل Vidhya
- تجزیه و تحلیل
- و
- پاسخ دادن
- پاسخ
- هر
- API
- برنامه های کاربردی
- مناسب
- برنامه های
- کمان
- هستند
- صف
- مقاله
- مصنوعی
- هوش مصنوعی
- هوش مصنوعی و یادگیری ماشین
- AS
- جنبه
- ارزیابی کنید
- دستیار
- دستیاران
- At
- حضار
- اتوماسیون
- خودرو
- دسترس پذیری
- در دسترس
- میانگین
- برج میزان
- موازنه
- BE
- زیرا
- قبل از
- بودن
- در زیر
- محک
- معیار
- سود
- بهتر
- میان
- بیلیون
- میلیارد توکن
- بلاگاتون
- می افتد
- پهن
- گسترده تر
- کسب و کار
- اما
- by
- صدا
- نام
- آمد
- CAN
- قابلیت های
- قابلیت
- اهميت دادن
- مورد
- موارد
- تهیه کنید
- متمرکز
- چالش ها
- گپ
- chatbots
- GPT چت
- انتخاب
- انتخاب
- واضح
- نزدیک
- ابر
- رمز
- برنامه نویسی
- منسجم
- ترکیب شده
- جمع و جور
- پیچیده
- جامع
- محاسباتی
- قدرت محاسباتی
- محاسبه
- قدرت پردازش
- نتیجه گیری
- متصل
- در نظر بگیرید
- توجه
- محدودیت ها
- مصرف
- شامل
- محتوا
- زمینه
- زمینه ها
- محاورهای
- گفتگو
- هسته
- همتایان
- ایجاد
- ایجاد
- بحرانی
- بسیار سخت
- مشتری
- داده ها
- تحلیل داده ها
- مجموعه داده ها
- روز
- تعریف می کند
- سرپیچی کردن
- ارائه
- ارائه
- غرق کردن
- دموکراتیک می کند
- نشان دادن
- طراحی
- با وجود
- توسعه
- توسعه دهندگان
- در حال توسعه
- پروژه
- دستگاه ها
- مختلف
- اختیار
- مختلف
- do
- میکند
- نمی کند
- انجام شده
- دانلود
- دانلود
- دو
- e
- هر
- به آسانی
- لبه
- آموزش
- به طور موثر
- بهره وری
- موثر
- موثر
- ظهور می کند
- به کار گرفته شده
- جذاب
- کافی
- تمام
- محیط
- محیط
- دوره ها
- تجهیزات
- عصر
- به خصوص
- اتر (ETH)
- حتی
- مثال ها
- انتظارات
- انتظار می رود
- تجربه
- اکتشاف
- وسیع
- استخراج
- خیلی
- چهره
- بسیار
- شگفت انگیز
- FAST
- امکان پذیر است
- ویژگی
- امکانات
- باز خورد
- رشته
- پایان
- نام خانوادگی
- متمرکز شده است
- پیروی
- رد پا
- برای
- قالب
- رایگان
- از جانب
- بیشتر
- تولید می کنند
- تولید
- تولید می کند
- مولد
- نسل
- دریافت کنید
- گرفتن
- دادن
- داده
- جهانی شده
- رفته
- خوب
- گوگل
- کردم
- GPU
- GPU ها
- دستور زبان
- بیشتر
- در حال رشد
- نیم
- دسته
- اداره
- سخت
- سخت افزار
- آیا
- از این رو
- اینجا کلیک نمایید
- زیاد
- بالا پایان
- با کیفیت بالا
- صفحه اصلی
- اتوماسیون خانگی
- چگونه
- HTTPS
- i
- if
- انجام
- پیاده سازی
- واردات
- مهم
- بهبود یافته
- in
- شامل
- شامل
- از جمله
- فرد
- اطلاعات
- در ابتدا
- نوآوری
- نصب
- ادغام
- اطلاعات
- هوشمند
- فعل و انفعالات
- تعاملی
- در ارتباط بودن
- حد واسط
- اینترنت
- اینترنت از چیزهایی که
- به
- پیچیده
- اینترنت اشیا
- دستگاه های iot
- موضوع
- IT
- ITS
- خود
- جان
- تنها
- نگهداری
- نگه داشته شد
- کلید
- شناخته شده
- کومار
- عدم
- چشم انداز
- زبان
- زبان ها
- بزرگ
- در مقیاس بزرگ
- بزرگتر
- لایه
- یادگیری
- طول
- کمتر
- سبک
- پسندیدن
- محدودیت
- محدود شده
- دسترسی محدود
- پشم لاما
- بار
- بارگیری
- نگاه کنيد
- کاهش
- دستگاه
- فراگیری ماشین
- ماشین آلات
- عمدتا
- ساخت
- باعث می شود
- ساخت
- روش
- تولید
- بسیاری
- علامت
- بازار یابی (Marketing)
- مری
- ریاضی
- ریاضی
- ماده
- حداکثر عرض
- بیشترین
- ممکن است..
- رسانه ها
- ذکر
- قدرت
- گم
- موبایل
- برنامه های موبایل
- تلفن همراه برنامه های
- مدل
- مدل سازی
- مدل
- بیش
- کارآمدتر
- چندگانه
- نام
- طبیعی
- زبان طبیعی
- پردازش زبان طبیعی
- لازم
- نیاز
- ضروری
- جدید
- nlp
- نه
- قابل توجه
- اکنون
- عدد
- کارت گرافیک Nvidia
- هدف
- به دست آمده
- of
- غالبا
- قدیمی
- بزرگتر
- on
- ONE
- فقط
- منبع باز
- OpenAI
- کار
- or
- دیگر
- دیگران
- ما
- خارج
- تولید
- روی
- متعلق به
- پارامتر
- پارامترهای
- بخش
- ویژه
- عبور
- مسیر
- الگو
- الگوهای
- مردم
- کامل
- کاملا
- انجام دادن
- کارایی
- انجام می دهد
- شخصی
- فاز
- محل
- سکو
- افلاطون
- هوش داده افلاطون
- PlatoData
- موقعیت
- قوی
- پتانسیل
- قدرت
- قوی
- عملی
- قبلی
- چاپ
- روند
- در حال پردازش
- محصول
- پروژه
- نمونه اولیه
- نمونه سازی
- ارائه
- ارائه
- منتشر شده
- قرار می دهد
- پــایتــون
- کیفیت
- سوالات
- سریع
- به سرعت
- تمشک
- تمشک پی
- نسبتا
- نسبت
- خواندن
- زمان واقعی
- واقعا
- با توجه
- منتشر شد
- مربوط
- قابل توجه
- مورد نیاز
- تحقیق
- محققان
- منابع
- منابع
- پاسخ
- پاسخ
- منحصر
- نتیجه
- نتایج
- بازده
- فاش می کند
- تنومند
- دویدن
- در حال اجرا
- نگهداری می شود
- سناریوها
- علم
- نمره
- نمرات
- به ثمر رساندن
- بخش
- مشاهده گردید
- جمله
- تنظیم
- تنظیمات
- چند
- نشان داده شده
- نشان می دهد
- امضاء
- مشابه
- ساده
- اندازه
- مهارت ها
- کوچک
- کسب و کارهای کوچک
- کوچکتر
- هوشمند
- خانه هوشمند
- گوشی های هوشمند
- So
- آگاهی
- رسانه های اجتماعی
- برخی از
- منبع
- خاص
- مشخص شده
- سرعت
- سرسام آور
- می ایستد
- آغاز شده
- وضعیت هنر
- بیانیه
- گام
- مراحل
- هنوز
- توقف
- ذخیره شده
- تلاش
- دانشجویان
- مورد مطالعه قرار
- مهاجرت تحصیلی
- سبک
- چنین
- مناسب
- بررسی
- SVG
- به سرعت
- ترکیبی
- سیستم
- سیستم های
- جدول
- گرفتن
- Takeaways
- صورت گرفته
- طول می کشد
- وظایف
- تیم
- فن آوری
- پیشرفته
- می گوید
- قالب
- آزمون
- اراده
- آزمایش
- تست
- متن
- نسبت به
- که
- La
- منبع
- شان
- آنجا.
- اینها
- اشیاء
- این
- کسانی که
- سه
- از طریق
- بدین ترتیب
- زمان
- به
- امروز
- رمز
- نشانه
- TONE
- در زمان
- ابزار
- ابزار
- تاپیک
- آموزش دیده
- آموزش
- ترجمه
- تریلیون
- درست
- امتحان
- دو برابر
- نوع
- فهمیدن
- درک
- متحمل چیزی شدن
- غیر منتظره
- منحصر به فرد
- بر خلاف
- استفاده کنید
- استفاده
- کاربر
- با استفاده از
- استفاده کنید
- ارزشمند
- ارزش
- متغیر
- نسخه
- حیاتی
- می خواهم
- بود
- مسیر..
- we
- پوشیدنی
- وب سایت
- بود
- چی
- چه زمانی
- که
- در حین
- WHO
- وسیع
- گسترده تر
- بطور گسترده
- اراده
- با
- بدون
- مهاجرت کاری
- با این نسخهها کار
- جهان
- سال
- شما
- زفیرنت