چیت شیت های بیشتر علم داده

بازنشر افلاطون

دنبال: 0

We recently realized that we hadn’t brought you any data science cheatsheets in a while. And it’s not for their lack of availability; data science cheatsheets are everywhere, ranging from the introductory to the advanced, covering topics from algorithms, to statistics, to interview tips, and beyond.

But what makes a good cheatsheet? What makes a cheatsheet worthy of being singled out as a particularly good one? It’s difficult to put your finger on دقیقا چه چیزی یک چیت شیت خوب را می سازد، اما بدیهی است که اطلاعات ضروری را به طور مختصر منتقل می کند - اینکه آیا این اطلاعات از ماهیت خاصی برخوردار است - قطعا شروع خوبی است. و این چیزی است که نامزدهای امروز ما را قابل توجه می کند. بنابراین، چهار برگه متمم انتخاب شده را بخوانید تا به شما در یادگیری یا بررسی علم داده کمک کند.

اولین مورد است Aaron Wang’s Data Science Cheatsheet 2.0تلفیقی چهار صفحه ای از انتزاعات آماری، الگوریتم های اساسی یادگیری ماشین، و موضوعات و مفاهیم یادگیری عمیق. این به معنای جامع نیست، بلکه در عوض یک مرجع سریع برای موقعیت هایی مانند آماده سازی مصاحبه و بررسی امتحان، و هر چیز دیگری که نیاز به سطح مشابهی از عمق بررسی دارد. نویسنده خاطرنشان می کند که در حالی که کسانی که درک اولیه ای از آمار و جبر خطی دارند، این منبع را بسیار سودمند می دانند، مبتدیان باید بتوانند اطلاعات مفیدی را نیز از محتوای آن به دست آورند.

شکل
Screenshot from Aaron Wang’s چیت شیت علم داده 2.0

Our next cheatsheet offering today is that which Aaron Wang’s resource is based on, Maverick Lin’s Data Science Cheatsheet (Wang’s reference to his own as 2.0 is a direct nod to Lin’s “original”). We can think of Lin’s cheatsheet as more in-depth than Wang’s (though Wang’s decision to make his less in-depth seems intentional and a useful alternative), covering more fundamental data science concepts such as data cleaning, the idea of modeling, doing “big data” with Hadoop, SQL, and even the basics of Python.

Clearly this will appeal to those who are more firmly in the “beginner” camp, and does a good job of whetting appetites and making readers aware of the broad field of data science, and many of the varying concepts which it encompasses. This is definitely another solid resource, especially if the reader is newcomer to data science.

شکل
Screenshot from Maverick Lin’s چت شیت علم داده

As we move further back in time — seeking the inspiration for Lin’s cheatsheet — we come across Cheatsheet احتمال ویلیام چن 2.0. Chen’s cheatsheet has garnered much attention and praise over the years, and so you may have come across it at some point. Clearly with a different focus (given its name), Chen’s cheatsheet is a crash course on, or deep dive review of, probability concepts, including a variety of distributions, covariance and transformations, conditional expectation, Markov chains, various formulas of importance, and much more.

At 10 pages, you should be able to imagine the breadth of probability topics being covered herein. But don’t let that deter you; Chen’s ability to boil concepts down to their essential bullet points and explain in plain English while not sacrificing on essentials is noteworthy. It is also rich in explanatory visualizations, something quite useful when space is limited and the desire to be concise is strong.

Not only is Chen’s compilation a quality one and worthy of your time, as a beginner or someone interested in a full review, I would work in reverse order of how these resources were presented — from Chen’s cheatsheet, to Lin’s, and finally to Wang’s, building on top of concepts as you go.

شکل
Screenshot from William Chen’s احتمال تقلب 2.0

One final resource I’m including here, though not technically a cheatsheet, is Rishabh Anand’s Machine Learning Bites. Billing itself as “[a]n interview guide on common Machine Learning concepts, best practices, definitions, and theory,” Anand has compiled a wide ranging collection of knowledge “bites,” the usefulness of which definitely transcends the originally intended interview preparation. Topics covered within include:

معیارهای امتیازدهی مدل
به اشتراک گذاری پارامتر
k-Fold Cross Validation
انواع داده پایتون
بهبود عملکرد مدل
مدل های بینایی کامپیوتری
توجه و انواع آن
رسیدگی به عدم تعادل کلاس
واژه نامه Computer Vision
تکثیر پشت وانیل
منظم سازی
منابع

شکل
عکس از لقمه های یادگیری ماشینی

While machine learning “concepts, best practices, definitions, and theory” are touched on, as promised in the resource’s description of itself, these “bites” are definitely geared toward the practical, which makes the site complementary to much of the material covered in the three previously mentioned cheatsheets. If I were looking to cover all of the material in all four of the resources in this post, I would certainly look at this after the other three.

بنابراین در اینجا شما چهار چیت شیت (یا سه چیت شیت و یک منبع مجاور چیت شیت) دارید که برای یادگیری یا مرور خود استفاده کنید. امیدوارم چیزی در اینجا برای شما مفید باشد، و من از هر کسی دعوت می‌کنم تا چیت‌هایی را که مفید یافته‌اند در نظرات زیر به اشتراک بگذارند.