Beginner's Guide to Data Science


Ready to learn the "Sexiest Job of the 21st Century" in 2018? Excellent! This guide puts together the best YouTube tutorials for gaining hands-on Data Science skills at beginner to intermediate level.

A Data Scientist masters the field of Computer Science, Math & Statistics and combines that with Subject Matter Expertise.

The resources listed on this page primarily focus on acquiring Programming, Data Visualization, Relational Databases and Machine Learning skills along with a quick refresher in Statistics & Probability theory at undergraduate level. Whether you have a background in engineering, law, or art it does not matter as this guide does not assume any prior knowledge except for a bit of high school math.

Happy learning!

268 Free Data Science Resources


  1. Python Programming

    1. Python Programming for Data Science
    2. Channel Level Hours Upvotes Resources Requirements
      Corey Schafer, codebasics Introductory 4 98% (average) Github Command line (see tip below)
      Contents
      • Install Anaconda
      • Start Jupyter Notebook
      • Launch Python File
      • Run cells
      • Insert cells
      • Code / Markdown Cells
      • Magic commands
      • Opening External Notebooks
      • Variables
      • Printing
      • Math Operators (+, -, /, **, %)
      • Rounding
      • Strings
      • Slicing
      • Lists
      • Append items
      • Input
      • If, Elif, Else Statement
      • In
      • For loop
      • Range function
      • Functions
      • Return statement
      • Arguments / Parameters
      • Dictionaries (keys, values)
      • Tuples
      • Modules
      • Reading / Writing Text Files
      • JSON
      • __name__
      • Exception Handling (try, except, finally)
      • Classes
      • List / Set Comprehensions
      • Zip
      • (Frozen) Sets
      Review

      For this guide I assumed that you have no prior knowledge or experience whatsoever, which is why the first two videos of the series teach you all you need to know about a “Integrated Development Environment” (often abbreviated as IDE) very popular for Data Science: Jupyter Notebooks. In case you plan to use an online programming environment, I would still recommend you to watch these tutorials as they also explain how to work with Juypter Notebooks.

      In the next 18 videos common Python concepts in the field of Data Science are discussed. Please note that even though the teacher uses another IDE, you can still follow along with a Jupyter Notebook. The reason I still picked the codebasics tutorials is because the pace and teaching style is excellent for absolute beginners.

      Tip 1: Do the interactive demo “How to use the command line“ (by General Assembly) in advance to be more comfortable with the terminal.

      Tip 2: Adjust the playback speed to 1.5x or 2.0x (via the gear icon in the play bar) to go through the codebasics videos more quickly!

      Tip 3: The easiest way to create a module in Juypter Notebook is by downloading it as a .py-file (File >> Download as >> Python) and subsequently importing it in your notebook. You may need to remove the “.html” extension of the downloaded file (e.g. “my_module.py.html”).

    3. Python Programming Fundamentals
    4. Channel Level Hours Upvotes Resources Requirements
      Joe James Introductory 1.75 100% Github None
      Contents
      • Variables
      • (Im)mutability
      • If-elif-else statements
      • For-loops
      • Functions
      • Lists
      • Tuples
      • Sets
      • Dictionaries
      • Reading and Writing a Text-file
      • Lambda
      • Map
      • Filter
      • Reduce
      • Recursion
      • Sorting
      • Exception Handling
      Review

      After an enervating intro song you can expect an outstanding clarification of the building blocks of Python. I would specifically recommend the 7th tutorial about the the most prominent Python data structures which is as you can read in the comments below the video an excellent summary.

      Note, the first few videos in the series have significant overlap with the previous recommendation. However, both teachers are great and as a beginner it doesn't hurt to hear an explanation from two unique perspectives.

    5. Data Analysis in Python with pandas
    6. Channel Level Hours Upvotes Resources Requirements
      Data School Introductory 7 99% Github Python (beginner)
      Contents
      • Load CSV-Data
      • Select Columns
      • Rename Columns
      • Remove Columns
      • Create DataFrame
      • Sort DataFrame
      • Filter DataFrame
      • Row- / Column means
      • String Methods
      • Change Datatypes
      • Groupby Statement
      • Aggregate Functions
      • Handle Missing data
      • DataFrame Index
      • Create DataSeries
      • Concatenate DataSeries
      • Backwards fill
      • Convert to CSV/Pickle
      • Data Sampling
      • Dummy Variables
      • Date Time Data
      • Handle Duplicates
      Review

      Although having a basic understanding of Python is essential for data science, in practice you’ll most likely opt for a data manipulation library such as pandas. The playlist consists of 31 videos which provide an excellent foundation in the first few steps of the data pipeline: from reading data to filtering/grouping data-frames. The teacher, Kevin, is a gifted communicator who masters the craft of teaching. Therefore it’s no surprise that he gets an upvote ratio of 99% (n=443).

      Tip: Adjust the playback speed to 1.5x or 2.0x (via the gear icon in the play bar) to go through the videos more quickly!

  2. Sign Up for the Data Science Email Course

    Only watching YouTube videos does not make you a data science unicorn 🦄
    Leave your email address below and I'll send you some helpful Python Programming, SQL and Statistics exercises (and solutions) to test your understanding!

  3. Data Visualization

    1. Matplotlib
    2. Channel Level Hours Upvotes Resources Requirements
      codebasics Introductory 0.75 100% Github Python (beginner)
      Contents
      • Linecharts
      • Plot parameters (color, linewidth, linestyle, marker, markersize, alpha)
      • Chart properties (labels, title)
      • Format strings
      • Multiple plots
      • Legends
      • Grid
      • Bar chart (horizontal/vertical)
      • Histograms
      • Pie charts (though I highly discourage you to ever use it ;-)
      Review

      Once you have seen how to work with Matplotlib it’s a piece of cake, but first use can be intimidating. This video series provides an excellent introduction to the most used chart parameters (color, marker(size), alpha) and types of plots (lineplot, bar chart, histograms).

    3. Plotting in Python using Matplotlib
    4. Channel Level Hours Upvotes Resources Requirements
      Max Schallwig Introductory 1.5 100% Football data Python (beginner)
      Contents
      • Lineplots
      • Scatterplots
      • Chart properties (xlim, ylim, color, size)
      • Multiple plots in one chart
      • Axis labels
      • Annotation (text in graph, arrows)
      • Legends
      • 1D/2D Histograms
      • Scaling axes (logarithmic, linear)
      Review

      Max does a great job in his brief and easy-to-follow tutorials which give you a good introduction to the basic functionality of Matplotlib. By the way, he offers the exact same content on Udemy for $20, lucky you!

      There is a bit of overlap with the previous recommendation (videos about "Axes" and "Histogram"), though the majority of the content covers different aspects. Besides, it doesn't hurt to hear a similar story from two angles.

  4. Relational Databases

    1. SQL Tutorials
    2. Channel Level Hours Upvotes Requirements
      Corey Schafer Introductory 0.5 97% None
      Contents
      • Create Database
      • Create Table
      • Insert Records
      • Update Records
      • Delete Records
      • Retrieve Records (SELECT, WHERE, ORDER BY)
      Review

      This 5-part series serves as a preparation for the following two recommendations. It learns you how to get started with SQL, set-up a database, create a table and do very basic queries.

      In the video Corey shows you how to do that with PSequel which is a beautiful and intuitive SQL editor without overloading you with an abundance of features you are never going to use. Moreover, it allows you to easily save bookmarks of your queries (handy for homework exercises!). In other words, I would highly recommend Mac-users to install the (free!) application!

      Pro-tip: Try out the following shortcuts in PSequel for toggling between the views (Cmd+1; Cmd+2; Cmd+3; Cmd+4; Cmd+5).

    3. SQL Server Queries
    4. Channel Level Hours Upvotes Resources Requirements
      WiseOwlTutorials Introductory 2.25 97% See the video its description for a written version of the tutorial and the Movie Database Knowledge about how to create a database
      Contents
      • SELECT
      • FROM
      • WHERE
      • ORDER BY
      • Calculated Columns
      • CASE
      • INNER JOIN
      • OUTER JOIN
      • Functions
      • GROUP BY
      • HAVING
      • Subqueries
      • Correlated Subqueries
      Review

      This series goes more into depth compared to the previous one which means explains why the tutorials are typically a bit longer. Even though the videos were published in 2012 (more than 5 years ago!) and the tutorials are actually meant for Microsoft SQL Server, most of the explanations are also relevant for PostgreSQL (i.e. a more modern version of SQL) because the syntax is almost identical. If you are watching tutorials 7-9 (Functions, Text Calculations, Date Calculations) though, an error will be raised if you try to execute the same code from the videos. If you want to follow along, you can look for the corresponding PostgreSQL commands in the documentation (a great exercise by the way).

      The reason I still picked these somewhat outdated tutorials is because in my opinion the explanations are one of the best available on YouTube.

    5. SQL (Advanced), Data Modelling and Normalization
    6. Channel Level Hours Upvotes Requirements
      Techtud, kudvenkat, Bart Baesens, channel5567 Intermediate 2 98% (average) SQL (beginner)
      Contents
      • IN
      • ANY
      • ALL
      • Correlated queries
      • COUNT
      • MIN
      • MAX
      • AVG
      • Check constraint
      • Unique constraint
      • Views
      • OVER
      • PARTITION BY
      • ROW_NUMBER()
      • RANK()
      • Entity Relationship Model
      • Cardinalities
      • Normalization (1NF, 2NF, 3NF, 4NF)
      Review

      Since I was not able to find a single playlist that contained all the topics which were treated in my own Foundations of Databases (JBP050) university course, I have composed a miscellaneous playlist that consists of videos from multiple YouTube channels. The majority comprises SQL-tutorials and the last two tutorials covers theory about ER-models and Normalization which is something you must know if you want to set-up a database from scratch.

  5. Statistics & Probability

    1. Quantitative Methods
    2. Channel Level Hours Upvotes Software Requirements
      dataminingincae Introductory 3 98% Gretl High school math
      Contents
      • Simple Linear Regression
      • Multiple Linear Regression
      • Standard Error
      • Confidence Intervals
      • P-values
      • R-Squared
      • Adjusted R-Squared
      • Regression Standard Error (RSE)
      • Point Estimates
      • Dummy Variables
      • Backwards Elimination
      • Logistic Regression
      Review

      This series is taught by a professor of a prestigious Business School (INCAE). The first part lays a solid theoretical foundation in statistical modeling after which an open-source WYSIWYG (i.e. drag and drop) tool - Gretl - is used to illustrate the concepts in practice. This way you can play around with the mathematical tools even if you cannot program yet. You may ask yourself why you should spend your time learning Gretl, well I believe it’s not really a waisted effort since the output in statistical packages in Python (e.g. Statsmodels) and R looks very similar.

    3. Statistics
    4. Channel Level Hours Upvotes Requirements
      statslectures Introductory 5.5 98% None
      Contents
      • Sample vs Population
      • Sampling Methods
      • Bar Charts
      • Histograms
      • Mean, Mode, Median
      • Standard Deviation
      • Variance
      • Skewness
      • Binomial Distribution
      • Normal Distribution
      • Poisson Distribution
      • Z-scores
      • Combinations vs Permutations
      • Correlation
      • Linear Regression
      • Confidence Intervals
      • T-distribution
      • Type I and II error
      • Effect size
      • One-sample / Two-sample tests
      • Independent / dependent tests
      • ANOVA
      • Chi-Square Test
      Review

      Ever attended everlasting stats lectures at uni without having any clue of what’s actually going on? Then a new world opens up once you have watched these 86 extremely concise statistics videos. Theory is accompanied with concrete and understandable everyday examples. The teacher does not shy away from writing out entire examples which makes the material very tangible. This may give you even more admiration for all the data science tools that make our lives so much easier.

      Tip: On the channel you also find a SPSS playlist in which he demonstrates the use of analytics software to do complex operations he also showed in the other videos with only a few mouse clicks.

  6. Machine Learning

    1. Machine Learning in Python with scikit-learn
    2. Channel Level Hours Upvotes Resources Requirements
      Data School Introductory 7 98% Scikit learn videos Pycon 2016 (Text ML) Python (beginner), Pandas (beginner)
      Contents
      • Supervised/unsupervised learning definitions
      • scikit-learn vs R for machine learning
      • Jupyter Notebook
      • kNN algorithm (+ choosing number of neighbours)
      • Logistic Regression
      • Linear Regression (+ interpretation of coefficients)
      • Train / Test split
      • K-fold cross validation
      • Manual feature selection
      • Performance metrics (accuracy, Mean Absolute Error, Mean Squared Error, Root Mean Squared Error)
      • Grid search (+ RandomizedSearchCV)
      • Text Machine Learning (vectorization, Naive Bayes (high level theory + practice), parameter tuning (stopwords, n-grams, min/max word frequency))
      Review

      Characteristic of Kevin’s teaching style is a step by step approach using very detailed Jupyter Notebooks which contain a lot of comments and easy-to-understand explanations and thus makes it excellent reference material (check out his Github page!). Because he explains every single line of code he types you will fully understand what you are doing which makes his videos accessible to newcomers. Although most videos stay at a basic level plenty of additional resources (blogs/videos/books) are included and mentioned throughout the videos series for further deepening.

      Tip: The last video in the series is a live recording from PyCon 2016; if you have watched the previous videos you can skip the first 24 minutes and start with part 2: Representing text as numerical data.

    3. Machine Learning
    4. Channel Level Hours Upvotes Requirements
      Cognitive Class Intermediate 1.5 100% A high-level understanding of data science
      Contents
      • Supervised / unsupervised definitions
      • Machine Learning Applications
      • Statistical Modelling vs Machine Learning
      • K-Nearest Neighbours
      • Regression vs Classification
      • Decision Trees
      • Information Gain / Entropy
      • Random Forest
      • Bias-variance tradeoff
      • MAE, MSE, RMSE
      • Train-Test Split
      • K-fold Cross-Validation
      • Overfitting / Underfitting
      • K-Means Clustering
      • Hierarchical Clustering
      • Agglomerative vs Divisive Clustering
      • Dendrogram
      • Singe Linkage Criteria
      • Complete Linkage Criteria
      • Average Linkage
      • Centroid Linkage
      • Density Based Clustering
      • Dimensionality Reduction
      Review

      This course is more on a conceptual level and goes beyond the scope of the previous recommendation. In particular, more attention is paid to unsupervised clustering methods such as K-Means Clustering, Hierarchical Clustering and Density Based Clustering. The course excels in its presentation style which include visual animated slides. Some topics might be a bit more advanced than usual which is why it has been categorized as an Intermediate level course.

      Tip: If you liked this series you can find more similar online courses at Big Data University!

  7. Collaboration & Academic Writing

    1. Version Control with Git and Github
    2. Channel Level Hours Upvotes Requirements
      Data School Introductory 0.5 98% Command line (see below)
      Contents
      • README.md
      • Creating a repository
      • Cloning a repository
      • Forking a repository
      • Committing changes
      • Pulling changes
      • Git ignore
      Review

      It should be clear by now that I am a huge fan of Kevin’s comprehensible tutorials and this series does not form any exception to that rule. What I liked in particular is that he makes such a complicated topic as version control not overly complex.

    3. LaTeX
    4. Channel Level Hours Upvotes Resources Requirements
      Michelle Krummel Introductory 4 98% LaTeX Source Code None
      Contents
      • Line breaks
      • Math mode
      • Fractions
      • Tables
      • Equations
      • Ordered list
      • Unordered list
      • Text size
      • Sections
      • Subsections
      • Table of Contents
      • Macros
      • Images
      • Footnotes
      • Captions
      • References
      Review

      So far Microsoft Word or Google Docs may have been your go-to text processing tool. Although very user-friendly those tools aren’t considered very effective for scientific writing in mathematical domains such as Data Science. And even if your paper does not contain any equation whatsoever, it can still be a time-saver to use LaTeX thanks to its automatic section- and figure numbering (you don’t want to update all figure numbers manually after you inserted a new figure, do you?).

      As for the content, Michelle slowly builds upon a LaTeX document from scratch with line-by-line bite-sized instructions. The pace gradually goes up as you move through the series with as a great finale tutorial 14 which gives a realistic use case of LaTeX as a tool for doing Calculus homework.

      Tip: The last two tutorials can be considered optional (but interesting!) as they are not related to academic writing.

Sign Up for the Data Science Email Course

Only watching YouTube videos does not make you a data science unicorn 🦄
Leave your email address below and I'll send you some helpful Python Programming, SQL and Statistics exercises (and solutions) to test your understanding!

FAQ

There are plenty of "Top Data Science Course" rankings out there, how is this one any different?


Well, I have looked into plenty of those rankings myself but most MOOC-recommendations weren't really what I was looking for as a beginner to be honest. This guide is different in the sense that it mainly centres around practical YouTube tutorials taught by industry practitioners who know the science of teaching. The latter is a rare ability for most programmers, especially those who can describe the "why" in addition to the "how"; simply typing out code is not that difficult, explaining others in plain English why you are doing something makes you a next-level instructor. Further all recommendations are accompanied with an overview of the contents and a review - so that you can get an idea of what to expect - which is also not very common for most rankings.

On a final note, the weekly email course has been designed because challenging online programming exercises are difficult to find online. The exercises that come with the course are beyond beginner level and can typically not be solved by a single line of code.

The curriculum does not include any R videos (only Python). Why is that?


Very sharp, you are absolutely right! One of the questions people new to the field of data science typically ask themselves is: should I learn Python or R? If you simply Google "Python vs R for Data Science" you will find an abundance of discussions on this topic. From what I've read Python usually comes out as the ultimate winner (especially for Deep Learning applications - a subset of Machine Learning that tries to mimic the human brain). That does not mean that you should neglect R of course, but as a beginner I think it's better to become very knowledgeable in a single programming language rather than be mediocre at two.

What makes a Data Science unicorn? 🦄


In the beginning of this academic year I interviewed the Academic Director of my university, prof. dr. Willem-Jan van der Heuvel, and that was also one of the questions I asked him. In summary he mentioned the following skills: "Analytical Skills" (e.g. Statistics & Probability), "Engineering Skills" (e.g. Programming but, for example, also distributed computing frameworks) and third: "Domain Knowledge" (e.g. a subversion expert). If you are interested, you can read the corresponding Medium article here.

How do I become a Data Science unicorn?


That is the logical follow-question obviously. First, let me say that I am not the one to answer that question, hence I will share the advice I got from data people way smarter and more experienced than me to answer a slightly rephrased question: What is the best way to learn data science?

If you look into the week by week schedule of data science (and coding) bootcamps, you will find that the last weeks of the program are usually dedicated towards students' capstone projects. That are learning projects which entail all steps of the data science pipeline: from data collection to presenting to the end customer. Because of this scope students experience all the tasks involved in a everyday and thus representative data science project. This typically implies you spend the majority of your time wrangling and cleaning the data (because remember: on average data scientists spend 80% of their time cleaning the data). On that note, some folks aren't that keen on Kaggle competitions (i.e. machine learning model building contests) for learning purposes; since most data sets available on the platform have already been fully cleaned. But the main take-away here is to think of any topic you like (and in which you preferably have some kind of subject matter expertise) and work from there. After all, data is everywhere these days so it should not be that difficult to think of intriguing research questions related to your interests.

On a personal note, I would add that while watching tutorials on YouTube you may sometimes get the impression that you master the material. I want to warn you against that false impression because coding something from scratch is significantly harder than retyping (or even worse: copy pasting) others' code. A better measure to test your understanding is by applying your knowledge in an entirely different use case.

What does "Upvotes" mean and how is it calculated?


On Coursera, edX and Udemy course ratings are displayed as a star rating (between 0-5 stars). On YouTube there is a similar mechanism to measure user feedback: up- and downvotes. A derivative thereof, the upvote ratio: #upvotes/(#upvotes + #downvotes) * 100%, has been included for each of my recommendations (calculated for the first video of the series).

Note that, I deliberately did not take into account total number of views neither the subscriber count since there are a wealth of hidden gems out there which - as the name implies - are still relatively unknown.. 💎

What selection criteria did you take into account?


The following aspects have been considered: Communication Skills (making the complex simple, for example by providing concrete examples), Teaching Style (e.g. live demos for programming lessons, animated PowerPoints, pace, etc.), Required Level of Prior Knowledge (i.e. accessible to newcomers), Upvote Ratio (see point above), Availability of additional resources or reference material (Github, blogs, etc.).

The title says "Top Data Science Videos 2018" but most of the content originates from way before 2018, isn't it?


Yeah again, you are right! Think of it as a catchy slogan ;-)

Where do I find the lecture notes of the Data Analytics for Engineers (2IAB0) tutorials?


Great to have you here, but you are probably looking for my Github repository which you can find over here. Simply, download the zip-file (easiest way) or clone it to your local hard drive (for instructions on how to do that see the Git tutorials above).

But since you are already here, let me give you another bonus tip: carefully study the Python, pandas, matplotlib, SQL and hypotheses testing tutorials listed above. I can ensure you they will be extremely helpful for assignment 1 and 2!

Why did you create this guide?


In the first place, because I could not find something similar on the world wide web (see first FAQ-question). And if you search for the topics listed on this page on YouTube yourself you will come to the conclusion that many recommendations do not show up in the top results. As these free resources have really helped me learn the very basics of data science I think it's worth sharing them. Second, it goes hand in hand with my motto: "Learn from the best".

Aside from those two motivations there is a practical reason too: it functions as a handy personal reference book for now and the years to come.

Is this guide updated regularly?


Indeed, it's a very dynamic list. For example, at the moment of writing I am taking online (and university) courses myself which means that there is a high probability that the number of free resources will soon increase even further. Sign up for the email course to get notified once new content has been added.

Can I contribute to this list and if so how?


Yes, you can! To add a recommendation to the list, fork this repository, update the README markdown file while maintaining the structure of the page and lastly create a pull request in which you briefly mention the change. Alternatively, send me an 💌 (see below).

How do I contact you?


Just shoot me an email at hoi@royklaassebos.nl

Built with ❤ in Eindhoven