This looks great! Thanks for sharing. Interestingly enough, from looking at the table of contents, it seems this book starts with a more (and welcome) pragmatic approach, where you write some python code before, look at data visualisation techniques, etc, before delving into stats.
I haven't done the course yet, I've just found it. But, from the rationale video, the course seems to be more about weaving recurrent fundamental data science concepts throughout, emphasizing one particular concept or technique in each chapter, so I guess that it would make more sense to take it as a whole.
It is intended as a "glue" course, having completed CS fundamentals and before core data science courses, like statistics, machine learning and databases, giving students a context for what lies ahead, and just enough to be dangerous and start doing data science stuff.
If this is what you are after, you may also want to consider CMU's "Practical Data Science", which seems to have a similar approach, videos, much more machine learning and big data, and is also very current, but doesn't have such a nice companion online book (but the notes look great) and has much less statistics: http://datasciencecourse.org
Both look like great DS intro courses from top universities, we are spoilt.
And then, also from Berkeley, there is "Data 8", which is intended for those who want an intro to data science, but don't have any programming or college math knowledge yet; it also has a similar online book with working links to Jupyter notebooks: http://data8.org/sp19/ (and videos: https://www.youtube.com/playlist?list=PLXbeRfilLvMoC3QZKxRrp...)
As I understand things, the Cartesian product (AKA the cross join) cannot be nicely depicted using Venn diagrams, you're right. However, Venn diagrams are a great way to depict the set logic that applies to the join keys of left, right, inner, and outer joins.
I thought a cartesian (cross) product produced an ordered output (tuple), an element from each set?
I don't have any experience with data science, but my brain wants to apply linear algebra and set theory...
So, in the above linked example, to clean we would first do an intersect operation on user names to remove people who don't appear in each set.
Then, to put the tables together (to append emails to appropriate names) we do a cross product between the filtered sets (assuming the sets have been ordered).
Is my intuition correct? I also have zero experience with DBs.
I'd assume because you have the --no-install-recommends flag on your apt-get call. Maybe something you're doing requires the recommended (but not dependent) packages. I haven't done it yet, but that's my assumption at first glance, so take it with a grain of salt.
I haven’t looked at the material yet, but I did try to read Deborah Nolan’s book Data Science in R, and it was a confounding experience. I remember thinking “the material in this book is so far from anything that I’ve ever heard described as ‘Data Science’ that it renders the phrase useless”
I've never looked at Data Science in R, but Hadley Wickam's R for Data Science is great in my opinion. Really applicable, down to earth, and focuses much more on the meat of data science (data manipulation and munging, visualization, relational data, and efficient programing) more than the typical "fit a neural network to this idealized toy data set!"
Course design: https://youtu.be/HITIm3KoU2U
Course website: http://www.ds100.org/sp19/