# Data Science Roadmap

A curriculum covering Data Science topics relevant to Machine Learning, Bioinformatics, and Artificial Intelligence. Includes necessary background in mathematics and the sciences.

Suggestions:

- Do all courses/books in order
- Don’t skip anything (if you’re tempted to skip a course because you think you already know it, take the final, if you pass, then skip).
- Projects are listed at the end of each Tier, you should consider them assigned at the beginning of the Tier, and due at the end – so work on them alongside the books and courses as you work your way through a Tier.
- For any project, feel free to use Python or R. Python is introduced earlier in the guide so it’s likely you’ll use it nearly exclusively for the earlier projects. It would be a good idea to lean more heavily on R in the latter projects to ensure you have substantial practice in both.
- Make a point of building your GitHub profile during your study so that over time you show a history of data science projects.

If you haven’t already, begin blogging about what you’re learning on Medium (or elsewhere). Try to focus on tutorials for topics you’ve learned well that might be helpful to others.

## Blogs

Bookmark these blogs and try to read at least one article from one of them per week.

- Data School
- Machine Learning Mastery
- No Free Hunch – Kaggle’s blog
- Pete Warden’s Blog
- The Unofficial Google Data Science Blog
- Data Science Central

## Level 0 – prep

- Course: Question Everything: Scientific Thinking in Real Life
- Book: College Algebra or Course: College Algebra and Problem Solving or thorough knowledge of Algebra.
- Book: Precalculus or Course: Precalculus or Khan Academy or thorough knowledge of Precalculus
- Course: Introduction to Computer Science and Programming Using Python
- Course Series: Data Science Path on Cognitive Class
- Book: Think Python
- Project: Do all problems in Rosalind’s Python Village
- Project: Titanic: Machine Learning from Disaster
- Project: Complete one Bot Programming Competition on CodinGame

## Level 1 – Foundations

- Course: M001: MongoDB Basics – note: this course is only intermittently offered, sign up at the start of the Tier, then proceed to the other courses and return to this one when it’s in session
- Course: MM220P: MongoDB for Python Developers – note: this course is only intermittently offered, sign up at the start of the Tier, then proceed to the other courses and return to this one when it’s in session
- Course: Introduction to Biology – The Secret of Life
- Course: Single Variable Calculus
- Course: Introduction to Probability and Statistics
- Course Series: Deep Learning Path on Cognitive Class
- Course: Introduction to Computational Thinking and Data Science
- Course: Data Science
- Book: Think Stats
- Course: Programming for the Web with JavaScript
- Course: Introduction to Solid State Chemistry
- Course: Multivariable Calculus
- Book: An Introduction to Statistical Learning
- Course: Principles of Biochemistry
- Book: The Elements of Statistical Learning
- Course: Intro to Machine Learning
- Project: Complete the Hackerrank Python Track
- Project: Do 10 problems (of your choice) on Rosalind
- Project: House Prices: Advanced Regression Techniques
- Project: Complete one competition of your choice from Crowd Analytix
- Project: Complete one Bot Programming Competition on CodinGame
- Project: Complete Deep Learning – TensorFlow on CodinGame

## Level 2 – develop more expertise

- Course: MongoDB Performance – note: this course is only intermittently offered, sign up at the start of Tier 2, then proceed to the other courses and return to this one when the next session opens up.
- Book: Biology
- Course: Linear Algebra
- Book: Think Bayes
- Course: Proteins: Biology’s Workforce
- Course: Mathematics for Computer Science
- Course: Python for Data Science
- Course: Data Visualization and D3.js
- Course: Statistics and Probability in Data Science using Python
- Book: Think DSP
- Course: DNA: Biology’s Genetic Code
- Course: Machine Learning
- Book: Think Complexity
- Course: Database Mini-Courses – take all mini-courses
- Course: Deep Learning
- Project: Do 20 problems (of your choice) on Rosalind
- Project: Digit Recognizer
- Project: Complete the Hackerrank Probability Challenges
- Project: Complete the Hackerrank Linear Algebra Foundations Challenges
- Project: Complete one competition of your choice from Crowd Analytix
- Project: Complete one Bot Programming Competition on CodinGame

## Level 3 – Add more rigor

- Course: Introduction to Algorithms (Python – requires book purchase) or Algorithms, Part I and Algorithms, Part II (Java)
- Course: Design and Analysis of Algorithms (Python – requires book purchase) or Analysis of Algorithms (Java)
- Book: The Art of R Programming
- Course: Mathematical Biostatistics Boot Camp 1
- Course: Mathematical Biostatistics Boot Camp 2
- Course Series: Big Data Path on Cognitive Class
- Course: Convex Optimization
- Book: R for Data Science
- Course: Probability: Basic Concepts & Discrete Random Variables
- Course Series: Hadoop Path on Cognitive Class
- Course: Probability: Distribution Models & Continuous Random Variables
- Course: Introduction to Mechanics, Part 1
- Course: Electricity & Magnetism, Part 2
- Course: Statistics for Applications
- Course: Intro to Hadoop and MapReduce
- Course: Differential Equations
- Project: Do 30 problems (of your choice) on Rosalind
- Project: Complete one competition of your choice on Kaggle
- Project: Complete the Hackerrank Algorithms Challenges
- Project: Complete one competition of your choice from Crowd Analytix
- Project: Complete one competition of your choice from Analytics Vidhya

## Level 4 – Practical applications

- Book: Machine Learning with R
- Course: Statistics and R for the Life Sciences
- Course: Introduction to Linear Models and Matrix Algebra
- Book: The Quest for Artificial Intelligence
- Course: Statistical Inference and Modeling for High-throughput Experiments
- Course: High-Dimensional Data Analysis
- Course: Introduction to Bioconductor: Annotation and Analysis of Genomes and Genomic Assays
- Course: High-performance Computing for Reproducible Genomics
- Course: Case Studies in Functional Genomics
- Course: Quantum Mechanics for Everyone
- Course: Artificial Intelligence (AI)
- Course: Machine Learning
- Book: Multiagent Systems
- Course: Robotics
- Course: Animation and CGI Motion
- Project: Do 30 problems (of your choice) on Rosalind
- Project: Complete one competition of your choice on Kaggle
- Project: Complete one competition of your choice on Kaggle
- Project: Complete the Hackerrank Artificial Intelligence Challenges
- Project: Complete one competition of your choice from Crowd Analytix
- Project: Complete one competition of your choice from Analytics Vidhya
- Project: Complete one competition of your choice from Driven Data

## Level 5 – Bonus Round – Advanced

- Course: Topics in Mathematics of Data Science
- Book: Bayesian Methods for Hackers
- Book: Bayesian Methods in the Search for MH370
- Course: Mathematics of Machine Learning
- Book: Mining of Massive Datasets
- Book: Informatics in the Future
- Course: Discrete Stochastic Processes
- Book: Bisociative Knowledge Discovery
- Course: Dynamic Systems and Control
- Book: New Horizons for a Data-Driven Economy
- Book: The Challenge of Chance

## Attribution

- Many of the courses listed closely mimic the list from Open Source Society University – Computer Science
- Many of the topics selected to augment were inspired by Google Interview University
- Many of the projects were inspired by (or are directly taken from) Free Code Camp, The Odin Project and Udacity

## Responses