Getting Started with Data Science

Photo by Carlos Muza on Unsplash

INTRODUCTION

Data Science is arguably the most wanted career in this century, In today’s high-tech world, everyone has pressing questions that must be answered by “big data”. It is also a very wide field, which makes learning data science overwhelming, With so many questions to be asked.

1.) What/Who is a Data Scientist?

The first step is knowing who is a data scientist.

  • Gather data that might help you to answer that question
  • Clean the data
  • Explore, analyze, and visualize the data
  • Build and evaluate a machine learning model
  • Communicate results

2.) Choose a Programming Language

Python and R are both great choices as programming languages for data science.R tends to be more popular in academia, and Python tends to be more popular in the industry, but Python languages have a wealth of packages that support the data science workflow It also as a great community.

3.) Learn data analysis, manipulation, and visualization

If you choose to work with python, you will want to learn Pandas and NumPy Pandas provide a high-performance data structure (called a “DataFrame”). It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets, visualizing data, and so much more. Learning pandas will increase your efficiency when working with data.

4.) Learn machine learning

Building “machine learning models” can be used to predict the future or automatically extract insights from data, it is a very interesting part of data science.

5.) Understand machine learning in more depth

Machine learning is a complex field. Although scikit-learn provides the tools you need to do effective machine learning, it doesn’t directly answer many important questions:

  • How do I interpret the results of my model?
  • How do I evaluate whether my model will generalize to future data?
  • How do I select which features should be included in my model?
  • And so on…
  • Another great book is Pattern Recognition and Machine Learning by Christopher M. Bishop. It dives deep into the mysterious world of Pattern Recognition and Machine Learning. This book deals with tough topics that require at least some knowledge of multivariate calculus, basic linear algebra, and data science, It is the best book to bring home Pattern Recognition into your brain!!!

6.) Focus on practical applications and not just theory

  • Solve Problems with data science Find out about real-life problems that can be solved with data science and get on them. You can find a list of interesting problems here
  • Kaggle, Zindi competitions are a great way to practice data science without coming up with the problem yourself. Don’t worry your rankings, just focus on learning something new with every competition.
  • If you create your data science projects, you should share them on GitHub and include writeups. That will help to show others that you know how to do reproducible data science

7.) Join a community of data scientist around you.

Joining a community cannot be overemphasized. Why is this important? This is because a Community keeps you motivated. Starting a new field may seem a bit overwhelming when you do it alone, but when you have people around you doing the same, it will help keep you motivated.

8.) Keep learning

To keep learning, you have to surround yourself with every source of knowledge you can find. From book to blog posts, to video tutorial to online communities to following Ultimate Data Scientists on social, media

  1. StackExchange
  2. Reddit
  3. Kdnuggets
  4. Towards Data Science on Medium
  5. DataViz
  6. Quora

Summary

Your data science journey has only begun!

Data Scientist | Python Developer|Technical Writer