Getting Started with Data Science

Grace Omojola
4 min readMay 27, 2020
Photo by Carlos Muza on Unsplash

INTRODUCTION

Data Science is arguably the most wanted career in this century, In today’s high-tech world, everyone has pressing questions that must be answered by “big data”. It is also a very wide field, which makes learning data science overwhelming, With so many questions to be asked.

That is why I wrote this guide. The idea was to create a simple guide which can help start a path to learn data science.

1.) What/Who is a Data Scientist?

The first step is knowing who is a data scientist.

Data science is a complex and often confusing field, and it involves Several Skills that make defining the profession a constant struggle.

A simple definition: Data Science Is the process of asking interesting questions and then answering those questions using data.

Essentially, a data scientist is someone who gathers and analyzes data intending to answer a question(reaching a conclusion).

Generally speaking, a data scientist workflow looks like this:

  • Ask a question
  • Gather data that might help you to answer that question
  • Clean the data
  • Explore, analyze, and visualize the data
  • Build and evaluate a machine learning model
  • Communicate results

2.) Choose a Programming Language

Python and R are both great choices as programming languages for data science.R tends to be more popular in academia, and Python tends to be more popular in the industry, but Python languages have a wealth of packages that support the data science workflow It also as a great community.

I generally prefer Python.

Note: You also don’t need to become a Python expert to move on(to begin a career in data science). Instead, you should focus on mastering the following: data types, data structures, imports, functions, conditional statements, comparisons, loops, and comprehensions. Everything else can wait until later!

3.) Learn data analysis, manipulation, and visualization

If you choose to work with python, you will want to learn Pandas and NumPy Pandas provide a high-performance data structure (called a “DataFrame”). It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets, visualizing data, and so much more. Learning pandas will increase your efficiency when working with data.

There are other tools, APIs, software that can help in data analysis, manipulation, and visualization Such as SQL (Structured Query Language), Tableau(data visualization software), Excel(a powerful data visualization and analysis tool) and tons of others.

4.) Learn machine learning

Building “machine learning models” can be used to predict the future or automatically extract insights from data, it is a very interesting part of data science.

For machine learning in Python, you should learn how to use the scikit-learn library.

5.) Understand machine learning in more depth

Machine learning is a complex field. Although scikit-learn provides the tools you need to do effective machine learning, it doesn’t directly answer many important questions:

  • How do I know which machine learning model will work “best” with my dataset?
  • How do I interpret the results of my model?
  • How do I evaluate whether my model will generalize to future data?
  • How do I select which features should be included in my model?
  • And so on…

If you want to become great at machine learning, you need to be able to answer those questions, which requires both experience and further study. Here are some resources to help you along that path:

  • My top recommendation is to read An Introduction to Statistical Learning (PDF). It will help you to gain both a theoretical and practical understanding of many important methods for regression and classification, without requiring a background in advanced mathematics.
  • Another great book is Pattern Recognition and Machine Learning by Christopher M. Bishop. It dives deep into the mysterious world of Pattern Recognition and Machine Learning. This book deals with tough topics that require at least some knowledge of multivariate calculus, basic linear algebra, and data science, It is the best book to bring home Pattern Recognition into your brain!!!

6.) Focus on practical applications and not just theory

  • Solve Problems with data science Find out about real-life problems that can be solved with data science and get on them. You can find a list of interesting problems here
  • Kaggle, Zindi competitions are a great way to practice data science without coming up with the problem yourself. Don’t worry your rankings, just focus on learning something new with every competition.
  • If you create your data science projects, you should share them on GitHub and include writeups. That will help to show others that you know how to do reproducible data science

7.) Join a community of data scientist around you.

Joining a community cannot be overemphasized. Why is this important? This is because a Community keeps you motivated. Starting a new field may seem a bit overwhelming when you do it alone, but when you have people around you doing the same, it will help keep you motivated.

8.) Keep learning

To keep learning, you have to surround yourself with every source of knowledge you can find. From book to blog posts, to video tutorial to online communities to following Ultimate Data Scientists on social, media

Read about data science every day and make it a habit to be updated with the recent happenings.

Here is a list of Data Scientists that you can follow.

Online forums to check out

  1. Analytics Vidhya
  2. StackExchange
  3. Reddit
  4. Kdnuggets
  5. Towards Data Science on Medium
  6. DataViz
  7. Quora

Summary

Your data science journey has only begun!

There is so much to be learnt in the field of data science that it would take more than a lifetime to master. Just remember: You don’t have to master it all to launch your data science career, you just have to get started!

--

--

Grace Omojola

Data Scientist | Python Developer|Technical Writer