Hands-on tutorial for Pandas and data.table libraries

Photo by NordWood Themes on Unsplash

Python and R are the two key players in the data science ecosystem. Both of these programming languages offer a rich selection of highly useful libraries.

When it comes to data analysis and manipulation, two libraries stand out: “data.table” for R and Pandas for Python.

I have been using both but I cannot really declare one superior to the other. Although I personally like “data.table” better, I haven’t come across any task that cannot be done with both.

In this article, I will walk you through 10 typical data analysis operations using Pandas…

…or what good data scientists have in common.

Photo by Leon on Unsplash

What I plan to write in this article is built around my experience of working with very good data scientists. I do not claim that I’m one of them as of today. However, I keep working and studying to become one.

I’m not in a position to declare or evaluate a data scientist as good or not good. The following words will demonstrate my observations of the common practices and skills of well-performing data scientists.

In this sense, the title of the article might be “What Good Data Scientists Have in Common”.

Learning from others is a highly valuable skill…

Efficient, organized, and elegant.

Photo by Sigmund on Unsplash

Real-life data is usually messy. It requires a lot of preprocessing to be ready for use. Pandas being one of the most-widely used data analysis and manipulation libraries offers several functions to preprocess the raw data.

In this article, we will focus on one particular function that organizes multiple preprocessing operations into a single one: the pipe function.

When it comes to software tools and packages, I learn best by working through examples. I keep this in mind when creating content. I will do the same in this article.

Let’s start with creating a data frame with mock data.

import…

Which took me a long time to realize.

Photo by NeONBRAND on Unsplash

I would like to start with stating a ground truth just in case you have not realized by now: Data science is an extremely broad field.

Data science can be applied to any business or industry where we can collect data. Besides, the advancements in data-related technology has made it easier than ever to collect, process, store, and transfer data. Thus, it is safe to say that data science applications will cover a broader scope in the future.

Although data science is ubiquitous, its applications differ greatly in different domains. It would be an uphill battle to learn about all…

Practical guide with examples

Photo by Gaëtan Werp on Unsplash

Data structures are an essential part of any programming language. How you store and manage data is one of the key factors for creating efficient programs.

Python has 4 built-in data structures:

  • List
  • Set
  • Tuple
  • Dictionary

They all have different features in terms of storing and accessing data. These differences matter because what fits best for a particular task depends on them. How you can interact with or manipulate these data structures are also different.

List

List is a collection of objects, represented in square brackets.

mylist = [1, 2, "a", True]
  • Lists can be used for storing objects with any…

This is how you contribute to a project

Photo by Chang Duong on Unsplash

After a long period of hard work and dedication, you have landed your first job as a data scientist. The orientation and getting-familiar-with-the-environment period is over. You are now expected to work on real life projects.

You are assigned a task to write a function that performs a particular task in a project. Your function will be a part of an existing project that is currently running.

You cannot just write the function in your local working environment and share it with an email. It should be implemented in the project. …

A practical Pandas tutorial.

Photo by Andre Taissin on Unsplash

Pandas provides plenty of functions for efficient data analysis and manipulation. In this article, we will focus on Pandas functions about a particular data manipulation operation.

The core data structure of Pandas is data frame which consists of labelled rows and columns. The index of a row or column can be considered as its address.

We can use the indices to access rows in a data frame. Although the columns are mostly accessed via their names, it is possible to use column indices as well. Both the column and row indices start from 0.

We will go over 4 Pandas…

Everything in data science starts from data

Photo by Mockup Graphics on Unsplash

There are several software tools and packages in the data science ecosystem. These tools accelerate the routine processes as well as helping us manage, explore, and analyze data.

Whatever tool you use or whatever project you work on, everything in data science starts from data. Without proper data, your data products are likely to fail.

When studying data science and practicing software tools, it sometimes becomes a challenge to find data to play with. Although there are several free data resources online, they may not always fit your needs.

In this article, we will generate mock sales data using the…

Discover its full potential

Photo by Daphné Be Frenchie on Unsplash

Pandas is a widely-used data analysis and manipulation library. It provides numerous functions and methods to perform typical operations simply and efficiently.

A typical task in data analysis is filtering data points (or observations). In case of working with tabular data, a data point is represented by a row. We sometimes need to filter a row based on some feature (or column) values.

There are several Pandas methods for filtering data points. In this article, we will focus on one of these methods. It is the query function.

Let’s first import libraries and create a sample data frame.

import numpy…

I mastered Python and SQL but am I ready?

Photo by Leon on Unsplash

Data science has gained a tremendous popularity in recent years. The ever-increasing ability to collect, transfer, store, and process data is a significant factor in the prevalence of data science.

More and more businesses are able to create value out of data. They apply data science techniques or data-oriented strategies to improve their processes. Data-based business decisions are also proven to be highly efficient and accurate.

As a result of what we have mentioned in the first two paragraphs, a great number of people make a career change to become a data scientist. …

Soner Yıldırım

Writing about Data Science, AI, ML, DL, Python, SQL, Stats, Math | linkedin.com/in/soneryildirim/ | twitter.com/snr14

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store