What I plan to write in this article is built around my experience of working with very good data scientists. I do not claim that I’m one of them as of today. However, I keep working and studying to become one.
I’m not in a position to declare or evaluate a data scientist as good or not good. The following words will demonstrate my observations of the common practices and skills of well-performing data scientists.
In this sense, the title of the article might be “What Good Data Scientists Have in Common”.
Learning from others is a highly valuable skill…
Real-life data is usually messy. It requires a lot of preprocessing to be ready for use. Pandas being one of the most-widely used data analysis and manipulation libraries offers several functions to preprocess the raw data.
In this article, we will focus on one particular function that organizes multiple preprocessing operations into a single one: the pipe function.
When it comes to software tools and packages, I learn best by working through examples. I keep this in mind when creating content. I will do the same in this article.
Let’s start with creating a data frame with mock data.
I would like to start with stating a ground truth just in case you have not realized by now: Data science is an extremely broad field.
Data science can be applied to any business or industry where we can collect data. Besides, the advancements in data-related technology has made it easier than ever to collect, process, store, and transfer data. Thus, it is safe to say that data science applications will cover a broader scope in the future.
Although data science is ubiquitous, its applications differ greatly in different domains. It would be an uphill battle to learn about all…
Data structures are an essential part of any programming language. How you store and manage data is one of the key factors for creating efficient programs.
Python has 4 built-in data structures:
They all have different features in terms of storing and accessing data. These differences matter because what fits best for a particular task depends on them. How you can interact with or manipulate these data structures are also different.
List is a collection of objects, represented in square brackets.
mylist = [1, 2, "a", True]
This is how you contribute to a project
After a long period of hard work and dedication, you have landed your first job as a data scientist. The orientation and getting-familiar-with-the-environment period is over. You are now expected to work on real life projects.
You are assigned a task to write a function that performs a particular task in a project. Your function will be a part of an existing project that is currently running.
You cannot just write the function in your local working environment and share it with an email. It should be implemented in the project. …
Pandas provides plenty of functions for efficient data analysis and manipulation. In this article, we will focus on Pandas functions about a particular data manipulation operation.
The core data structure of Pandas is data frame which consists of labelled rows and columns. The index of a row or column can be considered as its address.
We can use the indices to access rows in a data frame. Although the columns are mostly accessed via their names, it is possible to use column indices as well. Both the column and row indices start from 0.
We will go over 4 Pandas…
There are several software tools and packages in the data science ecosystem. These tools accelerate the routine processes as well as helping us manage, explore, and analyze data.
Whatever tool you use or whatever project you work on, everything in data science starts from data. Without proper data, your data products are likely to fail.
When studying data science and practicing software tools, it sometimes becomes a challenge to find data to play with. Although there are several free data resources online, they may not always fit your needs.
In this article, we will generate mock sales data using the…
Pandas is a widely-used data analysis and manipulation library. It provides numerous functions and methods to perform typical operations simply and efficiently.
A typical task in data analysis is filtering data points (or observations). In case of working with tabular data, a data point is represented by a row. We sometimes need to filter a row based on some feature (or column) values.
There are several Pandas methods for filtering data points. In this article, we will focus on one of these methods. It is the query function.
Let’s first import libraries and create a sample data frame.
Data science has gained a tremendous popularity in recent years. The ever-increasing ability to collect, transfer, store, and process data is a significant factor in the prevalence of data science.
More and more businesses are able to create value out of data. They apply data science techniques or data-oriented strategies to improve their processes. Data-based business decisions are also proven to be highly efficient and accurate.
As a result of what we have mentioned in the first two paragraphs, a great number of people make a career change to become a data scientist. …
Python and R are the two dominant programming languages in the data science ecosystem. Both have many libraries that offer efficient and simple methods to perform data analysis tasks.
In this article, we will focus on the data table package of R. The examples will demonstrate the typical data analysis and manipulation tasks on tabular data.
We will do the examples on the Melbourne housing dataset available on Kaggle. I created a data table that contains a subset of columns from this dataset. The fread function can be used to read a csv file and create a data table.