Time is a substantial feature for many data science related tasks. For instance, daily sales and inventory information are of crucial importance for retail analytics. Algorithmic trading requires transactional data on minute-level.
The way we represent and use time-related information changes depending on the task. For a scientific experiment, we may talk about measurements recorded on microsecond-level. However, we do not need such a precision for demographic information such as population, average household income, and so on.
The datetime module of Python helps us handle time-related information on any precision level. …
The generators in Python are one of those tools that we frequently use but do not talk about much. For instance, most for loops are accompanied with the range function which is a generator.
Generators allow for generating a sequence of values over time. The main advantage of using a generator is that we do not have to create the entire sequence at once and allocate memory. Instead, the generator returns one value at a time and waits until the next value is called.
In this article, we will go over 6 examples to demonstrate how generators are used in…
Everything in Python is an object and we define objects through classes. When we define an object, we actually create an instance of a class. Thus, class is the most fundamental piece in Python.
Classes have:
We can create our own classes using data and procedural attributes. It is our playground so we can implement various functions to customize a class.
In addition to the user-defined functions, it is possible to use built-in…
Spark is an analytics engine used for large-scale data processing. It lets you spread both data and computations over clusters to achieve a substantial performance increase.
As the cost of collecting, storing, and transferring data decreases, we are likely to have huge amounts of data when working on a real life problem. Thus, distributed engines like Spark are becoming the predominant tools in the data science ecosystem.
PySpark is a Python API for Spark. It combines the simplicity of Python with the efficiency of Spark which results in a cooperation that is highly appreciated by both data scientists and engineers.
…
Data science has been attracting a great number of people from many different industries. As the ability to collect and store data increases and gets cheaper, more and more businesses invest in data science to perform better in their field of expertise.
Since data science is still a maturing field and has not been well-established in the traditional education system yet, a vast majority of data scientists come from a variety of professions.
Data scientists who come from a different background decide to work in this field in grad school or at a point in their professional career. …
Pandas is one of the predominant tools for manipulating and analyzing structured data. It provides numerous functions and methods to play around with tabular data.
However, Pandas may not be your best friend as the data size gets larger. When working with large-scale data, it becomes necessary to distribute both data and computations which cannot be achieved with Pandas.
A highly popular option for such tasks is Spark, which is an analytics engine used for large-scale data processing. It lets you spread both data and computations over clusters to achieve a substantial performance increase.
It has become extremely easy to…
Spark is an analytics engine used for large-scale data processing. It lets you spread both data and computations over clusters to achieve a substantial performance increase.
Since it is getting easier and less expensive to collect and store data, we are likely to have huge amounts of data when working on a real life problem. Thus, distributed engines like Spark are becoming the predominant tools in the data science ecosystem.
PySpark is a Python API for Spark. …
Text files are commonly used to store data. Thus, they are essential in the data science system. Python provides versatile functions and methods to handle text file.
There are many options to create a text file. We will not cover all of them since the focus here is to manipulate text files, not creating them. Let’s go with a method that you can use in a jupyter notebook.
%%writefile myfile.txt
Python is awesome
This is the second line
This is the last line
We now have a text file named “myfile.txt” in the current working directory. …
Python is the go-to language in the data science ecosystem. One of the reasons why Python is so popular among data scientists is the rich selection of libraries it offers.
In this article, we will not focus on any library though. Instead, we will go over 4 small but very useful tips for base Python. These tips are also important for the external libraries because they adapt features from base Python.
List is a built-in data structure of Python. It is an ordered collection of items. …
How about that print statement?
String interpolation is kind of embedding variables into strings. Unlike typing an entire plain string, placeholders are implemented in the string that can hold variable values.
String interpolation allows us to use the print statements more efficiently. Whether it is for debugging code or confirming the results, print statements are likely to be all over your script.
In this article, we will go over three ways used for string interpolation in Python. They can do the same thing in a different way. After going through the examples, you will probably choose your favorite way.
This…
Writing about Data Science, AI, ML, DL, Python, SQL, Stats, Math | linkedin.com/in/soneryildirim/ | twitter.com/snr14