Data science is one of those things that experienced a tremendous popularity increase in a short time. This is an identifying characteristic of hype. Some might even think that data science is a hype that awaits its dreadful end.
I want to state my opinion first so that you know what to expect in the rest of the article. Data science is absolutely not hype. It has been around for a long time but in a different costume.
Line plot is an essential part of data analysis. It gives us an overview of how a quantity changes over sequential measurements. In case of working with time series, the importance of line plots becomes crucial.
Trend, seasonality, and correlation are some features that can be observed on carefully generated line plots. In this article, we will create interactive line plots using two Python libraries: Pandas and Altair.
Pandas provides the data and Altair makes beautiful and informative line plots. Although Pandas is also able to plot data, it is not an explicit data visualization library. …
Time series data consists of data points attached to sequential time stamps. Daily sales, hourly temperature values, and second-level measurements in a chemical process are some examples of time series data.
Time series data has different characteristics than ordinary tabular data. Thus, time series analysis has its own dynamics and can be considered as a separate field. There are books over 500 pages to cover time series analysis concepts and techniques in depth.
Pandas was created by Wes Mckinney to provide an efficient and flexible tool to work with financial data which is kind of a time series. …
List comprehension is used for creating lists based on iterables. It can also be described as representing for and if loops with a simpler and more appealing syntax. List comprehensions are relatively faster than for loops.
The syntax of a list comprehension is actually easy to understand. However, when it comes to complex and nested operations, it might get a little tricky to figure out how to structure a list comprehension.
In such cases, writing the loop version first makes it easier to write the code for the list comprehension. …
When I first started to learn about data science, support vector machine was my favorite algorithm. It was, of course, not the best one out there but it had a cool name. Besides, how it dynamically changes its classification strategy with the c and gamma parameters just amazed me.
Machine learning sounds appealing and charming. I think it plays key role in driving a lot of people to the field of data science. All those algorithms with fancy names make the newcomers thrilled.
I see the algorithms as the shining surface of the machine learning box. They are carefully designed…
Los generadores en Python son una de esas herramientas que usamos con frecuencia pero de las que no hablamos mucho. Por ejemplo, la mayoría de los bucles for están acompañados de la función de rango, que es un generador.
Los generadores permiten generar una secuencia de valores en el tiempo. La principal ventaja de usar un generador es que no tenemos que crear la secuencia completa a la vez y asignar memoria. En cambio, el generador devuelve un valor a la vez y espera hasta que se llame al siguiente valor.
En este artículo, repasaremos 6 ejemplos para demostrar cómo…
Read more in Planeta Chatbot : todo sobre los Chat bots, Voice apps e Inteligencia Artificial · 4 min read
What are the average house prices in different cities of the US? What are the total sales amounts of different product groups in a store? What are the average salaries in different companies?
All these questions can be answered by using a grouping operation given that we have proper data. Most data analysis libraries and frameworks implement a function to perform such operations.
In this article, we will compare two of the most popular data analysis libraries with regards to tasks that involve grouping. The first one is Python Pandas and the other is R data table.
Altair is a statistical data visualization library for Python. It provides a simple and easy-to-understand syntax for creating both static and interactive visualizations.
What I think separates Altair from other common data visualization libraries is that it integrates data analysis components into the visualizations seamlessly. Thus, it serves as a highly practical tool for data exploration.
In this article, we will see, step-by-step, how to create a visualization that includes filtering, grouping, and merging operations. Eventually, we will create an informative plot that can be used as part of an exploratory data analysis process.
We first generate mock data which…
Pandas is a data analysis and manipulation library for Python. It is one of the most popular tools among data scientists and analysts.
Pandas can handle an entire data analytics pipeline. It provides several functions and methods to clean, transform, analyze, and plot the data. In this article, we will do 30 examples that demonstrate the most commonly used functions in each of these steps.
Data science has gained tremendous popularity in recent years. More and more businesses are investing in data science which keeps the demand for data scientists at its current high level.
I decided to make a career change two years ago with an ultimate goal in mind: to become a data scientist. It was a tough decision to make because I already had a professional work experience of 6 years in a different domain.
Despite the inevitable challenges, I did make that decision and it took me almost two years to land my first job as a data scientist. …