A Simple Way to Compare Pandas DataFrames in Unit Tests
A single function for all your needs.
Testing is a fundamental part of any software. Without a thorough testing process, no software product is deployed into production.
Data analytics products also require testing. Let’s say you’re building a forecasting engine. No one can or should rely on the output of your product if you are testing it continuously.
The scope of the tests vary. There is no limit on what you should or should not be testing. It all depends on the data and the product.
A greater fraction of the data-based products work with tabular data. Therefore, tabular data structures such as Pandas DataFrames or SQL tables are quite commonly used in the data science ecosystem.
DataFrame is a two-dimensional data structure with labeled rows and columns. The testing procedure may require you to check if two DataFrames are equal. You can write a unit test for this task.
For two DataFrames, being equal might mean different things. For instance, the DataFrames shown in the image below are equal in terms of their shapes. Both have 4 rows and 3 columns.