Introduction to Pandas
What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name “Pandas” has a reference to both “Panel Data”, and “Python Data Analysis” and was created by Wes McKinney in 2008.
In general when we deal with data pandas libray is used very commonly due to some important functions in data science
- data analysis
- data cleaning
- data exploration
- data manipulation
Pandas – Panel Data and python data analysis it’s a multidimensional data involving measurements over time
Pandas alone cannot perform. It is built on numPy, as it can also handle ndimensional array. So both libraries required
Features – series obj & data frame,aligns data, slicing, indexing, subseting, handles missing data, groups by functionality
Features – merging & joining, labeling of axes hierarchially, time-series functionality, reshaping & robust input/output tool
Pandas – great for > 500k rows, works great for tabular data, arbitrary matrix & time series matrix
Numpy – < 500k rows, however memory efficinet
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?
Pandas gives you answers about the data. Like:
- Is there a correlation between two or more columns?
- What is average value?
- Max value?
- Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.