Learn Pandas ( Data Analysis Library) -
Summery
- What is Pandas?
- Setup and Installation
- Create virtualenv
- Install Pandas
- Install Jupyter Notebook
- Sample Examples:
What is Pandas?
In
computer programming, pandas is a software library written for the Python
programming language for data manipulation and analysis. In particular, it
offers data structures and operations for manipulating numerical tables and
time series. It is free software released under the three-clause BSD license.
Best Python Training in Gurgaon...
Setup and Installation
Before
we understanding some features of Pandas, let’s get Pandas installed in your
system. I would like to recommend you to create a virtual environment and
install Pandas inside the virtualenv.
Create virtualenv
#
virtualenv -p python3 env1
#
source env1/bin/activate
Install Pandas
#
pip install pandas
Jupyter Notebook
If you
are learning Pandas, I would recommend you to dive in and use a jupyter
notebook. The visualization of data in jupyter notebooks makes it easier to
understand what is going on at each step.
#
pip install jupyter
# jupyter notebook
Sample Examples:
I created a simple country data.

Load data into Pandas
With
Pandas, we can load data from different sources. The loaded data is stored in a
Pandas data structure called DataFrame. DataFrame’s are usually referred by the
variable name df. So, anytime you see df from here on you should be associating
it with Dataframe.
From CSV File
#
import pandas
# df
= pandas.read_csv("path_of_csv")

From an Excel sheet
#
import pandas
# df
= pandas.read_excel(“path_of_excel_sheet”)

Understanding Data
1. shows you a gist of the data
#
df.head()
2. Some statistical information about
your data
#
df.describe()
3. List of columns headers
# df.columns.values

Pick & Choose your Data
● Indexes
Indexes are labels used to refer to your data. These
labels are usually your column headers. For eg., Year, MSHA_ID, Production,
Labor_Hours, Etc.,
● Selecting Columns
Create a list of columns to be
selected
columns_to_be_selected = ["Total",
"Quantity", "Country"]
Use it as an index to the
DataFrame
# df [columns_to_be_selected]


● Selecting Rows
Unlike
the columns, our current DataFrame does not have a label which we can use to
refer the row data. But like arrays, DataFrame provides numerical indexing(0,
1, 2…) by default.

● Filtering Rows
Now, in
a real-time scenario, you would most probably not want to select rows based on
an index. An actual real-life requirement would be to filter out the rows that
satisfy a certain condition. With respect to our dataset, we can filter by any
of the following conditions

Get More Information About Python Training and Certification Course. Click Here.
Thank You For Reading
Written by Amal Satheesh
Comments
Post a Comment