Learn Pandas ( Data Analysis Library) -

Summery

  1. What is Pandas?
  2. Setup and Installation
  3. Create virtualenv
  4. Install Pandas
  5. Install Jupyter Notebook
  6. Sample Examples:

What is Pandas?

In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license.

Best Python Training in Gurgaon...

Setup and Installation

Before we understanding some features of Pandas, let’s get Pandas installed in your system. I would like to recommend you to create a virtual environment and install Pandas inside the virtualenv.

Create virtualenv

# virtualenv -p python3 env1

# source env1/bin/activate

Install Pandas

# pip install pandas

Jupyter Notebook

If you are learning Pandas, I would recommend you to dive in and use a jupyter notebook. The visualization of data in jupyter notebooks makes it easier to understand what is going on at each step.

# pip install jupyter

# jupyter notebook


Sample Examples:

I created a simple country data.


Load data into Pandas

With Pandas, we can load data from different sources. The loaded data is stored in a Pandas data structure called DataFrame. DataFrame’s are usually referred by the variable name df. So, anytime you see df from here on you should be associating it with Dataframe.

From CSV File

# import pandas

# df = pandas.read_csv("path_of_csv")


From an Excel sheet

# import pandas

# df = pandas.read_excel(“path_of_excel_sheet”)


Understanding Data

1. shows you a gist of the data

# df.head()

2. Some statistical information about your data

# df.describe()

3. List of columns headers

#  df.columns.values


Pick & Choose your Data

     Indexes

Indexes are labels used to refer to your data. These labels are usually your column headers. For eg., Year, MSHA_ID, Production, Labor_Hours, Etc.,

     Selecting Columns

Create a list of columns to be selected

columns_to_be_selected = ["Total", "Quantity", "Country"]

Use it as an index to the DataFrame

# df [columns_to_be_selected]

Uploading: 80716 of 80716 bytes uploaded.

     Selecting Rows

Unlike the columns, our current DataFrame does not have a label which we can use to refer the row data. But like arrays, DataFrame provides numerical indexing(0, 1, 2…) by default.

     Filtering Rows

Now, in a real-time scenario, you would most probably not want to select rows based on an index. An actual real-life requirement would be to filter out the rows that satisfy a certain condition. With respect to our dataset, we can filter by any of the following conditions



Get More Information About Python Training and Certification Course. Click Here.


Thank You For Reading

Written by Amal Satheesh

Comments