Python Pandas Tutorial

  • date 3rd August, 2019 |
  • by Prwatech |
  • 0 Comments

Python Pandas Tutorial  for Beginners

  Welcome to the world of Python Pandas Tutorial for Beginners. Are you the one who is looking forward to knowing the introduction to python pandas? Or the one who is very keen to explore the Tutorials on Python Pandas with examples that are available? Then you’ve landed on the Right path which provides the standard information of Python Pandas Tutorial with examples by Prwatech. The Below mentioned tutorial will help to Understand the detailed information about the introduction to python pandas, features of Python Pandas, Installation of Python Pandas, Data Structures, and Data frames in Python Pandas so Just Follow All the Tutorials of India’s Leading Best Python Training institute and Be a Pro python developer. So, let's start the Python Pandas tutorial for Beginners.

What are Python Pandas?

Pandas is an open-source Python Library That provides high-performance data manipulation and analyzing tools using its powerful data structures. The name Pandas comes from the word Panel Data – an Econometrics from Multidimensional data. Pandas library is built on over Numpy, which means Pandas needs Numpy to operate. provide an easy way to create, manipulate, and wrangle the data.

Pandas help us to perform the following operations:

1. Loading the Data 2. Preparing the Data 3. Manipulating the Data 4. Modeling the Data 5. Analyzing the Data Python with Pandas is used in different fields including academic and commercial domains that include finance, economics, statistics, analytics, etc.

Features of Python Pandas

Tools for loading data into the in-memory data objects from different file formats. Data alignment and integrated handling of missing data values. Reshaping and pivoting the data set. Label-based on slicing, indexing, and sub-setting of large data sets. Columns from data-structures can be inserted and deleted. Performing operations like groupBy over the dataset.

Installation of Python Pandas

For Mac OS:

Step1)Open the terminal Step2)pip install pandas

For Windows user:

Step1) Go to the File menu Step2) Go to settings Step3) Go to Project Step4) Go to project Interpreter Step5) Click on the ‘+’ icon Step6) Type pandas. Step7) Select it and install it. Step8) import pandas as n Step9) Use it

Data Structures in Python Pandas

Series Data-Frames Panel Python Pandas Tutorial for Beginners

Mutability:

All Pandas data structures are valued mutable and except Series, all are also size mutable. is a size immutable.

Series:

A Series is 1 Dimensional labeled array having the size Immutable and Value of Data Mutable. Syntax) pandas.Series(data,index,dtype,copy) data: it takes various forms like nD-array, list, constants index: Index value must be unique >dtype: It is for datatype copy: It is used to copy the data. By default, its value is false

Array:

If data is from an array, then the index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length starting from zero, i.e. [0,1,2,3…. range(len(array))-1]. Ex) import pandas as pad import numpy as num a=num.array([1,4,5,6,7]) s=pad.Series(a) print s

Dictionary

A dictionary can be passed as an input and if no index is specified, then the dictionary keys are taken in a sorted order to construct its index. If the index is passed, the values in data corresponding to the labels present in the index will be pulled out. Ex)import pandas as pad import numpy as num a= {‘a’: ‘add’, ‘s’: ‘sub’, ‘d’: ‘dvd’} s=pad.Series(a) print s

Constants

If data is a constant, then an index must be provided. The value will be repeated to match the length of the index. Ex)import pandas as pad import numpy as num s=pad.Series(4,index=[0,1,2]) print s

Accessing Data from Series with Position

Ex)    import pandas as pad import numpy as num a=num.array([1,4,5,6,7]) s=pad.Series(a) print s[2]

Data frames in Python Pandas

It is a 2Dimentional array which is Size Mutable and Heterogeneously typed columns. Syntax: pandas.DataFrame(data, index, column, dtype, copy) Data: it takes values in various forms like an array, series, map, list, dictionary, constants, and also another DataFrame. index: For the row labels, the Index is used for the resulting frame, it is Optional Default np.arrange(n) if no index is passed. Column: In column labels, the optional default syntax is - np.arrange(n). It is only true if no index is passed. Dtype: It denotes the datatype of each column. Copy: It is used for copying of data, by default it is false. DataFrames can be created using various inputs.

List:

Ex)import pandas as pad data = [9,2,3,4,5] df1 = pad.DataFrame(data) print df

Dictionary:

Ex)import pandas as pad import numpy as num a= {‘a’: ‘add’, ‘s’: ‘sub’, ‘d’: ‘dvd’} df=pad.DataFrame(a) print df

Series:

Ex)import pandas as pad import numpy as num a= {[‘a’, ‘add’], [‘s’, ‘sub’], [‘d’, ‘dvd’]} df=pad.Series(‘sr’, ‘opp’) print df

Numpy n-dimensional array:

Ex) import pandas as pad import numpy as num a= [1,2,3,4,5] s=pad.DataFrame(a) print df

Another DataFrame:

Ex)    import pandas as pad import numpy as num a= {‘a’: ‘add’, ‘s’: ‘sub’, ‘d’: ‘dvd’} s=pad.DataFrame(a) print s

Column additions:

Ex) import pandas as pad d = {'one' : pad.Series([2, 3, 4], index=['a', 'b', 'c']), 'two' : pad.Series([2, 3, 4, 5], index=['a', 'b', 'c', 'd'])} df = pad.DataFrame(d) print ("Adding a new column by passing as Series:") df['three']=pad.Series([100,200,300],index=['a','b','c']) print df print ("Adding a new column using the existing columns in DataFrame:") df['four']=df['one']+df['three'] print df

Column Deletion:

It can be done using either del() or pop(). Ex) import pandas as pd d = {'one' : pd.Series([2, 3, 4], index=['a', 'b', 'c']), 'two' : pd.Series([2, 3, 4, 5], index=['a', 'b', 'c', 'd']), 'three' : pd.Series([100,200,300], index=['a','b','c'])} df = pd.DataFrame(d) print ("Our dataframe is:") print df # using del function print ("Deleting the first column using DEL function:") del df['one'] print df # using pop function print ("Deleting another column using POP function:") df.pop('two') print df

Panel:

A panel is a 3D container of data elements. The term Panel data is been derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s. The names for the 3 axes are deliberated to give some semantic meaning to describe operations involving a panel data. Items: axis zero, each item corresponds to a DataFrame contained inside. major_axis: axis one, it is the index (rows) of each of the DataFrames. minor_axis: axis two, it is the columns of each of the DataFrames. Syntax) pandas.Panel(data, items, major_axis, minor_axis, dtype, copy) Data: Data can be taken from various forms like an array, series, map, lists, dictionary, constants, and also another DataFrame. Items: axis zero, each item corresponds to a DataFrame contained inside. major_axis: axis one, it is the index (rows) of each of the DataFrames. minor_axis: axis two, it is the columns of each of the DataFrames. dtype: It describes the datatype of each column. copy: copy the data. By default, its value is false.

How to Create Panel in Python Pandas?

A Panel can be created using multiple ways like: From ndimentional array Ex)    import pandas as pad import numpy as num data = num.random.rand(6,8,1) p = pd.Panel(data) print p o   From dictionary of DataFrame Ex)    import pandas as pd import numpy as np data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 'Item2' : pd.DataFrame(np.random.randn(4, 2))} p = pd.Panel(data) print p Selecting the Data from Panel o   Using Items: Ex)     import pandas as pad import numpy as np data = {'Item1' : pad.DataFrame(np.random.randn(5, 3)), 'Item2' : pd.DataFrame(np.random.randn(5, 2))} p = pd.Panel(data) print p['Item1'] Using major_axis: Ex)      import pandas as pd import numpy as np data = {'Item1' : pd.DataFrame(np.random.randn(9, 3)), 'Item2' : pd.DataFrame(np.random.randn(9, 2))} p = pd.Panel(data) print p.major_xs(1) Using minor_axis: Ex)     import pandas as pd import numpy as np data = {'Item1' : pd.DataFrame(np.random.randn(8, 3)), 'Item2' : pd.DataFrame(np.random.randn(8, 2))} p = pd.Panel(data) print p.minor_xs(1)

Series: Basic Functions

Name Description Example
Axes Used to return the list of the labels of the series. s = pd.Series(np.random.randn(9)) print s.axes
Empty It returns the Boolean value about whether the Object is empty or not. True will indicate that the object is empty. s = pd.Series(np.random.randn(9)) print s.empty
Ndim It returns the number of dimensions of the object. s = pd.Series(np.random.randn(9)) print s.ndim
Size It returns length of series s = pd.Series(np.random.randn(9)) print s.size
Values It returns the actual data present in series s = pd.Series(np.random.randn(9)) print s.values
head() It returns first n records from the series s = pd.Series(np.random.randn(9)) print s.head(3)
tail() It returns last n records from series s = pd.Series(np.random.randn(9)) print s.tail(3)

Basic DataFrame Functions

Name Description Example
T Transposes rows and column df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.T
Axes It returns list rows and column label axis df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.axis
Dtypes Returns the data type of each column. df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.dtypes
Empty Returns whether the DataFrame is empty using Boolean value df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.empty
Ndim Returns number of dimensions i.e. 2D df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.ndim
Shape Returns a tuple representing dimensionality of the DataFrame. df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.shape
Size Returns the number of elements present df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.size
Values Returns actual data df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.values
Head Returns the top n records df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.head(2)
tail Return the bottom n records df = {'Name':['Tom', 'dick', 'harry' ], 'Age':[20, 21, 19]} print df.tail(2)

Pandas - Descriptive Statistics

Ex)    import pandas as pd import numpy as np #Create a Dictionary of series d1 = {'Name':pd.Series(['Tomy','Jimy','Ricky','Viny','Steven','Smithen','Jacky', 'Lee','Dravid','Gaspery','Betin','Andru']), 'Age':pd.Series([22,23,26,21,30,29,23,34,40,30,51,46]),    'Rating':pd.Series([5.23,3.44,3.95,2.66,4.20,4.6,6.8,1.78,3.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d)
Name Description Example
count() Counts number of not null observations print df.count()
sum() Sums the values print df.sum()
mean() Finds mean of the values print df.mean()
median() Find medians of the values print df.median()
mode() Finds modes of the values print df.mode()
std() Finds standard deviation of the values print df.std()
min() Finds minimum value from given data print df.min()
max() Finds maximum value from given data print df.max()
abs() Finds absolute value print df.abs()
prod() Gives product of the values print df.prod()
cumsum() Gives the cumulative summation print df.cumsum()
cumprod() Gives the cumulative product print df.cumprod()

Iterations in Pandas

The behavior of basic iteration over Pandas objects depends on its type. i.e. when iterating over a Series, it is regarded as array-like, and basic iteration displays the values. In other data structures, like DataFrame and Panel, follow the dictionary like a convention of iterating over the keys of objects.

Iteration functions over DataFrames

import pandas as pd import numpy as np N=20 df = pd.DataFrame({ 'D': pd.date_range(start='2019-08-01',periods=N, frequency='D'), 'z': np.linspace(0,stop=N-1,num=N), 'c': np.random.rand(N), 'W': np.random.choice(['Low', 'Medium', 'High'],N).tolist(), 'R': np.random.normal(900, 90, size=(N)).tolist() })
Name Description Example
iteritems() Used to iterate over the (key,value) pairs for key,value in df.iteritems(): print key,value
iterrow() It returns iterator yielding each index value along with a series containing the data in each row. for row_index,row in df.iterrows(): print row_index,row
itertuples() It returns an iterator yielding a named tuple for each row in a DataFrame. for row in df.itertuples(): print row

Pandas methods to work with textual data

Ex) import pandas as pd import numpy as np s = pd.Series(['Tom', 'Dick', 'Harry', 'Allen', np.nan, '6234','SteveJobs'])
Name Description Example
lower() Converts all characters into lower case print s.str.lower()
upper() Converts all characters into upper case print s.str.upper()
len() Displays total number of characters present in a string print s.str.len()
strip() Helps to strip whitespace(including newline) from each string in the Series from both the sides. print s.str.strip()
split(‘ ’) Splits each string according to given delimiter print s.str.split(‘ ’)
cat(sep=‘ ’) Concatenates the series elements with given separator. print s.str.cat(sep=‘’)
get_dummies() It returns the Data-Frame with One-Hot Encoded values. print s.str.get_dummies()
contains(pattern) Returns true if given pattern is present print s.str.contains()
replace(a,b) Replaces the value of a with b print s.str.replace(‘@’,’$’)
repeat(value) Repeats each element for the specific number of times print s.str.repeat(2)
count(pattern) Returns the count of particular element present print s.str.count(‘s’)
startswith(pattern) Returns true if string starts with the given pattern print s.str.startswith(‘I’)
endswith(pattern) Returns true if string ends with the given pattern print s.str.endswith(‘m’)
find(pattern) Returns first position of first occurrence print s.str.find(‘r’)
findall(pattern) Returns all occurrence of a substring print s.str.findall(‘ra’)
swapcase Swaps from lower to upper case or viz versa print s.str.swapcase()
islower() Returns true if all characters are in lower case print s.str.islower()
isupper() Returns true if all characters are in upper case print s.str.upper()
isnumeric() Returns true if all characters are numeric print s.str.numeric()
         

Pandas - Window Statistics Functions

For working over numerical data, Pandas provide some variants like rolling, expanding, and exponentially moving weights for window statistics. Among these are some like sum, mean, median, variance, covariance, correlation, etc.

rolling() Function:

This function can be applied to a series of data. Specify window=n argument and apply an appropriate statistical function on top of it. Ex)       import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('2/2/2022', periods=5), columns = ['W', 'X', 'Y', 'Z']) print df.rolling(window=4).mean() Output)               A                     B                       C                D 2022-02-01        NaN               NaN                NaN         NaN 2022-02-02        NaN                NaN                 NaN         NaN 2022-02-03          NaN              NaN                  NaN         NaN 2022-02-04   0.628267   -0.047040   -0.287467   -0.161110 2022-02-05   0.398233    0.003517    0.099126   -0.405565 Since the window size is 4, for first three elements there are nulls and from fourth the value will be the average of the n, n-1 and n-2 elements.

expanding() Function:

This function can be applied to a series of data. Specify the min_periods=n arguments and apply the appropriate statistical functions on top of it. Ex)       import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('2/2/2022', periods=5), columns = ['W', 'X', 'Y', 'Z']) print df.expanding(min_periods=3).mean() Output)                     A                 B             C           D 2022-02-01        NaN         NaN         NaN         NaN 2022-02-02        NaN         NaN         NaN         NaN 2022-02-03         NaN         NaN         NaN         NaN 2022-02-04   0.628267   -0.047040   -0.287467   -0.161110 2022-02-05   0.398233    0.003517    0.099126   -0.40556

ewm() Function:

ewm is applied over a series of data. Specify any of com, span, halflife argument, and apply the appropriate statistical function on top of it. It assigns the weights exponentially. Ex)    import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('2/2/2022', periods=5), columns = ['W', 'X', 'Y', 'Z']) print df.ewm(com=0.5).mean() OutPu t                       A               B               C                  D 2022-02-01   1.088512   -0.650942   -2.547450   -0.566858 2022-02-02   0.865131   -0.453626   -1.137961    0.058747 2022-02-03  -0.132245   -0.807671   -0.308308   -1.491002 2022-02-04   1.084036    0.555444   -0.272119    0.480111 2022-02-05   0.425682    0.025511    0.239162   -0.153290 Window functions are majorly used while determining the trends within the data graphically by smoothing the curve. If there is a lot of variation in  everyday data and  lots of data points are available, then taking the samples and plotting is one approach and applying the window computations and plotting the graph on the results is another approach. By these methods, we can smooth the curve or the trend.

Using SQL in Pandas

import pandas as pd url = 'https://raw.github.com/pandasdev/ pandas/master/pandas/tests/data/tips.csv' tips=pd.read_csv(url) print tips.head()
Condition Description Example
Select With Pandas, column selection is done by passing a list of column names to your Data-Frame print tips[['total_bill', 'tip', 'smoker', 'time']].head(5)
Where Data-Frames can be filtered in multiple ways just like where condition in sql. print tips[tips['time'] == 'Dinner'].head(5)
GroupBy This operation fetches the count of records in each group throughout a dataset. print tips.groupby('sex').size()
Top N rows Returns top n records print tips.head(5)

Performing SQL join in Pandas

Pandas provide a single function ‘merge()’, as the entry point for all standard database join operations between Data-Frame objects. Ex)    import pandas as pd left = pd.DataFrame({'id':[1,2,3,4,5], 'Name': ['Ali', 'Any', 'Amen', 'Arik', 'Amy'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame({'id':[1,2,3,4,5],'Name': ['Bil', 'Briany', 'Bany', 'Brycy', 'Betten'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']})
Name Description Example
left join Displays common elements and elements of the 1st data frame print pd.merge(left, right, on='subject_id', how='left')
right join Displays common elements and elements of 2nd dataframe print pd.merge(left, right, on='subject_id', how='right')
outer join Displays entire elements of 1st  and 2nd dataframes print pd.merge(left, right, how='outer', on='subject_id')
inner join Displays only common elements of 1st  and 2nd dataframes print pd.merge(left, right, on='subject_id', how='inner')
Get success in your career as a Python developer by being a part of the Prwatech, India's leading Python training institute in Bangalore.

Quick Support

image image