Table of contents
No headings in the article.
I use the Jupyter notebook I installed from Anaconda. Reading from the book Pandas for everyone by Daniel Y. Chen. I learned that operations are vectorized, and methods that work on series and data frames are vectorized.
Using the datasets scientists.csv.
#Importing necessary library
import pandas as pd
# Importing Data set
scientists = pd.read_csv('scientist.csv')
scientists.head()
>>># Data Output
Name Born Died Age Occupation
0 Rosaline Franklin 1920-07-25 1958-04-16 37 Chemist
1 William Gosset 1876-06-13 1937-10-16 61 Statistician
2 Florence Nightingale 1820-05-12 1910-08-13 90 Nurse
3 Marie Curie 1867-11-07 1934-07-04 66 Chemist
4 Rachel Carson 1907-05-27 1964-04-14 56 Biologist
ages = scientists['Age']
print(ages)
>>> # Data Output
0 37
1 61
2 90
3 66
4 56
5 45
6 41
7 77
Name: Age, dtype: int64
print(ages + ages)
>>> # Data Output
0 74
1 122
2 180
3 132
4 112
5 90
6 82
7 154
Name: Age, dtype: int64
Vectors with integers (scalars)
When you operate on a vector using a scalar, the scalar will be recycled across the elements in the vector.
print(ages + 100)
>>> # Data Output
0 137
1 161
2 190
3 166
4 156
5 145
6 141
7 177
Name: Age, dtype: int64
print(ages * 2)
>>> # Data Output
0 74
1 122
2 180
3 132
4 112
5 90
6 82
7 154
Name: Age, dtype: int64
Vectors with different lengths
When you are working with vectors of different lengths, the behavior will depend on the type of vectors.
With a series, the vectors will perform an operation matched by the index. The rest of the index will be filled with a 'missing' value, which is denoted with NaN, for 'not a number'.
This type of behavior is called 'Broadcasting' and it differs between languages. Broadcasting in Pandas refers to how operations are calculated between arrays with different shapes.
print(ages + pd.Series([1, 100]))
>>> # Data Output
0 38.0
1 161.0
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
dtype: float64
Vector with common index labels
Data alignment is very common in Pandas and it is almost always automatic. If possible, things will always align themselves with index labels when actions are performed.
print(ages)
>>> # Data Output
0 37
1 61
2 90
3 66
4 56
5 45
6 41
7 77
Name: Age, dtype: int64
rev_ages = ages.sort_index(ascending=False)
print(rev_ages)
>>> # Data Output
7 77
6 41
5 45
4 56
3 66
2 90
1 61
0 37
Name: Age, dtype: int64
print(ages * 2)
>>> # Data Output
0 74
1 122
2 180
3 132
4 112
5 90
6 82
7 154
Name: Age, dtype: int64
You can download the book and Dataset and practice.
Happy Learning!!!