Monday, March 11, 2024

Pandas Series and Dataframe

What is Series?

In pandas, a series is a one-dimensional labeled array capable of holding any data type. It is a fundamental data structure in pandas and can be thought of as a column in a spreadsheet or a single column of a DataFrame. The Series is similar to a NumPy array but comes with additional features like labeled indices, which makes it more powerful and flexible.

import pandas as pd

# Creating a Series with custom indices data = [1, 2, 3, 4, 5] custom_indices = ['a', 'b', 'c', 'd', 'e'] series = pd.Series(data, index=custom_indices)

#1st argument is Data, next is index # Displaying the Series with custom indices print(series)

O/P:

a 1 b 2 c 3 d 4 e 5 dtype: int64


How to access value from Series?

print(series['c'])
O/p:
3

What is DataFrame?

In pandas, a DataFrame is a two-dimensional labeled data structure that is widely used for handling and manipulating tabular data. It can be thought of as a table, similar to a spreadsheet or a SQL table, where data is organized in rows and columns. 
import pandas as pd
import numpy as np
from numpy.random import randn

np.random.seed(121)
# Create Dataframe
df = pd.DataFrame(randn(5, 4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
print("DataFrame:\n")
print(df)
# Access Column: Return type -> Series
print("Access Column named 'W'\n")
print(df['W'])
# Access list of Column : Return type -> Dataframe
print("Access Columns named Y & Z\n")
print(df[['Y', 'Z']])
# Create a new column
df['V'] = df['Y'] + df['Z']
print("DataFrame:\n")
print(df)
# Drop the new column, Inplace -> To drop row, use axis=0
df.drop('V', axis=1, inplace=True)
print("DataFrame:\n")
print(df)
# Access Row using its label : Return type -> Series
print("Access Row labeled as 'B':\n")
print(df.loc['B'])
# Access Row using its internal index : Return type -> Series
print("Access Row 1:\n")
print(df.iloc[1])
#Access multiple rows : Return type-> DataFrame
print("Access Row labeled as B & C:\n")
print(df.loc[['B','C']])

O/p:
DataFrame:

          W         X         Y         Z
A -0.212033 -0.284929 -0.573898 -0.440310
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987
D  0.401873 -0.191427  1.403826 -1.968769
E -0.790415 -0.732722  0.087744 -0.500286
Access Column named 'W'

A   -0.212033
B   -0.330111
C   -0.014119
D    0.401873
E   -0.790415
Name: W, dtype: float64
Access Columns named Y & Z

          Y         Z
A -0.573898 -0.440310
B  1.615373  0.367062
C  1.709641 -1.326987
D  1.403826 -1.968769
E  0.087744 -0.500286
DataFrame:

          W         X         Y         Z         V
A -0.212033 -0.284929 -0.573898 -0.440310 -1.014208
B -0.330111  1.183695  1.615373  0.367062  1.982435
C -0.014119  0.629642  1.709641 -1.326987  0.382653
D  0.401873 -0.191427  1.403826 -1.968769 -0.564943
E -0.790415 -0.732722  0.087744 -0.500286 -0.412542
DataFrame:

          W         X         Y         Z
A -0.212033 -0.284929 -0.573898 -0.440310
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987
D  0.401873 -0.191427  1.403826 -1.968769
E -0.790415 -0.732722  0.087744 -0.500286
Access Row labeled as 'B':

W   -0.330111
X    1.183695
Y    1.615373
Z    0.367062
Name: B, dtype: float64
Access Row 1:

W   -0.330111
X    1.183695
Y    1.615373
Z    0.367062
Name: B, dtype: float64
Access Row labeled as B & C:

          W         X         Y         Z
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987

Similarly,

# Access a particular cell
print(df.loc['B', 'Y'])
# Access a subset of DF
print(df.loc[['B','C'],['X','Y']])

1.6153729283493805

          X         Y
B  1.183695  1.615373
C  0.629642  1.709641

No comments:

Post a Comment