Mini Bites

Tuesday, March 19, 2024

Git steps

git Init ( Initialize Git repo)
git status (shows untracked files)
git add <filename> ( add files to git staging area) [ git add . to add all the files]
git commit -m"<commit msg>" ( commit the change to git repo)
git log (shows the commit)

git add git commit

Working Directory ----------> Staging Area ---------------> Local Git Repo

git diff <filename> shows the difference of content of the file.

git checkout <filename> rolls back the file to the last committed version.

git remote add <name> <url>

By convention, name is origin.

git push -u <remote name> <branch name>

Local Git Repo ---------------------------------------------------------> Remote Repo

git push -u origin main

To remove files from staging area,

git rm --cached - r . <rm means remove, r means recursive>

Use .gitignore file to ignore some files.

git clone to clone from remote repo to local repo.

git clone <url>

For creating a new branch,

git branch <branch-name>

To check available branches,

git branch

To switch branches,

git checkout <branch-name>

To merge changes

git merge <name-of-branch> ( go to main branch and then use merge )

Thursday, March 14, 2024

Apply any function of your choice

Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas series.

import pandas as pd
data = {"K1": [12, 23], "K2": [25,50]}
df = pd.DataFrame(data, index=['R1', 'R2'])
print(df.apply(lambda x: x-2))

Output:

   K1  K2
R1  10  23
R2  21  48

Monday, March 11, 2024

Conditional Operation on DataFrame

import pandas as pd

import numpy as np
from numpy.random import randn

np.random.seed(121)
# Create Dataframe
df = pd.DataFrame(randn(5, 4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
print("DataFrame:\n")
print(df)
print("Check if value > 0")
print(df[df>0])
print("Get all the row where Col. X value is >0")
print(df[df['X']>0])

O/p:

DataFrame:

          W         X         Y         Z
A -0.212033 -0.284929 -0.573898 -0.440310
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987
D  0.401873 -0.191427  1.403826 -1.968769
E -0.790415 -0.732722  0.087744 -0.500286
Check if value > 0
          W         X         Y         Z
A       NaN       NaN       NaN       NaN
B       NaN  1.183695  1.615373  0.367062
C       NaN  0.629642  1.709641       NaN
D  0.401873       NaN  1.403826       NaN
E       NaN       NaN  0.087744       NaN
Get all the row where Col. X value is >0
          W         X         Y         Z
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987

What is Series?

In pandas, a series is a one-dimensional labeled array capable of holding any data type. It is a fundamental data structure in pandas and can be thought of as a column in a spreadsheet or a single column of a DataFrame. The Series is similar to a NumPy array but comes with additional features like labeled indices, which makes it more powerful and flexible.

import pandas as pd

# Creating a Series with custom indices data = [1, 2, 3, 4, 5] custom_indices = ['a', 'b', 'c', 'd', 'e'] series = pd.Series(data, index=custom_indices)

#1st argument is Data, next is index # Displaying the Series with custom indices print(series)

O/P:

a 1 b 2 c 3 d 4 e 5 dtype: int64

How to access value from Series?

print(series['c'])

O/p:

What is DataFrame?

In pandas, a DataFrame is a two-dimensional labeled data structure that is widely used for handling and manipulating tabular data. It can be thought of as a table, similar to a spreadsheet or a SQL table, where data is organized in rows and columns.

import pandas as pd
import numpy as np
from numpy.random import randn

np.random.seed(121)
# Create Dataframe
df = pd.DataFrame(randn(5, 4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
print("DataFrame:\n")
print(df)
# Access Column: Return type -> Series
print("Access Column named 'W'\n")
print(df['W'])
# Access list of Column : Return type -> Dataframe
print("Access Columns named Y & Z\n")
print(df[['Y', 'Z']])
# Create a new column
df['V'] = df['Y'] + df['Z']
print("DataFrame:\n")
print(df)
# Drop the new column, Inplace -> To drop row, use axis=0
df.drop('V', axis=1, inplace=True)
print("DataFrame:\n")
print(df)
# Access Row using its label : Return type -> Series
print("Access Row labeled as 'B':\n")
print(df.loc['B'])
# Access Row using its internal index : Return type -> Series
print("Access Row 1:\n")
print(df.iloc[1])
#Access multiple rows : Return type-> DataFrame
print("Access Row labeled as B & C:\n")
print(df.loc[['B','C']])

O/p:
DataFrame:

          W         X         Y         Z
A -0.212033 -0.284929 -0.573898 -0.440310
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987
D  0.401873 -0.191427  1.403826 -1.968769
E -0.790415 -0.732722  0.087744 -0.500286
Access Column named 'W'

A   -0.212033
B   -0.330111
C   -0.014119
D    0.401873
E   -0.790415
Name: W, dtype: float64
Access Columns named Y & Z

          Y         Z
A -0.573898 -0.440310
B  1.615373  0.367062
C  1.709641 -1.326987
D  1.403826 -1.968769
E  0.087744 -0.500286
DataFrame:

          W         X         Y         Z         V
A -0.212033 -0.284929 -0.573898 -0.440310 -1.014208
B -0.330111  1.183695  1.615373  0.367062  1.982435
C -0.014119  0.629642  1.709641 -1.326987  0.382653
D  0.401873 -0.191427  1.403826 -1.968769 -0.564943
E -0.790415 -0.732722  0.087744 -0.500286 -0.412542
DataFrame:

          W         X         Y         Z
A -0.212033 -0.284929 -0.573898 -0.440310
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987
D  0.401873 -0.191427  1.403826 -1.968769
E -0.790415 -0.732722  0.087744 -0.500286
Access Row labeled as 'B':

W   -0.330111
X    1.183695
Y    1.615373
Z    0.367062
Name: B, dtype: float64
Access Row 1:

W   -0.330111
X    1.183695
Y    1.615373
Z    0.367062
Name: B, dtype: float64
Access Row labeled as B & C:

          W         X         Y         Z
B -0.330111  1.183695  1.615373  0.367062
C -0.014119  0.629642  1.709641 -1.326987

Similarly,

# Access a particular cell
print(df.loc['B', 'Y'])
# Access a subset of DF
print(df.loc[['B','C'],['X','Y']])

1.6153729283493805

          X         Y
B  1.183695  1.615373
C  0.629642  1.709641

Friday, March 8, 2024

Cool features of Numpy

a = np.array([6,0,9,0,0,5,0])
def nonzero(a):
    return a!=0
print(a[nonzero(a)])

output:

[6 9 5]

Wednesday, March 6, 2024

Numpy Array vs Python List

import numpy as np

list = [1,2,'Heaven','Hell']
arr = np.array(list)

#Please note, generally numpy generally does not contain Strings. It is mainly used

for numerical Operration.


list[2:] = 3

This is not allowed for normal python list. This will throw an error like

below:

    list[2:] = 3
TypeError: can only assign an iterable

However, it is allowed for numpy array and it will change the

original numpy array.

import numpy as np
list = [1,2,'Heaven','Hell']
arr = np.array(list)
arr[2:] = 3
print(arr)

Ouput: ['1' '2' '3' '3']

Now, no avoid changing the original array, we can use copy option as below.

Without copy:
import numpy as np
list = [1,2,'Heaven','Hell']
arr = np.array(list)
arr_copy =arr
arr_copy[2:] = 3
print(arr)
print(arr_copy)

O/p:
['1' '2' '3' '3']
['1' '2' '3' '3']

With copy:
import numpy as np
list = [1,2,'Heaven','Hell']
arr = np.array(list)
arr_copy =arr.copy()
arr_copy[2:] = 3
print(arr)
print(arr_copy)
O/p:
['1' '2' 'Heaven' 'Hell']
['1' '2' '3' '3']


list_one = [1,2,3]
list_two = [4,5,6]
print(f"Result of list addition: {list_one + list_two}")
arr_one = np.array(list_one)
arr_two = np.array(list_two)
print(f"Result of array addition: {arr_one + arr_two}")

O/p:
Result of list addition: [1, 2, 3, 4, 5, 6]
Result of array addition: [5 7 9]

Monday, March 4, 2024

Lamda Function

A lambda function is a small, anonymous function defined using the lambda keyword.

passwords = ['72tK3', '6zW5P3', '1P3=it3?', 'i"u7/auX{m']


allowed_passwords = list(filter(lambda x: len(x) >= 8, passwords))
print(allowed_passwords)