Built-in Data Structures:

List

A list in Python is a mutable, ordered collection used to store multiple items in a single variable. It is one of the most commonly used data structures in Python.

Key Characteristics of Lists

Ordered
- Items are stored in the order they are added.
- The order is preserved unless explicitly modified.
Mutable
- Lists can be changed after creation.
- You can:
  - Add (append(), insert(), extend())
  - Remove (remove(), pop(), del)
  - Modify items (list[index] = new_value)
Heterogeneous
- A list can store elements of different data types:
```
my_list = [42, "hello", 3.14, [1, 2]]
```
Indexed
- Elements are accessed by position (index), starting at 0:
```
print(my_list[0])  # Output: 42
```
Dynamic Size

Lists can grow or shrink in size dynamically as elements are added or removed.

List Accessing:

In Python, you access list elements using indexing or slicing.

1. Indexing

We can use the index operator [] to access an item in a list. List indices start at 0.
You can use positive or negative indices.

my_list = ['apple', 'banana', 'cherry', 'date']

Element	Index	Negative Index
'apple'	0	-4
'banana'	1	-3
'cherry'	2	-2
'date'	3	-1

print(my_list[1]) # Output: banana

print(my_list[-1]) # Output: date

2. Slicing

It is used to access a range of elements. The values stored in a list can be accessed using the slice operator [] and [:] with indexes starting at 0 in the beginning of the list and working their way to end -1.

# Syntax: list[start:stop:step]
print(my_list[1:3])     # Output: ['banana', 'cherry']
print(my_list[:2])      # Output: ['apple', 'banana']
print(my_list[::2])     # Output: ['apple', 'cherry']

Looping through a list:

for fruit in my_list:
    print(fruit)

Output:
apple
banana
cherry
date

List Operations:
 1. Addition (+ Operator)
Combines two lists into a new list.
list1 = [1, 2, 3]
list2 = [4, 5]
result = list1 + list2
print(result) 

Output: [1, 2, 3, 4, 5]
2. Multiplication (* Operator)
Repeats the elements in a list.
list1 = ['A', 'B']
result = list1 * 3
print(result)  

Output: ['A', 'B', 'A', 'B', 'A', 'B']
3. Updation of List Elements
You can update a single element:
my_list = [10, 20, 30]
my_list[1] = 99
print(my_list) 

Output: [10, 99, 30]
Or multiple elements using slicing:
my_list[0:2] = [1, 2]
print(my_list)  

Output: [1, 2, 30]

4. Deletion of List Elements

Using del Statement:
my_list = [10, 20, 30, 40]
del my_list[1]
print(my_list)  

Output: [10, 30, 40]
Using remove() Method:
Removes the first occurrence of a value.
my_list = [1, 2, 3, 2]
my_list.remove(2)
print(my_list)  

Output: [1, 3, 2]

Built-in functions
Python provides a rich set of built-in functions to work with lists, one of the most commonly used data structures.
Below is a list of commonly used built-in functions and methods associated with Python lists:
1. len()
Returns the number of items in the list.
Example:
fruits = ['apple', 'banana', 'cherry']
print(len(fruits)) → Output: 3
2. max() and min()
Returns the maximum or minimum value from the list.
Example:
numbers = [10, 50, 20, 5]
print(max(numbers)) → Output: 50print(min(numbers)) → Output: 5
3. sum()
Returns the sum of all numeric elements in the list.
Example:
numbers = [10, 20, 30]
print(sum(numbers)) → Output: 60
4. sorted()
Returns a new list that is a sorted version of the original.
Example:
nums = [3, 1, 4, 1, 5]
print(sorted(nums)) → Output: [1, 1, 3, 4, 5]
5. list()
Converts an iterable (like a string, tuple, or range) into a list.
Example:
text = "hello"
print(list(text)) → Output: ['h', 'e', 'l', 'l', 'o']
6. any() and all()
any() returns True if any element is True
all() returns True only if all elements are True.
Example:
values = [0, 1, 2]
print(any(values)) → Output: Trueprint(all(values)) → Output: False
7. enumerate()
Returns an enumerate object that includes index and value.
Example:
fruits = ['apple', 'banana', 'cherry']
for index, fruit in enumerate(fruits): print(index, fruit)
Output:
0 apple1 banana2 cherry
8. zip()
Combines multiple iterables into tuples.
Example:
names = ['Alice', 'Bob']
scores = [85, 90]
result = list(zip(names, scores))print(result) → Output: [('Alice', 85), ('Bob', 90)]

Tuple
A tuple is a built-in data structure in Python that works similarly to a list.
However, the key difference is that tuples are immutable,
meaning their values cannot be changed once assigned, 
whereas lists are mutable and allow modifications.

Key Differences Between List and Tuple:
A list is enclosed in square brackets, like my_list = [1, 2, 3]
A tuple is enclosed in parentheses, like my_tuple = (1, 2, 3)
Lists allow changes to their elements, while tuples do not.
Benefits of Using Tuples:
Tuples can be used as keys in dictionaries because they are hashable. Lists cannot be used as dictionary keys.
Tuples are generally used for heterogeneous data (different types), while lists are preferred for homogeneous data (similar types).
Since tuples are immutable, iteration through a tuple is faster than a list, offering a slight performance improvement.
Creating a Tuple in Python
A tuple is created by placing all the elements inside parentheses (), with each element separated by a comma.
tup1 = (1, 2, 3, 4, 5)
tup2 = ('apple', 'mango', 'guava', 'orange')
You can also create an empty tuple that contains no elements. This is done by simply using empty parentheses:
tup = (). 
To write a tuple containing a single value, there must be a comma at the end of the value, even though there is only one value.
tup=(10,)
A tuple can have any number of items, and they may be of different types
(integer, float, list, string etc.)
tup=(10,'Swathi',97.3,'B',)

Accessing The Tuple Elements:
There are various ways in which we can access the elements of a tuple.
1. Indexing
We can use the index operator [] to access an item in a tuple, where the index starts from 0.
So, a tuple having 6 elements will have indices from 0 to 5. Trying to access an index outside of the tuple index range(6,7,... in this example)
will raise an IndexError.
The index must be an integer, so we cannot use float or other types. This will result in TypeError.
Example:
# Accessing tuple elements using indexing
my_tuple = ('p','e','r','m','i','t')
print(my_tuple[0]) # 'p' 
print(my_tuple[5]) # 't'

Output:
p
t

2. Negative Indexing
Python allows negative indexing for its sequences.
The index of -1 refers to the last item, -2 to the second last item and so on.
Example:
# Negative indexing for accessing tuple elements
my_tuple = ('p', 'e', 'r', 'm', 'i', 't')
# Output: 't'
print(my_tuple[-1])
# Output: 'p'
print(my_tuple[-6])

Output:
t
p

3. Slicing
We can access a range of items in a tuple by using the slicing operator colon :.
Example:
# Accessing tuple elements using slicing
my_tuple = ('p','r','o','g','r','a','m')
# elements 2nd to 4th
print(my_tuple[1:4])
# elements beginning to 2nd
print(my_tuple[:-5])
# elements 3th to end
print(my_tuple[3:])
# elements beginning to end
print(my_tuple[:])

Output:

('r', 'o', 'g')

('p', 'r')

('g', 'r', 'a', 'm')

('p', 'r', 'o', 'g', 'r', 'a', 'm')

Tuple Operations

Various Operations can be performed on Tuple. Operations performed on Tuple are given as:

a) Adding Tuple:

Tuple can be added by using the concatenation operator(+) to join two tuples.

eg:

data1=(1,2,3,4)

data2=('x','y','z')

data3=data1+data2

print (data1)

print (data2)

print (data3)

Output:

(1, 2, 3, 4)

('x', 'y', 'z')

(1, 2, 3, 4, 'x', 'y', 'z')

b) Replicating Tuple:

Replicating means repeating. It can be performed by using '*' operator by a specific number of times.

Eg:

tuple1=(10,20,30)

tuple2=(40,50,60)

print (tuple1*2)

print (tuple2*3)

Output:

(10, 20, 30, 10, 20, 30)

(40, 50, 60, 40, 50, 60, 40, 50, 60)

c) Deleting elements from Tuple:

Deleting individual element from a tuple is not supported. However the whole of the tuple can be deleted using the del statement.

Eg:

data=(10,20,'rahul',40.6,'z')

print (data)

del (data) #will delete the tuple data

print (data) #will show an error since tuple data is already deleted

Output:

(10, 20, 'rahul', 40.6, 'z')

Traceback (most recent call last): File "C:/Python27/t.py", line 4, in print data NameError: name 'data' is not defined

Built-in tuple functions

Function	Description	Example Usage
`len(tuple)`	Returns the number of items in the tuple.	`len((1, 2, 3))` → `3`
`max(tuple)`	Returns the largest item. Works with numbers or strings.	`max((3, 5, 1))` → `5`
`min(tuple)`	Returns the smallest item.	`min((3, 5, 1))` → `1`
`tuple(sequence)`	Converts a list, string, or other iterable into a tuple.	`tuple([1, 2])` → `(1, 2)`
`sum(tuple)`	Returns the total sum of elements (if all are numeric).	`sum((1, 2, 3))` → `6`
`sorted(tuple)`	Returns a sorted list from the tuple's items (does not modify tuple).	`sorted((3, 1, 2))` → `[1, 2, 3]`
`any(tuple)`	Returns `True` if at least one element is `True`.	`any((0, False, 5))` → `True`
`all(tuple)`	Returns `True` if all elements are `True`.	`all((1, 2, 3))` → `True`

Example:

# Define a tuple

my_tuple = (4, 7, 1, 9, 0)

print("Tuple:", my_tuple)

# Length of tuple

print("Length:", len(my_tuple))

# Maximum value

print("Max value:", max(my_tuple))

# Minimum value

print("Min value:", min(my_tuple))

# Sum of elements

print("Sum:", sum(my_tuple))

# Sorted version (returns a list)

print("Sorted:", sorted(my_tuple))

# Convert a list to tuple

my_list = [10, 20, 30]

converted = tuple(my_list)

print("Converted from list:", converted)

# Use of any() and all()

bool_tuple = (1, True, 0)

print("Any true?", any(bool_tuple)) # True, because 1 or True present

print("All true?", all(bool_tuple)) # False, because 0 is False

Output:

Tuple: (4, 7, 1, 9, 0)

Length: 5

Max value: 9

Min value: 0

Sum: 21

Sorted: [0, 1, 4, 7, 9]

Converted from list: (10, 20, 30)

Any true? True

All true? False

Set

In Python, a set is a collection of unordered and unindexed elements.
Sets can store elements of different data types (integers, strings, floats, etc.).
Sets are similar to lists and tuples, but they do not maintain order and do not allow duplicates.
The elements of a set are immutable (cannot be changed), but the set itself is mutable (you can add or remove elements).
Sets are defined using curly braces {} and elements are separated by commas.
Python implements the set data type using the built-in set class.
Sets do not allow duplicate values. Any repeated item is automatically removed.
Sets are unordered, so items appear in random order and cannot be accessed by index.
Sets are useful when you need to store unique items and perform mathematical set operations (like union, intersection, difference).

Creating a Set in Python

A set is created using curly braces {} or the set() constructor.
Elements inside a set must be separated by commas.
Sets automatically remove duplicate values.
You can store elements of different data types in a single set.

Method 1: Using Curly Braces

# Creating a set with curly braces

my_set = {1, 2, 3, 4}

print(my_set)

Output: {1, 2, 3, 4}

Method 2: Using the set() Constructor

# Creating a set from a list

my_list = [1, 2, 2, 3]

my_set = set(my_list)

print(my_set)

Output: {1, 2, 3}

Accessing Set Elements

Sets are unordered and unindexed, so you cannot access elements using an index like you do with lists or tuples.

my_set = {10, 20, 30}

print(my_set[0]) # This will raise a TypeError

To access elements, you generally use a loop.

my_set = {'apple', 'banana', 'cherry'}

for item in my_set:

print(item)

Set Operations

Adding elements to a set

1. `add()`

Adds a single element.

my_set = {1, 2}

my_set.add(3)

Output:

{1, 2, 3}

2. `update()`

Adds multiple elements (from any iterable like list, tuple, or another set).

my_set = {1, 2}
my_set.update([3, 4]) # {1, 2, 3, 4}
my_set.update((5, 6)) # {1, 2, 3, 4, 5, 6}
my_set.update({7, 8}) # {1, 2, 3, 4, 5, 6, 7, 8}

Removing elements from set

1. remove()

Removes a specific element. Raises an error if the element doesn’t exist.

my_set = {1, 2, 3}
my_set.remove(2) # {1, 3}
my_set.remove(10) # KeyError

2. discard()

Removes a specific element. Does nothing if the element doesn’t exist.

my_set = {1, 2, 3}
my_set.discard(3) # {1, 2}
my_set.discard(10) # No error

3. `pop()`

Removes and returns a random element (since sets are unordered).

my_set = {1, 2, 3}
removed = my_set.pop()
print(removed) # Could be 1, 2, or 3

4. `clear()`

Removes all elements from the set.

my_set = {1, 2, 3}
my_set.clear() # my_set becomes set()

Operation	Symbol	Python Method	Meaning	Example Result
Union	∪	`union()`	All elements in A or B	`{1, 2, 3, 4, 5, 6}`
Intersection	∩	`intersection()`	Elements in both A and B	`{3, 4}`
Difference (A - B)	−	`difference()`	Elements in A but not in B	`{1, 2}`
Symmetric Difference	△	`symmetric_difference()`	Elements in A or B but not both	`{1, 2, 5, 6}`

Example:

# Define two example sets

A = {1, 2, 3, 4}

B = {3, 4, 5, 6}

print("Set A:", A)

print("Set B:", B)

# Union

union_set = A.union(B)

print("Union (A ∪ B):", union_set)

# Intersection

intersection_set = A.intersection(B)

print("Intersection (A ∩ B):", intersection_set)

# Difference (A - B)

difference_set = A.difference(B)

print("Difference (A - B):", difference_set)

# Symmetric Difference

sym_diff_set = A.symmetric_difference(B)

print("Symmetric Difference (A △ B):", sym_diff_set)

Output:

Set A: {1, 2, 3, 4}

Set B: {3, 4, 5, 6}

Union (A ∪ B): {1, 2, 3, 4, 5, 6}

Intersection (A ∩ B): {3, 4}

Difference (A - B): {1, 2}

Symmetric Difference (A △ B): {1, 2, 5, 6}

Dictionary

In Python, a dictionary is a collection of elements where each element is a pair of key and value.
In Python, the dictionary data type (data structure) has implemented with a class known as dict.
All the elements of a dictionary must be enclosed in curly braces, each element must be separated with a comma symbol, and every pair of key and value must be separated with colon ( : ) symbol.

Creating Dictionary

Creating a dictionary is as simple as placing items inside curly braces {} separated by commas.
An item has a key and a corresponding value that is expressed as a pair (key: value).

Example:

# 1. Empty dictionary

my_dict1 = {}

print("1. Empty dictionary:", my_dict1)

# 2. Dictionary with integer keys

my_dict2 = {1: 'apple', 2: 'ball'}

print("2. Dictionary with integer keys:", my_dict2)

# 3. Dictionary with mixed keys

my_dict3 = {'name': 'John', 1: [2, 4, 3]}

print("3. Dictionary with mixed keys:", my_dict3)

# 4. Using dict() with another dictionary

my_dict4 = dict({1: 'apple', 2: 'ball'})

print("4. Using dict() with a dictionary:", my_dict4)

# 5. Using dict() with list of tuples (sequence of key-value pairs)

my_dict5 = dict([(1, 'apple'), (2, 'ball')])

print("5. Using dict() with list of tuples:", my_dict5)

Output:

1. Empty dictionary: {}

2. Dictionary with integer keys: {1: 'apple', 2: 'ball'}

3. Dictionary with mixed keys: {'name': 'John', 1: [2, 4, 3]}

4. Using dict() with a dictionary: {1: 'apple', 2: 'ball'}

5. Using dict() with list of tuples: {1: 'apple', 2: 'ball'}

Accessing Elements from Dictionary

In Python, the dictionary elements are organized based on the keys. So, we can access using the key of a value in the dictionary. Python provides the following ways to access the elements of a dictionary.

Using Key as index - The elements of a dictionary can be accessed using the key as an index.
get( key ) - This method returns the value associated with the given key in the dictionary.
Accessing the whole dictionary - In Python, we use the name of the dictionary to access the whole dictionary.
items( ) - This is a built-in method used to access all the elements of a dictionary in the form of a list of key-value pair.
keys( ) - This is a built-in method used to access all the keys in a dictionary in the form of a list.
values( ) - This is a built-in method used to access all the values in a dictionary in the form of a list.

# Creating a dictionary with student details

student_dictionary = {

'rollNo': 1,

'name': 'Nani',

'department': 'BSC',

'year': 2

}

# 1. Displaying the type of the dictionary

print(type(student_dictionary))

# 2. Accessing a value using the key as an index

print(student_dictionary['rollNo'])

# 3. Accessing a value using the get() method

print(student_dictionary.get('name'))

# 4. Accessing the whole dictionary

print(student_dictionary)

# 5. Accessing all key-value pairs using items() method

print(student_dictionary.items())

# 6. Accessing all keys using keys() method

print(student_dictionary.keys())

# 7. Accessing all values using values() method

print(student_dictionary.values())

Output:

<class 'dict'>
1
Nani
{'rollNo': 1, 'name': 'Nani', 'department': 'BSC', 'year': 2}
dict_items([('rollNo', 1), ('name', 'Nani'), ('department', 'BSC'), ('year', 2)])
dict_keys(['rollNo', 'name', 'department', 'year'])
dict_values([1, 'Nani', 'BSC', 2])

Operations in Dictionary

Dictionaries in Python allow us to perform various operations such as adding, updating, deleting, and more.

Adding elements - To add a new key-value pair, simply assign a value to a new key.
Updating elements - If the key already exists, assigning a new value will update it.

Removing elements

pop( key ) - This method removes the element with a specified key from the dictionary.
popitem( ) - This method removes the last element from the dictionary.
clear( ) - This method removes all the elements from the dictionary. That means the clear( ) method make the dictionary empty. This method returns the None value.
del keyword with dict[key] - This keyword deletes the element with the specified key from the dictionary. Once the del keyword has used on a dictionary, we can not access it in the rest of the code.
del keyword - This keyword deletes the dictionary completely. Once the del keyword has used on a dictionary, we can not access it in the rest of the code.

Example:

# Creating a dictionary with student details

student_dictionary = {

'rollNo': 1,

'name': 'Ram',

'department': 'CSE',

'year': 2,

'section': 'A',

'percentage': 80.5

}

# Displaying the original dictionary

print(f'Dictionary is {student_dictionary}')

# Updating the value of the 'name' key to 'Swathi'

student_dictionary['name'] = 'Swathi'

# Removing the element with key 'year' using pop()

student_dictionary.pop('year')

print(f'The dictionary after removing element with key "year":\n{student_dictionary}')

# Removing the last inserted item using popitem()

student_dictionary.popitem()

print(f'The dictionary after popitem():\n{student_dictionary}')

# Deleting the key 'section' using del

del student_dictionary['section']

print(f'The dictionary after deleting "section":\n{student_dictionary}')

# Clearing all elements from the dictionary using clear()

student_dictionary.clear()

print(f'The dictionary after clear():\n{student_dictionary}')

# Deleting the dictionary completely

del student_dictionary

# Uncommenting the line below would raise an error because the dictionary no longer exists

# print(f'The dictionary after del:\n{student_dictionary}') # ❌ This line causes an error

Output:

Dictionary is {'rollNo': 1, 'name': 'Ram', 'department': 'CSE', 'year': 2, 'section': 'A', 'percentage': 80.5}

The dictionary after removing element with key "year":

{'rollNo': 1, 'name': 'Swathi', 'department': 'CSE', 'section': 'A', 'percentage': 80.5}

The dictionary after popitem():

{'rollNo': 1, 'name': 'Swathi', 'department': 'CSE', 'section': 'A'}

The dictionary after deleting "section":

{'rollNo': 1, 'name': 'Swathi', 'department': 'CSE'}

The dictionary after clear():

{}

Functions

A function is a block of statements used to perform a specific task.
Functions allow us to divide a larger problem into smaller subparts to solve efficiently, to implement the code reusability concept and makes the code easier to read and understand.

Types of Functions in Python:

1. Built-in Functions

These are the functions that come pre-defined with Python. They help perform common tasks like printing, taking input, working with numbers or sequences, etc.

Examples:

print() – Prints output to the console.
input() – Takes user input as a string.
len() – Returns the length of a sequence (like a list or string).
type() – Returns the data type of a variable.
range() – Returns a sequence of numbers.
int(), float(), str() – Type conversion functions.

Some built-in functions used in Data Analysis

1. Data Loading

Function Description Library

open() Built-in function to open files built-in

pd.read_csv() Loads data from a CSV file pandas

pd.read_excel() Loads data from Excel files pandas

pd.read_json() Loads JSON data pandas

np.loadtxt() Loads text data NumPy

np.genfromtxt() Handles missing values in text files NumPy

Function	Description	Library
`open()`	Built-in function to open files	built-in
`pd.read_csv()`	Loads data from a CSV file	pandas
`pd.read_excel()`	Loads data from Excel files	pandas
`pd.read_json()`	Loads JSON data	pandas
`np.loadtxt()`	Loads text data	NumPy
`np.genfromtxt()`	Handles missing values in text files	NumPy

2. Data Cleaning

Function Description Library

df.isnull() Checks for missing values pandas

df.dropna() Removes missing values pandas

df.fillna() Fills missing values pandas

df.duplicated() Checks for duplicates pandas

df.drop_duplicates() Removes duplicate rows pandas

str.strip(), str.lower() String cleaning built-in, pandas

df.astype() Converts data types pandas

Function	Description	Library
`df.isnull()`	Checks for missing values	pandas
`df.dropna()`	Removes missing values	pandas
`df.fillna()`	Fills missing values	pandas
`df.duplicated()`	Checks for duplicates	pandas
`df.drop_duplicates()`	Removes duplicate rows	pandas
`str.strip()`, `str.lower()`	String cleaning	built-in, pandas
`df.astype()`	Converts data types	pandas

3. Data Exploration

Function Description Library

df.head() Shows top rows pandas

df.tail() Shows bottom rows pandas

df.info() Summary of data types pandas

df.describe() Summary stats pandas

df.value_counts() Frequency of values pandas

df.columns, df.shape Structure of data pandas

df.corr() Correlation matrix pandas

Function	Description	Library
`df.head()`	Shows top rows	pandas
`df.tail()`	Shows bottom rows	pandas
`df.info()`	Summary of data types	pandas
`df.describe()`	Summary stats	pandas
`df.value_counts()`	Frequency of values	pandas
`df.columns`, `df.shape`	Structure of data	pandas
`df.corr()`	Correlation matrix	pandas

4. Data Manipulation

Function Description Library

df.sort_values() Sort rows by column pandas

df.groupby() Group data by column(s) pandas

df.merge() Merge two DataFrames pandas

df.join() Join based on index pandas

df.pivot_table() Pivoting data pandas

df.apply() Apply custom function pandas

df.replace() Replace values pandas

Function	Description	Library
`df.sort_values()`	Sort rows by column	pandas
`df.groupby()`	Group data by column(s)	pandas
`df.merge()`	Merge two DataFrames	pandas
`df.join()`	Join based on index	pandas
`df.pivot_table()`	Pivoting data	pandas
`df.apply()`	Apply custom function	pandas
`df.replace()`	Replace values	pandas

5. Data Visualization

Function Description Library

plt.plot() Line plot matplotlib

plt.bar() Bar chart matplotlib

plt.hist() Histogram matplotlib

plt.scatter() Scatter plot matplotlib

sns.heatmap() Correlation heatmap seaborn

sns.boxplot() Box plot seaborn

sns.pairplot() Pairwise relationships seaborn

Function	Description	Library
`plt.plot()`	Line plot	matplotlib
`plt.bar()`	Bar chart	matplotlib
`plt.hist()`	Histogram	matplotlib
`plt.scatter()`	Scatter plot	matplotlib
`sns.heatmap()`	Correlation heatmap	seaborn
`sns.boxplot()`	Box plot	seaborn
`sns.pairplot()`	Pairwise relationships	seaborn

6. Statistical Analysis

Function Description Library

mean(), median(), mode() Central tendencies statistics, NumPy

std(), var() Spread metrics pandas, NumPy

ttest_ind() T-test scipy.stats

pearsonr() Pearson correlation scipy.stats

linregress() Linear regression scipy.stats

ols() Ordinary least squares statsmodels

Function	Description	Library
`mean()`, `median()`, `mode()`	Central tendencies	statistics, NumPy
`std()`, `var()`	Spread metrics	pandas, NumPy
`ttest_ind()`	T-test	scipy.stats
`pearsonr()`	Pearson correlation	scipy.stats
`linregress()`	Linear regression	scipy.stats
`ols()`	Ordinary least squares	statsmodels

2. User-defined Functions

These are functions that you create to perform a specific task or set of tasks. They make your code modular, reusable, and easier to manage.

Syntax:

def function_name(parameters):
# function body
return result

Example:

def greet(name):

print(f"Hello, {name}!")

greet("Swathi")

Output:

Hello, Swathi!

Function arguments:

Function arguments are the values you pass to a function when calling it. These values are assigned to the function’s parameters and used inside the function.

1. Positional Arguments

The positional arguments are the arguments passed to a function in the same positional order as they defined in the function definition. Here, the number of arguments and order of arguments in the function call should exactly match with the respective function definition. If any mismatch leads to error. The positional arguments are also known as required arguments.

Example:

def greet(name, age):

print(f"Hello {name}, you are {age} years old.")

greet("Swathi", 24)

Output:

Hello Swathi, you are 24 years old.

2. Default Arguments

The default argument is an argument which is set with a default value in the function definition. If the function is called with value then, the function executed with provided value, otherwise, it executed with the default value given in the function definition.

Example:

def greet(name="Swathi"):

print(f"Hello {name}")

greet() # Uses default

greet("Anu") # Overrides default

Output:

Hello Swathi

Hello Anu

3. Keyword Arguments

The keyword argument is an argument passed as a value along with the parameter name (parameter_name = value). When keyword arguments are used, we may ignore the order of arguments. We may pass the arguments in any order because the Python interpreter uses the keyword provided to match with the respective parameter.

Example:

def greet(name, city):

print(f"{name} is from {city}.")

greet(name="Swathi", city="Hyderabad")

Output:

Swathi is from Hyderabad.

4. Variable length keywords

Some times we may not aware of the number of arguments to be passed to a function definition, or it may change according to the situation. The Python provides variable-length of arguments which enable us to pass an arbitrary number of arguments. Here, all the arguments are stored as a tuple of parameters in the function definition. And they are accessed using the index values (similar to a tuple).

a. *args → Accepts multiple positional arguments.

Example:

def hobbies(*args):

print("Swathi's hobbies are:")

for hobby in args:

print("-", hobby)

hobbies("Reading", "Painting", "Music")

Output:

Swathi's hobbies are:

- Reading

- Painting

- Music

b. **kwargs → Accepts multiple keyword arguments.

Example:

def profile(**kwargs):

print("Swathi's Profile:")

for key, value in kwargs.items():

print(f"{key}: {value}")

profile(age=22, city="Hyderabad", skill="Python")

Output:

Swathi's Profile:

age: 22

city: Hyderabad

skill: Python

Scope of variables

Variable scope refers to where in your code a variable is accessible. The scope refers to the accessibility of a variable or object in the program. The scope of a variable determines the part of the program in which it can be accessed or used. In simple terms, the scope of a variable is the region of the program in which that variable can be accessed.

Python has the following types of variable scopes:

1. Local Scope

A variable declared inside a function is local to that function.

def greet():

name = "Swathi" # local variable

print("Hello", name)

greet()

# print(name) ❌ This will cause an error because 'name' is local

Output:

Hello Swathi

2. Global Scope

A variable declared outside any function is global and can be accessed anywhere in the file.

name = "Swathi" # global variable

def greet():

print("Hello", name) # can access global variable

greet()

print("Name outside function:", name)

Output:

Hello Swathi

Name outside function: Swathi

3. Global Keyword

If you want to modify a global variable inside a function, use the global keyword.

name = "Swathi"

def change_name():

global name

name = "Anu"

change_name()

print("Name after change:", name)

Output:

Name after change: Anu

4. Nonlocal Keyword (for nested functions)

Used to modify a variable in the outer function scope (not global).

def outer():

name = "Swathi"

def inner():

nonlocal name

name = "Anu"

print("Inner:", name)

inner()

print("Outer:", name)

outer()

Output:

Inner: Anu

Outer: Anu

Files and Operating System

In data analysis, interacting with files and the operating system is a common and essential task. Analysts and data scientists frequently:

Read and write data files such as CSV, Excel, and JSON.
Manage folders and file paths across different environments.
Load and save datasets during different stages of analysis.
Interact with the operating system to automate workflows and organize data.

Python provides built-in modules that greatly simplify these tasks. The most commonly used ones include:

os: Interacts with the operating system (e.g., working with paths, environment variables).
shutil: Performs high-level file operations such as copying and moving files.
open(): A built-in function for reading from and writing to files.

Merits:

Feature	Explanation
Platform Independence	Python code works seamlessly across Windows, macOS, and Linux without modification.
File & Directory Manipulation	Python can easily create, delete, rename, and move files and directories.
Access Control	You can manage file permissions and check whether files exist or are accessible.
Cross-Platform Compatibility	File handling functions behave consistently across operating systems.
File I/O Operations	Python provides simple functions for reading and writing text, binary, and structured files.
Error Handling	Exceptions like `FileNotFoundError` or `PermissionError` help manage common file-related issues.
File Compression	Python’s `zipfile` and `tarfile` modules support compressing and extracting files.
Interoperability	Python integrates well with other tools and libraries used in data science, such as Pandas and NumPy.

Demirits:

Concern	Explanation
Platform-Specific Features	Some OS-level features (like symbolic links) may behave differently or be unsupported on certain platforms.
Security Concerns	Improper handling of file input/output can expose sensitive data or cause unintentional data loss.
Learning Curve	File paths, permission errors, and working with different file formats can be challenging for beginners.
File System Errors	Errors such as missing files or incorrect paths are common and need to be handled properly.
Limited Low-Level Control	Python abstracts away many low-level details, which may limit control in certain complex scenarios.
Performance Impact	Reading and processing very large files can be memory-intensive and slow without optimization.

Applications:

1. Data Processing and Analysis

Use: Load datasets from files, clean data, and perform operations.
Example: Use pandas.read_csv() to read a CSV file and filter rows by conditions.

2. Configuration Management

Use: Store and load configuration settings from files like .ini, .json, or .yaml.
Example: Load API keys or data paths from a JSON config file.

3. Logging and Error Handling

Use: Maintain logs of data processing steps or catch and record errors.
Example: Use the logging module to write error messages to a file during batch processing.

4. File Compression and Archiving

Use: Compress datasets into .zip or .tar formats for sharing or storage.
Example: Use zipfile.ZipFile to archive old data logs.

5. Database Interaction

Use: Store processed data in databases such as SQLite or MySQL.
Example: Use sqlite3 to save cleaned data from a CSV into a SQLite database.

6. Web Development

Use: Handle file uploads (e.g., CSV files) and downloads in web-based data dashboards.
Example: Use Flask to create an interface where users can upload data for analysis.

7. Network Communication

Use: Download data from the internet or save data to a remote server.
Example: Use requests or ftplib to download a CSV from a public URL.

8. Automated Testing

Use: Automatically generate or modify test data files during software testing.
Example: Use os and open() to create mock input files for testing data pipelines.

NumPy Basics: Arrays and Vectorized Computations

NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It provides:

Efficient array data structures.
High-performance functions for mathematical and statistical operations.
The ability to perform vectorized computations, which are faster and more concise than using loops.

Creating Arrays

NumPy arrays are created using the np.array() function. These arrays are more powerful than regular Python lists.

a = np.array([1, 2, 3]) # 1D array

b = np.array([[1, 2], [3, 4]]) # 2D array

Array Creation Functions:

Function	Description
`np.zeros(shape)`	Creates an array filled with zeros.
`np.ones(shape)`	Creates an array filled with ones.
`np.arange(start, stop, step)`	Creates a range of evenly spaced values.
`np.linspace(start, stop, num)`	Generates evenly spaced values over a specified interval.
`np.eye(n)`	Creates an identity matrix of size `n`.
`np.random.rand()`	Generates random values.

NumPy arrays have various useful attributes:

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape) # (2, 3)

print(arr.ndim) # 2 (dimensions)

print(arr.size) # 6 (total elements)

print(arr.dtype) # int64 (data type)

Array Indexing and Slicing

You can access and modify array elements using indices:

arr = np.array([10, 20, 30, 40])

print(arr[2]) # 30

arr[1:3] = [25, 35] # slice assignment

Vectorized Computations

Vectorization refers to performing operations on entire arrays instead of using loops. This is faster and more efficient.

a = np.array([1, 2, 3])

b = np.array([4, 5, 6])

print(a + b) # [5 7 9]

print(a * b) # [4 10 18]

print(a ** 2) # [1 4 9]

Broadcasting

Broadcasting is a powerful feature in NumPy that allows arithmetic operations between arrays of different shapes. It works by automatically expanding the smaller array to match the shape of the larger array — without actually copying the data.

import numpy as np

a = np.array([1, 2, 3])

b = np.array([[1], [2], [3]])

# Shapes: a = (3,), b = (3,1)

Merits

1. Efficient Element-Wise Operations

NumPy allows you to perform arithmetic operations on entire arrays without using loops.

2. Memory Efficiency

NumPy arrays use fixed data types (e.g., int32, float64) which are more memory-efficient than Python lists.

3. Broadcasting

Broadcasting allows NumPy to perform operations on arrays with different shapes by “stretching” one to match the other.

4. Universal Functions (ufuncs)

NumPy has built-in universal functions (ufuncs) that operate element-wise on arrays.

5. Multi-dimensional Arrays

NumPy can handle n-dimensional arrays, perfect for working with matrices, tensors, and structured datasets.

6. Comprehensive Math Functions

NumPy includes a wide variety of advanced mathematical functions:

Category	Examples
Linear Algebra	`np.dot()`, `np.linalg.inv()`
Statistics	`np.mean()`, `np.std()`
Fourier Transform	`np.fft.fft()`

7. Interoperability with Other Libraries

NumPy arrays are universally accepted across the Python data ecosystem:

Pandas (DataFrames are built on NumPy arrays)
Matplotlib (for plotting arrays)
TensorFlow / PyTorch (for deep learning tensors)
OpenCV (image processing with arrays)

8. Random Number Generation

NumPy provides fast random number generation functions.

Demerits

While NumPy is a powerful tool for numerical computing, it's important to understand its limitations and potential pitfalls, especially when starting out or working in constrained environments.

1. Learning Curve

NumPy arrays, with their mathematical syntax and rules (like broadcasting and indexing), can be less intuitive than Python lists for beginners.

2. Fixed Size and Type

Once a NumPy array is created, its size and data type cannot be changed easily.

3. Verbose Syntax for Some Logic

For conditional filtering, NumPy uses logical operators that require extra syntax compared to Python's if conditions.

4. In-Place Mutations

NumPy often performs operations in-place, which can silently alter data and cause bugs in larger programs.

5. Memory Overhead for Small Arrays

For very small datasets or simple tasks, Python lists can be more memory-efficient than NumPy arrays.

Applications

Area	Example Use
Numerical Computing	Solving mathematical equations, running simulations, performing linear algebra and numerical integrations.
Data Analysis (with Pandas)	Cleaning, transforming, and aggregating large datasets; NumPy arrays are the core data structure behind Pandas.
Machine Learning	Serving as input/output data for models, performing matrix operations (dot product, mean, variance), and preparing datasets.
Signal/Image Processing	Applying filters, edge detection, Fourier transforms, and convolutions — especially in combination with OpenCV or SciPy.
Finance	Modeling risk, performing statistical analyses, analyzing time series data, and pricing financial instruments.
Physics & Engineering	Simulating physical systems, working with tensors and matrices, solving differential equations, and modeling dynamic systems.
Bioinformatics	Processing gene expression data, analyzing DNA sequences, and managing biological datasets efficiently.
Geospatial Analysis	Handling raster data, working with map grids, terrain modeling, and integrating with GIS tools like GDAL or Rasterio.
Quantum Computing	Simulating quantum circuits, quantum states, and performing complex number computations in quantum algorithms.

The Numpy NDArray

An N-dimensional array (also called an ndarray) is a generalized data structure in NumPy that can represent:

1D arrays (vectors)
2D arrays (matrices)
3D arrays (cubes of data)
N-dimensional arrays (tensors)

In NumPy, these arrays are represented using the ndarray object, which stores homogeneous data in multidimensional format.

Syntax: Creating an N-Dimensional Array

import numpy as np
arr = np.array([[[1, 2], [3, 4]],
                [[5, 6], [7, 8]]])

arr has 3 dimensions. 
Shape is (2,2,2).
It has 2 blocks. Each block has 2 rows. Each row has 2 columns.

Example:

import numpy as np
arr = np.array([[[1, 2], [3, 4]],
                [[5, 6], [7, 8]]])
print("Array:\n", arr)
print("Dimensions (ndim):", arr.ndim)
print("Shape:", arr.shape)
print("Size:", arr.size)
print("Data type:", arr.dtype)

Output:
Array:
 [[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
Dimensions (ndim): 3
Shape: (2, 2, 2)
Size: 8
Data type: int64

Merits

1. Efficient Element-Wise Operations
    NumPy allows you to perform arithmetic operations on entire arrays without using loops.

2. Memory Efficiency
    NumPy arrays use fixed data types (e.g., int32, float64) which are more memory-efficient than Python lists.

3. Broadcasting
    Broadcasting allows NumPy to perform operations on arrays with different shapes by “stretching” one to match the other.

4. Universal Functions (ufuncs)
    NumPy has built-in universal functions (ufuncs) that operate element-wise on arrays.

5. Multi-dimensional Arrays
    NumPy can handle n-dimensional arrays, perfect for working with matrices, tensors, and structured datasets.

6. Comprehensive Math Functions
    NumPy includes a wide variety of advanced mathematical functions:
    
Category Examples
Linear Algebra np.dot(), np.linalg.inv()
Statistics np.mean(), np.std()
Fourier Transform np.fft.fft()

7. Interoperability with Other Libraries
NumPy arrays are universally accepted across the Python data ecosystem:
Pandas (DataFrames are built on NumPy arrays)
Matplotlib (for plotting arrays)
TensorFlow / PyTorch (for deep learning tensors)
OpenCV (image processing with arrays)
8. Random Number Generation
    NumPy provides fast random number generation functions.

Demerits

While NumPy is a powerful tool for numerical computing, it's important to understand its limitations and potential pitfalls, especially when starting out or working in constrained environments.

1. Learning Curve
    NumPy arrays, with their mathematical syntax and rules (like broadcasting and indexing), can be less intuitive than Python lists for beginners.

2. Fixed Size and Type
    Once a NumPy array is created, its size and data type cannot be changed easily.

3. Verbose Syntax for Some Logic
    For conditional filtering, NumPy uses logical operators that require extra syntax compared to Python's if conditions.

4. In-Place Mutations
    NumPy often performs operations in-place, which can silently alter data and cause bugs in larger programs.

5. Memory Overhead for Small Arrays
    For very small datasets or simple tasks, Python lists can be more memory-efficient than NumPy arrays.


Applications

Area Example Use
Numerical Computing Solving mathematical equations, running simulations, performing linear algebra and numerical integrations.
Data Analysis (with Pandas) Cleaning, transforming, and aggregating large datasets; NumPy arrays are the core data structure behind Pandas.
Machine Learning Serving as input/output data for models, performing matrix operations (dot product, mean, variance), and preparing datasets.
Signal/Image Processing Applying filters, edge detection, Fourier transforms, and convolutions — especially in combination with OpenCV or SciPy.
Finance Modeling risk, performing statistical analyses, analyzing time series data, and pricing financial instruments.
Physics & Engineering Simulating physical systems, working with tensors and matrices, solving differential equations, and modeling dynamic systems.
Bioinformatics Processing gene expression data, analyzing DNA sequences, and managing biological datasets efficiently.
Geospatial Analysis Handling raster data, working with map grids, terrain modeling, and integrating with GIS tools like GDAL or Rasterio.
Quantum Computing Simulating quantum circuits, quantum states, and performing complex number computations in quantum algorithms.

Universal Functions
Ufuncs (short for Universal Functions) are core functions in NumPy that allow for fast and efficient element-wise operations on arrays.
These functions are optimized in C under the hood, making them significantly faster than Python loops for numerical computations.

Features:
Operate element-by-element on ndarray objects
Automatically handle broadcasting rules
Support arithmetic, logical, trigonometric, exponential, and statistical operations
Work efficiently with large datasets and high-dimensional arrays
Example:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(np.add(a, b))      # [5 7 9]
print(np.multiply(a, b)) # [4 10 18]
print(np.sqrt(b))        # [2. 2.23606798 2.44948974]

Arithmetic Operations:




Function
Description




np.add()
Element-wise addition


np.subtract()
Subtraction


np.multiply()
Multiplication


np.divide()
Division


np.power()
Exponentiation


np.mod()
Modulo


Trigonometric Functions:




Function
Description




np.sin()
Sine


np.cos()
Cosine


np.tan()
Tangent


np.arcsin()
Inverse sine


np.radians() / np.degrees()
Conversion between radians and degrees


Exponential and Logarithmic Functions:




Function
Description




np.exp()
Exponential (e^x)


np.log()
Natural logarithm


np.log10()
Base-10 logarithm


np.log2()
Base-2 logarithm


Statistical Functions:




Function
Description




np.mean()
Mean of elements


np.std()
Standard deviation


np.var()
Variance


np.min(), np.max()
Minimum and maximum


Bitwise Operations:




Function
Description




np.bitwise_and()
AND operation


np.bitwise_or()
OR operation


np.invert()
Bitwise NOT


np.left_shift()
Shift bits left


np.right_shift()
Shift bits right


 
 Merits of ufuncs

1. Efficiency: Implemented in C, faster than pure Python loops
2. Vectorization: Operate on entire arrays without loops
3. Broadcasting: Handle different shaped arrays easily
4. Code Readability: Cleaner and more expressive
5. Integration with NumPy Ecosystem: Works well with other tools/libraries

 Demerits of ufuncs

1. Learning Curve: Can be tricky for beginners to understand broadcasting/vectorization
2. Element-wise Limitation: Not suitable for complex, non-element-wise tasks
3. Memory Usage: Temporary arrays may increase memory consumption during operations

Applications

1. Scientific Computing
Ufuncs help in solving complex mathematical equations, running scientific simulations, and performing operations like trigonometric calculations, exponentials, and logarithms.
Use Cases:
Solving systems of equations using matrix operations
Simulating physical systems using mathematical models
Applying functions like np.sin(), np.exp(), np.log() to numerical datasets
2. Data Analysis & Statistics

In data science and analytics, ufuncs are used to process large datasets efficiently and perform statistical computations in a vectorized way.

Use Cases:
Computing mean, standard deviation, variance using np.mean(), np.std()
Applying element-wise filters or transformations (e.g., normalization)
Aggregating or comparing data without writing loops
3. Image & Signal Processing

Image data (as pixel arrays) and processing signals in audio and video analysis.

Use Cases:
Adjusting brightness/contrast of images with operations like np.add() or np.multiply()
Filtering pixel values using comparison ufuncs (e.g., np.where(), np.clip())
Performing mathematical transforms (e.g., Fourier using np.fft combined with ufuncs)
4. Machine Learning
Ufuncs are essential for preparing data for machine learning models and for applying element-wise transformations during training and evaluation.
Use Cases:
Preprocessing input features (e.g., scaling, normalizing using ufuncs)
Applying activation functions such as ReLU (np.maximum(0, x)), sigmoid (1 / (1 + np.exp(-x)))
Speeding up training pipelines through fast matrix and vector operations

File Input and Output with Arrays
Working with data often involves reading from or writing to files.
NumPy provides efficient and easy-to-use functions for handling input and output (I/O) operations.
Whether you're saving a large array for future use or loading data from a file, NumPy has got you covered.

NumPy provides functions to:
Save arrays to disk
Load arrays from disk
Support both binary (.npy, .npz) and text (.txt, .csv) formatsEach method is designed for specific use cases, from performance-focused binary formats to human-readable text formats.

Binary I/O Using .npy Format
What is the .npy Format?

The .npy file format is a binary file format specifically designed by NumPy to store ndarray objects efficiently. 
It stores not only the data but also metadata such as the array’s shape, data type, and endianness. 
This allows the array to be accurately reconstructed on any machine, regardless of platform or architecture.

Saving an Array Using np.save()

The np.save() function allows you to save a single NumPy array to a file in binary .npy format.
Syntax:

Function	Description
`np.add()`	Element-wise addition
`np.subtract()`	Subtraction
`np.multiply()`	Multiplication
`np.divide()`	Division
`np.power()`	Exponentiation
`np.mod()`	Modulo

Function	Description
`np.sin()`	Sine
`np.cos()`	Cosine
`np.tan()`	Tangent
`np.arcsin()`	Inverse sine
`np.radians()` / `np.degrees()`	Conversion between radians and degrees

Function	Description
`np.exp()`	Exponential (e^x)
`np.log()`	Natural logarithm
`np.log10()`	Base-10 logarithm
`np.log2()`	Base-2 logarithm

Function	Description
`np.mean()`	Mean of elements
`np.std()`	Standard deviation
`np.var()`	Variance
`np.min()`, `np.max()`	Minimum and maximum

Function	Description
`np.bitwise_and()`	AND operation
`np.bitwise_or()`	OR operation
`np.invert()`	Bitwise NOT
`np.left_shift()`	Shift bits left
`np.right_shift()`	Shift bits right

np.save(file, array)

file: The name of the file (with or without the .npy extension)
array: The NumPy array you want to save

Example:

import numpy as np

x = np.array([[1, 3, 5], [8, 9, 10]])
np.save('file1.npy', x)

Loading an Array Using np.load()
The np.load() function is used to load a previously saved .npy file back into a NumPy array.
Syntax:

array = np.load(file)

Example:

import numpy as np

y = np.load('file1.npy')

print(y)

Output:

[[ 1 3 5]

[ 8 9 10]]

This successfully retrieves the array from the binary file and prints it.

Text I/O Using `.txt` Format

Text files are useful when you need:

Human-readable data
Compatibility with tools like Excel or MATLAB
Lightweight file handling for small datasets

Saving an Array Using `np.savetxt()`

The np.savetxt() function allows you to save a NumPy array to a text file in a readable format.

Syntax:

np.savetxt(fname, array, fmt='%.18e', delimiter=' ')

fname: The file name (e.g., 'file.txt')
array: The NumPy array to save
fmt: Format string (optional), e.g., '%.2f' for two decimal places
delimiter: Character that separates the values (default is space)

Example:

import numpy as np

x = np.array([[1, 3, 5], [8, 9, 10]])

np.savetxt('file1.txt', x, fmt='%d', delimiter=',')

This creates a file file1.txt containing:

1,3,5

8,9,10

The fmt='%d' ensures values are saved as integers, and delimiter=',' separates values with commas.

Loading an Array Using np.loadtxt()

The np.loadtxt() function is used to read a text file containing numerical data back into a NumPy array.

Syntax:

array = np.loadtxt(fname, delimiter=' ')

Example:

import numpy as np

x = np.loadtxt('file1.txt', delimiter=',')

print(x)

Output:

[[ 1. 3. 5.]

[ 8. 9. 10.]]

By default, loadtxt() loads values as floating-point numbers, even if they were saved as integers.

Linear Algebra
Linear Algebra is the branch of mathematics that deals with vectors, matrices, and linear equations.
It provides tools to express and solve problems involving relationships between variables, transformations, and decompositions.

Importance in Data Analysis
Foundation for Machine Learning: Many algorithms (e.g., Linear Regression, PCA, Neural Networks) rely on matrix operations.
Data Transformations: Useful for scaling, rotation, and projection of data.
Solving Systems of Equations: Helps in optimization and regression problems.
Decomposition Techniques: Such as eigenvalue decomposition and singular value decomposition (SVD) for dimensionality reduction.
Creating Vectors & Matrices

import numpy as np
v = np.array([2, 4, 6]) # Vector (1D array)
M = np.array([[1, 2], [3, 4]]) # Matrix (2D array)
print("Vector:", v)
print("Matrix:\n", M)

Output:
Vector: [2 4 6]
Matrix:
 [[1 2]
 [3 4]]

Matrix Multiplication

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
result = np.dot(A, B)   # or A @ B
print(result)

Expected Output:

[[19 22]
 [43 50]]

💡 In data analysis, multiplication often means combining features or applying transformations.

Transpose

print(A.T)

Output:
[[1 3]
 [2 4]]

💡 Useful for switching rows ↔ columns (e.g., from samples-as-rows to features-as-rows).

Determinant & Inverse

det = np.linalg.det(A)
inv = np.linalg.inv(A)

print("Determinant:", det)
print("Inverse:\n", inv)

Output:

Determinant: -2.0000000000000004
Inverse:
 [[-2.   1. ]
 [ 1.5 -0.5]]

💡 The determinant helps check if a matrix is invertible. Inverse is used in solving systems.

Solving Linear Systems

Example:
Solve  2x + y = 8  ,x − y = 2

coeff = np.array([[2, 1], [1, -1]])
const = np.array([8, 2])

solution = np.linalg.solve(coeff, const)
print("Solution:", solution)

Expected Output:

Solution: [3. 2.]

💡 This is common in regression and optimization problems.

Eigenvalues & Eigenvectors

vals, vecs = np.linalg.eig(A)
print("Eigenvalues:", vals)
print("Eigenvectors:\n", vecs)

Output:

Eigenvalues: [-0.37228132  5.37228132]
Eigenvectors:
 [[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]

💡 Used in PCA for feature extraction.

Applications in Data Analysis
Regression Analysis: Representing and solving linear models.
Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) use eigenvectors.
Image Processing: Images can be represented as matrices; transformations use matrix operations.
Signal Processing: Fourier transforms and filtering rely on matrix math.
Advantages in Data Analysis
1. Foundation for Data Science & AI
Essential for machine learning algorithms (e.g., regression, PCA, neural networks).
Supports vectorized operations, making computations faster.

2. Efficient Computation with Libraries
NumPy, SciPy, and similar libraries use optimized C/Fortran code for linear algebra, enabling large-scale calculations.

3. Compact Data Representation
Matrices store large amounts of data in an organized way.
Makes transformations (rotation, scaling, translation) mathematically precise.

4. Real-World Applications
Image processing, computer graphics, recommendation systems, natural language processing, and more.

5. Supports Parallel Processing
Many matrix operations can run on GPUs for massive performance boosts.

Disdvantages in Data Analysis
1. Memory Usage for Large Data
Very large matrices can consume huge amounts of memory (especially with dense data).

2. Numerical Stability Issues
Floating-point errors can affect accuracy in large or ill-conditioned systems.

3. Complexity in Non-Linear Problems
Linear algebra is not suited for problems where relationships are fundamentally non-linear (needs advanced techniques).

4. Dependency on Specialized Libraries
Without libraries like NumPy/SciPy, implementing efficient linear algebra manually can be slow.

5. Steep Learning Curve for Beginners
Concepts like eigenvalues, singular value decomposition (SVD), and tensor algebra can be challenging.

Pseudo Random Number Generation
Random means something that cannot be predicted logically. In NumPy, we
have a module called random which provides functions for generating random
numbers. These functions can be useful for generating random inputs for
testing algorithms. 

The numpy.random module includes functions that allow you to generate random:
Integers
Floating-point numbers
Arrays
Samples from various probability distributions
These tools are incredibly useful for creating random datasets, simulating noise, or testing algorithms with varied inputs.

Generating Random Integer and Float
    

Example:
import numpy as np

# Generate a random integer between 0 and 9
i = np.random.randint(10)
print("Random Integer:", i)

# Generate a random float between 0.0 and 1.0
f = np.random.random()
print("Random Float:", f)

Output:
Random Integer: 1
Random Float: 0.7199076683556623

Generating Random Arrays
    
You can also generate entire arrays filled with random numbers using NumPy's randint() and rand() functions.

Example:
import numpy as np

# Generate an array of 5 random integers between 0 and 99
i = np.random.randint(0, 100, size=5)
print("Random Integer Array:", i)

# Generate a 3x3 array of random floats between 0.0 and 1.0
f = np.random.rand(3, 3)
print("Random Float Array:\n", f)

Output:
Random Integer Array: [ 8 41 80 46 10]
Random Float Array:
[[0.2591196  0.86353412 0.8476862 ]
 [0.96882763 0.22314657 0.52989077]
 [0.46073914 0.28446913 0.15327149]]

Reproducibility with Random Seeds

By default, random number generators in NumPy produce different results every time you run the code. 
However, for debugging or testing, you might want reproducible results.
This is where np.random.seed() becomes useful.
Setting a seed ensures that the same sequence of random numbers is generated each time the code is executed.

Example:
import numpy as np

# Set a seed value
np.random.seed(42)

# Generate a random array
x = np.random.randint(0, 100, size=5)
print("Array of random integers with seed 42:", x)

Output:
Array of random integers with seed 42: [51 92 14 71 60]

The np.random.seed(42) call locks the random sequence.
so you will always get the same result when you rerun this block of code with the same seed.

Fundamentals of Python Programming & Numpy

Built-in Data Structures:

List

Key Characteristics of Lists

List Accessing:

1. Indexing

2. Slicing

1. Addition (+ Operator)

2. Multiplication (* Operator)

3. Updation of List Elements

Using remove() Method:

Built-in functions

1. len()

2. max() and min()

3. sum()

4. sorted()

5. list()

6. any() and all()

7. enumerate()

8. zip()

Tuple

Creating a Tuple in Python

Accessing The Tuple Elements:

Tuple Operations

Set

Creating a Set in Python

Accessing Set Elements

Set Operations

1. add()

Adds a single element.

2. update()

3. pop()

4. clear()

Dictionary

Accessing Elements from Dictionary

Operations in Dictionary

Functions

1. Built-in Functions

Examples:

2. User-defined Functions

Scope of variables

Files and Operating System

NumPy Basics: Arrays and Vectorized Computations

Creating Arrays

Broadcasting

The Numpy NDArray

Syntax: Creating an N-Dimensional Array

Universal Functions

File Input and Output with Arrays

Binary I/O Using .npy Format

Text I/O Using .txt Format

Saving an Array Using np.savetxt()

Linear Algebra

Applications in Data Analysis

Advantages in Data Analysis

Disdvantages in Data Analysis

Pseudo Random Number Generation

Comments

Post a Comment

Popular posts from this blog

Introduction to Data Analysis & Python

Introduction to Pandas and Data Loading

1. Addition (`+` Operator)

2. **Multiplication (`*` Operator)**

Using `remove()` Method:

`Built-in functions`

1. `len()`

2. `max()` and `min()`

3. `sum()`

4. `sorted()`

5. `list()`

6. `any()` and `all()`

7. `enumerate()`

8. `zip()`

1. `add()`

2. `update()`

3. `pop()`

4. `clear()`

Binary I/O Using `.npy` Format

Text I/O Using `.txt` Format

Saving an Array Using `np.savetxt()`