Fundamentals of Python Programming & Numpy
Built-in Data Structures:
List
A list in Python is a mutable, ordered collection used to store multiple items in a single variable. It is one of the most commonly used data structures in Python.
Key Characteristics of Lists
-
Ordered
-
Items are stored in the order they are added.
-
The order is preserved unless explicitly modified.
-
-
Mutable
-
Lists can be changed after creation.
-
You can:
-
Add (
append(),insert(),extend()) -
Remove (
remove(),pop(),del) -
Modify items (
list[index] = new_value)
-
-
-
Heterogeneous
-
A list can store elements of different data types:
-
-
Indexed
-
Elements are accessed by position (index), starting at 0:
-
-
Dynamic Size
-
Lists can grow or shrink in size dynamically as elements are added or removed.
List Accessing:
1. Indexing
We can use the index operator [] to access an item in a list. List indices start at 0.
-
You can use positive or negative indices.
| Element | Index | Negative Index |
|---|---|---|
| 'apple' | 0 | -4 |
| 'banana' | 1 | -3 |
| 'cherry' | 2 | -2 |
| 'date' | 3 | -1 |
2. Slicing
It is used to access a range of elements. The values stored in a list can be accessed using the slice operator [] and [:] with indexes starting at 0 in the beginning of the list and working their way to end -1.
Tuple Operations
| Function | Description | Example Usage |
|---|---|---|
len(tuple) |
Returns the number of items in the tuple. | len((1, 2, 3)) → 3 |
max(tuple) |
Returns the largest item. Works with numbers or strings. | max((3, 5, 1)) → 5 |
min(tuple) |
Returns the smallest item. | min((3, 5, 1)) → 1 |
tuple(sequence) |
Converts a list, string, or other iterable into a tuple. | tuple([1, 2]) → (1, 2) |
sum(tuple) |
Returns the total sum of elements (if all are numeric). | sum((1, 2, 3)) → 6 |
sorted(tuple) |
Returns a sorted list from the tuple's items (does not modify tuple). | sorted((3, 1, 2)) → [1, 2, 3] |
any(tuple) |
Returns True if at least one element is True. |
any((0, False, 5)) → True |
all(tuple) |
Returns True if all elements are True. |
all((1, 2, 3)) → True |
Set
- In Python, a set is a collection of unordered and unindexed elements.
- Sets can store elements of different data types (integers, strings, floats, etc.).
- Sets are similar to lists and tuples, but they do not maintain order and do not allow duplicates.
- The elements of a set are immutable (cannot be changed), but the set itself is mutable (you can add or remove elements).
- Sets are defined using curly braces
{}and elements are separated by commas. - Python implements the set data type using the built-in
setclass. - Sets do not allow duplicate values. Any repeated item is automatically removed.
- Sets are unordered, so items appear in random order and cannot be accessed by index.
- Sets are useful when you need to store unique items and perform mathematical set operations (like union, intersection, difference).
Creating a Set in Python
-
A set is created using curly braces
{}or theset()constructor. Elements inside a set must be separated by commas.
-
Sets automatically remove duplicate values.
-
You can store elements of different data types in a single set.
set() ConstructorAccessing Set Elements
Set Operations
1. add()
Adds a single element.
2. update()
Adds multiple elements (from any iterable like list, tuple, or another set).
my_set.update([3, 4]) # {1, 2, 3, 4}
my_set.update((5, 6)) # {1, 2, 3, 4, 5, 6}
my_set.update({7, 8}) # {1, 2, 3, 4, 5, 6, 7, 8}
remove()Removes a specific element. Raises an error if the element doesn’t exist.
my_set.remove(2) # {1, 3}
my_set.remove(10) # KeyError
discard()Removes a specific element. Does nothing if the element doesn’t exist.
my_set.discard(3) # {1, 2}
my_set.discard(10) # No error
3. pop()
Removes and returns a random element (since sets are unordered).
removed = my_set.pop()
print(removed) # Could be 1, 2, or 3
4. clear()
Removes all elements from the set.
my_set.clear() # my_set becomes set()
| Operation | Symbol | Python Method | Meaning | Example Result |
|---|---|---|---|---|
| Union | ∪ | union() |
All elements in A or B | {1, 2, 3, 4, 5, 6} |
| Intersection | ∩ | intersection() |
Elements in both A and B | {3, 4} |
| Difference (A - B) | − | difference() |
Elements in A but not in B | {1, 2} |
| Symmetric Difference | △ | symmetric_difference() |
Elements in A or B but not both | {1, 2, 5, 6} |
Dictionary
- In Python, a dictionary is a collection of elements where each element is a pair of key and value.
- In Python, the dictionary data type (data structure) has implemented with a class known as dict.
- All the elements of a dictionary must be enclosed in curly braces, each element must be separated with a comma symbol, and every pair of key and value must be separated with colon ( : ) symbol.
- Creating a dictionary is as simple as placing items inside curly braces {} separated by commas.
- An item has a key and a corresponding value that is expressed as a pair (key: value).
Accessing Elements from Dictionary
- Using Key as index - The elements of a dictionary can be accessed using the key as an index.
- get( key ) - This method returns the value associated with the given key in the dictionary.
- Accessing the whole dictionary - In Python, we use the name of the dictionary to access the whole dictionary.
- items( ) - This is a built-in method used to access all the elements of a dictionary in the form of a list of key-value pair.
- keys( ) - This is a built-in method used to access all the keys in a dictionary in the form of a list.
- values( ) - This is a built-in method used to access all the values in a dictionary in the form of a list.
- <class 'dict'>
- 1
- Nani
- {'rollNo': 1, 'name': 'Nani', 'department': 'BSC', 'year': 2}
- dict_items([('rollNo', 1), ('name', 'Nani'), ('department', 'BSC'), ('year', 2)])
- dict_keys(['rollNo', 'name', 'department', 'year'])
- dict_values([1, 'Nani', 'BSC', 2])
Operations in Dictionary
Dictionaries in Python allow us to perform various operations such as adding, updating, deleting, and more.- Adding elements - To add a new key-value pair, simply assign a value to a new key.
- Updating elements - If the key already exists, assigning a new value will update it.
- pop( key ) - This method removes the element with a specified key from the dictionary.
- popitem( ) - This method removes the last element from the dictionary.
- clear( ) - This method removes all the elements from the dictionary. That means the clear( ) method make the dictionary empty. This method returns the None value.
- del keyword with dict[key] - This keyword deletes the element with the specified key from the dictionary. Once the del keyword has used on a dictionary, we can not access it in the rest of the code.
- del keyword - This keyword deletes the dictionary completely. Once the del keyword has used on a dictionary, we can not access it in the rest of the code.
Functions
- A function is a block of statements used to perform a specific task.
- Functions allow us to divide a larger problem into smaller subparts to solve efficiently, to implement the code reusability concept and makes the code easier to read and understand.
1. Built-in Functions
These are the functions that come pre-defined with Python. They help perform common tasks like printing, taking input, working with numbers or sequences, etc.
Examples:
-
print()– Prints output to the console. -
input()– Takes user input as a string. -
len()– Returns the length of a sequence (like a list or string). -
type()– Returns the data type of a variable. -
range()– Returns a sequence of numbers. -
int(),float(),str()– Type conversion functions.
| Function | Description | Library |
|---|---|---|
open() |
Built-in function to open files | built-in |
pd.read_csv() |
Loads data from a CSV file | pandas |
pd.read_excel() |
Loads data from Excel files | pandas |
pd.read_json() |
Loads JSON data | pandas |
np.loadtxt() |
Loads text data | NumPy |
np.genfromtxt() |
Handles missing values in text files | NumPy |
| Function | Description | Library |
|---|---|---|
df.isnull() |
Checks for missing values | pandas |
df.dropna() |
Removes missing values | pandas |
df.fillna() |
Fills missing values | pandas |
df.duplicated() |
Checks for duplicates | pandas |
df.drop_duplicates() |
Removes duplicate rows | pandas |
str.strip(), str.lower() |
String cleaning | built-in, pandas |
df.astype() |
Converts data types | pandas |
| Function | Description | Library |
|---|---|---|
df.head() |
Shows top rows | pandas |
df.tail() |
Shows bottom rows | pandas |
df.info() |
Summary of data types | pandas |
df.describe() |
Summary stats | pandas |
df.value_counts() |
Frequency of values | pandas |
df.columns, df.shape |
Structure of data | pandas |
df.corr() |
Correlation matrix | pandas |
| Function | Description | Library |
|---|---|---|
df.sort_values() |
Sort rows by column | pandas |
df.groupby() |
Group data by column(s) | pandas |
df.merge() |
Merge two DataFrames | pandas |
df.join() |
Join based on index | pandas |
df.pivot_table() |
Pivoting data | pandas |
df.apply() |
Apply custom function | pandas |
df.replace() |
Replace values | pandas |
| Function | Description | Library |
|---|---|---|
plt.plot() |
Line plot | matplotlib |
plt.bar() |
Bar chart | matplotlib |
plt.hist() |
Histogram | matplotlib |
plt.scatter() |
Scatter plot | matplotlib |
sns.heatmap() |
Correlation heatmap | seaborn |
sns.boxplot() |
Box plot | seaborn |
sns.pairplot() |
Pairwise relationships | seaborn |
| Function | Description | Library |
|---|---|---|
mean(), median(), mode() |
Central tendencies | statistics, NumPy |
std(), var() |
Spread metrics | pandas, NumPy |
ttest_ind() |
T-test | scipy.stats |
pearsonr() |
Pearson correlation | scipy.stats |
linregress() |
Linear regression | scipy.stats |
ols() |
Ordinary least squares | statsmodels |
2. User-defined Functions
These are functions that you create to perform a specific task or set of tasks. They make your code modular, reusable, and easier to manage.
Syntax:
# function body
return result
*args → Accepts multiple positional arguments.**kwargs → Accepts multiple keyword arguments.Scope of variables
global keyword.Files and Operating System
In data analysis, interacting with files and the operating system is a common and essential task. Analysts and data scientists frequently:
-
Read and write data files such as CSV, Excel, and JSON.
-
Manage folders and file paths across different environments.
-
Load and save datasets during different stages of analysis.
-
Interact with the operating system to automate workflows and organize data.
Python provides built-in modules that greatly simplify these tasks. The most commonly used ones include:
-
os: Interacts with the operating system (e.g., working with paths, environment variables). -
shutil: Performs high-level file operations such as copying and moving files. -
open(): A built-in function for reading from and writing to files.
| Feature | Explanation |
|---|---|
| Platform Independence | Python code works seamlessly across Windows, macOS, and Linux without modification. |
| File & Directory Manipulation | Python can easily create, delete, rename, and move files and directories. |
| Access Control | You can manage file permissions and check whether files exist or are accessible. |
| Cross-Platform Compatibility | File handling functions behave consistently across operating systems. |
| File I/O Operations | Python provides simple functions for reading and writing text, binary, and structured files. |
| Error Handling | Exceptions like FileNotFoundError or PermissionError help manage common file-related issues. |
| File Compression | Python’s zipfile and tarfile modules support compressing and extracting files. |
| Interoperability | Python integrates well with other tools and libraries used in data science, such as Pandas and NumPy. |
| Concern | Explanation |
|---|---|
| Platform-Specific Features | Some OS-level features (like symbolic links) may behave differently or be unsupported on certain platforms. |
| Security Concerns | Improper handling of file input/output can expose sensitive data or cause unintentional data loss. |
| Learning Curve | File paths, permission errors, and working with different file formats can be challenging for beginners. |
| File System Errors | Errors such as missing files or incorrect paths are common and need to be handled properly. |
| Limited Low-Level Control | Python abstracts away many low-level details, which may limit control in certain complex scenarios. |
| Performance Impact | Reading and processing very large files can be memory-intensive and slow without optimization. |
-
Use: Load datasets from files, clean data, and perform operations.
-
Example: Use
pandas.read_csv()to read a CSV file and filter rows by conditions.
-
Use: Store and load configuration settings from files like
.ini,.json, or.yaml. -
Example: Load API keys or data paths from a JSON config file.
-
Use: Maintain logs of data processing steps or catch and record errors.
-
Example: Use the
loggingmodule to write error messages to a file during batch processing.
-
Use: Compress datasets into
.zipor.tarformats for sharing or storage. -
Example: Use
zipfile.ZipFileto archive old data logs.
-
Use: Store processed data in databases such as SQLite or MySQL.
-
Example: Use
sqlite3to save cleaned data from a CSV into a SQLite database.
-
Use: Handle file uploads (e.g., CSV files) and downloads in web-based data dashboards.
-
Example: Use Flask to create an interface where users can upload data for analysis.
-
Use: Download data from the internet or save data to a remote server.
-
Example: Use
requestsorftplibto download a CSV from a public URL.
-
Use: Automatically generate or modify test data files during software testing.
-
Example: Use
osandopen()to create mock input files for testing data pipelines.
NumPy Basics: Arrays and Vectorized Computations
NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It provides:
-
Efficient array data structures.
-
High-performance functions for mathematical and statistical operations.
-
The ability to perform vectorized computations, which are faster and more concise than using loops.
Creating Arrays
np.array() function. These arrays are more powerful than regular Python lists.| Function | Description |
|---|---|
np.zeros(shape) |
Creates an array filled with zeros. |
np.ones(shape) |
Creates an array filled with ones. |
np.arange(start, stop, step) |
Creates a range of evenly spaced values. |
np.linspace(start, stop, num) |
Generates evenly spaced values over a specified interval. |
np.eye(n) |
Creates an identity matrix of size n. |
np.random.rand() |
Generates random values. |
Broadcasting
int32, float64) which are more memory-efficient than Python lists.| Category | Examples |
|---|---|
| Linear Algebra | np.dot(), np.linalg.inv() |
| Statistics | np.mean(), np.std() |
| Fourier Transform | np.fft.fft() |
NumPy arrays are universally accepted across the Python data ecosystem:
-
Pandas (DataFrames are built on NumPy arrays)
-
Matplotlib (for plotting arrays)
-
TensorFlow / PyTorch (for deep learning tensors)
-
OpenCV (image processing with arrays)
if conditions.| Area | Example Use |
|---|---|
| Numerical Computing | Solving mathematical equations, running simulations, performing linear algebra and numerical integrations. |
| Data Analysis (with Pandas) | Cleaning, transforming, and aggregating large datasets; NumPy arrays are the core data structure behind Pandas. |
| Machine Learning | Serving as input/output data for models, performing matrix operations (dot product, mean, variance), and preparing datasets. |
| Signal/Image Processing | Applying filters, edge detection, Fourier transforms, and convolutions — especially in combination with OpenCV or SciPy. |
| Finance | Modeling risk, performing statistical analyses, analyzing time series data, and pricing financial instruments. |
| Physics & Engineering | Simulating physical systems, working with tensors and matrices, solving differential equations, and modeling dynamic systems. |
| Bioinformatics | Processing gene expression data, analyzing DNA sequences, and managing biological datasets efficiently. |
| Geospatial Analysis | Handling raster data, working with map grids, terrain modeling, and integrating with GIS tools like GDAL or Rasterio. |
| Quantum Computing | Simulating quantum circuits, quantum states, and performing complex number computations in quantum algorithms. |
The Numpy NDArray
ndarray) is a generalized data structure in NumPy that can represent:-
1D arrays (vectors)
-
2D arrays (matrices)
-
3D arrays (cubes of data)
-
N-dimensional arrays (tensors)
ndarray object, which stores homogeneous data in multidimensional format.Syntax: Creating an N-Dimensional Array
Text I/O Using .txt Format
- Human-readable data
- Compatibility with tools like Excel or MATLAB
- Lightweight file handling for small datasets
Saving an Array Using np.savetxt()
np.savetxt() function allows you to save a NumPy array to a text file in a readable format. fname: The file name (e.g.,'file.txt')array: The NumPy array to savefmt: Format string (optional), e.g.,'%.2f'for two decimal placesdelimiter: Character that separates the values (default is space)
file1.txt containing:fmt='%d' ensures values are saved as integers, and delimiter=',' separates values with commas.np.loadtxt()np.loadtxt() function is used to read a text file containing numerical data back into a NumPy array.array = np.loadtxt(fname, delimiter=' ')
loadtxt() loads values as floating-point numbers, even if they were saved as integers.
Comments
Post a Comment