Python
Python is a high-level, interpreted programming language known for its simplicity, readability, and versatility. It is widely used for web development, data science, automation, artificial intelligence, and more.
Key Features & Use Cases
Key Features
- Easy to Read & Write – Uses a clean and readable syntax with indentation.
- Interpreted – Executes code line by line, making debugging easier.
- Dynamically Typed – No need to define variable types explicitly.
- Large Standard Library – Includes built-in modules for file handling, networking, math, and more.
- Cross-Platform – Runs on multiple operating systems (Windows, Mac, Linux).
- Object-Oriented & Functional – Supports multiple programming paradigms.
Use Cases
- Data cleaning and transformation (removing duplicates, handling missing data)
- Exploratory data analysis (summarizing datasets, visualizing distributions)
- Creating data visualizations (bar charts, line graphs, heatmaps)
- Statistical analysis (regression analysis, hypothesis testing)
- Building and evaluating machine learning models (classification, clustering)
Common Libraries for Analytics
pandas
- Offers data structures and operations for manipulating numerical tables and time series
- Provides high-level data manipulation tools like DataFrame objects for easy cleaning, merging, and reshaping
- Integrates well with libraries like NumPy for numerical operations
Terminal: pip install pandas
Example Import & Use Case:
import pandas as pd
# Example usage
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)
polars
- A high-performance DataFrame library designed for efficient data manipulation
- Can handle large datasets quickly by leveraging Apache Arrow under the hood
- Offers a syntax similar to pandas but focuses on speed and parallelization
Terminal: pip install polars
Example Import & Use Case:
import polars as pl
# Example usage
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
}
df = pl.DataFrame(data)
print(df)
numpy
- Provides support for large, multi-dimensional arrays and matrices
- Includes a vast library of high-level mathematical functions to operate on these arrays
- Foundation for most scientific computing libraries in Python
Terminal: pip install numpy
Example Import & Use Case:
import numpy as np
# Example usage
arr = np.array([1, 2, 3, 4, 5])
print(arr.mean()) # Calculate the mean
matplotlib & seaborn
- Core Python libraries for creating static, interactive, and animated visualizations
- Matplotlib offers a low-level approach to plotting, while Seaborn provides high-level wrappers for common statistical graphics
Terminal: pip install matplotlib seaborn
Example Import & Use Case:
import matplotlib.pyplot as plt
import seaborn as sns
# Example usage
data = [10, 15, 8, 12, 20]
plt.plot(data)
plt.title("Simple Line Chart")
plt.show()
# Seaborn example
sns.set_theme()
tips = sns.load_dataset("tips")
sns.barplot(x="day", y="total_bill", data=tips)
plt.show()
scikit-learn
- Offers simple and efficient tools for predictive data analysis and machine learning
- Covers classification, regression, clustering, model selection, and preprocessing
Terminal: pip install scikit-learn
Example Import & Use Case:
from sklearn.linear_model import LinearRegression
import numpy as np
# Example usage
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])
model = LinearRegression()
model.fit(X, y)
prediction = model.predict(np.array([[6]]))
print(prediction)
statsmodels
- Provides classes and functions for statistical modeling and hypothesis testing
- Offers advanced techniques like time series analysis, ARIMA, and generalized linear models
Terminal: pip install statsmodels
Example Import & Use Case:
import statsmodels.api as sm
# Example usage
data = sm.datasets.get_rdataset("Guerry", "HistData").data
X = data[["Literacy", "Pop1831"]]
y = data["Donations"]
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())
PySpark
- Offers a Python API for Apache Spark, enabling large-scale data processing in a distributed environment
- Ideal for handling massive datasets that exceed traditional memory constraints
Terminal: pip install pyspark
Example Import & Use Case:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("ExampleApp").getOrCreate()
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])
df.show()
TensorFlow
- An end-to-end open-source platform for machine learning developed by Google
- Provides tools for building, training, and deploying deep learning models
Terminal: pip install tensorflow
Example Import & Use Case:
import tensorflow as tf
# Example usage
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')
PyTorch
- A deep learning framework developed by Facebook’s AI Research lab
- Known for its dynamic computation graph, making it intuitive for rapid experimentation
Terminal: pip install torch
Example Import & Use Case:
import torch
# Example usage
x = torch.randn(5, 3)
y = torch.randn(5, 3)
print(x + y)
XGBoost
- A popular gradient boosting library designed for efficiency, flexibility, and portability
- Often used in machine learning competitions for its high performance on structured data
Terminal: pip install xgboost
Example Import & Use Case:
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = xgb.XGBClassifier()
model.fit(X_train, y_train)
print(model.score(X_test, y_test))
Plotly
- A library for creating interactive, publication-quality graphs
- Allows for dynamic data exploration, suitable for web-based dashboards and presentations
Terminal: pip install plotly
Example Import & Use Case:
import plotly.express as px
# Example usage
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp",
color="continent", size="pop", hover_name="country")
fig.show()
Keras
- A high-level neural networks API, now integrated within TensorFlow, for building and training deep learning models
- Provides abstractions for layers, optimizers, and training loops, making it simpler to prototype and deploy models
Terminal: pip install tensorflow
Example Import & Use Case:
import tensorflow as tf
from tensorflow import keras
import numpy as np
# Generate synthetic data for demonstration
X = np.random.random((1000, 20))
y = np.random.randint(2, size=(1000, 1))
# Define a simple feedforward model
model = keras.Sequential([
keras.layers.Dense(16, activation='relu', input_shape=(20,)),
keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model with an optimizer, loss function, and metric
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model on the synthetic dataset
model.fit(X, y, epochs=5, batch_size=32)