Pandas: Data Transformation and Manipulation
Applying Functions
Because sometimes, you need to force your data into submission with custom functions.
- Using
.apply()to Apply Custom Functions
import pandas as pd
df = pd.DataFrame({"Numbers": [1, 2, 3, 4]})
def square(x):
return x ** 2
df["Squared"] = df["Numbers"].apply(square)
print(df).apply() lets you apply a function to every element in a column, making data transformations easy and efficient.
- Using
.map()for Element-Wise Transformations in a Series
df["Numbers"] = df["Numbers"].map(lambda x: x * 10)
print(df).map() works similarly but is meant for Series objects, ideal for quick, element-wise operations.
- Implementing Lambda Functions for Quick Transformations
df["Doubled"] = df["Numbers"].apply(lambda x: x * 2)Lambda functions allow for quick, anonymous functions without cluttering your code.
Grouping and Aggregating Data
Why analyze each row when you can summarize entire groups?
- Using
.groupby()for Data Segmentation
df = pd.DataFrame({"Category": ["A", "B", "A", "B"], "Values": [10, 20, 30, 40]})
grouped = df.groupby("Category").sum()
print(grouped).groupby() creates groups based on a column and allows for operations on each group.
- Applying Aggregation Functions (
.sum(),.mean(),.count())
print(df.groupby("Category")["Values"].mean())Aggregation functions provide quick insights without looping through the data manually.
- Custom Aggregation with
.apply()
def custom_agg(series):
return series.max() - series.min()
print(df.groupby("Category")["Values"].apply(custom_agg))Custom aggregations let you define your own summary statistics.
Merging, Concatenating, and Joining DataFrames
Because real-world data never comes in one clean table.
- Merging DataFrames with
.merge()(Inner, Outer, Left, Right Joins)
df1 = pd.DataFrame({"ID": [1, 2, 3], "Value": ["A", "B", "C"]})
df2 = pd.DataFrame({"ID": [2, 3, 4], "Score": [90, 80, 70]})
merged = df1.merge(df2, on="ID", how="inner")
print(merged)Merging is like SQL joins—inner, outer, left, and right.
- Concatenating DataFrames using
.concat()
df3 = pd.DataFrame({"ID": [4, 5], "Value": ["D", "E"]})
concat_df = pd.concat([df1, df3])
print(concat_df)Concatenation stacks DataFrames on top of each other, row-wise or column-wise.
- Joining DataFrames using
.join()
df1.set_index("ID", inplace=True)
df2.set_index("ID", inplace=True)
joined_df = df1.join(df2, how="outer")
print(joined_df).join() is similar to .merge() but works with index-based joins.
Pivot Tables and Reshaping Data
Because sometimes your boss wants the data presented in an entirely different format for no reason.
- Creating Pivot Tables using
.pivot_table()
pivot = df.pivot_table(values="Values", index="Category", aggfunc="sum")
print(pivot)Pivot tables allow for multi-level summarization.
- Reshaping DataFrames with
.melt()
melted = df.melt(id_vars=["Category"], value_vars=["Values"])
print(melted).melt() converts wide-form data into a long format.
- Stacking and Unstacking Data using
.stack()and.unstack()
stacked = pivot.stack()
unstacked = stacked.unstack()
print(stacked, unstacked)These functions reshape multi-level indexed DataFrames.
Hands-On Exercise
- Apply Functions to Data: Use
.apply()and.map()to transform dataset values. - Group and Aggregate Data: Perform grouping and aggregations on a sample dataset.
- Merge and Concatenate DataFrames: Practice merging and concatenating datasets.
- Create Pivot Tables and Reshape Data: Generate pivot tables and reshape data using
.melt()and.stack().