Pandas: Data Transformation and Manipulation

Applying Functions

Because sometimes, you need to force your data into submission with custom functions.

  • Using .apply() to Apply Custom Functions
import pandas as pd
df = pd.DataFrame({"Numbers": [1, 2, 3, 4]})
def square(x):
    return x ** 2
df["Squared"] = df["Numbers"].apply(square)
print(df)

.apply() lets you apply a function to every element in a column, making data transformations easy and efficient.

  • Using .map() for Element-Wise Transformations in a Series
df["Numbers"] = df["Numbers"].map(lambda x: x * 10)
print(df)

.map() works similarly but is meant for Series objects, ideal for quick, element-wise operations.

  • Implementing Lambda Functions for Quick Transformations
df["Doubled"] = df["Numbers"].apply(lambda x: x * 2)

Lambda functions allow for quick, anonymous functions without cluttering your code.

Grouping and Aggregating Data

Why analyze each row when you can summarize entire groups?

  • Using .groupby() for Data Segmentation
df = pd.DataFrame({"Category": ["A", "B", "A", "B"], "Values": [10, 20, 30, 40]})
grouped = df.groupby("Category").sum()
print(grouped)

.groupby() creates groups based on a column and allows for operations on each group.

  • Applying Aggregation Functions (.sum(), .mean(), .count())
print(df.groupby("Category")["Values"].mean())

Aggregation functions provide quick insights without looping through the data manually.

  • Custom Aggregation with .apply()
def custom_agg(series):
    return series.max() - series.min()
print(df.groupby("Category")["Values"].apply(custom_agg))

Custom aggregations let you define your own summary statistics.

Merging, Concatenating, and Joining DataFrames

Because real-world data never comes in one clean table.

  • Merging DataFrames with .merge() (Inner, Outer, Left, Right Joins)
df1 = pd.DataFrame({"ID": [1, 2, 3], "Value": ["A", "B", "C"]})
df2 = pd.DataFrame({"ID": [2, 3, 4], "Score": [90, 80, 70]})
merged = df1.merge(df2, on="ID", how="inner")
print(merged)

Merging is like SQL joins—inner, outer, left, and right.

  • Concatenating DataFrames using .concat()
df3 = pd.DataFrame({"ID": [4, 5], "Value": ["D", "E"]})
concat_df = pd.concat([df1, df3])
print(concat_df)

Concatenation stacks DataFrames on top of each other, row-wise or column-wise.

  • Joining DataFrames using .join()
df1.set_index("ID", inplace=True)
df2.set_index("ID", inplace=True)
joined_df = df1.join(df2, how="outer")
print(joined_df)

.join() is similar to .merge() but works with index-based joins.

Pivot Tables and Reshaping Data

Because sometimes your boss wants the data presented in an entirely different format for no reason.

  • Creating Pivot Tables using .pivot_table()
pivot = df.pivot_table(values="Values", index="Category", aggfunc="sum")
print(pivot)

Pivot tables allow for multi-level summarization.

  • Reshaping DataFrames with .melt()
melted = df.melt(id_vars=["Category"], value_vars=["Values"])
print(melted)

.melt() converts wide-form data into a long format.

  • Stacking and Unstacking Data using .stack() and .unstack()
stacked = pivot.stack()
unstacked = stacked.unstack()
print(stacked, unstacked)

These functions reshape multi-level indexed DataFrames.

Hands-On Exercise

  1. Apply Functions to Data: Use .apply() and .map() to transform dataset values.
  2. Group and Aggregate Data: Perform grouping and aggregations on a sample dataset.
  3. Merge and Concatenate DataFrames: Practice merging and concatenating datasets.
  4. Create Pivot Tables and Reshape Data: Generate pivot tables and reshape data using .melt() and .stack().

References