Pandas: Working with Data Structures

Creating Series and DataFrames

Creating a Pandas Series

A Pandas Series is basically a glorified one-dimensional array, except it comes with labels and makes you feel smarter for using it.

  • From a List:
import pandas as pd
s = pd.Series([42, 23, 16, 15, 8, 4])
print(s)
  • From a Dictionary:
data = {"Apples": 3, "Bananas": 5, "Cherries": 7}
s = pd.Series(data)
print(s)
  • From a NumPy Array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
s = pd.Series(arr)
print(s)

Creating a Pandas DataFrame

A DataFrame is like an overachieving spreadsheet on steroids. Here’s how you create one:

  • From a Dictionary:
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}
df = pd.DataFrame(data)
print(df)
  • From a List of Lists:
data = [["Alice", 25], ["Bob", 30], ["Charlie", 35]]
df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)
  • From a NumPy Array:
arr = np.array([[1, 2], [3, 4], [5, 6]])
df = pd.DataFrame(arr, columns=["Column1", "Column2"])
print(df)

Importing Data from Various Sources

Let’s be real: no one enjoys manually typing in data. Here’s how to import it like a sane person:

  • Reading CSV Files:
df = pd.read_csv("data.csv")
  • Reading Excel Files:
df = pd.read_excel("data.xlsx")
  • Reading JSON Files:
df = pd.read_json("data.json")
  • Reading from a Database:
import sqlite3
conn = sqlite3.connect("database.db")
df = pd.read_sql("SELECT * FROM table_name", conn)

Viewing, Summarizing, and Exploring Datasets

Your dataset is a mysterious black box until you poke at it. Here’s how:

  • First and Last Few Rows:
print(df.head())  # First 5 rows
print(df.tail())  # Last 5 rows
  • Checking Dataset Structure:
print(df.info())
  • Summarizing Numerical Data:
print(df.describe())
  • Checking for Missing Values:
print(df.isnull().sum())
  • Getting Unique Values and Value Counts:
print(df["column_name"].unique())
print(df["column_name"].value_counts())

Selecting, Indexing, and Slicing Data

Time to chop up your dataset like a chef:

  • Selecting Columns:
print(df["column_name"])
  • Selecting Rows by Label:
print(df.loc[0])  # First row
  • Selecting Rows by Index Position:
print(df.iloc[0])  # First row
  • Filtering Rows with Conditions:
filtered_df = df[df["Age"] > 30]
print(filtered_df)
  • Resetting and Setting Index:
df.set_index("column_name", inplace=True)
df.reset_index(inplace=True)

Hands-On Exercise

  1. Create Series and DataFrames: Generate a Pandas Series and DataFrame using lists, dictionaries, and NumPy arrays.
  2. Load External Data: Import a dataset from a CSV, Excel, and JSON file.
  3. Explore a Dataset: Load a sample dataset and apply .head(), .info(), and .describe().
  4. Select and Filter Data: Use .loc[] and .iloc[] to extract specific rows and columns.

References