Title here
Summary here
A Pandas Series is basically a glorified one-dimensional array, except it comes with labels and makes you feel smarter for using it.
import pandas as pd
s = pd.Series([42, 23, 16, 15, 8, 4])
print(s)data = {"Apples": 3, "Bananas": 5, "Cherries": 7}
s = pd.Series(data)
print(s)import numpy as np
arr = np.array([1, 2, 3, 4, 5])
s = pd.Series(arr)
print(s)A DataFrame is like an overachieving spreadsheet on steroids. Here’s how you create one:
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}
df = pd.DataFrame(data)
print(df)data = [["Alice", 25], ["Bob", 30], ["Charlie", 35]]
df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)arr = np.array([[1, 2], [3, 4], [5, 6]])
df = pd.DataFrame(arr, columns=["Column1", "Column2"])
print(df)Let’s be real: no one enjoys manually typing in data. Here’s how to import it like a sane person:
df = pd.read_csv("data.csv")df = pd.read_excel("data.xlsx")df = pd.read_json("data.json")import sqlite3
conn = sqlite3.connect("database.db")
df = pd.read_sql("SELECT * FROM table_name", conn)Your dataset is a mysterious black box until you poke at it. Here’s how:
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rowsprint(df.info())print(df.describe())print(df.isnull().sum())print(df["column_name"].unique())
print(df["column_name"].value_counts())Time to chop up your dataset like a chef:
print(df["column_name"])print(df.loc[0]) # First rowprint(df.iloc[0]) # First rowfiltered_df = df[df["Age"] > 30]
print(filtered_df)df.set_index("column_name", inplace=True)
df.reset_index(inplace=True).head(), .info(), and .describe()..loc[] and .iloc[] to extract specific rows and columns.