MySQL for Python Developers — Database Integration, Pandas & Data Engineering
If you work with Python and data, MySQL will cross your path sooner or later. Whether you’re building a web app, running analytics pipelines, or just learning backend development — knowing how to connect Python to MySQL cleanly and efficiently is a skill that pays off in almost every data-related job. The problem is that most tutorials online show you five lines of code, call it a day, and leave you confused when things break in a real project. This guide is different. We’ll go from setting up your first connection all the way to integrating MySQL with Pandas, building data pipelines, and applying real data engineering patterns — with working code at every step. Why MySQL and Python work so well together MySQL is one of the world’s most widely used relational databases — open source, fast, and supported by virtually every hosting platform on the planet. Python, on the other hand, has become the go-to language for data work, scripting, and backend development. The two naturally complement each other. In a typical workflow, Python handles the logic — fetching data, transforming it, making decisions — while MySQL handles persistence and structured storage. This pattern shows up everywhere: Django and Flask apps using MySQL as the backend, ETL pipelines pulling production data into analytics databases, data science notebooks querying live databases to analyze customer behavior. Understanding how to connect these two properly — with connection pooling, error handling, and clean query patterns — is what separates a developer who can get things working from one who builds things that actually hold up in production. Choosing your connector: mysql-connector-python vs PyMySQL vs SQLAlchemy Before you write a single line of code, you need to decide which library you’re using to communicate between Python and MySQL. There are three main options, and they’re not interchangeable: Option 1mysql-connector-pythonOfficial Oracle library. Pure Python, no extra dependencies. Best for straightforward use cases and learners starting out. Option 2PyMySQLLightweight alternative, also pure Python. Compatible with most MySQL versions and often preferred in legacy codebases and serverless environments. Option 3SQLAlchemyThe most powerful option. An ORM that abstracts database interactions. Preferred for larger applications, Pandas integration, and data engineering pipelines Setting up your Python MySQL connection Step 1 — Install the library # Install mysql-connector-python pip install mysql-connector-python # Or if you prefer PyMySQL pip install pymysql # For SQLAlchemy + MySQL (recommended for data engineering) pip install sqlalchemy pymysql Step 2 — Create your first connection import mysql.connector connection = mysql.connector.connect( host=”localhost”, user=”your_username”, password=”your_password”, database=”your_database” ) if connection.is_connected(): print(“Connected to MySQL successfully”) print(f”Server version: {connection.get_server_info()}”) connection.close() Always close your connection when done. Leaving connections open is one of the most common causes of “Too many connections” errors in MySQL, especially in scripts that run repeatedly. Step 3 — Use environment variables for credentials (never hardcode) Hardcoding database credentials is a serious security mistake — and one that beginners make all the time. If your code ever ends up on GitHub, those credentials are exposed. The right way to handle this from day one: import os import mysql.connector from dotenv import load_dotenv load_dotenv() # loads .env file into environment variables connection = mysql.connector.connect( host=os.getenv(“DB_HOST”), user=os.getenv(“DB_USER”), password=os.getenv(“DB_PASSWORD”), database=os.getenv(“DB_NAME”) ) Create a .env file in your project root and add it to .gitignore. This is standard practice and takes two minutes to set up. CRUD operations in Python — Create, Read, Update, Delete Once your connection is working, the most common thing you’ll do is perform CRUD operations. Here’s how each one works in Python with MySQL: Create — inserting data cursor = connection.cursor() insert_query = “”” INSERT INTO employees (name, department, salary) VALUES (%s, %s, %s) “”” data = (“Ayesha Khan”, “Analytics”, 85000) cursor.execute(insert_query, data) connection.commit() # required — changes are not saved without this print(f”Inserted row ID: {cursor.lastrowid}”) Always use parameterized queries with %s placeholders — never use f-strings or string formatting to build SQL queries. That’s how SQL injection vulnerabilities happen. Read — fetching data cursor.execute(“SELECT id, name, salary FROM employees WHERE department = %s”, (“Analytics”,)) rows = cursor.fetchall() for row in rows: print(f”ID: {row[0]}, Name: {row[1]}, Salary: {row[2]}”) # Use fetchone() when you only need a single row # Use fetchmany(n) when you want to paginate large result sets Update — modifying records update_query = “UPDATE employees SET salary = %s WHERE name = %s” cursor.execute(update_query, (92000, “Ayesha Khan”)) connection.commit() print(f”Rows affected: {cursor.rowcount}”) Delete — removing records delete_query = “DELETE FROM employees WHERE id = %s” cursor.execute(delete_query, (7,)) connection.commit() print(f”Deleted {cursor.rowcount} row(s)”) Notice that every write operation — INSERT, UPDATE, DELETE — requires a connection.commit() call. Without it, MySQL rolls back your changes when the connection closes. This is not a bug — it’s how transaction-safe databases are supposed to work. Using Pandas with MySQL — read_sql and to_sql This is where things get genuinely powerful for data work. Pandas can read directly from MySQL into a DataFrame and write DataFrames back to MySQL — with just a few lines of code. But to do this properly, you need SQLAlchemy to create the database engine that Pandas talks to. Reading MySQL data into a Pandas DataFrame import pandas as pd from sqlalchemy import create_engine # Create engine — format: dialect+driver://user:password@host/database engine = create_engine(“mysql+pymysql://username:password@localhost/your_database”) # Read an entire table df = pd.read_sql(“SELECT * FROM employees”, con=engine) # Or use a more targeted query df_filtered = pd.read_sql( “SELECT name, salary FROM employees WHERE salary > 80000″, con=engine ) print(df_filtered.head()) Writing a Pandas DataFrame back to MySQL # if_exists options: ‘fail’, ‘replace’, ‘append’ df.to_sql( name=”employees_backup”, con=engine, if_exists=”replace”, index=False, # don’t write the DataFrame index as a column chunksize=1000 # write in batches for large datasets ) Use if_exists=’append’ when you’re adding new rows to an existing table. Use ‘replace’ only when you intentionally want to drop and recreate the table — it’s destructive. Reading in chunks for large datasets When you’re dealing with millions of rows, loading everything into memory at once is a bad idea. Pandas gives you a chunked reading option that processes data in manageable pieces: chunk_iter = pd.read_sql( “SELECT * FROM large_transactions_table”, con=engine, chunksize=50000
Explore More