How to Import Data From a MySql Database Into Pandas Data Frame

In this blog we will connect to Mysql database, read tables and convert into pandas's dataframe. We will also add some records in mysql table through csv file and pandas dataframe.

Install SQLAlchemy and MySQLdb

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. SQLAlchemy provides a full suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language.

Python needs a module to interface with MySQL database. For that install MySQL-Python module.

sudo -E pip install sqlalchemy

sudo apt-get install python-mysqldb

sudo -E pip install mysqlclient

Import modules

import pandas as pd

Mysql database details

username = "root"
password = "nutannutan"
port = 3306
database = "test"

I have several database in my system. I wanted to read one of mysql database called test. You can read database which you have. In my test database i have one table student.

No description has been provided for this image

Engine Configuration

The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.

No description has been provided for this image

Where above, an Engine references both a Dialect and a Pool, which together interpret the DBAPI’s module functions as well as the behavior of the database.

Creating an engine is just a matter of issuing a single call, create_engine():

#to check mysql url
print('mysql+mysqldb://%s:%s@localhost:%i/%s'
                       %(username, password, port, database))

mysql+mysqldb://root:nutannutan@localhost:3306/test

engine = create_engine('mysql+mysqldb://%s:%s@localhost:%i/%s'
                       %(username, password, port, database))

Read database table using pandas module

pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None): Read SQL query into a DataFrame.

Returns a DataFrame corresponding to the result set of the query string. Optionally provide an index_col parameter to use one of the columns as the index, otherwise default integer index will be used.

Parameters

sql: str 
    SQL query to be executed.
    
con: SQLAlchemy connectable, str, or sqlite3 connection
     Using SQLAlchemy makes it possible to use any DB supported by that library.

index_col: str or list of str, optional, default: None
    Column(s) to set as index(MultiIndex).
    
coerce_float: bool, default True
    Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Useful for SQL result sets.
 
 params: list, tuple or dict, optional, default: None
     List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent.      
 parse_dates: list or dict, default: None
     1. List of column names to parse as dates.
     2. Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
     3. Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

 chunksize: int, default None
     If specified, return an iterator where chunksize is the number of rows to include in each chunk.
    
 dtype: Type name or dict of columns
     Data type for data or columns. E.g. np.float64 or {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}

Returns

    DataFrame or Iterator[DataFrame]

sql = "SELECT * FROM student;"
df = pd.read_sql_query(sql, engine).set_index('id')

df.head()

Insert data into student table

Create student data

I have created the student.csv file and added some student records.

No description has been provided for this image

Read csv file

inputFile = '/static/images/nutan/blog/python/student.csv'
students = pd.read_csv(inputFile).set_index('id')
print(students)

       name  age        qualification
id                                   
6     Heena   21             Graduate
7     Rinki   15                   10
8   Deepika   18                   12
9    Neesha   23  Electrical Engineer

Insert data into Mysql

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.

Parameters

    name: str
        Name of SQL table.
    
    con: sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
        Using SQLAlchemy makes it possible to use any DB supported by that library.

    schema: str, optional
        Specify the schema (if database flavor supports this). If None, use default schema.
    
    if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’

        How to behave if the table already exists.
        1. fail: Raise a ValueError.
        2. replace: Drop the table before inserting new values.
        3. append: Insert new values to the existing table.

    index: bool, default True
        Write DataFrame index as a column. Uses index_label as the column name in the table.
    
    index_label: str or sequence, default None
        Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
    
    chunksize: int, optional
        Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.
    dtype: dict or scalar, optional
        Specifying the datatype for columns. 
    
    method{None, ‘multi’, callable}, optional
        Controls the SQL insertion clause used:
        1. None : Uses standard SQL INSERT clause (one per row).
        2. ‘multi’: Pass multiple values in a single INSERT clause.
        3. callable with signature (pd_table, conn, keys, data_iter).

        Details and a sample callable implementation can be found in the section insert method.

Raises

ValueError
    When the table already exists and if_exists is ‘fail’ (the default).

students.to_sql('student', engine, if_exists='append', index=False)

View data in mysql after appending new students

Login to mysql console and write query "select * from student;".

No description has been provided for this image

Read sql and convert into pandas dataframe

sql = "SELECT * FROM student;"
df = pd.read_sql_query(sql, engine).set_index('id')

df

We can see mysql student table loaded into pandas dataframe. Now whatever you want to do with pandas dataframe, you can do.

Let us add some students using pandas dataframe

Create student dataframe

new_students = {
        'id':[11, 12],
        'name':['Roy', 'Mishal'],
        'age':[20, 21],
        'qualification':['12', 'BE'],
}
new_students

{'id': [11, 12],
 'name': ['Roy', 'Mishal'],
 'age': [20, 21],
 'qualification': ['12', 'BE']}

Convert new student dictionary into pandas dataframe

df1 = pd.DataFrame(new_students).set_index('id')
df1

Insert into sql table

df1.to_sql('student', engine, if_exists='append', index=False)

View student records after adding new record

sql = "SELECT * FROM student;"
df2 = pd.read_sql_query(sql, engine).set_index('id')

df2

Plot numeric columns

import matplotlib.pyplot as plt

df2.plot(kind = "bar")
plt.xlabel("ID")
plt.ylabel("Age")
plt.show()

That's it.

	name	age	qualification
id
1	John	25	BE
2	Nesha	20	BE
3	Maria	30	BTech
4	Suraj	35	Electrical Engineer
5	Martha	16	XII

How to Import Data From a MySql Database Into Pandas Data Frame

How to Import Data From a MySql Database Into Pandas Data Frame

Install SQLAlchemy and MySQLdb

Import modules

Mysql database details

Engine Configuration

Read database table using pandas module

pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None): Read SQL query into a DataFrame.

Parameters

Returns

Insert data into student table

Create student data

Read csv file

Insert data into Mysql

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

Parameters

Raises

View data in mysql after appending new students

Read sql and convert into pandas dataframe

Let us add some students using pandas dataframe

Create student dataframe

Convert new student dictionary into pandas dataframe

Insert into sql table

View student records after adding new record

Plot numeric columns

kindergarten

Python for kids

Fourier series

Linear Equations

Geometry

Laplace

Vectors

Differential equations

Functions

Jacobian

Lagrangian

Waves

Electromagnetism

Optics

Quantum mechanics concepts

Theory of relativity

Kinematics

Thermodynamics

Formulae

A level physics

Chemistry

English

Geography

Animation

Plotting

SVG

Python

Machine Learning

TensorFlow

PySpark

PyTorch

Natural Language Processing

Others