How to Import Data From a MySql Database Into Pandas Data Frame

How to Import Data From a MySql Database Into Pandas Data Frame

In this blog we will connect to Mysql database, read tables and convert into pandas's dataframe. We will also add some records in mysql table through csv file and pandas dataframe.

Install SQLAlchemy and MySQLdb

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. SQLAlchemy provides a full suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language.

Python needs a module to interface with MySQL database. For that install MySQL-Python module.

sudo -E pip install sqlalchemy

sudo apt-get install python-mysqldb

sudo -E pip install mysqlclient

Import modules

In [11]:
import pandas as pd

Mysql database details

In [12]:
username = "root"
password = "nutannutan"
port = 3306
database = "test"

I have several database in my system. I wanted to read one of mysql database called test. You can read database which you have. In my test database i have one table student.

Engine Configuration

The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.

Where above, an Engine references both a Dialect and a Pool, which together interpret the DBAPI’s module functions as well as the behavior of the database.

Creating an engine is just a matter of issuing a single call, create_engine():

from sqlalchemy import create_engine
In [13]:
#to check mysql url
                       %(username, password, port, database))
In [14]:
engine = create_engine('mysql+mysqldb://%s:%s@localhost:%i/%s'
                       %(username, password, port, database))

Read database table using pandas module

pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None): Read SQL query into a DataFrame.

Returns a DataFrame corresponding to the result set of the query string. Optionally provide an index_col parameter to use one of the columns as the index, otherwise default integer index will be used.


sql: str 
    SQL query to be executed.

con: SQLAlchemy connectable, str, or sqlite3 connection
     Using SQLAlchemy makes it possible to use any DB supported by that library.

index_col: str or list of str, optional, default: None
    Column(s) to set as index(MultiIndex).

coerce_float: bool, default True
    Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Useful for SQL result sets.

 params: list, tuple or dict, optional, default: None
     List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent.      
 parse_dates: list or dict, default: None
     1. List of column names to parse as dates.
     2. Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
     3. Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

 chunksize: int, default None
     If specified, return an iterator where chunksize is the number of rows to include in each chunk.

 dtype: Type name or dict of columns
     Data type for data or columns. E.g. np.float64 or {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}


    DataFrame or Iterator[DataFrame]
In [15]:
sql = "SELECT * FROM student;"
df = pd.read_sql_query(sql, engine).set_index('id')
In [16]:
name age qualification
1 John 25 BE
2 Nesha 20 BE
3 Maria 30 BTech
4 Suraj 35 Electrical Engineer
5 Martha 16 XII

Insert data into student table

Create student data

I have created the student.csv file and added some student records.

Read csv file

In [18]:
inputFile = '/static/images/nutan/blog/python/student.csv'
students = pd.read_csv(inputFile).set_index('id')
       name  age        qualification
6     Heena   21             Graduate
7     Rinki   15                   10
8   Deepika   18                   12
9    Neesha   23  Electrical Engineer

Insert data into Mysql

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.


    name: str
        Name of SQL table.

    con: sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
        Using SQLAlchemy makes it possible to use any DB supported by that library.

    schema: str, optional
        Specify the schema (if database flavor supports this). If None, use default schema.

    if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’

        How to behave if the table already exists.
        1. fail: Raise a ValueError.
        2. replace: Drop the table before inserting new values.
        3. append: Insert new values to the existing table.

    index: bool, default True
        Write DataFrame index as a column. Uses index_label as the column name in the table.

    index_label: str or sequence, default None
        Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

    chunksize: int, optional
        Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.
    dtype: dict or scalar, optional
        Specifying the datatype for columns. 

    method{None, ‘multi’, callable}, optional
        Controls the SQL insertion clause used:
        1. None : Uses standard SQL INSERT clause (one per row).
        2. ‘multi’: Pass multiple values in a single INSERT clause.
        3. callable with signature (pd_table, conn, keys, data_iter).

        Details and a sample callable implementation can be found in the section insert method.


    When the table already exists and if_exists is ‘fail’ (the default).
In [19]:
students.to_sql('student', engine, if_exists='append', index=False)

View data in mysql after appending new students

Login to mysql console and write query "select * from student;".

Read sql and convert into pandas dataframe

In [20]:
sql = "SELECT * FROM student;"
df = pd.read_sql_query(sql, engine).set_index('id')
In [22]:
name age qualification
1 John 25 BE
2 Nesha 20 BE
3 Maria 30 BTech
4 Suraj 35 Electrical Engineer
5 Martha 16 XII
6 Rishika 16 II
7 Heena 21 Graduate
8 Rinki 15 10
9 Deepika 18 12
10 Neesha 23 Electrical Engineer

We can see mysql student table loaded into pandas dataframe. Now whatever you want to do with pandas dataframe, you can do.

Let us add some students using pandas dataframe

Create student dataframe

In [32]:
new_students = {
        'id':[11, 12],
        'name':['Roy', 'Mishal'],
        'age':[20, 21],
        'qualification':['12', 'BE'],
{'id': [11, 12],
 'name': ['Roy', 'Mishal'],
 'age': [20, 21],
 'qualification': ['12', 'BE']}

Convert new student dictionary into pandas dataframe

In [34]:
df1 = pd.DataFrame(new_students).set_index('id')
name age qualification
11 Roy 20 12
12 Mishal 21 BE

Insert into sql table

In [35]:
df1.to_sql('student', engine, if_exists='append', index=False)

View student records after adding new record

In [36]:
sql = "SELECT * FROM student;"
df2 = pd.read_sql_query(sql, engine).set_index('id')
In [37]:
name age qualification
1 John 25 BE
2 Nesha 20 BE
3 Maria 30 BTech
4 Suraj 35 Electrical Engineer
5 Martha 16 XII
6 Rishika 16 II
7 Heena 21 Graduate
8 Rinki 15 10
9 Deepika 18 12
10 Neesha 23 Electrical Engineer
11 Roy 20 12
12 Mishal 21 BE

Plot numeric columns

In [38]:
import matplotlib.pyplot as plt
In [42]:
df2.plot(kind = "bar")

That's it.

In [ ]:

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning


  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning


  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set


  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing