Working With MongoDB & Python Using PyMongo and Pandas

In this blog, we will connect MongoDB with Python with help of PyMongo. Then we will create a database, collection and documents. After that we will documents in Pandas's dataframe.

Prerequisites

Install Python and pip
Install Jupyter notebook
Install MongoDB community server

What is MongoDB?

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL).

What is PyMongo?

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. The pymongo package is a native Python driver for MongoDB.

Install PyMongo

PyMongo can be installed with pip:

pip install pymongo

conda install

conda install -c anaconda pymongo

Connect to MongoDB

import pymongo

from pymongo import MongoClient

class pymongo.mongo_client.MongoClient(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)

Client for a MongoDB instance, a replica set, or a set of mongoses. Means tools for connecting to MongoDB.

client = MongoClient()
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

MongoClient with optional parameters.

client = MongoClient("mongodb://localhost:27017")
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

Show databases

print(client.list_database_names())

['admin', 'config', 'local']

Create a new database

I am creating a new database called "sampledb". You can give database name according to your choice.

db = client.sampledb
db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sampledb')

View list of database again after creating a new database

print(client.list_database_names())

['admin', 'config', 'local']

If the database doesn’t exist, then MongoDB creates it for you, but only when we perform the first operation on the database.

Create a collection

student_collection = db["students"]
student_collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sampledb'), 'students')

Insert document in the collection

MongoDB generates the ObjectId dynamically, so no need to add id.

student1 = { "name": "John", "age": 10, "class": "VI", "section": "A" }

result = student_collection.insert_one(student1)
result

<pymongo.results.InsertOneResult at 0x2223a301700>

print(f"Inserted new record id: {result.inserted_id}")

Inserted new record id: 623df5c9a2f230bd4ff5fe28

Get the inserted record/document using find()

find(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, batch_size=0, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, session=None, allow_disk_use=None)

Query the database.

The filter argument is a prototype document that all results must match.

for student in student_collection.find():
    print(student)

{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'), 'name': 'John', 'age': 10, 'class': 'VI', 'section': 'A'}

Insert multiple document

insert_many(documents, ordered=True, bypass_document_validation=False, session=None):

Insert an iterable of documents.

Create the list of students.

students = [
    { "name": "Maria", "age": 9, "class": "VI", "section": "B"},
    { "name": "Michel", "age": 11, "class": "VII", "section": "A"},
    { "name": "Priyanka", "age": 8, "class": "IV", "section": "B"},
    { "name": "Jeena", "age": 12, "class": "X", "section": "A" }
]

result = student_collection.insert_many(students)
result

<pymongo.results.InsertManyResult at 0x2223a301b80>

View the all inserted records

for student in student_collection.find():
    print(student)

{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'), 'name': 'John', 'age': 10, 'class': 'VI', 'section': 'A'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe29'), 'name': 'Maria', 'age': 9, 'class': 'VI', 'section': 'B'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2a'), 'name': 'Michel', 'age': 11, 'class': 'VII', 'section': 'A'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2b'), 'name': 'Priyanka', 'age': 8, 'class': 'IV', 'section': 'B'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2c'), 'name': 'Jeena', 'age': 12, 'class': 'X', 'section': 'A'}

Find the first document in the student collection:

find_one(filter=None, *args, **kwargs)

Get a single document from the database.

student = student_collection.find_one()
student

{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
 'name': 'John',
 'age': 10,
 'class': 'VI',
 'section': 'A'}

Convert collection into Pandas dataframe

import pandas as pd

students = student_collection.find()
students

<pymongo.cursor.Cursor at 0x2223a2fd040>

Convert students Cursor to list

list_students = list(students)
list_students

[{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
  'name': 'John',
  'age': 10,
  'class': 'VI',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe29'),
  'name': 'Maria',
  'age': 9,
  'class': 'VI',
  'section': 'B'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2a'),
  'name': 'Michel',
  'age': 11,
  'class': 'VII',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2b'),
  'name': 'Priyanka',
  'age': 8,
  'class': 'IV',
  'section': 'B'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2c'),
  'name': 'Jeena',
  'age': 12,
  'class': 'X',
  'section': 'A'}]

Convert list to the Pandas Dataframe

df = pd.DataFrame(list_students)
df

Convert collection with query into Pandas dataframe

students1 = student_collection.find({ "class": "VI" })
students1

<pymongo.cursor.Cursor at 0x2223c6625e0>

list_students1 = list(students1)
list_students1

[{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
  'name': 'John',
  'age': 10,
  'class': 'VI',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe29'),
  'name': 'Maria',
  'age': 9,
  'class': 'VI',
  'section': 'B'}]

df1 = pd.DataFrame(list_students1)
df1

	_id	name	age	class	section
0	623df5c9a2f230bd4ff5fe28	John	10	VI	A
1	623df5cca2f230bd4ff5fe29	Maria	9	VI	B
2	623df5cca2f230bd4ff5fe2a	Michel	11	VII	A
3	623df5cca2f230bd4ff5fe2b	Priyanka	8	IV	B
4	623df5cca2f230bd4ff5fe2c	Jeena	12	X	A

Working With MongoDB & Python Using PyMongo and Pandas

Working With MongoDB & Python Using PyMongo and Pandas

Prerequisites

What is MongoDB?

What is PyMongo?

Install PyMongo

PyMongo can be installed with pip:

conda install

Connect to MongoDB

class pymongo.mongo_client.MongoClient(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)

MongoClient with optional parameters.

Show databases

Create a new database

View list of database again after creating a new database

Create a collection

Insert document in the collection

Get the inserted record/document using find()

Insert multiple document

insert_many(documents, ordered=True, bypass_document_validation=False, session=None):

View the all inserted records

Find the first document in the student collection:

find_one(filter=None, *args, **kwargs)

Convert collection into Pandas dataframe

Convert students Cursor to list

Convert list to the Pandas Dataframe

Convert collection with query into Pandas dataframe

kindergarten

Python for kids

Fourier series

Linear Equations

Geometry

Laplace

Vectors

Differential equations

Functions

Jacobian

Lagrangian

Waves

Electromagnetism

Optics

Quantum mechanics concepts

Theory of relativity

Kinematics

Thermodynamics

Formulae

A level physics

Chemistry

English

Geography

Animation

Plotting

SVG

Python

Machine Learning

TensorFlow

PySpark

PyTorch

Natural Language Processing

Others