Appearance
MongoDB with Python (pymongo)
The recommended Python driver for MongoDB is pymongo.
- Documentation: https://pymongo.readthedocs.io/en/stable/
- Tutorial: https://pymongo.readthedocs.io/en/stable/tutorial.html
Installation
bash
python3 -m pip install pymongoConnecting to the Database
python
from pymongo import MongoClient
client = MongoClient('localhost', 27017)Selecting a Database
python
# Both are equivalent
db = client.testdb
db = client['testdb']Selecting a Collection
python
# Both are equivalent
coll = db.trips
coll = db['trips']Working with ObjectId
Every document in MongoDB has a unique _id field, which is an ObjectId by default. In pymongo, you need to import the ObjectId class from bson to query or work with these IDs.
python
from bson import ObjectIdbson
The bson module is included with pymongo — no separate installation is needed.
Finding a Document by ID
python
user = collection.find_one({"_id": ObjectId("63e362b9415552734e896049")})
print(user)Getting the ID of an Inserted Document
When you insert a document, pymongo returns the generated ObjectId:
python
result = collection.insert_one({"username": "newuser", "role": "normal"})
print(result.inserted_id) # e.g. ObjectId('665a1f2b3c4d5e6f7a8b9c0d')Converting ObjectId to String and Back
python
# ObjectId to string
id_string = str(user['_id'])
# String back to ObjectId
object_id = ObjectId(id_string)This is useful when passing IDs through URLs or APIs, where they need to be plain strings.
Referencing Other Documents
MongoDB does not have joins like relational databases, but you can reference documents in other collections by storing their ObjectId. This is similar to a foreign key in SQL.
Storing a Reference
When inserting a document, include the _id of the related document as a field:
python
# Insert a user
user_result = db.users.insert_one({"username": "john.doe", "role": "admin"})
# Insert an order that references the user
db.orders.insert_one({
"product": "Laptop",
"quantity": 1,
"user_id": user_result.inserted_id # stores the user's ObjectId
})Looking Up a Referenced Document
To fetch the related document, query using the stored ObjectId:
python
# Find the order
order = db.orders.find_one({"product": "Laptop"})
# Use the stored reference to find the user
user = db.users.find_one({"_id": order["user_id"]})
print(user["username"]) # john.doeManual lookups
Unlike SQL joins, MongoDB references require a separate query for each lookup. This is by design — MongoDB favours embedding related data inside a single document when possible. Use references when the related data is a separate entity (e.g. a user and an order).
CRUD Operations
Insert
python
trip = {
"departureStationName": "Karhupuisto",
"returnStationName": "Sörnäinen (M)",
"coveredDistance": 712,
"duration": 224
}
trips = db.trips
inserted_id = trips.insert_one(trip).inserted_idfind() vs find_one()
find() returns a cursor — an iterable object that you loop over. find_one() returns a single dictionary, or None if no document matches.
python
# find() returns a cursor — iterate with a for loop
for trip in collection.find():
print(trip['departureStationName'])
# find_one() returns one document or None
trip = collection.find_one({"departureStationName": "Karhupuisto"})
if trip:
print(trip['returnStationName'])WARNING
Always check for None when using find_one(), since accessing a key on None will raise an error.
Filtering
Filters work the same way as in the MongoDB shell. Pass a dictionary as the first argument.
python
# Trips longer than 1000 meters
for trip in collection.find({"coveredDistance": {"$gt": 1000}}):
print(trip['departureStationName'], trip['coveredDistance'])
# Trips from Karhupuisto or Hietalahdentori
for trip in collection.find({"departureStationName": {"$in": ["Karhupuisto", "Hietalahdentori"]}}):
print(trip['departureStationName'])Projection
Pass a second dictionary to find() or find_one() to select which fields to return.
python
# Return only station names and distance
for trip in collection.find({}, {"departureStationName": 1, "returnStationName": 1, "coveredDistance": 1, "_id": 0}):
print(trip)
# find_one with projection
trip = collection.find_one({"departureStationName": "Karhupuisto"}, {"coveredDistance": 1, "_id": 0})
print(trip) # {'coveredDistance': 712}Sorting
Sorting in pymongo uses a different syntax than the MongoDB shell.
For a single field, pass the field name and direction:
python
# Sort by distance, longest first
for trip in collection.find().sort("coveredDistance", -1):
print(trip['coveredDistance'])For multiple fields, pass a list of tuples:
python
# Sort by departure station (A-Z), then by distance (longest first)
for trip in collection.find().sort([("departureStationName", 1), ("coveredDistance", -1)]):
print(trip['departureStationName'], trip['coveredDistance'])Shell vs pymongo
MongoDB shell: .sort({ coveredDistance: -1 }) pymongo: .sort("coveredDistance", -1)
These are not interchangeable — pymongo does not accept a dictionary in .sort().
Limit and Skip
python
# First 10 trips
for trip in collection.find().limit(10):
print(trip['departureStationName'])
# Page 2 with 10 trips per page
for trip in collection.find().skip(10).limit(10):
print(trip['departureStationName'])Date Queries
In pymongo, use Python's datetime module instead of ISODate().
python
from datetime import datetime
# Trips departing after July 31, 2025
for trip in collection.find({"departure": {"$gt": datetime(2025, 7, 31)}}):
print(trip['departureStationName'])
# Trips during July 2025
for trip in collection.find({
"departure": {
"$gte": datetime(2025, 7, 1),
"$lte": datetime(2025, 7, 31, 23, 59, 59)
}
}):
print(trip['departureStationName'])INFO
pymongo automatically converts Python datetime objects to MongoDB ISODate values and vice versa.
Update
python
filter = {"departureStationName": "Sörnäinen (M)"}
new_values = {"$set": {"departureStationName": "Sörnäinen"}}
collection.update_one(filter, new_values)Updating Array Elements
Use the positional $ operator to update a specific element inside an array. The filter must match both the document and the array element.
python
# Update a specific role's access level in a user's roles array
db.users.update_one(
{"username": "john.doe", "roles.name": "editor"},
{"$set": {"roles.$.accessLevel": 2}}
)Delete
python
# Delete one trip
collection.delete_one({"departureStationName": "Karhupuisto"})
# Delete trips shorter than 500 meters — returns the count of deleted documents
deleted_count = collection.delete_many({"coveredDistance": {"$lt": 500}}).deleted_count
# Delete all documents
collection.delete_many({})Aggregation
The aggregation pipeline works the same way in pymongo as in the MongoDB shell. Pass a list of stages to the aggregate() method.
python
# Count trips per departure station
results = collection.aggregate([
{"$group": {"_id": "$departureStationName", "count": {"$sum": 1}}}
])
for doc in results:
print(doc["_id"], doc["count"])Multi-Stage Pipeline
python
# Average distance of trips from Karhupuisto
results = collection.aggregate([
{"$match": {"departureStationName": "Karhupuisto"}},
{"$group": {"_id": None, "avgDistance": {"$avg": "$coveredDistance"}}}
])
for doc in results:
print("Average distance:", doc["avgDistance"])Aggregation with Sorting and Limiting
python
# Top 3 longest trips
results = db.trips.aggregate([
{"$sort": {"coveredDistance": -1}},
{"$limit": 3},
{"$project": {"departureStationName": 1, "returnStationName": 1, "coveredDistance": 1, "_id": 0}}
])
for doc in results:
print(doc["departureStationName"], "->", doc["returnStationName"], doc["coveredDistance"], "m")TIP
The aggregate() method returns a cursor, just like find(). Iterate over it with a for loop or convert it to a list with list(results).
Importing CSV Files with Pandas
You can use pandas to read a CSV file and insert all rows into a MongoDB collection. Each row becomes a document.
Installation
bash
python3 -m pip install pandasExample
Given a CSV file trips.csv:
Departure,Return,Departure station name,Return station name,Covered distance (m),Duration (sec.)
2025-07-31T23:59:58,2025-08-01T00:08:37,Isoisänsilta,Vilhonvuorenkatu,1002,516
2025-07-31T23:59:54,2025-08-01T00:03:42,Karhupuisto,Sörnäinen (M),712,224
2025-07-31T23:59:53,2025-08-01T00:05:21,Hietalahdentori,Apollonkatu,1805,323python
import pandas as pd
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client.bikedata
collection = db.trips
# Read the CSV file into a DataFrame
df = pd.read_csv("trips.csv")
# Convert the DataFrame to a list of dictionaries and insert into MongoDB
collection.insert_many(df.to_dict(orient="records"))The to_dict(orient="records") method converts each row into a dictionary:
python
[
{"Departure": "2025-07-31T23:59:58", "Return": "2025-08-01T00:08:37", "Departure station name": "Isoisänsilta", "Return station name": "Vilhonvuorenkatu", "Covered distance (m)": 1002, "Duration (sec.)": 516},
{"Departure": "2025-07-31T23:59:54", ...},
{"Departure": "2025-07-31T23:59:53", ...}
]Column names
The field names in MongoDB will match the CSV column headers exactly, including spaces and special characters. You can rename columns in pandas before inserting with df.rename(columns={"Covered distance (m)": "coveredDistance"}).
Clearing old data first
If you want to replace the collection contents each time, call collection.delete_many({}) before inserting.
Full Example
python
from pymongo import MongoClient
from bson import ObjectId
client = MongoClient("mongodb://localhost:27017/")
db = client.bikedata
collection = db.trips
# Insert a new trip
result = collection.insert_one({
"departureStationName": "Karhupuisto",
"returnStationName": "Sörnäinen (M)",
"coveredDistance": 712,
"duration": 224
})
print("Inserted ID:", result.inserted_id)
# Find the trip by its ObjectId
trip = collection.find_one({"_id": result.inserted_id})
print("Found trip:", trip['departureStationName'], "->", trip['returnStationName'])
# List all trips
for trip in collection.find():
print(trip['departureStationName'], "->", trip['returnStationName'], trip['coveredDistance'], "m")