Dynamo DB Python Boto3

How to use Python to work with Amazon Dynamo DB

Posted by

In this tutorial, we will be using the Boto3 module in Python to work with Amazon’s NoSQL Database, Dynamo DB. The tutorial will also talk about setting up a local instance of Dynam DB.


NoSQL Databases

NoSQL databases are used to solve challenges faces by RDMS (Relational Database Management System), or simply put Relational Databases. Some cons of an RDMS are listed below

  • A schema has to be defined beforehand
  • The data to be stored has to be structured
  • It is difficult to change tables and relationships

On the other hand, NoSQL databases can handle unstructured data and do not need a schema to be defined.

In this tutorial, we will be working with Amazon Dynamo DB. It is a type of key-value and document database NoSQL database.


Table of Contents

  1. Pre-requisites
  2. Setting up Dynamo DB Locally
  3. Connecting to our DB using Python
  4. Create Table
  5. Insert Data
  6. Get Data
  7. Update Data
  8. Delete Data
  9. Query
  10. Conclusion
  11. Resources

Pre-requisites

  1. Basic Understanding of NoSQL Databases
  2. Experience with Python

Setting up DynamoDB Locally

Step 1

Download and Install Java SE. To run DynamoDB on your computer, you must have the Java Runtime Environment (JRE) version 8.x or newer. The application doesn’t run on earlier JRE versions.

Step 2

Download and Install AWS CLI Installer. Type the following command in the command prompt to verify the installation.

aws --version 

If you get an error, you might have to add a Path variable. Look at this article for more information

Step 3

Download and Extract Amazon Dynamo DB

Step 4

Navigate to the folder where you extracted Dynamo DB and type the following command in a command prompt.

java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

Do Not Close this terminal unit you are done working with the Database

Step 5

Configure credentials. Type the following command in a new command prompt

aws configure

Step 6

Type the following command

aws dynamodb list-tables --endpoint-url http://localhost:8000

This should return an empty list of tables unless you already have existing tables.

Alternatively, you can also setup Amazon Dynamo DB as a web service


Connecting to our DB using Python

Before we start, we will need to set up and activate a virtual environment

/* Install virtual environment */
pip install virtualenv
/* Create a virtual environment */
python -m virtualenv venv 
/* If the above doesn't work, try the following */
python -m venv venv
/* Activate the virtual environment */
venv/Scripts/activate

We will use the boto3 module to interact with the local instance of Dynamo DB.

pip install boto3

Next, we will need to import the library and create a database object

import boto3

We will be creating a class and adding the CRUD operations as its methods.

class dtable:
    db = None
    tableName = None
    table = None
    table_created = False

def __init__(self):
    self.db  = boto3.resource('dynamodb',
    endpoint_url="http://localhost:8000")
    print("Initialized")

Test your code by creating an instance of our class

if __name__ == '__main__':
movies = table()

We will be using the instance of the class table we just created later on in the article.


Create Table

In DynamoDB, a table can have two types of primary keys: A single partition key or a composite primary key (partition key + sort key).

We will create a table called Movies. The year of the movie will be the partition key and the title will be the sort key. Below is the format to declare a key schema. Store it in a variable called KeySchema.

primaryKey=[
{
'AttributeName': 'year',
'KeyType': 'HASH' # Partition key
},
{
'AttributeName': 'title',
'KeyType': 'RANGE' # Sort key
}
]

We will also need to declare the data types of the above attributes.

AttributeDataType=[
  {
     'AttributeName': 'year',
     'AttributeType': 'N' #All Number Type
},
  {
     'AttributeName': 'title',
     'AttributeType': 'S' #String
  },
]

We will also need to limit the number of reads and writes on our database per second

ProvisionedThroughput={
'ReadCapacityUnits': 10,
'WriteCapacityUnits': 10
}

All the required parameters to create a table have now been created. Now we can move on to using these parameters to actually creating the table.

def createTable(self, tableName , KeySchema, AttributeDefinitions, ProvisionedThroughput):
self.tableName = tableName
table = self.db.create_table(
TableName=tableName,
KeySchema=KeySchema,
AttributeDefinitions=AttributeDefinitions,
ProvisionedThroughput=ProvisionedThroughput
)
self.table = table
print(f'Created Table {self.table}')

The above function and our previously defined variables will be used to create the table

movies.createTable(
tableName="Movie",
KeySchema=primaryKey,
AttributeDefinitions=attributeDataType,
ProvisionedThroughput=provisionedThroughput)

Insert Data

The format of the data to be inserted is below

{
'year' : 2020,
'title' : 'Some Title',
'info' : {
'key1' : 'value1',
'key2' : 'value2',
}
}

For each item, other than the primary key(year and title), we have flexibility over the data inside ‘info’. The data inside info doesn’t need to be structured.

Before inserting data, we will create a JSON file with a few movies. You can find the JSON file in my GitHub repo.

def insert_data(self, path):
with open(path) as f:
data = json.load(f)
for item in data:
try:
self.table.put_item(Item = item)
except:
pass
print(f'Inserted Data into {self.tableName}')

Get Item

We can access an item in the database if we know its primary key. In our case, it is the year+Ttitle. We will try to access the table with the year 2020 and title ‘Title1’.

Below is the method of our class which returns the item from the table

def getItem(self,key):
try:
response = self.table.get_item(Key = key)
return response['Item']
except Exception as e:
print('Item not found')
return None

Note: the K in key parameter of the get_item function is uppercase

This is how we would invoke the function

print(movies.getItem(key = {'year' : 2020 , 'title': 'Title 1'}))

Before we move on to Update and Delete, it’ll be beneficial to familiarize yourself with a couple of Expression parameters that can be passed to the update and delete function.

The two expressions are UpdateExpression and ConditionExpression

Below is an example of a UpdateExpression

UpdateExpression=”set info.rating=:rating, info.Info=:info”

:producer and :info are the values we want to use while updating. They can be thought of as placeholders.

We will also need to pass an extra parameter ExpressionAttributeValues to pass values to these variables

ExpressionAttributeValues={             
':rating': 5.0,
':info': 'Updated Information'
}

In a way, this is similar to the format() function in Python

You can find a list of common Update Operations (Add, Modify, Delete) over here

ConditionExpression is similar to where clause in SQL. If evaluated to True, the command is executed else the command is ignored.

An example is below

ConditionExpression= "info.producer = :producer",
ExpressionAttributeValues={
':producer': 'Kevin Feige'
}

The ConditionExpression also follows the same format as the UpdateExpression

CondtionExpression can be used for Conditional Updates and Conditional Deletes. We will discuss them below.

You can find a list of Condition Expressions over here


Update

Below is the update method of our class

def updateItem(self,key, updateExpression, conditionExpression,expressionAttributes):
try:
response = self.table.update_item(
Key = key, UpdateExpression = updateExpression,
ConditionExpression = conditionExpression,
ExpressionAttributes = expressionAttributes
)
except Exception as e:
print(e)
return None

We will update the movie produced by Kevin Feige. We will update the Info, add a rating of 5, and append a genre of ‘legendary’ to the list of genres.

upExp = "SET info.Info = :info , info.rating = :rating, info.Genre = list_append(info.Genre, :genre)"
condExp = "info.Producer = :producer"
expAttr = {
  ":info" : "Updated Information",
  ":rating" : 5,
  ":genre" : ["Legendary"],
  ":producer" : "Kevin Feige"
}
print("After Update")
movies.updateItem({'year' : 2019 , 'title': 'Title 3'},upExp,condExp,expAttr)
print(movies.getItem(key = {'year' : 2019 , 'title': 'Title 3'}))

Delete

The Delete operation is similar to the Update operation. Below is our method to delete an item. It accepts a Key, the condition Expression and Expression Attribute Values.

def deleteItem(self, key, conditionExpression, expressionAttributes):
try:
response = self.table.delete_item(
Key = key,
ConditionExpression = conditionExpression,
ExpressionAttributeValues = expressionAttributes
)
except Exception as e:
print(e)

We will delete the movie with producer = “ABC”

print("Before Delete")
print(movies.getItem(key = {'title':'Title 2' , 'year': 2019}))
print("After Delete")
condExp = "info.Producer = :producer"
expAttr = {':producer' : "ABC" }
movies.deleteItem({'title':'Title 2' , 'year': 2019},condExp,expAttr)
print(movies.getItem(key = {'title':'Title 2' , 'year': 2019}))

Query

We can query the table using the partition key we had provided while creating our table. In our case, it was the year. A partition key is necessary for the query operator, the sort key is optional.

We will be using the Key class, you can read more about it over here.

Import the Key Class

from boto3.dynamodb.conditions import Key

Below is the method for the query

def query(self,projectExpression,expressionAttributes,keyExpression):
try:
response = self.table.query(
ProjectionExpression = projectExpression,
KeyConditionExpression= keyExpression,
)
return response['Items']
except Exception as e:
print(e)
return None

The parameter ProjectionExpression is a string with the list of columns we want the function return. KeyConditionExpression is the key condition using the Key Class. It is necessary to have the partition key present in the KeyConditionExpression. Additionally, you can also pass a parameter FilterExpression which is similar to ConditionExpression

We will display the titles of all movies starting with ‘M’ in 2020.

print("Movies after 2019 with title starting with M")
projection = "title"
Keycondition = Key('year').eq(2020) & Key('title').begins_with('M')
print(movies.query(projection,expAttr,Keycondition))

Conclusion

I hope this article was able to help you out. In case of any errors in the code snippets above, please refer to my Github repo mentioned in the Resources Section. Please do let me know if you find any errors 🙂

Happy Learning!

Resources

Github Repo

rahulbanerjee26/DynamoDB_Boto3
Contribute to rahulbanerjee26/DynamoDB_Boto3 development by creating an account on GitHub.github.com