Thursday, March 11, 2021

How to Set Timeout and Max Retries in Python Request Module

Introduction

In this tutorial we will use python requests module with max retries and timeout values. It is strongly recommended to implement retries and timeout mechanism for production code as connections can close at any time, even in non-error conditions. HTTP applications have to be ready to properly handle unexpected closes. If a transport connection closes while the client is performing a transaction, the client should reopen the connection and retry one time, unless the transaction has side effects. 

1. Set Max Retries Values

We will create python session and define our retry strategy for that. Our retry strategy will consists of connect retries, read retries, backoff factor (delay between attempts after second try), and status_forcelist (list of status, against which we want to retry).

Use following code to create request session, and setup retry strategy.

# import request module

import requests

# define some values of retries. You can update it according to your requirements.

MAX_RETRIES = 3

BACKOFF = 1

STATUS_FORCELIST = [

    413,  # Payload Too Large

    429,  # Too Many Requests

    500,  # Internal Server Error

    502,  # Bad Gateway

    503,  # Service Unavailable

    504  # Gateway Timeout

]

# Set both connect and read timeout. (we can set only one value to represent both.)

REQUEST_TIMEOUT = (3.05, 10)  # (connect, read)

# create request session

session = requests.Session()

# set some default values in session header, these values will be merged with request header, if we want to send some additional headers at runtime.

session.headers.update({'Connection': 'close'})

# define retry strategy, and set required values

retry = Retry(

total=MAX_RETRIES,  # Total number of retries to allow. Takes precedence over other counts.

read=MAX_RETRIES,  # How many connection-related errors to retry on.

connect=MAX_RETRIES,  # How many times to retry on read errors.

backoff_factor=BACKOFF,  # A backoff factor to apply between attempts after the second try

status_forcelist=STATUS_FORCELIST,

)

adapter = HTTPAdapter(max_retries=retry)

session.mount('http://', adapter)

session.mount('https://', adapter)

At this point, we have session ready with required values of retries, back off factor, and status force list. We will use this session to send API requests.

2. Set Timeout Value

Use following code to send API requests with required value of request timeout we have set above

response = session.get('httpbin.org/delay/10', timeout=REQUEST_TIMEOUT)

This request will attempt retries against given list of status force list before raising any exception.

Conclusion

We have configured both timeout and max retries of python request module. Make sure to set some appropriate value of timeout on production, by default value of timeout is None which means it never expire which can cause your production code to hang for too long.

Wednesday, March 10, 2021

How to Create and Upload an In Memory CSV File to Amazon S3 Bucket using Python

Introduction

In this tutorial we will create an in memory csv file and upload to Amazon S3 bucket using python package boto3. We will cover two scenarios here, 1). create an in memory file from nested lists (list of lists), 2). create in memory file from list of dictionaries, and then upload to Amazon S3.

1. Install dependencies

We need to install required dependencies in order to complete this tutorial. Install following package in your OS directly or first create virtual environment activate it and then install package in that virtual environment.

Run following command to create virtual environment (in windows)

virtualenv venv  # venv is name of virtual environment

Activate virtual environment:

.\venv\Scripts\activate.bat

Install following package(s) in virtual environment:

pip install boto3

2. Create In memory File

Create new file with extension .py and import following module:

import csv

import boto3

from io import StringIO

Create list of lists as following, first nested list will represent header of file, and other nested lists will represent data of file.

list_of_lists = [['name', 'age'], ['name 1', 25], ['name 2', 26], ['name 3', 27]]

If we have list of dictionaries, we can first convert that list of dictionaries to list of lists and then proceed with next steps. Use following code to convert list of dictionaries to list of lists.

# input list

list_of_dicts = [{'name': 'name 1', 'age': 25}, {'name': 'name 2', 'age': 26}, {'name': 'name 3', 'age': 27}]

# convert list of dictionary to list of lists

file_data = []

header = list(list_of_dicts[0].keys())

file_data = [[d[key] for key in header] for d in list_of_dicts]

file_data = [header] + file_data

At this point, we have converted list of dictionaries to list of lists, use following code to create in memory file and write data in it.

# create in memory file and write data in it.

file_to_save = StringIO()

csv.writer(file_to_save).writerows(file_data)

file_to_save = bytes(file_to_save.getvalue(), encoding='utf-8')

file_name_on_s3 = 'my_data.csv'

3. Save In Memory File to Amazon S3

We have created an in memory file, now use following code to save/upload that in memory file to Amazon S3.

# create boto3 client using your AWS access key id and secret access key

client = boto3.client('s3',

                      aws_access_key_id='your access key',

                      aws_secret_access_key='your secret key')

# save in memory file to S3

response = client.put_object(

Body=file_to_save,

Bucket='your bucket name',

Key=file_name_on_s3,

)

Conclusion

We have created an in memory file from list of lists and/or list of dictionaries, and uploaded to Amazon S3 bucket. Please let me know in comments if you have any better approach to implement this.

HAPPY CODING!!