looking for tips handling large file uploads (100MB < file 1GB)
Hi there,
We're writing a Flask backend. One of the features is the users should be able to upload big files, 1+ GB is not uncommon.
I was wondering whether you'd have some tips to handle these properly?
*First question*: as we implemented it now, it eats all memory away. So basically the memory usage increases as much as the file size has been uploaded.
I guess we're doing everything by the book (followed the guideline http://flask.pocoo.org/docs/0.11/patterns/fileuploads/)
Here is how the service is implemented:
def store_upload(project_id):
file = request.files['file']
# boiler plate checks ...
file.save(os.path.join(destinationbase, destination)) #save file to decent location
return _response_stuff()
From what I've seen, the memory usage increases even before the code enters in "file = request.files['file']". How can I avoid this, and make sure flask saves every data chunk on disk and not in memory?
*Second question*: as a WSGI server, we use gunicorn. What would be correct gunicorn settings? We're using gevent workers. Async does not seem to be the wrong choice, as we've (I guess) not hit a bottleneck there. But if you think there are better choices, I'd be glad to hear. I am curious about the difference with e.g. gthread or gaiohttp.
thanks!
/r/flask
http://redd.it/5akwqs
Hi there,
We're writing a Flask backend. One of the features is the users should be able to upload big files, 1+ GB is not uncommon.
I was wondering whether you'd have some tips to handle these properly?
*First question*: as we implemented it now, it eats all memory away. So basically the memory usage increases as much as the file size has been uploaded.
I guess we're doing everything by the book (followed the guideline http://flask.pocoo.org/docs/0.11/patterns/fileuploads/)
Here is how the service is implemented:
def store_upload(project_id):
file = request.files['file']
# boiler plate checks ...
file.save(os.path.join(destinationbase, destination)) #save file to decent location
return _response_stuff()
From what I've seen, the memory usage increases even before the code enters in "file = request.files['file']". How can I avoid this, and make sure flask saves every data chunk on disk and not in memory?
*Second question*: as a WSGI server, we use gunicorn. What would be correct gunicorn settings? We're using gevent workers. Async does not seem to be the wrong choice, as we've (I guess) not hit a bottleneck there. But if you think there are better choices, I'd be glad to hear. I am curious about the difference with e.g. gthread or gaiohttp.
thanks!
/r/flask
http://redd.it/5akwqs
gging
if serializer.is_valid():
serializer.save() #save data to database
add_result['flag'] = 1
add_result['message'] = 'Successfully added {0}'.format(user_name)
else:
msg = "Invalid serializer. serializer.errors: {0}".format(serializer.errors)
add_result['message'] = msg
except Exception as e:
msg = "Exception using serializer: {0}. Exception type: {1}.".format(e, e.__class__.__name__)
add_result['message'] = msg
else:
msg = "'{0}' is not a Destiny2 player on PS4".format(user_name)
add_result['message'] = msg
else: #not valid
msg = "save_user_form not valid. error: {0}".format(save_user_form.errors)
add_result['message'] = msg
return add_result
/r/django
https://redd.it/7eigs7
if serializer.is_valid():
serializer.save() #save data to database
add_result['flag'] = 1
add_result['message'] = 'Successfully added {0}'.format(user_name)
else:
msg = "Invalid serializer. serializer.errors: {0}".format(serializer.errors)
add_result['message'] = msg
except Exception as e:
msg = "Exception using serializer: {0}. Exception type: {1}.".format(e, e.__class__.__name__)
add_result['message'] = msg
else:
msg = "'{0}' is not a Destiny2 player on PS4".format(user_name)
add_result['message'] = msg
else: #not valid
msg = "save_user_form not valid. error: {0}".format(save_user_form.errors)
add_result['message'] = msg
return add_result
/r/django
https://redd.it/7eigs7
reddit
Best strategies for incorporating data from a RESTful... • r/django
**tl;dr** When pulling data into my django database from an api, should I use forms, or a [django rest...
Way to loop np.save or np.savetxt?
I have very large data sets and so need to save by appending parts. The simplified version is below and I’m not sure why it doesn't work:
number = 10 #the number of iterations
thing = np.array(1,2,3)
f = open('a.npy', 'ab') #open a file for appending
for i in range(number):
np.save(f, thing) #save the thing to a file
with open('a.npy', 'rb') as f: #open a file for reading
a = np.load(f)
print(a) #this just returns 1,2,3 (not this ten times which is what I want).
I.e. I want to return [1,2,3,1,2,3,1,2,3,...,1,2,3]
/r/Python
https://redd.it/mpxahi
I have very large data sets and so need to save by appending parts. The simplified version is below and I’m not sure why it doesn't work:
number = 10 #the number of iterations
thing = np.array(1,2,3)
f = open('a.npy', 'ab') #open a file for appending
for i in range(number):
np.save(f, thing) #save the thing to a file
with open('a.npy', 'rb') as f: #open a file for reading
a = np.load(f)
print(a) #this just returns 1,2,3 (not this ten times which is what I want).
I.e. I want to return [1,2,3,1,2,3,1,2,3,...,1,2,3]
/r/Python
https://redd.it/mpxahi
reddit
Way to loop np.save or np.savetxt?
I have very large data sets and so need to save by appending parts. The simplified version is below and I’m not sure why it doesn't work: ...
Appending data to a CSV file within a flask app
Hi! As seen in the title, I can't seem to add/append data or text to the csv that I have uploaded to my flask app that is being deployed in Google Cloud Run. But when I try it locally using Jupyter Notebook, it seems to work just fine.
Here is the code:
#load the csv
features = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
url_data = pd.read_csv("RetrainDatabase.csv",low_memory=False)
#append the new data features
url_data.loc[len(url_data )] = features
#save the new csv
url_data.to_csv("RetrainDatabase_V2.csv", index=False)
/r/flask
https://redd.it/13fcehr
Hi! As seen in the title, I can't seem to add/append data or text to the csv that I have uploaded to my flask app that is being deployed in Google Cloud Run. But when I try it locally using Jupyter Notebook, it seems to work just fine.
Here is the code:
#load the csv
features = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
url_data = pd.read_csv("RetrainDatabase.csv",low_memory=False)
#append the new data features
url_data.loc[len(url_data )] = features
#save the new csv
url_data.to_csv("RetrainDatabase_V2.csv", index=False)
/r/flask
https://redd.it/13fcehr
Reddit
r/flask on Reddit: Appending data to a CSV file within a flask app
Posted by u/Xylluxov - No votes and no comments