D Did anyone receive this from NIPS?
Your co-author, Reviewer has not submitted their reviews for one or more papers assigned to them for review (or they submitted insufficient reviews). Please kindly note the Review deadline was on the 2nd July 11.59pm AOE.
===
My co-author has graduated and no longer worked in academic anymore. How can I handle that? It is not fair to reject my paper!
/r/MachineLearning
https://redd.it/1lrr5yy
Your co-author, Reviewer has not submitted their reviews for one or more papers assigned to them for review (or they submitted insufficient reviews). Please kindly note the Review deadline was on the 2nd July 11.59pm AOE.
===
My co-author has graduated and no longer worked in academic anymore. How can I handle that? It is not fair to reject my paper!
/r/MachineLearning
https://redd.it/1lrr5yy
Reddit
From the MachineLearning community on Reddit
Explore this post and more from the MachineLearning community
Generating Synthetic Data for Your ML Models
I prepared a simple tutorial to demonstrate how to use synthetic data with machine learning models in Python.
https://ryuru.com/generating-synthetic-data-for-your-ml-models/
/r/Python
https://redd.it/1lrkjvc
I prepared a simple tutorial to demonstrate how to use synthetic data with machine learning models in Python.
https://ryuru.com/generating-synthetic-data-for-your-ml-models/
/r/Python
https://redd.it/1lrkjvc
Ryuru
Generating Synthetic Data for Your ML Models - Ryuru
Maybe it’s your first time hearing the term “synthetic data.” What could it be? Synthetic data is like fuel for your ML models, made...
WebPath: Yes yet another another url library but hear me out
Yeaps another url library. But hear me out. Read on first.
# What my project does
Extending the pathlib concept to HTTP:
# before:
resp = requests.get("https://api.github.com/users/yamadashy")
data = resp.json()
name = data"name" # pray it exists
reposurl = data["reposurl"]
reposresp = requests.get(reposurl)
repos = reposresp.json()
firstrepo = repos0"name" # more praying
# after:
user = WebPath("https://api.github.com/users/yamadashy").get()
name = user.find("name", default="Unknown")
firstrepo = (user / "reposurl").get().find("0.name", default="No repos")
Other stuff:
Request timing: GET /users → 200 (247ms)
Rate limiting: .with_rate_limit(2.0)
Pagination with cycle detection
Debugging the api itself with .inspect()
Caching that strips auth headers automatically
What makes it different vs existing librariees:
requests + jmespath/jsonpath: Need 2+ libraries
httpx: Similar base nav but no json navigation or debugging integration
furl + requests: Not sure if we're in the same boat but this is more for url building ..
# Target audience
For ppl who:
Build scripts that consume apis (stock prices, crypto prices, GitHub stats, etc etc.)
Get frustrated debugging
/r/Python
https://redd.it/1lr8d7t
Yeaps another url library. But hear me out. Read on first.
# What my project does
Extending the pathlib concept to HTTP:
# before:
resp = requests.get("https://api.github.com/users/yamadashy")
data = resp.json()
name = data"name" # pray it exists
reposurl = data["reposurl"]
reposresp = requests.get(reposurl)
repos = reposresp.json()
firstrepo = repos0"name" # more praying
# after:
user = WebPath("https://api.github.com/users/yamadashy").get()
name = user.find("name", default="Unknown")
firstrepo = (user / "reposurl").get().find("0.name", default="No repos")
Other stuff:
Request timing: GET /users → 200 (247ms)
Rate limiting: .with_rate_limit(2.0)
Pagination with cycle detection
Debugging the api itself with .inspect()
Caching that strips auth headers automatically
What makes it different vs existing librariees:
requests + jmespath/jsonpath: Need 2+ libraries
httpx: Similar base nav but no json navigation or debugging integration
furl + requests: Not sure if we're in the same boat but this is more for url building ..
# Target audience
For ppl who:
Build scripts that consume apis (stock prices, crypto prices, GitHub stats, etc etc.)
Get frustrated debugging
/r/Python
https://redd.it/1lr8d7t
I benchmarked 4 Python text extraction libraries so you don't have to (2025 results)
TL;DR: Comprehensive benchmarks of Kreuzberg, Docling, MarkItDown, and Unstructured across 94 real-world documents. Results might surprise you.
## 📊 Live Results: https://goldziher.github.io/python-text-extraction-libs-benchmarks/
---
## Context
As the author of Kreuzberg, I wanted to create an honest, comprehensive benchmark of Python text extraction libraries. No cherry-picking, no marketing fluff - just real performance data across 94 documents (~210MB) ranging from tiny text files to 59MB academic papers.
Full disclosure: I built Kreuzberg, but these benchmarks are automated, reproducible, and the methodology is completely open-source.
---
## 🔬 What I Tested
### Libraries Benchmarked:
- Kreuzberg (71MB, 20 deps) - My library
- Docling (1,032MB, 88 deps) - IBM's ML-powered solution
- MarkItDown (251MB, 25 deps) - Microsoft's Markdown converter
- Unstructured (146MB, 54 deps) - Enterprise document processing
### Test Coverage:
- 94 real documents: PDFs, Word docs, HTML, images, spreadsheets
- 5 size categories: Tiny (<100KB) to Huge (>50MB)
- 6 languages: English, Hebrew, German, Chinese, Japanese, Korean
- CPU-only processing: No GPU acceleration for fair comparison
- Multiple metrics: Speed, memory usage, success rates, installation sizes
---
## 🏆 Results Summary
### Speed Champions 🚀
1. Kreuzberg: 35+ files/second, handles everything
2. Unstructured: Moderate speed, excellent reliability
3. MarkItDown: Good on simple docs, struggles with complex files
4. Docling: Often 60+ minutes per file (!!)
### Installation Footprint 📦
- Kreuzberg: 71MB, 20 dependencies ⚡
- Unstructured:
/r/Python
https://redd.it/1ls6hj5
TL;DR: Comprehensive benchmarks of Kreuzberg, Docling, MarkItDown, and Unstructured across 94 real-world documents. Results might surprise you.
## 📊 Live Results: https://goldziher.github.io/python-text-extraction-libs-benchmarks/
---
## Context
As the author of Kreuzberg, I wanted to create an honest, comprehensive benchmark of Python text extraction libraries. No cherry-picking, no marketing fluff - just real performance data across 94 documents (~210MB) ranging from tiny text files to 59MB academic papers.
Full disclosure: I built Kreuzberg, but these benchmarks are automated, reproducible, and the methodology is completely open-source.
---
## 🔬 What I Tested
### Libraries Benchmarked:
- Kreuzberg (71MB, 20 deps) - My library
- Docling (1,032MB, 88 deps) - IBM's ML-powered solution
- MarkItDown (251MB, 25 deps) - Microsoft's Markdown converter
- Unstructured (146MB, 54 deps) - Enterprise document processing
### Test Coverage:
- 94 real documents: PDFs, Word docs, HTML, images, spreadsheets
- 5 size categories: Tiny (<100KB) to Huge (>50MB)
- 6 languages: English, Hebrew, German, Chinese, Japanese, Korean
- CPU-only processing: No GPU acceleration for fair comparison
- Multiple metrics: Speed, memory usage, success rates, installation sizes
---
## 🏆 Results Summary
### Speed Champions 🚀
1. Kreuzberg: 35+ files/second, handles everything
2. Unstructured: Moderate speed, excellent reliability
3. MarkItDown: Good on simple docs, struggles with complex files
4. Docling: Often 60+ minutes per file (!!)
### Installation Footprint 📦
- Kreuzberg: 71MB, 20 dependencies ⚡
- Unstructured:
/r/Python
https://redd.it/1ls6hj5
GitHub
GitHub - Goldziher/kreuzberg: A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured…
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Rub...
How to record system audio from django website ?
HI , i am working on a "Real time AI lecture/class note-taker"
for that i was trying to record system audio ,,..... but that seems to not work.... i am using django framework of python... can anyone help me ?
/r/django
https://redd.it/1lrais1
HI , i am working on a "Real time AI lecture/class note-taker"
for that i was trying to record system audio ,,..... but that seems to not work.... i am using django framework of python... can anyone help me ?
/r/django
https://redd.it/1lrais1
Reddit
From the django community on Reddit
Explore this post and more from the django community
Is this really the right way to pass parameters from React?
Making a simple application which is meant to send a list to django as a parameter for a get. In short, I'm sending a list of names and want to retrieve any entry that uses one of these names.
The only way I was able to figure out how to do this was to first convert the list to a string and then convert that string back into a JSON in the view. So it looks like this
react
api/myget/?names=${JSON.stringify(listofnames)}
Django
listofnames = json.loads(request.queryparams'list_of_names'
this feels very redundant to me. Is this the way people typically would pass a list?
/r/djangolearning
https://redd.it/1lpw4xs
Making a simple application which is meant to send a list to django as a parameter for a get. In short, I'm sending a list of names and want to retrieve any entry that uses one of these names.
The only way I was able to figure out how to do this was to first convert the list to a string and then convert that string back into a JSON in the view. So it looks like this
react
api/myget/?names=${JSON.stringify(listofnames)}
Django
listofnames = json.loads(request.queryparams'list_of_names'
this feels very redundant to me. Is this the way people typically would pass a list?
/r/djangolearning
https://redd.it/1lpw4xs
Reddit
From the djangolearning community on Reddit
Explore this post and more from the djangolearning community
Robyn now supports Server Sent Events
For the unaware, Robyn is a super fast async Python web framework.
Server Sent Events were one of the most requested features and Robyn finally supports it :D
Let me know what you think and if you'd like to request any more features.
Release Notes - https://github.com/sparckles/Robyn/releases/tag/v0.71.0
/r/Python
https://redd.it/1ls89sy
For the unaware, Robyn is a super fast async Python web framework.
Server Sent Events were one of the most requested features and Robyn finally supports it :D
Let me know what you think and if you'd like to request any more features.
Release Notes - https://github.com/sparckles/Robyn/releases/tag/v0.71.0
/r/Python
https://redd.it/1ls89sy
GitHub
GitHub - sparckles/Robyn: Robyn is a Super Fast Async Python Web Framework with a Rust runtime.
Robyn is a Super Fast Async Python Web Framework with a Rust runtime. - sparckles/Robyn
An analytic theory of creativity in convolutional diffusion models.
https://arxiv.org/abs/2412.20292
/r/MachineLearning
https://redd.it/1lsipgp
https://arxiv.org/abs/2412.20292
/r/MachineLearning
https://redd.it/1lsipgp
arXiv.org
An analytic theory of creativity in convolutional diffusion models
We obtain an analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-matching diffusion models can generate highly original images that lie far...
Sunday Daily Thread: What's everyone working on this week?
# Weekly Thread: What's Everyone Working On This Week? 🛠️
Hello /r/Python! It's time to share what you've been working on! Whether it's a work-in-progress, a completed masterpiece, or just a rough idea, let us know what you're up to!
## How it Works:
1. Show & Tell: Share your current projects, completed works, or future ideas.
2. Discuss: Get feedback, find collaborators, or just chat about your project.
3. Inspire: Your project might inspire someone else, just as you might get inspired here.
## Guidelines:
Feel free to include as many details as you'd like. Code snippets, screenshots, and links are all welcome.
Whether it's your job, your hobby, or your passion project, all Python-related work is welcome here.
## Example Shares:
1. Machine Learning Model: Working on a ML model to predict stock prices. Just cracked a 90% accuracy rate!
2. Web Scraping: Built a script to scrape and analyze news articles. It's helped me understand media bias better.
3. Automation: Automated my home lighting with Python and Raspberry Pi. My life has never been easier!
Let's build and grow together! Share your journey and learn from others. Happy coding! 🌟
/r/Python
https://redd.it/1lsnrbz
# Weekly Thread: What's Everyone Working On This Week? 🛠️
Hello /r/Python! It's time to share what you've been working on! Whether it's a work-in-progress, a completed masterpiece, or just a rough idea, let us know what you're up to!
## How it Works:
1. Show & Tell: Share your current projects, completed works, or future ideas.
2. Discuss: Get feedback, find collaborators, or just chat about your project.
3. Inspire: Your project might inspire someone else, just as you might get inspired here.
## Guidelines:
Feel free to include as many details as you'd like. Code snippets, screenshots, and links are all welcome.
Whether it's your job, your hobby, or your passion project, all Python-related work is welcome here.
## Example Shares:
1. Machine Learning Model: Working on a ML model to predict stock prices. Just cracked a 90% accuracy rate!
2. Web Scraping: Built a script to scrape and analyze news articles. It's helped me understand media bias better.
3. Automation: Automated my home lighting with Python and Raspberry Pi. My life has never been easier!
Let's build and grow together! Share your journey and learn from others. Happy coding! 🌟
/r/Python
https://redd.it/1lsnrbz
Reddit
From the Python community on Reddit
Explore this post and more from the Python community
For running Python scripts on schedule or as APIs, what do you use?
Just curious, if you’ve written a Python script (say for scraping, data cleaning, sending reports, automating alerts, etc.), how do you usually go about:
1. Running it on a schedule (daily, hourly, etc)?
2. Exposing it as an API (to trigger remotely or integrate with another tool/app)?
Do you:
Use GitHub Actions or cron?
Set up Flask/FastAPI + deploy somewhere like Render?
Use Replit, AWS Lambda, or something else?
Also: would you ever consider paying (like $5–10/month) for a tool that lets you just upload your script and get:
A private API endpoint
Auth + input support
Optional scheduling (like “run every morning at 7 AM”) all without needing to write YAML or do DevOps stuff?
I’m trying to understand what people prefer. Would love your thoughts! 🙏
/r/Python
https://redd.it/1lsgsqn
Just curious, if you’ve written a Python script (say for scraping, data cleaning, sending reports, automating alerts, etc.), how do you usually go about:
1. Running it on a schedule (daily, hourly, etc)?
2. Exposing it as an API (to trigger remotely or integrate with another tool/app)?
Do you:
Use GitHub Actions or cron?
Set up Flask/FastAPI + deploy somewhere like Render?
Use Replit, AWS Lambda, or something else?
Also: would you ever consider paying (like $5–10/month) for a tool that lets you just upload your script and get:
A private API endpoint
Auth + input support
Optional scheduling (like “run every morning at 7 AM”) all without needing to write YAML or do DevOps stuff?
I’m trying to understand what people prefer. Would love your thoughts! 🙏
/r/Python
https://redd.it/1lsgsqn
Reddit
From the Python community on Reddit
Explore this post and more from the Python community
What's your take on Celery vs django-qstash for background tasks
Hello guys, I'm currently working on a personal project and would like to know your thoughts and advice on handling background tasks in django.
My use cases includes:
1. Sending transactional emails in the background
2. Some few periodic tasks
Celery is super powerful and flexible, but it requires running a persistent worker which can get tricky or expensive on some platforms like Render. On the other hand, QStash lets you queue tasks and have them POST to your app without a worker — great for simpler or cost-sensitive deployments.
Have you tried both? What are the treadoffs of adopting django-Qstash.
/r/django
https://redd.it/1lsneon
Hello guys, I'm currently working on a personal project and would like to know your thoughts and advice on handling background tasks in django.
My use cases includes:
1. Sending transactional emails in the background
2. Some few periodic tasks
Celery is super powerful and flexible, but it requires running a persistent worker which can get tricky or expensive on some platforms like Render. On the other hand, QStash lets you queue tasks and have them POST to your app without a worker — great for simpler or cost-sensitive deployments.
Have you tried both? What are the treadoffs of adopting django-Qstash.
/r/django
https://redd.it/1lsneon
Reddit
From the django community on Reddit
Explore this post and more from the django community
Web push notifications from Django. Here's the tutorial.
https://youtu.be/grSfBbYuJ0I?feature=shared
/r/django
https://redd.it/1lsrgvl
https://youtu.be/grSfBbYuJ0I?feature=shared
/r/django
https://redd.it/1lsrgvl
YouTube
JS client, SW and Test Messages | Django Web Push Notifications Part-1 | Ishaan Topkar
Learn how to send web push notifications from Django. This is the part 1 of the tutorial- setting up a JavaScript client, a service worker and sending test messages from firebase. Part 2 coming soon.
Source Code (JS client and SW):
https://github.com/topCodegeek/fcm…
Source Code (JS client and SW):
https://github.com/topCodegeek/fcm…
Python as essentially a cross-platform shell script?
I’m making an SSH server using OpenSSH, and a custom client interface. I’m using Python as the means of bringing it all together: handling generation of configs, authentication keys, and building the client interface. Basically a setup script to cover certain limitations and prevent a bunch of extra manual setup.
Those (to me) seem like tasks that shell scripts are commonly used for, but since those scripts can vary from system to system, I chose to use Python as a cross-platform solution. That sorta got me thinking, have any of you ever used Python this way? If so, what did you use it for?
/r/Python
https://redd.it/1lss8mg
I’m making an SSH server using OpenSSH, and a custom client interface. I’m using Python as the means of bringing it all together: handling generation of configs, authentication keys, and building the client interface. Basically a setup script to cover certain limitations and prevent a bunch of extra manual setup.
Those (to me) seem like tasks that shell scripts are commonly used for, but since those scripts can vary from system to system, I chose to use Python as a cross-platform solution. That sorta got me thinking, have any of you ever used Python this way? If so, what did you use it for?
/r/Python
https://redd.it/1lss8mg
Reddit
From the Python community on Reddit
Explore this post and more from the Python community
How many models should an app have?
Hello, I'm developing a simple onlins bookstore project. In my shop app, I have about 20 models. Is this ok, or bad practice?
/r/djangolearning
https://redd.it/1lscnh2
Hello, I'm developing a simple onlins bookstore project. In my shop app, I have about 20 models. Is this ok, or bad practice?
/r/djangolearning
https://redd.it/1lscnh2
Reddit
From the djangolearning community on Reddit
Explore this post and more from the djangolearning community
We built an AI-agent with a state machine instead of a giant prompt
/r/IPython
https://redd.it/1lsw3k3
/r/IPython
https://redd.it/1lsw3k3
Help with my understanding of Flask teardown logic
Hello, I need some clarification of my understanding of this issue. Do I really the following teardown logic at all or not? Long story short, Ive been struggling with password resets. And somewhere between the mess of Git commits, I keep adding stuff, just in case. Its usually some other issue I solved, and I solve eventually. The question is I want to really know if the teardown logic is necessay.
I read somewhere, that Flask does this automaatically anyway (it has something to do with g, request context), and you dont need i even with app.app_context().push(). But I keep adding this, only to solve it anyway using something else. The reason why I keep adding this back, is becoz CSRF related errors keep popping between fixes. I want to remove it once and for all
@app.teardownrequest
def teardownrequest(responseorexc):
db.session.remove()
@app.teardownappcontext
def teardownappcontext(responseorexc):
db.session.remove()
/r/flask
https://redd.it/1lsv1dp
Hello, I need some clarification of my understanding of this issue. Do I really the following teardown logic at all or not? Long story short, Ive been struggling with password resets. And somewhere between the mess of Git commits, I keep adding stuff, just in case. Its usually some other issue I solved, and I solve eventually. The question is I want to really know if the teardown logic is necessay.
I read somewhere, that Flask does this automaatically anyway (it has something to do with g, request context), and you dont need i even with app.app_context().push(). But I keep adding this, only to solve it anyway using something else. The reason why I keep adding this back, is becoz CSRF related errors keep popping between fixes. I want to remove it once and for all
@app.teardownrequest
def teardownrequest(responseorexc):
db.session.remove()
@app.teardownappcontext
def teardownappcontext(responseorexc):
db.session.remove()
/r/flask
https://redd.it/1lsv1dp
Reddit
From the flask community on Reddit
Explore this post and more from the flask community
Solving Wordle using uv's dependency resolver
What this project does
Just a small weekend project I hacked together. This is a Wordle solver that generates a few thousand Python packages that encode a Wordle as a constraint satisfaction problem and then uses uv's dependency resolver to generate a lockfile, thus coming up with a potential solution.
The user tries it, gets a response from the Wordle website, the solver incorporates it into the package constraints and returns another potential solution and so on until the Wordle is solved or it discovers it doesn't know the word.
Blog post on how it works here
Target audience
This isn't really for production Wordle-solving use, although it did manage to solve today's Wordle, so perhaps it can become your daily driver.
Comparison
There are lots of other Wordle solvers, but to my knowledge, this is the first Wordle solver on the market that uses a package manager's dependency resolver.
/r/Python
https://redd.it/1lsuqis
What this project does
Just a small weekend project I hacked together. This is a Wordle solver that generates a few thousand Python packages that encode a Wordle as a constraint satisfaction problem and then uses uv's dependency resolver to generate a lockfile, thus coming up with a potential solution.
The user tries it, gets a response from the Wordle website, the solver incorporates it into the package constraints and returns another potential solution and so on until the Wordle is solved or it discovers it doesn't know the word.
Blog post on how it works here
Target audience
This isn't really for production Wordle-solving use, although it did manage to solve today's Wordle, so perhaps it can become your daily driver.
Comparison
There are lots of other Wordle solvers, but to my knowledge, this is the first Wordle solver on the market that uses a package manager's dependency resolver.
/r/Python
https://redd.it/1lsuqis
GitHub
GitHub - mildbyte/uv-wordle-solver: Solve Wordle using uv's dependency resolver!
Solve Wordle using uv's dependency resolver! Contribute to mildbyte/uv-wordle-solver development by creating an account on GitHub.
We built an AI-agent with a state machine instead of a giant prompt
/r/IPython
https://redd.it/1lswmbe
/r/IPython
https://redd.it/1lswmbe