Python Daily
2.57K subscribers
1.48K photos
53 videos
2 files
38.9K links
Daily Python News
Question, Tips and Tricks, Best Practices on Python Programming Language
Find more reddit channels over at @r_channels
Download Telegram
D Did anyone receive this from NIPS?

Your co-author, Reviewer has not submitted their reviews for one or more papers assigned to them for review (or they submitted insufficient reviews). Please kindly note the Review deadline was on the 2nd July 11.59pm AOE.

===
My co-author has graduated and no longer worked in academic anymore. How can I handle that? It is not fair to reject my paper!

/r/MachineLearning
https://redd.it/1lrr5yy
A google play clone database schema

/r/django
https://redd.it/1lrljdn
WebPath: Yes yet another another url library but hear me out

Yeaps another url library. But hear me out. Read on first. 

# What my project does

Extending the pathlib concept to HTTP:

# before:
resp = requests.get("https://api.github.com/users/yamadashy")
data = resp.json()
name = data"name"  # pray it exists
reposurl = data["reposurl"] 
reposresp = requests.get(reposurl)
repos = reposresp.json()
first
repo = repos0"name"  # more praying

# after:
user = WebPath("https://api.github.com/users/yamadashy").get()
name = user.find("name", default="Unknown")
firstrepo = (user / "reposurl").get().find("0.name", default="No repos")
Other stuff:

Request timing: GET /users → 200 (247ms)
Rate limiting: .with_rate_limit(2.0)
Pagination with cycle detection
Debugging the api itself with .inspect()
Caching that strips auth headers automatically

What makes it different vs existing librariees:

requests + jmespath/jsonpath: Need 2+ libraries
httpx: Similar base nav but no json navigation or debugging integration
furl + requests: Not sure if we're in the same boat but this is more for url building .. 

# Target audience

For ppl who:

Build scripts that consume apis (stock prices, crypto prices, GitHub stats, etc etc.)
Get frustrated debugging

/r/Python
https://redd.it/1lr8d7t
I benchmarked 4 Python text extraction libraries so you don't have to (2025 results)

TL;DR: Comprehensive benchmarks of Kreuzberg, Docling, MarkItDown, and Unstructured across 94 real-world documents. Results might surprise you.

## 📊 Live Results: https://goldziher.github.io/python-text-extraction-libs-benchmarks/

---

## Context

As the author of Kreuzberg, I wanted to create an honest, comprehensive benchmark of Python text extraction libraries. No cherry-picking, no marketing fluff - just real performance data across 94 documents (~210MB) ranging from tiny text files to 59MB academic papers.

Full disclosure: I built Kreuzberg, but these benchmarks are automated, reproducible, and the methodology is completely open-source.

---

## 🔬 What I Tested

### Libraries Benchmarked:
- Kreuzberg (71MB, 20 deps) - My library
- Docling (1,032MB, 88 deps) - IBM's ML-powered solution
- MarkItDown (251MB, 25 deps) - Microsoft's Markdown converter
- Unstructured (146MB, 54 deps) - Enterprise document processing

### Test Coverage:
- 94 real documents: PDFs, Word docs, HTML, images, spreadsheets
- 5 size categories: Tiny (<100KB) to Huge (>50MB)
- 6 languages: English, Hebrew, German, Chinese, Japanese, Korean
- CPU-only processing: No GPU acceleration for fair comparison
- Multiple metrics: Speed, memory usage, success rates, installation sizes

---

## 🏆 Results Summary

### Speed Champions 🚀
1. Kreuzberg: 35+ files/second, handles everything
2. Unstructured: Moderate speed, excellent reliability
3. MarkItDown: Good on simple docs, struggles with complex files
4. Docling: Often 60+ minutes per file (!!)

### Installation Footprint 📦
- Kreuzberg: 71MB, 20 dependencies
- Unstructured:

/r/Python
https://redd.it/1ls6hj5
How to record system audio from django website ?

HI , i am working on a "Real time AI lecture/class note-taker"
for that i was trying to record system audio ,,..... but that seems to not work.... i am using django framework of python... can anyone help me ?

/r/django
https://redd.it/1lrais1
Is this really the right way to pass parameters from React?

Making a simple application which is meant to send a list to django as a parameter for a get. In short, I'm sending a list of names and want to retrieve any entry that uses one of these names.

The only way I was able to figure out how to do this was to first convert the list to a string and then convert that string back into a JSON in the view. So it looks like this

react

api/myget/?names=${JSON.stringify(listofnames)}


Django

list
ofnames = json.loads(request.queryparams'list_of_names'

this feels very redundant to me. Is this the way people typically would pass a list?

/r/djangolearning
https://redd.it/1lpw4xs
My first flask app, feedback?
https://cyberinteractive.net/

/r/flask
https://redd.it/1ls244l
Robyn now supports Server Sent Events

For the unaware, Robyn is a super fast async Python web framework.

Server Sent Events were one of the most requested features and Robyn finally supports it :D

Let me know what you think and if you'd like to request any more features.

Release Notes - https://github.com/sparckles/Robyn/releases/tag/v0.71.0

/r/Python
https://redd.it/1ls89sy
Sunday Daily Thread: What's everyone working on this week?

# Weekly Thread: What's Everyone Working On This Week? 🛠️

Hello /r/Python! It's time to share what you've been working on! Whether it's a work-in-progress, a completed masterpiece, or just a rough idea, let us know what you're up to!

## How it Works:

1. Show & Tell: Share your current projects, completed works, or future ideas.
2. Discuss: Get feedback, find collaborators, or just chat about your project.
3. Inspire: Your project might inspire someone else, just as you might get inspired here.

## Guidelines:

Feel free to include as many details as you'd like. Code snippets, screenshots, and links are all welcome.
Whether it's your job, your hobby, or your passion project, all Python-related work is welcome here.

## Example Shares:

1. Machine Learning Model: Working on a ML model to predict stock prices. Just cracked a 90% accuracy rate!
2. Web Scraping: Built a script to scrape and analyze news articles. It's helped me understand media bias better.
3. Automation: Automated my home lighting with Python and Raspberry Pi. My life has never been easier!

Let's build and grow together! Share your journey and learn from others. Happy coding! 🌟

/r/Python
https://redd.it/1lsnrbz
For running Python scripts on schedule or as APIs, what do you use?

Just curious, if you’ve written a Python script (say for scraping, data cleaning, sending reports, automating alerts, etc.), how do you usually go about:

1. Running it on a schedule (daily, hourly, etc)?
2. Exposing it as an API (to trigger remotely or integrate with another tool/app)?

Do you:

Use GitHub Actions or cron?
Set up Flask/FastAPI + deploy somewhere like Render?
Use Replit, AWS Lambda, or something else?

Also: would you ever consider paying (like $5–10/month) for a tool that lets you just upload your script and get:

A private API endpoint
Auth + input support
Optional scheduling (like “run every morning at 7 AM”) all without needing to write YAML or do DevOps stuff?

I’m trying to understand what people prefer. Would love your thoughts! 🙏

/r/Python
https://redd.it/1lsgsqn
What's your take on Celery vs django-qstash for background tasks

Hello guys, I'm currently working on a personal project and would like to know your thoughts and advice on handling background tasks in django.

My use cases includes:

1. Sending transactional emails in the background

2. Some few periodic tasks

Celery is super powerful and flexible, but it requires running a persistent worker which can get tricky or expensive on some platforms like Render. On the other hand, QStash lets you queue tasks and have them POST to your app without a worker — great for simpler or cost-sensitive deployments.

Have you tried both? What are the treadoffs of adopting django-Qstash.

/r/django
https://redd.it/1lsneon
Python as essentially a cross-platform shell script?

I’m making an SSH server using OpenSSH, and a custom client interface. I’m using Python as the means of bringing it all together: handling generation of configs, authentication keys, and building the client interface. Basically a setup script to cover certain limitations and prevent a bunch of extra manual setup.

Those (to me) seem like tasks that shell scripts are commonly used for, but since those scripts can vary from system to system, I chose to use Python as a cross-platform solution. That sorta got me thinking, have any of you ever used Python this way? If so, what did you use it for?

/r/Python
https://redd.it/1lss8mg
How many models should an app have?

Hello, I'm developing a simple onlins bookstore project. In my shop app, I have about 20 models. Is this ok, or bad practice?

/r/djangolearning
https://redd.it/1lscnh2
We built an AI-agent with a state machine instead of a giant prompt

/r/IPython
https://redd.it/1lsw3k3
Help with my understanding of Flask teardown logic

Hello, I need some clarification of my understanding of this issue. Do I really the following teardown logic at all or not? Long story short, Ive been struggling with password resets. And somewhere between the mess of Git commits, I keep adding stuff, just in case. Its usually some other issue I solved, and I solve eventually. The question is I want to really know if the teardown logic is necessay.


I read somewhere, that Flask does this automaatically anyway (it has something to do with g, request context), and you dont need i even with app.app_context().push(). But I keep adding this, only to solve it anyway using something else. The reason why I keep adding this back, is becoz CSRF related errors keep popping between fixes. I want to remove it once and for all

@app.teardownrequest
def teardown
request(responseorexc):
db.session.remove()

@app.teardownappcontext
def teardown
appcontext(responseorexc):
db.session.remove()

/r/flask
https://redd.it/1lsv1dp
Solving Wordle using uv's dependency resolver

What this project does

Just a small weekend project I hacked together. This is a Wordle solver that generates a few thousand Python packages that encode a Wordle as a constraint satisfaction problem and then uses uv's dependency resolver to generate a lockfile, thus coming up with a potential solution.

The user tries it, gets a response from the Wordle website, the solver incorporates it into the package constraints and returns another potential solution and so on until the Wordle is solved or it discovers it doesn't know the word.

Blog post on how it works here

Target audience

This isn't really for production Wordle-solving use, although it did manage to solve today's Wordle, so perhaps it can become your daily driver.

Comparison

There are lots of other Wordle solvers, but to my knowledge, this is the first Wordle solver on the market that uses a package manager's dependency resolver.

/r/Python
https://redd.it/1lsuqis
We built an AI-agent with a state machine instead of a giant prompt

/r/IPython
https://redd.it/1lswmbe