The Hidden Bias of Alembic and Django Migrations (and when to consider alternatives)
Hey all,
My name is Rotem, I'm one of the creators of [Atlas](https://atlasgo.io), a database schema-as-code tool. You can find us on [GitHub](https://github.com/ariga/atlas).
I recently wrote a blog post covering cases where you might want to consider an alternative to Alembic or Django migrations for your schema changes.
Don't get me wrong - alembic and Django migrations are great tools - among the best in the industry - if you are using them successfully, you should probably keep at it :-)
However, over the years, I've come to realize that these tools, having been built to fit the use case of serving an ORM, have biases that might hinder your project.
In case you are interested, you can [find the blog post here](https://atlasgo.io/blog/2025/02/10/the-hidden-bias-alembic-django-migrations?utm_source=reddit&utm_medium=post&utm_campaign=python).
Atlas has two capabilities that enable it to work very well inside ORM codebases, `external_schema` and `composite_schema`. Atlas has ORM integration plugins called "providers" that allow it to read the desired schema of the database from your ORM code, you can then use it like:
data "external_schema" "sqlalchemy" {
program = [
"atlas-provider-sqlalchemy",
/r/Python
https://redd.it/1im66lu
Hey all,
My name is Rotem, I'm one of the creators of [Atlas](https://atlasgo.io), a database schema-as-code tool. You can find us on [GitHub](https://github.com/ariga/atlas).
I recently wrote a blog post covering cases where you might want to consider an alternative to Alembic or Django migrations for your schema changes.
Don't get me wrong - alembic and Django migrations are great tools - among the best in the industry - if you are using them successfully, you should probably keep at it :-)
However, over the years, I've come to realize that these tools, having been built to fit the use case of serving an ORM, have biases that might hinder your project.
In case you are interested, you can [find the blog post here](https://atlasgo.io/blog/2025/02/10/the-hidden-bias-alembic-django-migrations?utm_source=reddit&utm_medium=post&utm_campaign=python).
Atlas has two capabilities that enable it to work very well inside ORM codebases, `external_schema` and `composite_schema`. Atlas has ORM integration plugins called "providers" that allow it to read the desired schema of the database from your ORM code, you can then use it like:
data "external_schema" "sqlalchemy" {
program = [
"atlas-provider-sqlalchemy",
/r/Python
https://redd.it/1im66lu
atlasgo.io
Atlas is a language-agnostic tool for managing and migrating database schemas using modern DevOps principles. It enables developers to automate schema changes through both declarative (schema-as-code) and versioned migration workflows, supporting inputs like…
Someone talk me down from using Yamale
...or push me over the edge; whichever. So I've been looking into YAML schema validators that can handle complex yaml files like, for example, the `ci.yml` file that configures GitHub actions.
The combined internet wisdom from searching google and conferring with Gemini and Claude 3.5 is to use `jsonschema.validate`. But that seems, IDK, like just wrong to the core. Besides aren't there a few things that you can do in .yml files that you can't in .json?
After some scrolling, I came across Yamale, which looks pretty awesome albeit underrated. I like the `includes` and 'recursions', but I have a few things about it that make me hesitate:
\- Is a really as popular as PyPy makes it seem (2M monthly dowloads)? When I search specifically for use cases and questions about it on SO, 🦗. Same here on Reddit. Maybe everyone using it is so happy and it works so well as to be invisible. Or maybe that "2M monthly downloads" means nothing?
\- Is it going to be around and supported much longer? From the GH repo
/r/Python
https://redd.it/1imc4we
...or push me over the edge; whichever. So I've been looking into YAML schema validators that can handle complex yaml files like, for example, the `ci.yml` file that configures GitHub actions.
The combined internet wisdom from searching google and conferring with Gemini and Claude 3.5 is to use `jsonschema.validate`. But that seems, IDK, like just wrong to the core. Besides aren't there a few things that you can do in .yml files that you can't in .json?
After some scrolling, I came across Yamale, which looks pretty awesome albeit underrated. I like the `includes` and 'recursions', but I have a few things about it that make me hesitate:
\- Is a really as popular as PyPy makes it seem (2M monthly dowloads)? When I search specifically for use cases and questions about it on SO, 🦗. Same here on Reddit. Maybe everyone using it is so happy and it works so well as to be invisible. Or maybe that "2M monthly downloads" means nothing?
\- Is it going to be around and supported much longer? From the GH repo
/r/Python
https://redd.it/1imc4we
Reddit
From the Python community on Reddit: Someone talk me down from using Yamale
Explore this post and more from the Python community
Controlling mouse with hand gesture. What are your thoughts.!
https://www.reddit.com/r/PythonProjects2/comments/1imvsya/lets\_talk\_about\_python\_opencv/
\#python #opencv
/r/Python
https://redd.it/1imwemv
https://www.reddit.com/r/PythonProjects2/comments/1imvsya/lets\_talk\_about\_python\_opencv/
\#python #opencv
/r/Python
https://redd.it/1imwemv
Reddit
From the PythonProjects2 community on Reddit: lets talk about #python #OpenCV
Explore this post and more from the PythonProjects2 community
Coursera Guided Project: Build a Data Science Web App with Streamlit and Python
Hi there, everyone! Does anyone have the Colab or Jupyter Python code for this Coursera guided project? If so, please share it in the comments or message me. Thanks in advance!
/r/Python
https://redd.it/1imh7nw
Hi there, everyone! Does anyone have the Colab or Jupyter Python code for this Coursera guided project? If so, please share it in the comments or message me. Thanks in advance!
/r/Python
https://redd.it/1imh7nw
Reddit
From the Python community on Reddit
Explore this post and more from the Python community
Boolean search query generator
I’m working on a project where I generate Boolean queries using an LLM (like ChatGPT), but I need to ensure that the generated queries are valid based on the data in my database. If certain terms in the query don’t exist in the database, I need to automatically remove or modify them.
For example:
LLM-Generated Query: ("iPhone 14" OR "Samsung Galaxy S22") AND ("128GB" OR "256GB") AND ("Red" OR "Blue")
Database Check:
My DB has entries for "iPhone 14" and "Samsung Galaxy S22".
It only has "128GB" as a storage option (no "256GB").
For colors, only "Red" is available (no "Blue").
Modified Query (after DB validation): ("iPhone 14" OR "Samsung Galaxy S22") AND "128GB" AND "Red"
How to efficiently verify and modify these Boolean queries based on the DB contents? Are there existing libraries or tools that could help streamline this process?
Keep in mind that I can only use one llm cal for this purpose.
/r/Python
https://redd.it/1in2fbu
I’m working on a project where I generate Boolean queries using an LLM (like ChatGPT), but I need to ensure that the generated queries are valid based on the data in my database. If certain terms in the query don’t exist in the database, I need to automatically remove or modify them.
For example:
LLM-Generated Query: ("iPhone 14" OR "Samsung Galaxy S22") AND ("128GB" OR "256GB") AND ("Red" OR "Blue")
Database Check:
My DB has entries for "iPhone 14" and "Samsung Galaxy S22".
It only has "128GB" as a storage option (no "256GB").
For colors, only "Red" is available (no "Blue").
Modified Query (after DB validation): ("iPhone 14" OR "Samsung Galaxy S22") AND "128GB" AND "Red"
How to efficiently verify and modify these Boolean queries based on the DB contents? Are there existing libraries or tools that could help streamline this process?
Keep in mind that I can only use one llm cal for this purpose.
/r/Python
https://redd.it/1in2fbu
Reddit
From the Python community on Reddit
Explore this post and more from the Python community
Open-source AI influencer in Python
* **What My Project Does** – Open-source project to create a virtual AI influencer in Python
* **Target Audience** – Developers
* **Comparison** – Built on top of free to use open-source technologies
Link to project :- [https://github.com/SamurAIGPT/Ai-Influencer/](https://github.com/SamurAIGPT/Ai-Influencer/)
/r/Python
https://redd.it/1in0zhc
* **What My Project Does** – Open-source project to create a virtual AI influencer in Python
* **Target Audience** – Developers
* **Comparison** – Built on top of free to use open-source technologies
Link to project :- [https://github.com/SamurAIGPT/Ai-Influencer/](https://github.com/SamurAIGPT/Ai-Influencer/)
/r/Python
https://redd.it/1in0zhc
GitHub
GitHub - SamurAIGPT/AI-Influencer: Create and customize your AI influencer open-source
Create and customize your AI influencer open-source - SamurAIGPT/AI-Influencer
Dockerize a Django App
I need help. I want to deploy a project that I've been working with. It's fairly simple, here's the repo: https://github.com/gabrielpistore/SiGOS-UFCAT. I've been thinking about using docker. Anyone could give me some advices on how should I do it.
/r/django
https://redd.it/1imvnfk
I need help. I want to deploy a project that I've been working with. It's fairly simple, here's the repo: https://github.com/gabrielpistore/SiGOS-UFCAT. I've been thinking about using docker. Anyone could give me some advices on how should I do it.
/r/django
https://redd.it/1imvnfk
GitHub
GitHub - gabrielpistore/SiGOS-UFCAT
Contribute to gabrielpistore/SiGOS-UFCAT development by creating an account on GitHub.
Plutus Is a Command Line Income and Expense Tracker
Hi,
Plutus helps you quickly analyze your income and expenses from the command line using a single CSV file as your data source.
Source code
[https://github.com/nickjj/plutus](https://github.com/nickjj/plutus)
Documentation / demo video
The repo has as extensive readme file
A demo video is on [YouTube ](https://www.youtube.com/watch?v=mwVnKbne9v4)(no ads) and is also linked in the readme
Target audience / why
You can use this to help with budgeting or getting your numbers in order for filing taxes.
You just want to keep track of your income from a few sources, separate out personal / business expenses and keep tabs on how much you paid in taxes. You want to get your numbers and move on with life.
Features and benefits
A single CSV data source
Have peace of mind it won't get corrupted from a tool upgrade
Easily trackable in git
Pipe its output to other tools
View it in any spreadsheet tool if you want ad hoc visualizations
Friendly towards sending it to an accountant
Categories and subcategories are unrestricted along with being easy to change later
A category is just text in a specific CSV column
Flexible report generating with a
/r/Python
https://redd.it/1imydyo
Hi,
Plutus helps you quickly analyze your income and expenses from the command line using a single CSV file as your data source.
Source code
[https://github.com/nickjj/plutus](https://github.com/nickjj/plutus)
Documentation / demo video
The repo has as extensive readme file
A demo video is on [YouTube ](https://www.youtube.com/watch?v=mwVnKbne9v4)(no ads) and is also linked in the readme
Target audience / why
You can use this to help with budgeting or getting your numbers in order for filing taxes.
You just want to keep track of your income from a few sources, separate out personal / business expenses and keep tabs on how much you paid in taxes. You want to get your numbers and move on with life.
Features and benefits
A single CSV data source
Have peace of mind it won't get corrupted from a tool upgrade
Easily trackable in git
Pipe its output to other tools
View it in any spreadsheet tool if you want ad hoc visualizations
Friendly towards sending it to an accountant
Categories and subcategories are unrestricted along with being easy to change later
A category is just text in a specific CSV column
Flexible report generating with a
/r/Python
https://redd.it/1imydyo
GitHub
GitHub - nickjj/plutus: A CLI tool for income and expense tracking.
A CLI tool for income and expense tracking. Contribute to nickjj/plutus development by creating an account on GitHub.
D What happened to SSMs and linear attentions?
Someone who is upto date with this area of research can summarize what is current state of SSMs and softmax attention alternatives? Are they used in cusomer focused models yet or are still in research? Does their promise only appears to be in benchmarks on a paper? or are the hardware accelerators have etched the attention so that it is fully juiced up and using SSMs or linear attention alternatives only provide marginal gains which does appeal with the level of complexity in them?
/r/MachineLearning
https://redd.it/1in9y30
Someone who is upto date with this area of research can summarize what is current state of SSMs and softmax attention alternatives? Are they used in cusomer focused models yet or are still in research? Does their promise only appears to be in benchmarks on a paper? or are the hardware accelerators have etched the attention so that it is fully juiced up and using SSMs or linear attention alternatives only provide marginal gains which does appeal with the level of complexity in them?
/r/MachineLearning
https://redd.it/1in9y30
Reddit
From the MachineLearning community on Reddit
Explore this post and more from the MachineLearning community
Wednesday Daily Thread: Beginner questions
# Weekly Thread: Beginner Questions 🐍
Welcome to our Beginner Questions thread! Whether you're new to Python or just looking to clarify some basics, this is the thread for you.
## How it Works:
1. Ask Anything: Feel free to ask any Python-related question. There are no bad questions here!
2. Community Support: Get answers and advice from the community.
3. Resource Sharing: Discover tutorials, articles, and beginner-friendly resources.
## Guidelines:
This thread is specifically for beginner questions. For more advanced queries, check out our [Advanced Questions Thread](#advanced-questions-thread-link).
## Recommended Resources:
If you don't receive a response, consider exploring r/LearnPython or join the Python Discord Server for quicker assistance.
## Example Questions:
1. What is the difference between a list and a tuple?
2. How do I read a CSV file in Python?
3. What are Python decorators and how do I use them?
4. How do I install a Python package using pip?
5. What is a virtual environment and why should I use one?
Let's help each other learn Python! 🌟
/r/Python
https://redd.it/1indhda
# Weekly Thread: Beginner Questions 🐍
Welcome to our Beginner Questions thread! Whether you're new to Python or just looking to clarify some basics, this is the thread for you.
## How it Works:
1. Ask Anything: Feel free to ask any Python-related question. There are no bad questions here!
2. Community Support: Get answers and advice from the community.
3. Resource Sharing: Discover tutorials, articles, and beginner-friendly resources.
## Guidelines:
This thread is specifically for beginner questions. For more advanced queries, check out our [Advanced Questions Thread](#advanced-questions-thread-link).
## Recommended Resources:
If you don't receive a response, consider exploring r/LearnPython or join the Python Discord Server for quicker assistance.
## Example Questions:
1. What is the difference between a list and a tuple?
2. How do I read a CSV file in Python?
3. What are Python decorators and how do I use them?
4. How do I install a Python package using pip?
5. What is a virtual environment and why should I use one?
Let's help each other learn Python! 🌟
/r/Python
https://redd.it/1indhda
Discord
Join the Python Discord Server!
We're a large community focused around the Python programming language. We believe that anyone can learn to code. | 412982 members
Preswald: A full-stack Python SDK for building and deploying interactive data apps
Hi everyone,
Preswald is a lightweight, full-stack SDK that helps you build, deploy, and manage interactive data applications. all with minimal Python and SQL. It brings together data ingestion, storage, transformation, and visualization into one simple framework.
Source Code: https://github.com/StructuredLabs/preswald
Slack: Community
Features:Target Audience / Why Use It:
Build apps with minimal Python/SQL.
Handle ingestion, ETL, and visualization in one SDK.
Connect to CSV, JSON, Parquet, or SQL databases easily.
Customize your app’s look with simple tweaks in
Deploy locally or to Google Cloud Run with one command.
Lightweight and simple, no need for a huge data stack.
If you’re tired of juggling tools to get simple data apps up and running, this might make life easier. It’s good for quick internal tools, dashboards, or just experimenting with data.
/r/Python
https://redd.it/1ind8kn
Hi everyone,
Preswald is a lightweight, full-stack SDK that helps you build, deploy, and manage interactive data applications. all with minimal Python and SQL. It brings together data ingestion, storage, transformation, and visualization into one simple framework.
Source Code: https://github.com/StructuredLabs/preswald
Slack: Community
Features:Target Audience / Why Use It:
Build apps with minimal Python/SQL.
Handle ingestion, ETL, and visualization in one SDK.
Connect to CSV, JSON, Parquet, or SQL databases easily.
Customize your app’s look with simple tweaks in
preswald.toml.Deploy locally or to Google Cloud Run with one command.
Lightweight and simple, no need for a huge data stack.
If you’re tired of juggling tools to get simple data apps up and running, this might make life easier. It’s good for quick internal tools, dashboards, or just experimenting with data.
/r/Python
https://redd.it/1ind8kn
GitHub
GitHub - StructuredLabs/preswald: Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data…
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide...
What I learned about Django security from my hidden analytics module
I built a hidden statistics module in my Django portfolio and discovered something interesting about security
I added a secret stats endpoint to my Django site that tracks all attempts to access my site. After analyzing 2.2k unique visitors, the data tells an interesting story.
Legitimate traffic is exactly what you'd expect: homepage (2.6k visits), portfolio (911), blog (661). But here's where it gets fun - my stats module caught hundreds of automated attacks trying everything from .env file access (64 attempts) to WordPress admin panels.
The best part? I didn't build any special security - Django's default configurations handled everything. The stats module just silently recorded all these failed attempts while serving my actual visitors without a hitch.
My favorite discovery was seeing the persistence of some bots - one tried +50 different variations of WordPress manifest files. On a Django site. I actually found myself admiring their determination.
TL;DR: Built a secret stats module in Django, watched it record thousands of failed hack attempts while Django's security didn't break a sweat.
https://preview.redd.it/0sleafxbxkie1.png?width=1905&format=png&auto=webp&s=e2f2912cdda1ac3940054f97e6346b68a2b8dc3b
/r/django
https://redd.it/1ina9i4
I built a hidden statistics module in my Django portfolio and discovered something interesting about security
I added a secret stats endpoint to my Django site that tracks all attempts to access my site. After analyzing 2.2k unique visitors, the data tells an interesting story.
Legitimate traffic is exactly what you'd expect: homepage (2.6k visits), portfolio (911), blog (661). But here's where it gets fun - my stats module caught hundreds of automated attacks trying everything from .env file access (64 attempts) to WordPress admin panels.
The best part? I didn't build any special security - Django's default configurations handled everything. The stats module just silently recorded all these failed attempts while serving my actual visitors without a hitch.
My favorite discovery was seeing the persistence of some bots - one tried +50 different variations of WordPress manifest files. On a Django site. I actually found myself admiring their determination.
TL;DR: Built a secret stats module in Django, watched it record thousands of failed hack attempts while Django's security didn't break a sweat.
https://preview.redd.it/0sleafxbxkie1.png?width=1905&format=png&auto=webp&s=e2f2912cdda1ac3940054f97e6346b68a2b8dc3b
/r/django
https://redd.it/1ina9i4
Any free hosting providers that allow me to install other apps?
I have a flask web app that uses musescore to generate sheet music, are there any free hosting providers that allow this? Pythonanywhere does allow me to compile other apps but has a 500mb limit.
/r/flask
https://redd.it/1inecoj
I have a flask web app that uses musescore to generate sheet music, are there any free hosting providers that allow this? Pythonanywhere does allow me to compile other apps but has a 500mb limit.
/r/flask
https://redd.it/1inecoj
Reddit
From the flask community on Reddit
Explore this post and more from the flask community
ParScrape v0.5.1 Released
# What My project Does:
Scrapes data from sites and uses AI to extract structured data from it.
# Whats New:
* BREAKING CHANGE: --ai-provider Google renamed to Gemini.
* Now supports XAI, Deepseek, OpenRouter, LiteLLM
* Now has much better pricing data.
# Key Features:
* Uses Playwright / Selenium to bypass most simple bot checks.
* Uses AI to extract data from a page and save it various formats such as CSV, XLSX, JSON, Markdown.
* Has rich console output to display data right in your terminal.
# GitHub and PyPI
* PAR Scrape is under active development and getting new features all the time.
* Check out the project on GitHub or for full documentation, installation instructions, and to contribute: [https://github.com/paulrobello/par\_scrape](https://github.com/paulrobello/par_scrape)
* PyPI [https://pypi.org/project/par\_scrape/](https://pypi.org/project/par_scrape/)
# Comparison:
I have seem many command line and web applications for scraping but none that are as simple, flexible and fast as ParScrape
# Target Audience
AI enthusiasts and data hungry hobbyist
/r/Python
https://redd.it/1inj8if
# What My project Does:
Scrapes data from sites and uses AI to extract structured data from it.
# Whats New:
* BREAKING CHANGE: --ai-provider Google renamed to Gemini.
* Now supports XAI, Deepseek, OpenRouter, LiteLLM
* Now has much better pricing data.
# Key Features:
* Uses Playwright / Selenium to bypass most simple bot checks.
* Uses AI to extract data from a page and save it various formats such as CSV, XLSX, JSON, Markdown.
* Has rich console output to display data right in your terminal.
# GitHub and PyPI
* PAR Scrape is under active development and getting new features all the time.
* Check out the project on GitHub or for full documentation, installation instructions, and to contribute: [https://github.com/paulrobello/par\_scrape](https://github.com/paulrobello/par_scrape)
* PyPI [https://pypi.org/project/par\_scrape/](https://pypi.org/project/par_scrape/)
# Comparison:
I have seem many command line and web applications for scraping but none that are as simple, flexible and fast as ParScrape
# Target Audience
AI enthusiasts and data hungry hobbyist
/r/Python
https://redd.it/1inj8if
My talk has been accepted for DjangoCon EU 2025!
“What if I fail?”
That thought used to haunt me every time I took a step outside my comfort zone. Applying for my first open-source contribution, organizing Django meetups, even sharing my thoughts publicly—self-doubt was always lurking.
But today, I have proof that pushing past fear leads to something bigger. My talk on Zango framework has been accepted for DjangoCon EU 2025! 🎉
This isn’t just about a conference talk. It’s about the journey—the long nights, the imposter syndrome, the relentless belief that if you keep showing up, opportunities will follow.
From casually exploring Django to building a thriving community in India, from contributing in silence to speaking on an international stage—it’s been a wild ride. And if there’s one thing I’ve learned, it’s this:
“Do it. Even if you think you’re not “ready” yet”
Because the best things happen when you stop waiting for permission.
If you’re attending DjangoCon EU 2025, let’s meet, exchange ideas, and keep growing together. 🚀
Checkout offical zango website: zango.dev
You can star the framework here: https://github.com/Healthlane-Technologies/Zango
https://preview.redd.it/vab7feuiwnie1.jpg?width=800&format=pjpg&auto=webp&s=3315f71790fd94a209cd300d10a036606ff56a0e
/r/django
https://redd.it/1inlr5k
“What if I fail?”
That thought used to haunt me every time I took a step outside my comfort zone. Applying for my first open-source contribution, organizing Django meetups, even sharing my thoughts publicly—self-doubt was always lurking.
But today, I have proof that pushing past fear leads to something bigger. My talk on Zango framework has been accepted for DjangoCon EU 2025! 🎉
This isn’t just about a conference talk. It’s about the journey—the long nights, the imposter syndrome, the relentless belief that if you keep showing up, opportunities will follow.
From casually exploring Django to building a thriving community in India, from contributing in silence to speaking on an international stage—it’s been a wild ride. And if there’s one thing I’ve learned, it’s this:
“Do it. Even if you think you’re not “ready” yet”
Because the best things happen when you stop waiting for permission.
If you’re attending DjangoCon EU 2025, let’s meet, exchange ideas, and keep growing together. 🚀
Checkout offical zango website: zango.dev
You can star the framework here: https://github.com/Healthlane-Technologies/Zango
https://preview.redd.it/vab7feuiwnie1.jpg?width=800&format=pjpg&auto=webp&s=3315f71790fd94a209cd300d10a036606ff56a0e
/r/django
https://redd.it/1inlr5k
Linkedin
Zango | LinkedIn
Zango | 13 followers on LinkedIn. An open framework by Zelthy for rapid app development with built-in enterprise-grade security and scalability.
Pykomodo: A python chunker for LLMs
Hola! I recently built **Komodo**, a Python-based utility that splits large codebases into smaller, LLM-friendly chunks. It supports multi-threaded file reading, powerful ignore/unignore patterns, and optional “enhanced” features(e.g. metadata extraction and redundancy removal). Each chunk can include functions/classes/imports so that *any* individual chunk is self-contained—helpful for AI/LLM tasks.
If you’re dealing with a huge repo and need to slice it up for context windows or search, Komodo might save you a lot of hassle or at least I hope it will. I'd love to hear any feedback/criticisms/suggestions! Please drop some ideas and if you like it, do drop me a star on github too.
**Source Code:** [https://github.com/duriantaco/pykomodo](https://github.com/duriantaco/pykomodo)
**Features:Target Audience / Why Use It:**
* Anyone who's needs to chunk their stuff
Thanks everyone for your time. Have a good week ahead.
/r/Python
https://redd.it/1inn3fl
Hola! I recently built **Komodo**, a Python-based utility that splits large codebases into smaller, LLM-friendly chunks. It supports multi-threaded file reading, powerful ignore/unignore patterns, and optional “enhanced” features(e.g. metadata extraction and redundancy removal). Each chunk can include functions/classes/imports so that *any* individual chunk is self-contained—helpful for AI/LLM tasks.
If you’re dealing with a huge repo and need to slice it up for context windows or search, Komodo might save you a lot of hassle or at least I hope it will. I'd love to hear any feedback/criticisms/suggestions! Please drop some ideas and if you like it, do drop me a star on github too.
**Source Code:** [https://github.com/duriantaco/pykomodo](https://github.com/duriantaco/pykomodo)
**Features:Target Audience / Why Use It:**
* Anyone who's needs to chunk their stuff
Thanks everyone for your time. Have a good week ahead.
/r/Python
https://redd.it/1inn3fl
GitHub
GitHub - duriantaco/pykomodo: A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly…
A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks. - duriantaco/pykomodo
How to prepare for live coding test
Okay so i have a live coding test
And tbh idk what to do,how do you even prepare for it?
I cant even remember the non obvious imports without searching then how do they expect to create a full stack app in 1 hour(thats what they aked me to do first time)
I cleared the conceptual interview but now its time for 2nd coding test,incould not do first one because or problems with django of my system
/r/django
https://redd.it/1inmut7
Okay so i have a live coding test
And tbh idk what to do,how do you even prepare for it?
I cant even remember the non obvious imports without searching then how do they expect to create a full stack app in 1 hour(thats what they aked me to do first time)
I cleared the conceptual interview but now its time for 2nd coding test,incould not do first one because or problems with django of my system
/r/django
https://redd.it/1inmut7
Reddit
How to prepare for live coding test : r/django
144K subscribers in the django community. News and links for Django developers.
R LLMs as Few-Shot Data Annotators for Multilingual Text Detoxification
This paper introduces a method for using LLMs as few-shot learners to generate high-quality parallel datasets for text detoxification. The key innovation is using modern LLMs to create paired toxic/non-toxic text examples that maintain semantic meaning while reducing toxicity.
Main technical points:
- Uses few-shot prompting with carefully curated example pairs
- Implements multi-stage filtering to ensure quality
- Validates semantic preservation using automated metrics
- Achieves better toxicity reduction while maintaining meaning compared to existing methods
- Creates larger, higher-quality parallel datasets than previous approaches
Results:
- Outperforms existing detoxification models on standard benchmarks
- Shows strong cross-domain generalization
- Demonstrates effectiveness with just 3-5 examples
- Maintains semantic similarity scores >0.85
- Reduces toxicity scores by >60% on test sets
I think this could be particularly valuable for content moderation systems that need to preserve meaning while removing harmful content. The ability to generate high-quality parallel data could help train better downstream detoxification models.
I think the few-shot approach is especially promising because it reduces the need for large annotated datasets, which are expensive and time-consuming to create manually.
TLDR: Modern LLMs can generate high-quality parallel toxic/non-toxic text pairs using few-shot learning, enabling better training data for detoxification systems while maintaining semantic meaning.
Full summary is here. Paper here.
/r/MachineLearning
https://redd.it/1innuh3
This paper introduces a method for using LLMs as few-shot learners to generate high-quality parallel datasets for text detoxification. The key innovation is using modern LLMs to create paired toxic/non-toxic text examples that maintain semantic meaning while reducing toxicity.
Main technical points:
- Uses few-shot prompting with carefully curated example pairs
- Implements multi-stage filtering to ensure quality
- Validates semantic preservation using automated metrics
- Achieves better toxicity reduction while maintaining meaning compared to existing methods
- Creates larger, higher-quality parallel datasets than previous approaches
Results:
- Outperforms existing detoxification models on standard benchmarks
- Shows strong cross-domain generalization
- Demonstrates effectiveness with just 3-5 examples
- Maintains semantic similarity scores >0.85
- Reduces toxicity scores by >60% on test sets
I think this could be particularly valuable for content moderation systems that need to preserve meaning while removing harmful content. The ability to generate high-quality parallel data could help train better downstream detoxification models.
I think the few-shot approach is especially promising because it reduces the need for large annotated datasets, which are expensive and time-consuming to create manually.
TLDR: Modern LLMs can generate high-quality parallel toxic/non-toxic text pairs using few-shot learning, enabling better training data for detoxification systems while maintaining semantic meaning.
Full summary is here. Paper here.
/r/MachineLearning
https://redd.it/1innuh3
www.aimodels.fyi
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators | AI Research Paper Details
Existing approaches to multilingual text detoxification are hampered by the scarcity of parallel multilingual datasets. In this work, we introduce a pipeline for the generation of multilingual parallel detoxification data. We also introduce SynthDetoxM, a…
jupad - Python Notepad
I've always used python as a calculator but wanted something that feels more like a soulver sketchpad.
* **Source code:** [ jupad - Python Notepad](https://github.com/idanpa/jupad)
* **Target audience:** Developer tool
* **Comparison**: This is somewhere between python REPL to Jupyter notebook. Inspired by notepad calculators ([Soulver](https://soulver.app/), [Numi](https://numi.app/), [Numbr](https://numbr.dev/)), reactive jupyter notebooks ([marimo](https://github.com/marimo-team/marimo), [ipyflow](https://github.com/ipyflow)) and similar projects ([Hydrogen](https://github.com/nteract/hydrogen)). Based on [qtconsole](https://github.com/jupyter/qtconsole).
/r/Python
https://redd.it/1inlq7x
I've always used python as a calculator but wanted something that feels more like a soulver sketchpad.
* **Source code:** [ jupad - Python Notepad](https://github.com/idanpa/jupad)
* **Target audience:** Developer tool
* **Comparison**: This is somewhere between python REPL to Jupyter notebook. Inspired by notepad calculators ([Soulver](https://soulver.app/), [Numi](https://numi.app/), [Numbr](https://numbr.dev/)), reactive jupyter notebooks ([marimo](https://github.com/marimo-team/marimo), [ipyflow](https://github.com/ipyflow)) and similar projects ([Hydrogen](https://github.com/nteract/hydrogen)). Based on [qtconsole](https://github.com/jupyter/qtconsole).
/r/Python
https://redd.it/1inlq7x
GitHub
GitHub - idanpa/jupad: Python Notepad
Python Notepad. Contribute to idanpa/jupad development by creating an account on GitHub.
Getting told “PL/SQL is a better option compare to Python” on Report Automation
Background: Recently I’m working on a report automation task using Python Pandas library, but - I was told by the TI team (Tech infra) that currently they are having issues with the Pandas library on the servers, so I’m asked to find alternatives to revise my finished program…
The problem is while I’m looking for alternatives, I’m getting a lot of options or ideas from not just my own team, but other teams.
And one of the Senior employees on my team asked me what my Python program is doing, after I explained my program logic, he basically told me “You shouldn’t use Python for this task in the first place. Should just use PL SQL” Because:
1. PL SQL is being used by my team for a long time, most of people are more familiar with it.
2. Using PL SQL avoids the Python Libraries issue
3. It’s approved by the company so no need to worry about “getting approvals”
Maybe this option could work and he is trying to help, but I’m not convinced by his explanations on why PL SQL is a better option specifically in the context of the report automation task which requires:
1. Iterating through each rows of data, uses
/r/Python
https://redd.it/1inhere
Background: Recently I’m working on a report automation task using Python Pandas library, but - I was told by the TI team (Tech infra) that currently they are having issues with the Pandas library on the servers, so I’m asked to find alternatives to revise my finished program…
The problem is while I’m looking for alternatives, I’m getting a lot of options or ideas from not just my own team, but other teams.
And one of the Senior employees on my team asked me what my Python program is doing, after I explained my program logic, he basically told me “You shouldn’t use Python for this task in the first place. Should just use PL SQL” Because:
1. PL SQL is being used by my team for a long time, most of people are more familiar with it.
2. Using PL SQL avoids the Python Libraries issue
3. It’s approved by the company so no need to worry about “getting approvals”
Maybe this option could work and he is trying to help, but I’m not convinced by his explanations on why PL SQL is a better option specifically in the context of the report automation task which requires:
1. Iterating through each rows of data, uses
/r/Python
https://redd.it/1inhere
Reddit
From the Python community on Reddit
Explore this post and more from the Python community