AI & ML Papers
32.9K subscribers
7.1K photos
529 videos
24 files
7.76K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

📝 Summary:
SVG-T2I enables high-quality text-to-image synthesis directly in the Visual Foundation Model feature domain. This scaled framework achieves competitive performance without a variational autoencoder, validating VFM representations for generative tasks.

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11749
• PDF: https://arxiv.org/pdf/2512.11749
• Github: https://github.com/KlingTeam/SVG-T2I

🔹 Models citing this paper:
https://huggingface.co/KlingTeam/SVG-T2I

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#TextToImage #DiffusionModels #GenerativeAI #VisualFoundationModels #DeepLearning
Directional Textual Inversion for Personalized Text-to-Image Generation

📝 Summary:
Directional Textual Inversion DTI enhances text-to-image personalization by fixing learned token magnitudes and optimizing only their direction. This prevents norm inflation issues of standard Textual Inversion, improving prompt conditioning and enabling smooth interpolation. DTI offers better te...

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13672
• PDF: https://arxiv.org/pdf/2512.13672
• Project Page: https://kunheek.github.io/dti
• Github: https://github.com/kunheek/dti

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#TextualInversion #TextToImage #GenerativeAI #DeepLearning #AI
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

📝 Summary:
This paper proposes a framework using a semantic-pixel reconstruction objective to adapt encoder features for generation. It creates a compact, semantically rich latent space, leading to state-of-the-art image reconstruction and improved text-to-image generation and editing.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17909
• PDF: https://arxiv.org/pdf/2512.17909
• Project Page: https://jshilong.github.io/PS-VAE-PAGE/
• Github: https://jshilong.github.io/PS-VAE-PAGE/

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#TextToImage #ImageGeneration #DeepLearning #ComputerVision #AIResearch
1
MineTheGap: Automatic Mining of Biases in Text-to-Image Models

📝 Summary:
MineTheGap automatically finds prompts that cause Text-to-Image models to generate biased outputs. It uses a genetic algorithm and a novel bias score to identify and rank biases, aiming to reduce redundancy and improve output diversity.

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13427
• PDF: https://arxiv.org/pdf/2512.13427

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AIbias #TextToImage #GenerativeAI #ResponsibleAI #MachineLearning
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

📝 Summary:
Text-to-image models struggle with complex spatial reasoning due to sparse prompts. This paper introduces SpatialGenEval, a new benchmark with dense prompts, showing models struggle with higher-order spatial tasks. A new dataset, SpatialT2I, helps fine-tune models for significant performance gain...

🔹 Publication Date: Published on Jan 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.20354
• PDF: https://arxiv.org/pdf/2601.20354
• Github: https://github.com/AMAP-ML/SpatialGenEval

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#TextToImage #SpatialReasoning #GenerativeAI #ComputerVision #AIResearch
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

📝 Summary:
DenseGRPO addresses sparse rewards in flow matching models by providing dense, step-wise rewards for intermediate denoising steps. It uses these rewards to adaptively calibrate exploration, improving alignment with human preferences in text-to-image generation.

🔹 Publication Date: Published on Jan 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.20218
• PDF: https://arxiv.org/pdf/2601.20218

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AI #MachineLearning #ReinforcementLearning #TextToImage #GenerativeAI
Enhancing Spatial Understanding in Image Generation via Reward Modeling

📝 Summary:
Text-to-image models struggle with complex spatial relationships. This paper introduces SpatialScore, a reward model trained on 80k preference pairs, to evaluate and improve spatial accuracy. It significantly enhances spatial understanding in image generation via reinforcement learning.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24233
• PDF: https://arxiv.org/pdf/2602.24233
• Project Page: https://dagroup-pku.github.io/SpatialT2I/
• Github: https://github.com/DAGroup-PKU/SpatialT2I

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#ImageGeneration #TextToImage #SpatialAI #RewardModeling #DeepLearning
Conditioned Activation Transport for T2I Safety Steering

📝 Summary:
Current T2I models generate unsafe content, and linear steering degrades image quality. This paper proposes Conditioned Activation Transport CAT, which uses geometric conditioning and nonlinear transport maps to activate only in unsafe regions. CAT significantly reduces unsafe content generation ...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03163
• PDF: https://arxiv.org/pdf/2603.03163
• Github: https://github.com/NASK-AISafety/conditional-activation-transport

Datasets citing this paper:
https://huggingface.co/datasets/NASK-PIB/SafeSteerDataset

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AISafety #TextToImage #GenerativeAI #DeepLearning #AIethics
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

📝 Summary:
CoCo is a code-driven framework for text-to-image generation, using executable code for precise spatial layout and structured image creation. It significantly outperforms natural language CoT methods, enabling more controllable and accurate image synthesis.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08652
• PDF: https://arxiv.org/pdf/2603.08652

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#TextToImage #GenerativeAI #AIResearch #CodeDrivenAI #ComputerVision
1
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

📝 Summary:
UDM-GRPO integrates Uniform Discrete Diffusion Models with reinforcement learning, solving training instability issues. It optimizes using final samples as actions and reconstructed trajectories. This achieves state-of-the-art performance in text-to-image generation and OCR tasks.

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18518
• PDF: https://arxiv.org/pdf/2604.18518
• Project Page: https://yovecent.github.io/UDM-GRPO.github.io/
• Github: https://github.com/Yovecent/UDM-GRPO

🔹 Models citing this paper:
https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-GenEval
https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-PickScore

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #TextToImage #DeepLearning
1