AI & ML Papers – Telegram

AI & ML Papers

33.4K subscribers

7.17K photos

556 videos

24 files

7.87K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

33.4K subscribers

✨PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

📝 Summary:
PoseDreamer uses diffusion models to generate large-scale, photorealistic synthetic 3D human mesh datasets with improved image quality. Models trained on this data achieve comparable or superior performance to those using real or traditional synthetic datasets, offering a scalable solution.

🔹 Publication Date: Published on Mar 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28763
• PDF: https://arxiv.org/pdf/2603.28763
• Project Page: https://prosperolo.github.io/posedreamer

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#DiffusionModels #SyntheticData #3DGeneration #ComputerVision #AIResearch

❤1

134 views09:22

✨ Explore Data Science 📝 Write your paper

This media is not supported in your browser

VIEW IN TELEGRAM

✨VOID: Video Object and Interaction Deletion

📝 Summary:
VOID is a video object removal framework designed for complex scenarios involving significant object interactions. It uses vision-language and video diffusion models, leveraging causal reasoning to generate physically plausible counterfactual scenes. VOID better preserves consistent scene dynamic...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02296
• PDF: https://arxiv.org/pdf/2604.02296
• Project Page: https://void-model.github.io/
• Github: https://github.com/Netflix/void-model

🔹 Models citing this paper:
• https://huggingface.co/netflix/void-model

✨ Spaces citing this paper:
• https://huggingface.co/spaces/sam-motamed/VOID

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #GenerativeAI #DeepLearning

201 views08:04

✨ Explore Data Science 📝 Write your paper

✨RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

📝 Summary:
RefineAnything is a multimodal diffusion model for region-specific image refinement. It fixes local detail collapse while strictly preserving backgrounds using a Focus-and-Refine strategy and boundary-aware loss. This provides a practical solution for high-precision local editing.

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06870
• PDF: https://arxiv.org/pdf/2604.06870
• Project Page: https://limuloo.github.io/RefineAnything/
• Github: https://github.com/limuloo/RefineAnything

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#DiffusionModels #ImageEditing #ComputerVision #DeepLearning #GenerativeAI

425 views02:00

✨ Explore Data Science 📝 Write your paper

Media is too big

VIEW IN TELEGRAM

✨Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

📝 Summary:
Matrix-Game 3.0 is a memory-augmented diffusion model achieving real-time 720p interactive video generation with long-term temporal consistency. It uses an advanced data engine, a self-correction training framework with memory, and efficient inference strategies. This enables practical, industria...

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08995
• PDF: https://arxiv.org/pdf/2604.08995
• Project Page: https://matrix-game-v3.github.io/

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#DiffusionModels #VideoGeneration #RealTimeAI #GenerativeAI #MachineLearning

201 views02:01

✨ Explore Data Science 📝 Write your paper

✨CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation

📝 Summary:
CT-1 is a Vision-Language-Camera model that improves camera-controllable video generation. It uses a Diffusion Transformer and Wavelet Regularization Loss to accurately estimate camera trajectories, enabling precise video synthesis. This achieves 25.7% better accuracy than prior methods.

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09201
• PDF: https://arxiv.org/pdf/2604.09201
• Project Page: https://gulucaptain.github.io/Camera-Transformer-1/
• Github: https://github.com/gulucaptain/Camera-Transformer-1

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AI #VideoGeneration #ComputerVision #DiffusionModels #VisionLanguageModels

206 views02:01

✨ Explore Data Science 📝 Write your paper

✨MixFlow: Mixed Source Distributions Improve Rectified Flows

📝 Summary:
Rectified flows and diffusion models are improved through κ-FC formulation that conditions the source distribution and MixFlow training strategy that reduces generative path curvatures and enhances sa...

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09181
• PDF: https://arxiv.org/pdf/2604.09181
• Github: https://github.com/NazirNayal8/MixFlow

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#RectifiedFlows #DiffusionModels #GenerativeAI #MachineLearning #AIResearch

222 views20:07

✨ Explore Data Science 📝 Write your paper

✨Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

📝 Summary:
Uni-ViGU introduces a unified framework for video generation and understanding, uniquely building upon a video generator as its foundation. It uses unified flow matching and a bidirectional training mechanism to achieve competitive performance in both generation and understanding tasks.

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08121
• PDF: https://arxiv.org/pdf/2604.08121
• Project Page: https://fr0zencrane.github.io/uni-vigu-page/
• Github: https://fr0zencrane.github.io/uni-vigu-page/

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VideoGeneration #VideoUnderstanding #DiffusionModels #AIResearch #DeepLearning

144 views07:04

✨ Explore Data Science 📝 Write your paper

✨Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution

📝 Summary:
Domain-specific autoencoders significantly enhance medical image super-resolution. Replacing generic VAEs improves fidelity, showing autoencoder choice is key, not the diffusion architecture. Autoencoder performance predicts overall SR quality.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12152
• PDF: https://arxiv.org/pdf/2604.12152
• Github: https://github.com/sebasmos/latent-sr

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#MedicalImaging #SuperResolution #DiffusionModels #DeepLearning #Autoencoders

206 views11:05

✨ Explore Data Science 📝 Write your paper

This media is not supported in your browser

VIEW IN TELEGRAM

✨Repurposing 3D Generative Model for Autoregressive Layout Generation

📝 Summary:
LaviGen is a 3D layout generation framework that repurposes 3D generative models. It uses an adapted 3D diffusion model for autoregressive generation, explicitly modeling geometric relations and physical constraints. This achieves superior, more plausible 3D layouts 65% faster than previous methods.

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16299
• PDF: https://arxiv.org/pdf/2604.16299
• Project Page: https://fenghora.github.io/LaviGen-Page/
• Github: https://github.com/fenghora/LaviGen

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning

181 views05:02

✨ Explore Data Science 📝 Write your paper

Media is too big

VIEW IN TELEGRAM

✨Hierarchical Codec Diffusion for Video-to-Speech Generation

📝 Summary:
HiCoDiT generates speech from videos by leveraging the hierarchical structure of discrete speech tokens, achieving better audio-visual alignment through coarse-to-fine conditioning with dual-scale nor...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15923
• PDF: https://arxiv.org/pdf/2604.15923

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VideoToSpeech #DiffusionModels #GenerativeAI #SpeechSynthesis #DeepLearning

237 views12:06

✨ Explore Data Science 📝 Write your paper

✨UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

📝 Summary:
UDM-GRPO integrates Uniform Discrete Diffusion Models with reinforcement learning, solving training instability issues. It optimizes using final samples as actions and reconstructed trajectories. This achieves state-of-the-art performance in text-to-image generation and OCR tasks.

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18518
• PDF: https://arxiv.org/pdf/2604.18518
• Project Page: https://yovecent.github.io/UDM-GRPO.github.io/
• Github: https://github.com/Yovecent/UDM-GRPO

🔹 Models citing this paper:
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-GenEval
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-PickScore

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #TextToImage #DeepLearning

❤1

120 views07:04

✨ Explore Data Science 📝 Write your paper

✨dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

📝 Summary:
dWorldEval proposes a scalable robotics policy evaluation method using a discrete diffusion world model. It unifies diverse modalities into a token space, employing a transformer and progress token for success detection. This approach significantly outperforms prior methods, enabling large-scale ...

🔹 Publication Date: Published on Apr 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.22152
• PDF: https://arxiv.org/pdf/2604.22152

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#Robotics #DiffusionModels #WorldModels #AI #MachineLearning

191 views02:00

✨ Explore Data Science 📝 Write your paper

✨DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

📝 Summary:
DiffNR enhances sparse-view CT reconstruction with neural representations by employing SliceFixer, a single-step diffusion model. It corrects artifacts via pseudo-reference volumes, offering 3D supervision for better accuracy and efficient optimization, with a 3.99 dB PSNR gain.

🔹 Publication Date: Published on Apr 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21518
• PDF: https://arxiv.org/pdf/2604.21518
• Project Page: https://ooonesevennn.github.io/DiffNR/
• Github: https://github.com/ooonesevennn/DiffNR

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#3DReconstruction #DiffusionModels #NeuralNetworks #CTReconstruction #DeepLearning

197 views04:01

✨ Explore Data Science 📝 Write your paper

🔥 SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

💡 The paper introduces a new post-training method called SOAR for diffusion models, which addresses the gap between supervised fine-tuning and reinforcement learning. Currently, supervised fine-tuning optimizes the denoiser only on ground-truth states, but once inference deviates from these ideal states, it relies on out-of-distribution generalization rather than learned correction, leading to exposure bias. Reinforcement learning can address this mismatch, but its terminal reward signal is sparse and suffers from credit-assignment difficulty.

SOAR proposes a bias-correction post-training method that fills this gap by providing dense, reward-free supervision through self-correction mechanisms. The method starts from a real sample, performs a single stop-gradient rollout with the current model, re-noises the resulting off-trajectory state, and supervises the model to steer back toward the original clean target. This approach is on-policy, reward-free, and provides dense per-timestep supervision with no credit-assignment problem.

The results show that SOAR improves the performance of diffusion models on various tasks, including image and text generation. On the SD3.5-Medium dataset, SOAR improves the GenEval score from 0.70 to 0.78 and the OCR score from 0.64 to 0.67 over supervised fine-tuning. Additionally, SOAR surpasses the performance of Flow-GRPO in final metric value on both aesthetic and text-image alignment tasks, despite having no access to a reward model. The paper concludes that SOAR can directly replace supervised fine-tuning as a stronger first post-training stage after pretraining, while remaining fully compatible with subsequent reinforcement learning alignment.

📅 Published on Apr 14

🔗 Links:
• arXiv: https://arxiv.org/abs/2604.12617
• PDF: https://arxiv.org/pdf/2604.12617
• Project Page: https://hy-soar.github.io/
• GitHub: https://github.com/Tencent-Hunyuan/HY-SOAR ⭐ 350

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DiffusionModels #SelfCorrectionTechniques #OptimalAlignmentMethods #RefinementInAI #PostTrainingMethods

SOAR: Self-Correction for Optimal Alignment and Refinement in...

The post-training pipeline for diffusion models currently has two stages: supervised fine-tuning (SFT) on curated data and reinforcement learning (RL) with reward models. A fundamental gap...

716 views09:36

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

💡 The paper introduces D-OPSD, a new training approach for diffusion models that enables efficient supervised fine-tuning while preserving few-step inference capabilities. The current landscape of high-performance image generation models is shifting from inefficient multi-step models to efficient few-step models, but these models are challenging to fine-tune using traditional techniques. The problem with traditional fine-tuning methods is that they compromise the model's inherent few-step inference capability.

To address this issue, the authors propose D-OPSD, which leverages on-policy self-distillation with text and multimodal features. The method works by making the model act as both the teacher and the student, where the student is conditioned only on the text feature, and the teacher is conditioned on the multimodal feature of both the text prompt and the target image. The training process minimizes the difference between the predicted distributions over the student's own roll-outs, allowing the model to learn new concepts and styles without sacrificing its original few-step capacity.

The key contribution of D-OPSD is that it enables on-policy learning during supervised fine-tuning, which allows the model to learn from its own trajectory and under its own supervision. This approach enables the model to inherit the in-context capabilities of its encoder, making it possible to fine-tune the model continuously without compromising its few-step inference capability. The results show that D-OPSD enables efficient supervised fine-tuning for diffusion models, making it a promising approach for high-performance image generation models.

📅 Published on May 6

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.05204
• PDF: https://arxiv.org/pdf/2605.05204
• Project Page: https://vvvvvjdy.github.io/d-opsd/
• GitHub: https://github.com/vvvvvjdy/D-OPSD ⭐ 24

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DiffusionModels #SelfDistillation #FewShotLearning #ImageGeneration #MultimodalLearning

D-OPSD: On-Policy Self-Distillation for Continuously Tuning...

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein)....

❤2

372 views17:34

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 MARBLE: Multi-Aspect Reward Balance for Diffusion RL

💡 The paper introduces MARBLE, a novel gradient-space optimization framework for multi-reward reinforcement learning fine-tuning of diffusion models. The problem addressed is that existing methods for handling multiple rewards either train separate models for each reward or use a weighted-sum reward aggregation, which can lead to poor performance due to sample-level mismatch. This mismatch occurs because most rollouts are highly informative for certain reward dimensions but irrelevant for others, causing the weighted summation to dilute their supervision.

To address this issue, MARBLE maintains independent advantage estimators for each reward and computes per-reward policy gradients. These gradients are then harmonized into a single update direction without manual reward weighting, by solving a quadratic programming problem. This approach allows for a unified model that can be jointly trained on all rewards, eliminating the need for heavy manual tuning and sequential training.

The authors also propose an amortized formulation that reduces the computational cost of MARBLE, making it more efficient. Additionally, they use exponential moving average smoothing on the balancing coefficients to stabilize updates against transient fluctuations.

The results show that MARBLE improves all five reward dimensions simultaneously on the SD3.5 Medium dataset, outperforming the baseline method. Specifically, MARBLE turns the worst-aligned reward's gradient cosine from negative to consistently positive, indicating better alignment with human preferences. Furthermore, MARBLE runs at nearly the same training speed as the baseline method, with only a 3% slowdown. Overall, MARBLE provides a more effective and efficient approach to multi-reward reinforcement learning fine-tuning of diffusion models.

📅 Published on May 7

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.06507
• PDF: https://arxiv.org/pdf/2605.06507
• Project Page: https://aim-uofa.github.io/MARBLE/
• GitHub: https://github.com/aim-uofa/MARBLE ⭐ 24

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MultiRewardReinforcementLearning #DiffusionModels #GradientSpaceOptimization #MultiAspectRewardBalance #ReinforcementLearningFineTuning

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and...

❤1

403 views13:35

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models

💡 The paper presents a comprehensive study of text-to-image diffusion models, aiming to identify key design choices and training insights that lead to strong model performance. The problem addressed is the lack of fully open models that match the performance of state-of-the-art models, which hinders further research in the field. To tackle this, the authors conducted over 300 controlled experiments, totaling 700K TPU v6e hours, to investigate modeling and data design choices in text-to-image diffusion training and inference.

The method used involved a systematic investigation of various design decisions, such as dataset mixing and text encoder adapters, to identify simple yet effective approaches to training strong models. The authors found several empirical findings, including the use of equal weighting for mixing curated datasets and the benefits of larger text encoder adapters.

The results of the study led to the development of i1, a 3B-parameter text-to-image diffusion model trained using only publicly available datasets. The i1 model is competitive with leading models on five representative benchmarks and outperforms the best existing fully open model by 29.5 absolute percentage points on average. The authors provide the i1 checkpoints, training and inference code, and the data processing pipeline, making it a fully open model that can serve as a foundation for future research in text-to-image diffusion models.

Overall, the paper contributes to the field by providing a practical foundation for open research in text-to-image diffusion models, highlighting the importance of transparency and reproducibility in AI research. The release of the i1 model and its associated code and data processing pipeline enables the research community to build upon and improve the model, driving further progress in the field.

📅 Published on Jun 9

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.11289
• PDF: https://arxiv.org/pdf/2606.11289
• Project Page: https://zlab-princeton.github.io/i1/

🤖 Models citing this paper:
• https://huggingface.co/zlab-princeton/i1-3B

📊 Datasets citing this paper:
• https://huggingface.co/datasets/zlab-princeton/i1-captions
• https://huggingface.co/datasets/zlab-princeton/i1-gptedit-tfrecord

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/multimodalart/i1-3B

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#TextToImageModels #DiffusionModels #TextEncoderAdapters #ImageSynthesis #DeepLearningModels

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

428 views13:52

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise in Infinite, Real-Time Terrain Generation

💡 The paper introduces Terrain Diffusion, a new method for generating realistic and infinite procedural worlds in real-time. The current method, Perlin noise, is fast and infinite but lacks realism and large-scale coherence. Terrain Diffusion uses diffusion models and a novel algorithm called InfiniteDiffusion to address these limitations. The InfiniteDiffusion algorithm enables seamless and real-time synthesis of boundless landscapes by coupling planetary context with local detail through a hierarchical stack of diffusion models. The method also uses a compact Laplacian encoding to stabilize outputs across large dynamic ranges and an open-source infinite-tensor framework to support constant-memory manipulation of unbounded tensors. Additionally, few-step consistency distillation enables efficient generation. The results show that Terrain Diffusion can synthesize entire planets coherently, controllably, and without limits, making it a practical foundation for procedural world generation. The method provides constant-time random access, seamless infinite extent, and seed-consistency, making it a suitable successor to Perlin noise. Overall, the paper presents a significant contribution to the field of procedural world generation, enabling the creation of realistic and infinite worlds in real-time.

📅 Published on Dec 9, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2512.08309
• PDF: https://arxiv.org/pdf/2512.08309
• Project Page: https://xandergos.github.io/terrain-diffusion/

🤖 Models citing this paper:
• https://huggingface.co/xandergos/terrain-diffusion-30m
• https://huggingface.co/xandergos/terrain-diffusion-90m
• https://huggingface.co/xandergos/TerrainDiffusion-Consistency-Base-192x3

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#ProceduralTerrainGeneration #InfiniteWorlds #DiffusionModels #RealTimeTerrainSynthesis #PerlinNoiseAlternatives

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

❤1

593 views19:56

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

💡 The paper proposes a training-free acceleration strategy for text-to-image diffusion models called MrFlow. The problem with existing multi-resolution generation strategies is that they can produce noticeable blurring or artifacts due to upsampling in the latent space and selective modification of partial regions. MrFlow addresses this issue by using a staged low-to-high-resolution pipeline. It first generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained model, injects low-strength noise to enable high-frequency resampling, and finally refines the details at high resolution. The results show that MrFlow achieves a 10x end-to-end acceleration while maintaining a high level of image quality, with only a 1 percent gap in performance compared to the original model. Additionally, MrFlow can be combined with other acceleration strategies, such as timestep distillation, to achieve even higher acceleration of up to 25x. The key advantage of MrFlow is that it does not require any training or runtime modifications, making it a hardware-agnostic and efficient solution for accelerating text-to-image diffusion models.

📅 Published on Jul 2

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.01642
• PDF: https://arxiv.org/pdf/2607.01642

🤖 Models citing this paper:
• https://huggingface.co/Xingyu-Zheng/MrFlow

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Xingyu-Zheng/mrflow-fast-diffusion

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DiffusionModels #TextToImageSynthesis #MultiResolutionGeneration #StagedSampling #SuperResolutionTechniques

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

❤2

934 views17:53

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

💡 The paper introduces Hunyuan3D 2.0, a large-scale 3D synthesis system for generating high-resolution textured 3D assets. The system consists of two main components: Hunyuan3D-DiT, a shape generation model, and Hunyuan3D-Paint, a texture synthesis model. The shape generation model uses a scalable flow-based diffusion transformer to create geometry that aligns with a given condition image. The texture synthesis model uses strong geometric and diffusion priors to produce high-resolution and vibrant texture maps for generated or hand-crafted meshes.

The authors also introduce Hunyuan3D-Studio, a user-friendly production platform that simplifies the re-creation process of 3D assets, allowing both professional and amateur users to manipulate or animate their meshes efficiently. The system is evaluated and compared to previous state-of-the-art models, showing that Hunyuan3D 2.0 outperforms them in terms of geometry details, condition alignment, and texture quality.

The main contributions of the paper are the development of a scalable and efficient 3D synthesis system, the introduction of a user-friendly production platform, and the public release of the code and pre-trained weights of the models. The system aims to fill the gaps in the open-source 3D community for large-scale foundation generative models, providing a valuable resource for researchers and developers. Overall, the paper presents a significant advancement in the field of 3D synthesis, enabling the generation of high-quality textured 3D assets with improved geometry and texture details.

📅 Published on Jan 21, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2501.12202
• PDF: https://arxiv.org/pdf/2501.12202
• Project Page: https://huggingface.co/alluriaiprojects

🤖 Models citing this paper:
• https://huggingface.co/tencent/Hunyuan3D-2
• https://huggingface.co/tencent/Hunyuan3D-2.1
• https://huggingface.co/tencent/Hunyuan3D-2mv

📊 Datasets citing this paper:
• https://huggingface.co/datasets/tencent/HY3D-Bench

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/tencent/Hunyuan3D-2
• https://huggingface.co/spaces/frogleo/Image-to-3D
• https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DiffusionModels #3DAssetGeneration #Textured3DModeling #GeometrySynthesis #3DSynthesisSystems

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

❤1

950 views15:52

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate