AI & ML Papers
Photo
🔥 i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models
📅 Published on Jun 9
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.11289
• PDF: https://arxiv.org/pdf/2606.11289
• Project Page: https://zlab-princeton.github.io/i1/
🤖 Models citing this paper:
• https://huggingface.co/zlab-princeton/i1-3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/zlab-princeton/i1-captions
• https://huggingface.co/datasets/zlab-princeton/i1-gptedit-tfrecord
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/multimodalart/i1-3B
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#TextToImageModels #DiffusionModels #TextEncoderAdapters #ImageSynthesis #DeepLearningModels
💡 The paper presents a comprehensive study of text-to-image diffusion models, aiming to identify key design choices and training insights that lead to strong model performance. The problem addressed is the lack of fully open models that match the performance of state-of-the-art models, which hinders further research in the field. To tackle this, the authors conducted over 300 controlled experiments, totaling 700K TPU v6e hours, to investigate modeling and data design choices in text-to-image diffusion training and inference.
The method used involved a systematic investigation of various design decisions, such as dataset mixing and text encoder adapters, to identify simple yet effective approaches to training strong models. The authors found several empirical findings, including the use of equal weighting for mixing curated datasets and the benefits of larger text encoder adapters.
The results of the study led to the development of i1, a 3B-parameter text-to-image diffusion model trained using only publicly available datasets. The i1 model is competitive with leading models on five representative benchmarks and outperforms the best existing fully open model by 29.5 absolute percentage points on average. The authors provide the i1 checkpoints, training and inference code, and the data processing pipeline, making it a fully open model that can serve as a foundation for future research in text-to-image diffusion models.
Overall, the paper contributes to the field by providing a practical foundation for open research in text-to-image diffusion models, highlighting the importance of transparency and reproducibility in AI research. The release of the i1 model and its associated code and data processing pipeline enables the research community to build upon and improve the model, driving further progress in the field.
📅 Published on Jun 9
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.11289
• PDF: https://arxiv.org/pdf/2606.11289
• Project Page: https://zlab-princeton.github.io/i1/
🤖 Models citing this paper:
• https://huggingface.co/zlab-princeton/i1-3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/zlab-princeton/i1-captions
• https://huggingface.co/datasets/zlab-princeton/i1-gptedit-tfrecord
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/multimodalart/i1-3B
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#TextToImageModels #DiffusionModels #TextEncoderAdapters #ImageSynthesis #DeepLearningModels
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.