AI & ML Papers

✨From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

📝 Summary:
MLLMs struggle with human cognitive perception of images like memorability or aesthetics. CogIP-Bench evaluates this gap, showing post-training significantly improves alignment. This enhances human-like perception and improves creative AI tasks.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22805
• PDF: https://arxiv.org/pdf/2511.22805
• Project Page: https://follen-cry.github.io/MLLM-Cognition-project-page/

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#MLLM #CognitiveAI #ImagePerception #AIAlignment #AIResearch

257 views06:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Steerability of Instrumental-Convergence Tendencies in LLMs

📝 Summary:
This research investigates AI system steerability, noting a safety-security dilemma. It demonstrates that a short anti-instrumental prompt suffix dramatically reduces unwanted instrumental behaviors, like self-replication, in large language models. For Qwen3-30B, this reduced the convergence rate...

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01584
• PDF: https://arxiv.org/pdf/2601.01584
• Github: https://github.com/j-hoscilowicz/instrumental_steering/

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AISafety #LLMs #AISteering #PromptEngineering #AIAlignment

314 views08:03

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes

📝 Summary:
Current diffusion model alignment struggles with complex, fine-grained human expertise due to simplified preferences. This paper proposes a framework with hierarchical criteria and Complex Preference Optimization CPO, maximizing positive and minimizing negative attributes to improve generation qu...

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04300
• PDF: https://arxiv.org/pdf/2601.04300

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#DiffusionModels #AIAlignment #MachineLearning #GenerativeAI #PreferenceLearning

243 views09:03

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Real-Time Aligned Reward Model beyond Semantics

📝 Summary:
RLHF faces reward overoptimization from reward model misalignment. R2M introduces a new framework that uses real-time policy feedback to dynamically adapt the reward model. This improves alignment by responding to continuous policy distribution shifts beyond just semantics.

🔹 Publication Date: Published on Jan 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.22664
• PDF: https://arxiv.org/pdf/2601.22664

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#ReinforcementLearning #AI #MachineLearning #RewardModels #AIAlignment

177 views04:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

📝 Summary:
ThinkSafe is a self-aligned framework that enhances safety in large reasoning models. It uses lightweight refusal steering and fine-tuning on self-generated responses to preserve reasoning performance and reduce computational costs. ThinkSafe significantly improves safety without degrading native...

🔹 Publication Date: Published on Jan 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.23143
• PDF: https://arxiv.org/pdf/2601.23143

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AISafety #LLMs #AIAlignment #MachineLearning #DeepLearning

186 views05:02

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization

📝 Summary:
SLIME is a new objective for aligning large language models, addressing 'unlearning' and 'formatting collapse' issues in prior methods. It maximizes preferred response likelihood, stabilizes rejected token probabilities, and uses dual-margin constraints, achieving superior performance and stable ...

🔹 Publication Date: Published on Feb 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02383
• PDF: https://arxiv.org/pdf/2602.02383

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLM #AIAlignment #MachineLearning #NLP #DeepLearning

175 views10:09

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨The Truthfulness Spectrum Hypothesis

📝 Summary:
This paper proposes the truthfulness spectrum hypothesis: LLMs contain truth directions ranging from domain-general to domain-specific. While general directions exist, domain-specific ones steer more effectively, with post-training reshaping this geometry to influence behaviors like sycophancy.

🔹 Publication Date: Published on Feb 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.20273
• PDF: https://arxiv.org/pdf/2602.20273
• Github: https://github.com/zfying/truth_spec

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMs #AIResearch #AIAlignment #NLP #Truthfulness

❤1

347 views15:41

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

📝 Summary:
MOSAIC is a framework aligning agentic models for safe multi-step tool use, employing explicit safety reasoning and refusal. It significantly reduces harmful actions, increases refusal for unsafe tasks, cuts privacy leakage, and preserves benign performance.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03205
• PDF: https://arxiv.org/pdf/2603.03205
• Project Page: https://aradhye2002.github.io/mosaic-agent-safety/

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AISafety #AIAgents #ResponsibleAI #LLMs #AIAlignment

❤1

194 views09:05

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Alignment Makes Language Models Normative, Not Descriptive

📝 Summary:
Aligned language models excel at normative, rule-based behavior prediction but struggle with complex descriptive human strategic interactions. Base models predict real human choices in these games better. This reveals a trade-off in model optimization.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17218
• PDF: https://arxiv.org/pdf/2603.17218

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLM #AIAlignment #NormativeAI #GameTheory #AIBehavior

206 views09:59

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Internal Safety Collapse in Frontier Large Language Models

📝 Summary:
Frontier LLMs suffer Internal Safety Collapse, continuously generating harmful content under specific task conditions, even for benign tasks. A new framework triggers this vulnerability, yielding 95% safety failure rates and revealing inherent unsafe capabilities despite alignment efforts.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23509
• PDF: https://arxiv.org/pdf/2603.23509
• Project Page: https://wuyoscar.github.io/ISC-Bench
• Github: https://github.com/wuyoscar/ISC-Bench

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AISafety #LLM #AIAlignment #MachineLearning #AIResearch

❤1

236 views01:00

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform