AI & ML Papers
32.9K subscribers
7.1K photos
529 videos
24 files
7.76K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

📝 Summary:
MLLMs struggle with human cognitive perception of images like memorability or aesthetics. CogIP-Bench evaluates this gap, showing post-training significantly improves alignment. This enhances human-like perception and improves creative AI tasks.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22805
• PDF: https://arxiv.org/pdf/2511.22805
• Project Page: https://follen-cry.github.io/MLLM-Cognition-project-page/

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#MLLM #CognitiveAI #ImagePerception #AIAlignment #AIResearch
Steerability of Instrumental-Convergence Tendencies in LLMs

📝 Summary:
This research investigates AI system steerability, noting a safety-security dilemma. It demonstrates that a short anti-instrumental prompt suffix dramatically reduces unwanted instrumental behaviors, like self-replication, in large language models. For Qwen3-30B, this reduced the convergence rate...

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01584
• PDF: https://arxiv.org/pdf/2601.01584
• Github: https://github.com/j-hoscilowicz/instrumental_steering/

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AISafety #LLMs #AISteering #PromptEngineering #AIAlignment
Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes

📝 Summary:
Current diffusion model alignment struggles with complex, fine-grained human expertise due to simplified preferences. This paper proposes a framework with hierarchical criteria and Complex Preference Optimization CPO, maximizing positive and minimizing negative attributes to improve generation qu...

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04300
• PDF: https://arxiv.org/pdf/2601.04300

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#DiffusionModels #AIAlignment #MachineLearning #GenerativeAI #PreferenceLearning
Real-Time Aligned Reward Model beyond Semantics

📝 Summary:
RLHF faces reward overoptimization from reward model misalignment. R2M introduces a new framework that uses real-time policy feedback to dynamically adapt the reward model. This improves alignment by responding to continuous policy distribution shifts beyond just semantics.

🔹 Publication Date: Published on Jan 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.22664
• PDF: https://arxiv.org/pdf/2601.22664

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#ReinforcementLearning #AI #MachineLearning #RewardModels #AIAlignment
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

📝 Summary:
ThinkSafe is a self-aligned framework that enhances safety in large reasoning models. It uses lightweight refusal steering and fine-tuning on self-generated responses to preserve reasoning performance and reduce computational costs. ThinkSafe significantly improves safety without degrading native...

🔹 Publication Date: Published on Jan 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.23143
• PDF: https://arxiv.org/pdf/2601.23143

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AISafety #LLMs #AIAlignment #MachineLearning #DeepLearning
SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization

📝 Summary:
SLIME is a new objective for aligning large language models, addressing 'unlearning' and 'formatting collapse' issues in prior methods. It maximizes preferred response likelihood, stabilizes rejected token probabilities, and uses dual-margin constraints, achieving superior performance and stable ...

🔹 Publication Date: Published on Feb 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02383
• PDF: https://arxiv.org/pdf/2602.02383

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #AIAlignment #MachineLearning #NLP #DeepLearning
The Truthfulness Spectrum Hypothesis

📝 Summary:
This paper proposes the truthfulness spectrum hypothesis: LLMs contain truth directions ranging from domain-general to domain-specific. While general directions exist, domain-specific ones steer more effectively, with post-training reshaping this geometry to influence behaviors like sycophancy.

🔹 Publication Date: Published on Feb 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.20273
• PDF: https://arxiv.org/pdf/2602.20273
• Github: https://github.com/zfying/truth_spec

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLMs #AIResearch #AIAlignment #NLP #Truthfulness
1
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

📝 Summary:
MOSAIC is a framework aligning agentic models for safe multi-step tool use, employing explicit safety reasoning and refusal. It significantly reduces harmful actions, increases refusal for unsafe tasks, cuts privacy leakage, and preserves benign performance.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03205
• PDF: https://arxiv.org/pdf/2603.03205
• Project Page: https://aradhye2002.github.io/mosaic-agent-safety/

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AISafety #AIAgents #ResponsibleAI #LLMs #AIAlignment
1
Alignment Makes Language Models Normative, Not Descriptive

📝 Summary:
Aligned language models excel at normative, rule-based behavior prediction but struggle with complex descriptive human strategic interactions. Base models predict real human choices in these games better. This reveals a trade-off in model optimization.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17218
• PDF: https://arxiv.org/pdf/2603.17218

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #AIAlignment #NormativeAI #GameTheory #AIBehavior
Internal Safety Collapse in Frontier Large Language Models

📝 Summary:
Frontier LLMs suffer Internal Safety Collapse, continuously generating harmful content under specific task conditions, even for benign tasks. A new framework triggers this vulnerability, yielding 95% safety failure rates and revealing inherent unsafe capabilities despite alignment efforts.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23509
• PDF: https://arxiv.org/pdf/2603.23509
• Project Page: https://wuyoscar.github.io/ISC-Bench
• Github: https://github.com/wuyoscar/ISC-Bench

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AISafety #LLM #AIAlignment #MachineLearning #AIResearch
1