Mode-Conditioning Unlocks Superior Test-Time Scaling https://www.arxiv.org/abs/2512.01127
arXiv.org
Mode-Conditioning Unlocks Superior Test-Time Scaling
Parallel sampling promises substantial gains in test-time scaling, but its effectiveness is sharply limited by diversity collapse, where models concentrate on a few modes and repeated samples...
Seed-Prover 1.5: Mastering Undergraduate-Level
Theorem Proving via Learning from Experience
https://github.com/ByteDance-Seed/Seed-Prover/blob/main/SeedProver-1.5/SeedProver-1.5.pdf
Theorem Proving via Learning from Experience
https://github.com/ByteDance-Seed/Seed-Prover/blob/main/SeedProver-1.5/SeedProver-1.5.pdf
GitHub
Seed-Prover/SeedProver-1.5/SeedProver-1.5.pdf at main · ByteDance-Seed/Seed-Prover
Contribute to ByteDance-Seed/Seed-Prover development by creating an account on GitHub.
❤5👍1
HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering https://arxiv.org/abs/2512.14870
arXiv.org
HERBench: A Benchmark for Multi-Evidence Integration in Video...
Video Large Language Models (Video-LLMs) are rapidly improving, yet current Video Question Answering (VideoQA) benchmarks often allow questions to be answered from a single salient cue,...
Extracting Anyon Statistics from Neural Network Fractional Quantum Hall States https://arxiv.org/abs/2512.15872
arXiv.org
Extracting Anyon Statistics from Neural Network Fractional Quantum...
Fractional quantum Hall states host emergent anyons with exotic exchange statistics, but obtaining direct access to their topological properties in real systems remains a challenge. Neural-network...
IMProofBench Informal Mathematical Proof Benchmark https://improofbench.math.ethz.ch/
Experimental Quantum Error Correction below the Surface Code Threshold via All-Microwave Leakage Suppression https://journals.aps.org/prl/abstract/10.1103/rqkg-dw31
Physical Review Letters
Experimental Quantum Error Correction below the Surface Code Threshold via All-Microwave Leakage Suppression
A new strategy improves error correction in quantum computation by mitigating the effects of qubits escaping from their intended states.
👍1
Observation of disorder-induced superfluidity https://arxiv.org/abs/2512.21416
arXiv.org
Observation of disorder-induced superfluidity
The emergence of states with long-range correlations in a disordered landscape is rare, as disorder typically suppresses the particle mobility required for long-range coherence. But when more than...
❤1
LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories) https://arxiv.org/abs/2512.24796
arXiv.org
LeanCat: A Benchmark Suite for Formal Category Theory in Lean...
Large language models (LLMs) have made rapid progress in formal theorem proving, yet current benchmarks under-measure the kind of abstraction and library-mediated reasoning that organizes modern...
🔥3
Non-Abelian topological superconductivity from melting Abelian fractional Chern insulators https://arxiv.org/abs/2512.17996
arXiv.org
Non-Abelian topological superconductivity from melting Abelian...
Fractional Chern insulators (FCI) are exotic phases of matter realized at partial filling of a Chern band that host fractionally charged anyon excitations. Recent numerical studies in several...
❤1💩1
Fermi Sets: Universal and interpretable neural architectures for fermions https://arxiv.org/abs/2601.02508
arXiv.org
Fermi Sets: Universal and interpretable neural architectures for fermions
We introduce Fermi Sets, a universal and physically interpretable neural architecture for fermionic many-body wavefunctions. Building on a ``parity-graded'' representation [1], we prove that any...
👾1
Visualizing interaction-driven restructuring of quantum Hall edge states https://arxiv.org/abs/2511.00156
arXiv.org
Visualizing interaction-driven restructuring of quantum Hall edge states
Many topological phases host gapless boundary modes that can be dramatically modified by electronic interactions. Even for the long-studied edge modes of quantum Hall phases, forming at the...
Kitaev interactions in the van der Waals antiferromagnet VBr3 https://arxiv.org/abs/2601.05001
arXiv.org
Kitaev interactions in the van der Waals antiferromagnet VBr3
Van der Waals materials hosting Kitaev interactions are promising platforms for exploring exotic quantum phenomena. Here, we report inelastic neutron scattering investigations of the van der Waals...
🔥2❤1
Forwarded from Neural Shit
Наткнулся на интересную статью. Это буквально самый тупой (и одновременно гениальный) промпт-хак.
Исследователи из Google Research выяснили, что если нейронка тупит, не надо придумывать сложные цепочки рассуждений или молиться духам машины. Нужно просто повторить промпт два раза подряд. Буквально CTRL+C —> CTRL+V.
Почему? Почти все современные LLM читают слева направо. Токены в начале промпта "не видят" токенов в конце. А когда вы дублируете запрос, вторая копия промпта через механизм внимания может смотреть на первую копию целиком. Получается, что модель сразу видит весь контекст и лучше понимает задачу.
Протестили на Gemini, GPT-4o, Claude 3 и DeepSeek. По цифрам из статьи:
— Метод победил в 47 из 70 тестов (0 поражений, остальные — ничья).
— В задачах на поиск инфы в тексте точность взлетала с убогих 21% до 97%!
— Время генерации не растет
И да, работает это только на моделях с выключенным режимом размышлений, ибо модели в reasoning режиме сами повторяют себе запрос в процессе.
Промпт-инжиниринг, который мы заслужили
тут статья
Исследователи из Google Research выяснили, что если нейронка тупит, не надо придумывать сложные цепочки рассуждений или молиться духам машины. Нужно просто повторить промпт два раза подряд. Буквально CTRL+C —> CTRL+V.
Почему? Почти все современные LLM читают слева направо. Токены в начале промпта "не видят" токенов в конце. А когда вы дублируете запрос, вторая копия промпта через механизм внимания может смотреть на первую копию целиком. Получается, что модель сразу видит весь контекст и лучше понимает задачу.
Протестили на Gemini, GPT-4o, Claude 3 и DeepSeek. По цифрам из статьи:
— Метод победил в 47 из 70 тестов (0 поражений, остальные — ничья).
— В задачах на поиск инфы в тексте точность взлетала с убогих 21% до 97%!
— Время генерации не растет
И да, работает это только на моделях с выключенным режимом размышлений, ибо модели в reasoning режиме сами повторяют себе запрос в процессе.
Промпт-инжиниринг, который мы заслужили
тут статья
arXiv.org
Prompt Repetition Improves Non-Reasoning LLMs
When not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.
👍28🗿11😁4💩3🤯2❤1
On neural scaling and the quanta hypothesis
https://ericjmichaud.com/quanta/
https://ericjmichaud.com/quanta/
❤1
Falcon-H1-Tiny: A series of extremely small, yet powerful language models redefining capabilities at small scale
https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost
https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost
huggingface.co
Falcon-H1-Tiny: A series of extremely small, yet powerful language models redefining capabilities at small scale - a Hugging Face…
Discover amazing ML apps made by the community
Graviton detection and the quantization of gravity https://arxiv.org/abs/2308.12988
arXiv.org
Graviton detection and the quantization of gravity
We revisit a question asked by Dyson: "Is a graviton detectable?" We demonstrate that in both Dyson's original sense and in a more modern measurement-theoretic sense, it is possible to construct a...
🐳5
BabyVision: Visual Reasoning Beyond Language https://arxiv.org/abs/2601.06521
arXiv.org
BabyVision: Visual Reasoning Beyond Language
While humans develop core visual skills long before acquiring language, contemporary Multimodal LLMs (MLLMs) still rely heavily on linguistic priors to compensate for their fragile visual...
👍2
GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers https://gptzero.me/news/neurips/
AI Detection Resources | GPTZero
GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers
GPTZero's analysis 4841 papers accepted by NeurIPS 2025 show there are at least 100 with confirmed hallucinations
🫡16🤣3🔥2🤯1