Forwarded from Axis of Ordinary
"What is the performance limit when scaling LLM inference? Sky's the limit.
We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient."
Paper: http://arxiv.org/abs/2402.12875
We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient."
Paper: http://arxiv.org/abs/2402.12875
Media is too big
VIEW IN TELEGRAM
Mirages in the Energy Landscape of Soft Sphere Packings
Abstract:
The energy landscape is central to understanding low-temperature and athermal systems, like jammed soft spheres. The geometry of this high-dimensional energy surface is controlled by a plethora of minima and their associated basins of attraction that escape analytical treatment and are thus studied numerically. We show that the ODE solver with the best time-for-error for this problem, CVODE, is orders of magnitude faster than other steepest-descent solvers for such systems. Using this algorithm, we provide unequivocal evidence that optimizers widely used in computational studies destroy all semblance of the true landscape geometry, even in moderate dimensions. Using various geometric indicators, both low- and high-dimensional, we show that results on the fractality of basins of attraction originated from the use of inadequate mapping strategies, as basins are actually smooth structures with well-defined length scales. Thus, a vast number of past claims on energy landscapes need to be re-evaluated due to the use of inadequate numerical methods.
https://www.arxiv.org/abs/2409.12113
Abstract:
The energy landscape is central to understanding low-temperature and athermal systems, like jammed soft spheres. The geometry of this high-dimensional energy surface is controlled by a plethora of minima and their associated basins of attraction that escape analytical treatment and are thus studied numerically. We show that the ODE solver with the best time-for-error for this problem, CVODE, is orders of magnitude faster than other steepest-descent solvers for such systems. Using this algorithm, we provide unequivocal evidence that optimizers widely used in computational studies destroy all semblance of the true landscape geometry, even in moderate dimensions. Using various geometric indicators, both low- and high-dimensional, we show that results on the fractality of basins of attraction originated from the use of inadequate mapping strategies, as basins are actually smooth structures with well-defined length scales. Thus, a vast number of past claims on energy landscapes need to be re-evaluated due to the use of inadequate numerical methods.
https://www.arxiv.org/abs/2409.12113
🔥1
Forwarded from 🎓 TIL - Today I Learned but no 🐝
Forwarded from Homo Technicus
Сьогодні вийшов мій великий проект: бібліотека Ergodicity на Python та книга про те, як і навіщо нею користуватися: https://www.ergodicitylibrary.com.
Тема технічна і специфічна. Насамперед, може бути цікаво для людей у quant finance, але можна використовувати і в біології чи в фізиці.
Хоча я звичайно вважаю, що оптимальній поведінці в умовах радикальної непередбачуваності треба навчатися всім.
Кому цікаво, можете почитати на сайті, про що вона саме.
Один з трьох великих запланованих програмних проектів зроблено. Далі буде складніше.
Тема технічна і специфічна. Насамперед, може бути цікаво для людей у quant finance, але можна використовувати і в біології чи в фізиці.
Хоча я звичайно вважаю, що оптимальній поведінці в умовах радикальної непередбачуваності треба навчатися всім.
Кому цікаво, можете почитати на сайті, про що вона саме.
Один з трьох великих запланованих програмних проектів зроблено. Далі буде складніше.
Ergodicity Library
Home | Ergodicity Library
👍1