How Good Is PagedAttention at Memory Sharing?
#llms #pagedattention #memorysharing #parallelsampling #beamsharing #parallelsequences #orca #orcabaselines
https://hackernoon.com/how-good-is-pagedattention-at-memory-sharing
#llms #pagedattention #memorysharing #parallelsampling #beamsharing #parallelsequences #orca #orcabaselines
https://hackernoon.com/how-good-is-pagedattention-at-memory-sharing
Hackernoon
How Good Is PagedAttention at Memory Sharing?
We evaluate the effectiveness of memory sharing in PagedAttention with two popular sampling methods: parallel sampling and beam search.