Efficiency Breakthroughs
Novel techniques for making LLMs faster, smaller, and more efficient.
Overview
Research направленная на оптимизацию вычислительных ресурсов.
Key Areas
Inference Optimization
- KV cache compression
- Speculative decoding
Model Compression
- Knowledge distillation
- Quantization advances
Architecture Efficiency
- Sparse attention
- Efficient rotary embeddings