Explorer
- .. (Parent Directory)
- 0701 Narayan - $μ$nit Scaling- Simple and Scalable FP8 LLM Training.txt
- 0702 Tang - Pangu Pro MoE- Mixture of Grouped Experts for Efficient Sparsity.txt
- 0702 Vaswani - Attention Is All You Need.txt
- 0704 Roy - Fast and Simplex- 2-Simplicial Attention in Triton.txt
- 0704 Zhu - Establishing Best Practices for Building Rigorous Agentic Benchmarks.txt
- 0705 Gladstone - Energy-Based Transformers are Scalable Learners and Thinkers.txt
- 0708 Gelada - Scaling Context Requires Rethinking Attention.txt
- 0710 Liang - Drag-and-Drop LLMs- Zero-Shot Prompt-to-Weights.txt
- 0710 Qiu - Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks.txt
- 0715 Comanici - Gemini 2.5- Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.txt
- 0722 Prabhudesai - Diffusion Beats Autoregressive in Data-Constrained Settings.txt
- 0722 Zhou - Apple Intelligence Foundation Language Models- Tech Report 2025.txt
- 0723 Han - Deep Researcher with Test-Time Diffusion.txt
- 0724 Liu - The Serial Scaling Hypothesis.txt
- 0725 Liu - ProRL- Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models.txt
- 0727 Dherin - Learning without training- The implicit dynamics of in-context learning.txt
- 0728 Calian - DataRater- Meta-Learned Dataset Curation.txt
- 0729 Qin - Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved).txt
- 0730 Gunjal - Rubrics as Rewards- Reinforcement Learning Beyond Verifiable Domains.txt
- 0801 Chuang - Meta CLIP 2- A Worldwide Scaling Recipe.txt
- 0801 Dong - Reinforcement Pre-Training.txt
- 0801 Zhou - Solving Formal Math Problems by Decomposition and Iterative Reflection.txt
- 0809 Samragh - Your LLM Knows the Future- Uncovering Its Multi-Token Prediction Potential.txt
- 0812 Wu - On the Generalization of SFT- A Reinforcement Learning Perspective with Reward Rectification.txt
- 0813 Agarwal - On-Policy Distillation of Language Models- Learning from Self-Generated Mistakes.txt
- 0813 Hu - REINFORCE++- An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models.txt
- 0818 Radhakrishna - Apriel-Nemotron-15B-Thinker.txt
- 0826 Chen - Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models.txt
- 0829 Ethayarajh - KTO- Model Alignment as Prospect Theoretic Optimization.txt
- 0831 Weller - On the Theoretical Limitations of Embedding-Based Retrieval.txt
- 0908 Lin - REFRAG- Rethinking RAG based Decoding.txt
- 0916 Team - LongCat-Flash Technical Report.txt
- 0924 Li - Reinforcement Learning on Pre-Training Data.txt
- 0928 Geng - X-Omni- Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again.txt
- 0928 Tang - On a few pitfalls in KL divergence gradient estimation for RL.txt
- 0929 Kim - In Their Own Words- Reasoning Traces Tailored for Small Models Make Them Better Reasoners.txt
- 0929 Xu - Single-stream Policy Optimization.txt