Explorer
- .. (Parent Directory)
- 0701 $μ$nit Scaling- Simple and Scalable FP8 LLM Training.pdf
- 0701 ERNIE_Technical_Report.pdf.pdf
- 0701 Narayan - $μ$nit Scaling- Simple and Scalable FP8 LLM Training.pdf
- 0702 Tang - Pangu Pro MoE- Mixture of Grouped Experts for Efficient Sparsity.pdf
- 0702 Vaswani - Attention Is All You Need.pdf
- 0704 Roy - Fast and Simplex- 2-Simplicial Attention in Triton.pdf
- 0704 Zhu - Establishing Best Practices for Building Rigorous Agentic Benchmarks.pdf
- 0705 Gladstone - Energy-Based Transformers are Scalable Learners and Thinkers.pdf
- 0708 Gelada - Scaling Context Requires Rethinking Attention.pdf
- 0710 Liang - Drag-and-Drop LLMs- Zero-Shot Prompt-to-Weights.pdf
- 0710 Qiu - Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks.pdf
- 0711 MiniMax - MiniMax-01- Scaling Foundation Models with Lightning Attention.pdf
- 0715 Comanici - Gemini 2.5- Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.pdf
- 0722 Gema - Inverse Scaling in Test-Time Compute.pdf
- 0722 Prabhudesai - Diffusion Beats Autoregressive in Data-Constrained Settings.pdf
- 0722 Zhou - Apple Intelligence Foundation Language Models- Tech Report 2025.pdf
- 0723 Han - Deep Researcher with Test-Time Diffusion.pdf
- 0724 Liu - The Serial Scaling Hypothesis.pdf
- 0725 Fujii - Rewriting Pre-Training Data Boosts LLM Performance in Math and Code.pdf
- 0725 Liu - ProRL- Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models.pdf
- 0727 Dherin - Learning without training- The implicit dynamics of in-context learning.pdf
- 0728 Agrawal - GEPA- Reflective Prompt Evolution Can Outperform Reinforcement Learning.pdf
- 0728 Calian - DataRater- Meta-Learned Dataset Curation.pdf
- 0728 Liu - AlphaGo Moment for Model Architecture Discovery.pdf
- 0729 Gloeckle - Better & Faster Large Language Models via Multi-token Prediction.pdf
- 0729 Qin - Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved).pdf
- 0730 Gunjal - Rubrics as Rewards- Reinforcement Learning Beyond Verifiable Domains.pdf
- 0801 Chuang - Meta CLIP 2- A Worldwide Scaling Recipe.pdf
- 0801 Dong - Reinforcement Pre-Training.pdf
- 0801 Zhou - Solving Formal Math Problems by Decomposition and Iterative Reflection.pdf
- 0809 Samragh - Your LLM Knows the Future- Uncovering Its Multi-Token Prediction Potential.pdf
- 0812 Team - GLM-4.5- Agentic, Reasoning, and Coding (ARC) Foundation Models.pdf
- 0812 Wu - On the Generalization of SFT- A Reinforcement Learning Perspective with Reward Rectification.pdf
- 0813 Agarwal - On-Policy Distillation of Language Models- Learning from Self-Generated Mistakes.pdf
- 0813 Hu - REINFORCE++- An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models.pdf
- 0818 Maini - BeyondWeb- Lessons from Scaling Synthetic Data for Trillion-scale Pretraining.pdf
- 0818 Radhakrishna - Apriel-Nemotron-15B-Thinker.pdf
- 0826 Chen - Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models.pdf
- 0829 Ethayarajh - KTO- Model Alignment as Prospect Theoretic Optimization.pdf
- 0831 Weller - On the Theoretical Limitations of Embedding-Based Retrieval.pdf
- 0908 Lin - REFRAG- Rethinking RAG based Decoding.pdf
- 0916 Team - LongCat-Flash Technical Report.pdf
- 0924 Li - Reinforcement Learning on Pre-Training Data.pdf
- 0928 Geng - X-Omni- Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again.pdf
- 0928 Tang - On a few pitfalls in KL divergence gradient estimation for RL.pdf
- 0929 Kim - In Their Own Words- Reasoning Traces Tailored for Small Models Make Them Better Reasoners.pdf
- 0929 Xu - Single-stream Policy Optimization.pdf
- Step3-Sys-Tech-Report.pdf
- The Era of Experience Paper.pdf
- kimi2_tech_report.pdf
- longcat_tech_report.pdf
- metadata.jsonl