Explorer

Root / papers / 2025Q3

Folder View | Tree View

.. (Parent Directory)
0701 $μ$nit Scaling- Simple and Scalable FP8 LLM Training.pdf
0701 ERNIE_Technical_Report.pdf.pdf
0701 Narayan - $μ$nit Scaling- Simple and Scalable FP8 LLM Training.pdf
0702 Tang - Pangu Pro MoE- Mixture of Grouped Experts for Efficient Sparsity.pdf
0702 Vaswani - Attention Is All You Need.pdf
0704 Roy - Fast and Simplex- 2-Simplicial Attention in Triton.pdf
0704 Zhu - Establishing Best Practices for Building Rigorous Agentic Benchmarks.pdf
0705 Gladstone - Energy-Based Transformers are Scalable Learners and Thinkers.pdf
0708 Gelada - Scaling Context Requires Rethinking Attention.pdf
0710 Liang - Drag-and-Drop LLMs- Zero-Shot Prompt-to-Weights.pdf
0710 Qiu - Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks.pdf
0711 MiniMax - MiniMax-01- Scaling Foundation Models with Lightning Attention.pdf
0715 Comanici - Gemini 2.5- Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.pdf
0722 Gema - Inverse Scaling in Test-Time Compute.pdf
0722 Prabhudesai - Diffusion Beats Autoregressive in Data-Constrained Settings.pdf
0722 Zhou - Apple Intelligence Foundation Language Models- Tech Report 2025.pdf
0723 Han - Deep Researcher with Test-Time Diffusion.pdf
0724 Liu - The Serial Scaling Hypothesis.pdf
0725 Fujii - Rewriting Pre-Training Data Boosts LLM Performance in Math and Code.pdf
0725 Liu - ProRL- Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models.pdf
0727 Dherin - Learning without training- The implicit dynamics of in-context learning.pdf
0728 Agrawal - GEPA- Reflective Prompt Evolution Can Outperform Reinforcement Learning.pdf
0728 Calian - DataRater- Meta-Learned Dataset Curation.pdf
0728 Liu - AlphaGo Moment for Model Architecture Discovery.pdf
0729 Gloeckle - Better & Faster Large Language Models via Multi-token Prediction.pdf
0729 Qin - Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved).pdf
0730 Gunjal - Rubrics as Rewards- Reinforcement Learning Beyond Verifiable Domains.pdf
0801 Chuang - Meta CLIP 2- A Worldwide Scaling Recipe.pdf
0801 Dong - Reinforcement Pre-Training.pdf
0801 Zhou - Solving Formal Math Problems by Decomposition and Iterative Reflection.pdf
0809 Samragh - Your LLM Knows the Future- Uncovering Its Multi-Token Prediction Potential.pdf
0812 Team - GLM-4.5- Agentic, Reasoning, and Coding (ARC) Foundation Models.pdf
0812 Wu - On the Generalization of SFT- A Reinforcement Learning Perspective with Reward Rectification.pdf
0813 Agarwal - On-Policy Distillation of Language Models- Learning from Self-Generated Mistakes.pdf
0813 Hu - REINFORCE++- An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models.pdf
0818 Maini - BeyondWeb- Lessons from Scaling Synthetic Data for Trillion-scale Pretraining.pdf
0818 Radhakrishna - Apriel-Nemotron-15B-Thinker.pdf
0826 Chen - Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models.pdf
0829 Ethayarajh - KTO- Model Alignment as Prospect Theoretic Optimization.pdf
0831 Weller - On the Theoretical Limitations of Embedding-Based Retrieval.pdf
0908 Lin - REFRAG- Rethinking RAG based Decoding.pdf
0916 Team - LongCat-Flash Technical Report.pdf
0924 Li - Reinforcement Learning on Pre-Training Data.pdf
0928 Geng - X-Omni- Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again.pdf
0928 Tang - On a few pitfalls in KL divergence gradient estimation for RL.pdf
0929 Kim - In Their Own Words- Reasoning Traces Tailored for Small Models Make Them Better Reasoners.pdf
0929 Xu - Single-stream Policy Optimization.pdf
Step3-Sys-Tech-Report.pdf
The Era of Experience Paper.pdf
kimi2_tech_report.pdf
longcat_tech_report.pdf
metadata.jsonl