Explorer
- .. (Parent Directory)
- 0106 Gao - Metadata Conditioning Accelerates Language Model Pre-training.pdf
- 0107 Sun - Scaling Laws for Floating Point Quantization Training.pdf
- 0109 Guan - rStar-Math- Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking.pdf
- 0113 Jacovi - The FACTS Grounding Leaderboard- Benchmarking LLMs' Ability to Ground Responses to Long-Form Input.pdf
- 0113 Xiang - Towards System 2 Reasoning in LLMs- Learning How to Think With Meta Chain-of-Thought.pdf
- 0124 Farquhar - MONA- Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking.pdf
- 0130 Ye - FlashInfer- Efficient and Customizable Attention Engine for LLM Inference Serving.pdf
- 0203 Muennighoff - s1- Simple test-time scaling.pdf
- 0204 Muennighoff - s1- Simple test-time scaling.pdf
- 0204 Zhang - Ladder-residual- parallelism-aware architecture for accelerating large model inference with communication overlapping.pdf
- 0206 Sharma - Constitutional Classifiers- Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.pdf
- 0206 Ye - LIMO- Less is More for Reasoning.pdf
- 0209 Ye - LIMO- Less is More for Reasoning.pdf
- 0210 Geiping - Scaling up Test-Time Compute with Latent Reasoning- A Recurrent Depth Approach.pdf
- 0216 Betley - Tell me about yourself- LLMs are aware of their learned behaviors.pdf
- 0216 Busbridge - Distillation Scaling Laws.pdf
- 0216 Lee - Evolving Deeper LLM Thinking.pdf
- 0216 Lv - Autonomy-of-Experts Models.pdf
- 0216 Ye - LIMO- Less is More for Reasoning.pdf
- 0218 Yuan - Native Sparse Attention- Hardware-Aligned and Natively Trainable Sparse Attention.pdf
- 0220 Bai - Qwen2.5-VL Technical Report.pdf
- 0224 Askari-Hemmat - Improving the Scaling Laws of Synthetic Data with Deliberate Practice.pdf
- 0306 Behrouz - Titans- Learning to Memorize at Test Time.pdf
- 0306 DeepSeek-AI - DeepSeek-V2- A Strong, Economical, and Efficient Mixture-of-Experts Language Model.pdf
- 0306 Yang - Qwen2 Technical Report.pdf
- 0307 Wei - SWE-RL- Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution.pdf
- 0307 Yin - Which Attention Heads Matter for In-Context Learning?.pdf
- 0312 Zhou - Inductive Moment Matching.pdf
- 0317 Ivison - Large-Scale Data Selection for Instruction Tuning.pdf
- 0317 Li - Predictable Scale- Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining.pdf
- 0317 Zhu - Transformers without Normalization.pdf
- 0321 Rodriguez - A Framework for Evaluating Emerging Cyberattack Capabilities of AI.pdf
- 0323 understand-r1-zero.pdf.pdf
- 0324 Baker - Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation.pdf
- 0324 Guan - Deliberative Alignment- Reasoning Enables Safer Language Models.pdf
- 0324 Zaremba - Trading Inference-Time Compute for Adversarial Robustness.pdf
- 0405 Cohere - Command A- An Enterprise-Ready Large Language Model.pdf
- 0405 Ivison - Unpacking DPO and PPO- Disentangling Best Practices for Learning from Preference Feedback.pdf
- 0405 Ji - The First Few Tokens Are All You Need- An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models.pdf
- 0405 Lin - Efficient Model Development through Fine-tuning Transfer.pdf
- 0405 Liu - Inference-Time Scaling for Generalist Reward Modeling.pdf
- 0405 Qin - Scaling Laws of Synthetic Data for Language Models.pdf
- 0405 Ruan - Reasoning to Learn from Latent Thoughts.pdf
- 0429 Xiao - Densing Law of LLMs.pdf
- 0430 Singh - The Leaderboard Illusion.pdf
- 0504 Bai - Qwen Technical Report.pdf
- 0504 Bai - Qwen-VL- A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond.pdf
- 0504 DeepSeek-AI - DeepSeek-Coder-V2- Breaking the Barrier of Closed-Source Models in Code Intelligence.pdf
- 0504 DeepSeek-AI - DeepSeek-R1- Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.pdf
- 0504 DeepSeek-AI - DeepSeek-V2- A Strong, Economical, and Efficient Mixture-of-Experts Language Model.pdf
- 0504 DeepSeek-AI - DeepSeek-V3 Technical Report.pdf
- 0504 Qwen - Qwen2.5 Technical Report.pdf
- 0514 AlphaEvolve.pdf.pdf
- 0516 Zhao - Insights into DeepSeek-V3- Scaling Challenges and Reflections on Hardware for AI Architectures.pdf
- 0517 Laban - LLMs Get Lost In Multi-Turn Conversation.pdf
- 0519 AI - Practical Efficiency of Muon for Pretraining.pdf
- 0519 AI - Rethinking Reflection in Pre-Training.pdf
- 0519 Chen - Parallel Scaling Law for Language Models.pdf
- 0519 Debenedetti - Defeating Prompt Injections by Design.pdf
- 0519 Engels - Scaling Laws For Scalable Oversight.pdf
- 0519 Faria - Sample, Don't Search- Rethinking Test-Time Alignment for Language Models.pdf
- 0519 Li - When Bad Data Leads to Good Models.pdf
- 0519 Lin - Sleep-time Compute- Beyond Inference Scaling at Test-time.pdf
- 0519 Nikolić - The Jailbreak Tax- How Useful are Your Jailbreak Outputs?.pdf
- 0519 Wang - Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning.pdf
- 0519 Wang - Think Deep, Think Fast- Investigating Efficiency of Verifier-free Inference-time-scaling Methods.pdf
- 0519 Wang - Tina- Tiny Reasoning Models via LoRA.pdf
- 0519 Yang - Qwen3 Technical Report.pdf
- 0519 Yue - Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?.pdf
- 0519 Zhang - LoRI- Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.pdf
- 0519 Zhao - Absolute Zero- Reinforced Self-play Reasoning with Zero Data.pdf
- 0605 Defazio - Why Gradients Rapidly Increase Near the End of Training.pdf
- 0608 Łańcucki - Inference-Time Hyper-Scaling with KV Cache Compression.pdf
- 0613 Lee - Distillation Robustifies Unlearning.pdf
- 0613 Zweiger - Self-Adapting Language Models.pdf
- 0618 Wen - Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs.pdf
- 0618 gemini_v2_5_report.pdf.pdf
- 0620 Confidential_Inference_Paper.pdf.pdf
- DeepSeek_R1.pdf
- Kimi_k1.5.pdf