Explorer

Root / papers / 2025h1

Folder View | Tree View

.. (Parent Directory)
0106 Gao - Metadata Conditioning Accelerates Language Model Pre-training.pdf
0107 Sun - Scaling Laws for Floating Point Quantization Training.pdf
0109 Guan - rStar-Math- Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking.pdf
0113 Jacovi - The FACTS Grounding Leaderboard- Benchmarking LLMs' Ability to Ground Responses to Long-Form Input.pdf
0113 Xiang - Towards System 2 Reasoning in LLMs- Learning How to Think With Meta Chain-of-Thought.pdf
0124 Farquhar - MONA- Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking.pdf
0130 Ye - FlashInfer- Efficient and Customizable Attention Engine for LLM Inference Serving.pdf
0203 Muennighoff - s1- Simple test-time scaling.pdf
0204 Muennighoff - s1- Simple test-time scaling.pdf
0204 Zhang - Ladder-residual- parallelism-aware architecture for accelerating large model inference with communication overlapping.pdf
0206 Sharma - Constitutional Classifiers- Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.pdf
0206 Ye - LIMO- Less is More for Reasoning.pdf
0209 Ye - LIMO- Less is More for Reasoning.pdf
0210 Geiping - Scaling up Test-Time Compute with Latent Reasoning- A Recurrent Depth Approach.pdf
0216 Betley - Tell me about yourself- LLMs are aware of their learned behaviors.pdf
0216 Busbridge - Distillation Scaling Laws.pdf
0216 Lee - Evolving Deeper LLM Thinking.pdf
0216 Lv - Autonomy-of-Experts Models.pdf
0216 Ye - LIMO- Less is More for Reasoning.pdf
0218 Yuan - Native Sparse Attention- Hardware-Aligned and Natively Trainable Sparse Attention.pdf
0220 Bai - Qwen2.5-VL Technical Report.pdf
0224 Askari-Hemmat - Improving the Scaling Laws of Synthetic Data with Deliberate Practice.pdf
0306 Behrouz - Titans- Learning to Memorize at Test Time.pdf
0306 DeepSeek-AI - DeepSeek-V2- A Strong, Economical, and Efficient Mixture-of-Experts Language Model.pdf
0306 Yang - Qwen2 Technical Report.pdf
0307 Wei - SWE-RL- Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution.pdf
0307 Yin - Which Attention Heads Matter for In-Context Learning?.pdf
0312 Zhou - Inductive Moment Matching.pdf
0317 Ivison - Large-Scale Data Selection for Instruction Tuning.pdf
0317 Li - Predictable Scale- Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining.pdf
0317 Zhu - Transformers without Normalization.pdf
0321 Rodriguez - A Framework for Evaluating Emerging Cyberattack Capabilities of AI.pdf
0323 understand-r1-zero.pdf.pdf
0324 Baker - Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation.pdf
0324 Guan - Deliberative Alignment- Reasoning Enables Safer Language Models.pdf
0324 Zaremba - Trading Inference-Time Compute for Adversarial Robustness.pdf
0405 Cohere - Command A- An Enterprise-Ready Large Language Model.pdf
0405 Ivison - Unpacking DPO and PPO- Disentangling Best Practices for Learning from Preference Feedback.pdf
0405 Ji - The First Few Tokens Are All You Need- An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models.pdf
0405 Lin - Efficient Model Development through Fine-tuning Transfer.pdf
0405 Liu - Inference-Time Scaling for Generalist Reward Modeling.pdf
0405 Qin - Scaling Laws of Synthetic Data for Language Models.pdf
0405 Ruan - Reasoning to Learn from Latent Thoughts.pdf
0429 Xiao - Densing Law of LLMs.pdf
0430 Singh - The Leaderboard Illusion.pdf
0504 Bai - Qwen Technical Report.pdf
0504 Bai - Qwen-VL- A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond.pdf
0504 DeepSeek-AI - DeepSeek-Coder-V2- Breaking the Barrier of Closed-Source Models in Code Intelligence.pdf
0504 DeepSeek-AI - DeepSeek-R1- Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.pdf
0504 DeepSeek-AI - DeepSeek-V2- A Strong, Economical, and Efficient Mixture-of-Experts Language Model.pdf
0504 DeepSeek-AI - DeepSeek-V3 Technical Report.pdf
0504 Qwen - Qwen2.5 Technical Report.pdf
0514 AlphaEvolve.pdf.pdf
0516 Zhao - Insights into DeepSeek-V3- Scaling Challenges and Reflections on Hardware for AI Architectures.pdf
0517 Laban - LLMs Get Lost In Multi-Turn Conversation.pdf
0519 AI - Practical Efficiency of Muon for Pretraining.pdf
0519 AI - Rethinking Reflection in Pre-Training.pdf
0519 Chen - Parallel Scaling Law for Language Models.pdf
0519 Debenedetti - Defeating Prompt Injections by Design.pdf
0519 Engels - Scaling Laws For Scalable Oversight.pdf
0519 Faria - Sample, Don't Search- Rethinking Test-Time Alignment for Language Models.pdf
0519 Li - When Bad Data Leads to Good Models.pdf
0519 Lin - Sleep-time Compute- Beyond Inference Scaling at Test-time.pdf
0519 Nikolić - The Jailbreak Tax- How Useful are Your Jailbreak Outputs?.pdf
0519 Wang - Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning.pdf
0519 Wang - Think Deep, Think Fast- Investigating Efficiency of Verifier-free Inference-time-scaling Methods.pdf
0519 Wang - Tina- Tiny Reasoning Models via LoRA.pdf
0519 Yang - Qwen3 Technical Report.pdf
0519 Yue - Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?.pdf
0519 Zhang - LoRI- Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.pdf
0519 Zhao - Absolute Zero- Reinforced Self-play Reasoning with Zero Data.pdf
0605 Defazio - Why Gradients Rapidly Increase Near the End of Training.pdf
0608 Łańcucki - Inference-Time Hyper-Scaling with KV Cache Compression.pdf
0613 Lee - Distillation Robustifies Unlearning.pdf
0613 Zweiger - Self-Adapting Language Models.pdf
0618 Wen - Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs.pdf
0618 gemini_v2_5_report.pdf.pdf
0620 Confidential_Inference_Paper.pdf.pdf
DeepSeek_R1.pdf
Kimi_k1.5.pdf