Explorer
- .. (Parent Directory)
- 0306 Behrouz - Titans- Learning to Memorize at Test Time.txt
- 0306 Yang - Qwen2 Technical Report.txt
- 0307 Wei - SWE-RL- Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution.txt
- 0307 Yin - Which Attention Heads Matter for In-Context Learning?.txt
- 0312 Zhou - Inductive Moment Matching.txt
- 0317 Ivison - Large-Scale Data Selection for Instruction Tuning.txt
- 0317 Li - Predictable Scale- Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining.txt
- 0317 Zhu - Transformers without Normalization.txt
- 0321 Rodriguez - A Framework for Evaluating Emerging Cyberattack Capabilities of AI.txt
- 0324 Baker - Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation.txt
- 0324 Guan - Deliberative Alignment- Reasoning Enables Safer Language Models.txt
- 0324 Zaremba - Trading Inference-Time Compute for Adversarial Robustness.txt
- 0405 Cohere - Command A- An Enterprise-Ready Large Language Model.txt
- 0405 Ivison - Unpacking DPO and PPO- Disentangling Best Practices for Learning from Preference Feedback.txt
- 0405 Ji - The First Few Tokens Are All You Need- An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models.txt
- 0405 Lin - Efficient Model Development through Fine-tuning Transfer.txt
- 0405 Liu - Inference-Time Scaling for Generalist Reward Modeling.txt
- 0405 Qin - Scaling Laws of Synthetic Data for Language Models.txt
- 0405 Ruan - Reasoning to Learn from Latent Thoughts.txt
- 0429 Xiao - Densing Law of LLMs.txt
- 0430 Singh - The Leaderboard Illusion.txt
- 0504 Bai - Qwen Technical Report.txt
- 0504 Bai - Qwen-VL- A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond.txt
- 0504 DeepSeek-AI - DeepSeek-Coder-V2- Breaking the Barrier of Closed-Source Models in Code Intelligence.txt
- 0504 DeepSeek-AI - DeepSeek-R1- Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.txt
- 0504 DeepSeek-AI - DeepSeek-V2- A Strong, Economical, and Efficient Mixture-of-Experts Language Model.txt
- 0504 DeepSeek-AI - DeepSeek-V3 Technical Report.txt
- 0504 Qwen - Qwen2.5 Technical Report.txt
- 0516 Zhao - Insights into DeepSeek-V3- Scaling Challenges and Reflections on Hardware for AI Architectures.txt
- 0517 Laban - LLMs Get Lost In Multi-Turn Conversation.txt
- 0519 AI - Practical Efficiency of Muon for Pretraining.txt
- 0519 AI - Rethinking Reflection in Pre-Training.txt
- 0519 Chen - Parallel Scaling Law for Language Models.txt
- 0519 Debenedetti - Defeating Prompt Injections by Design.txt
- 0519 Engels - Scaling Laws For Scalable Oversight.txt
- 0519 Faria - Sample, Don't Search- Rethinking Test-Time Alignment for Language Models.txt
- 0519 Li - When Bad Data Leads to Good Models.txt
- 0519 Lin - Sleep-time Compute- Beyond Inference Scaling at Test-time.txt
- 0519 Wang - Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning.txt
- 0519 Wang - Think Deep, Think Fast- Investigating Efficiency of Verifier-free Inference-time-scaling Methods.txt
- 0519 Wang - Tina- Tiny Reasoning Models via LoRA.txt
- 0519 Yang - Qwen3 Technical Report.txt
- 0519 Yue - Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?.txt
- 0519 Zhang - LoRI- Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.txt
- 0519 Zhao - Absolute Zero- Reinforced Self-play Reasoning with Zero Data.txt
- 0605 Defazio - Why Gradients Rapidly Increase Near the End of Training.txt
- 0613 Lee - Distillation Robustifies Unlearning.txt
- 0613 Zweiger - Self-Adapting Language Models.txt
- 0618 Wen - Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs.txt