Explorer
- .. (Parent Directory)
- 1001 Tian - Reinforcement Mid-Training.pdf
- 1003 Kang - Demystifying Synthetic Data in LLM Pre-training- A Systematic Study of Scaling Laws, Benefits, and Pitfalls.pdf
- 1003 Wu - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.pdf
- 1004 Hatamizadeh - RLP- Reinforcement as a Pretraining Objective.pdf
- 1007 Wang - Critique Fine-Tuning- Learning to Critique is More Effective than Learning to Imitate.pdf
- 1009 Cen - Webscale-RL- Automated Data Pipeline for Scaling RL Data to Pretraining Levels.pdf
- 1010 NVIDIA - Pretraining Large Language Models with NVFP4.pdf
- 1016 Davis - What is the objective of reasoning with reinforcement learning?.pdf
- 1016 Khatri - The Art of Scaling Reinforcement Learning Compute for LLMs.pdf
- 1019 Ouyang - ReasoningBank- Scaling Agent Self-Evolving with Reasoning Memory.pdf
- 1030 Wu - Parallel Loop Transformer for Efficient Test-Time Computation Scaling.pdf
- 1030 Zhu - Scaling Latent Reasoning via Looped Language Models.pdf
- 1031 Momeni - In-context Continual Learning Assisted by an External Continual Learner.pdf
- 1031 Shi - Continual Learning of Large Language Models- A Comprehensive Survey.pdf
- 1031 Wu - Continual Learning for Large Language Models- A Survey.pdf
- 1103 Lin - Continual Learning via Sparse Memory Finetuning.pdf
- 1103 Qi - Defeating the Training-Inference Mismatch via FP16.pdf
- 1118 Shenfeld - RL's Razor- Why Online Reinforcement Learning Forgets Less.pdf
- 1201 Anthony - Training Foundation Models on a Full-Stack AMD Platform- Compute, Networking, and System Design.pdf
- 1201 Merrill - Critical Batch Size Revisited- A Simple Empirical Approach to Large-Batch Language Model Training.pdf
- 1201 Team - Kimi Linear- An Expressive, Efficient Attention Architecture.pdf
- 1201 Wu - HunyuanVideo 1.5 Technical Report.pdf
- 1203 Murphy - Reinforcement Learning- An Overview.pdf
- 1205 Lee - Feedback Descent- Open-Ended Text Optimization via Pairwise Comparison.pdf
- 1205 Zheng - Stabilizing Reinforcement Learning with LLMs- Formulation and Practices.pdf
- 1206 Cui - Homogeneous Keys, Heterogeneous Values- Exploiting Local KV Cache Asymmetry for Long-Context LLMs.pdf
- 1206 Eyring - Noise Hypernetworks- Amortizing Test-Time Compute in Diffusion Models.pdf
- 1206 McCandlish - An Empirical Model of Large-Batch Training.pdf
- 1206 Sarukkai - Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks.pdf
- 1206 Wang - Reinforcement Learning for Reasoning in Large Language Models with One Training Example.pdf
- 1206 Yan - Learning to Reason under Off-Policy Guidance.pdf
- 1206 Yang - Hyperbolic Fine-tuning for Large Language Models.pdf
- 1206 Zhao - SmallKV- Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference.pdf
- 1207 Gong - Scaling Diffusion Language Models via Adaptation from Autoregressive Models.pdf
- 1207 Hersche - Soft-Masked Diffusion Language Models.pdf
- 1207 Nie - Large Language Diffusion Models.pdf
- 1207 Wang - Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing.pdf
- 1207 Xie - Dream-Coder 7B- An Open Diffusion Language Model for Code.pdf
- 1207 Xu - Energy-Based Diffusion Language Models for Text Generation.pdf
- 1207 Ye - Dream 7B- Diffusion Large Language Models.pdf
- 1210 Zhang - On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models.pdf
- 1211 Liang - Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models.pdf
- 1212 Cai - Escaping the Verifier- Learning to Reason via Demonstrations.pdf
- 1212 Zhang - On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning.pdf
- 1219 Guo - SonicMoE- Accelerating MoE with IO and Tile-aware Optimizations.pdf
- 1222 Jiang - Meta-RL Induces Exploration in Language Agents.pdf
- 1763662397-1763646865-olmo_3_technical_report-1.pdf
- 2510.21890v1.pdf
- DeepSeekMath_V2.pdf
- DeepSeek_OCR_paper.pdf
- deepseek-v3.2.pdf
- kevin_murphy_book1.pdf
- kevin_murphy_book2.pdf
- metadata.jsonl
- the-smol-training-playbook-the-secrets-to-building-world-class-llms.pdf