Explorer

Root / papers / 2025Q4

Folder View | Tree View

.. (Parent Directory)
1001 Tian - Reinforcement Mid-Training.pdf
1003 Kang - Demystifying Synthetic Data in LLM Pre-training- A Systematic Study of Scaling Laws, Benefits, and Pitfalls.pdf
1003 Wu - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.pdf
1004 Hatamizadeh - RLP- Reinforcement as a Pretraining Objective.pdf
1007 Wang - Critique Fine-Tuning- Learning to Critique is More Effective than Learning to Imitate.pdf
1009 Cen - Webscale-RL- Automated Data Pipeline for Scaling RL Data to Pretraining Levels.pdf
1010 NVIDIA - Pretraining Large Language Models with NVFP4.pdf
1016 Davis - What is the objective of reasoning with reinforcement learning?.pdf
1016 Khatri - The Art of Scaling Reinforcement Learning Compute for LLMs.pdf
1019 Ouyang - ReasoningBank- Scaling Agent Self-Evolving with Reasoning Memory.pdf
1030 Wu - Parallel Loop Transformer for Efficient Test-Time Computation Scaling.pdf
1030 Zhu - Scaling Latent Reasoning via Looped Language Models.pdf
1031 Momeni - In-context Continual Learning Assisted by an External Continual Learner.pdf
1031 Shi - Continual Learning of Large Language Models- A Comprehensive Survey.pdf
1031 Wu - Continual Learning for Large Language Models- A Survey.pdf
1103 Lin - Continual Learning via Sparse Memory Finetuning.pdf
1103 Qi - Defeating the Training-Inference Mismatch via FP16.pdf
1118 Shenfeld - RL's Razor- Why Online Reinforcement Learning Forgets Less.pdf
1201 Anthony - Training Foundation Models on a Full-Stack AMD Platform- Compute, Networking, and System Design.pdf
1201 Merrill - Critical Batch Size Revisited- A Simple Empirical Approach to Large-Batch Language Model Training.pdf
1201 Team - Kimi Linear- An Expressive, Efficient Attention Architecture.pdf
1201 Wu - HunyuanVideo 1.5 Technical Report.pdf
1203 Murphy - Reinforcement Learning- An Overview.pdf
1205 Lee - Feedback Descent- Open-Ended Text Optimization via Pairwise Comparison.pdf
1205 Zheng - Stabilizing Reinforcement Learning with LLMs- Formulation and Practices.pdf
1206 Cui - Homogeneous Keys, Heterogeneous Values- Exploiting Local KV Cache Asymmetry for Long-Context LLMs.pdf
1206 Eyring - Noise Hypernetworks- Amortizing Test-Time Compute in Diffusion Models.pdf
1206 McCandlish - An Empirical Model of Large-Batch Training.pdf
1206 Sarukkai - Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks.pdf
1206 Wang - Reinforcement Learning for Reasoning in Large Language Models with One Training Example.pdf
1206 Yan - Learning to Reason under Off-Policy Guidance.pdf
1206 Yang - Hyperbolic Fine-tuning for Large Language Models.pdf
1206 Zhao - SmallKV- Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference.pdf
1207 Gong - Scaling Diffusion Language Models via Adaptation from Autoregressive Models.pdf
1207 Hersche - Soft-Masked Diffusion Language Models.pdf
1207 Nie - Large Language Diffusion Models.pdf
1207 Wang - Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing.pdf
1207 Xie - Dream-Coder 7B- An Open Diffusion Language Model for Code.pdf
1207 Xu - Energy-Based Diffusion Language Models for Text Generation.pdf
1207 Ye - Dream 7B- Diffusion Large Language Models.pdf
1210 Zhang - On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models.pdf
1211 Liang - Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models.pdf
1212 Cai - Escaping the Verifier- Learning to Reason via Demonstrations.pdf
1212 Zhang - On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning.pdf
1219 Guo - SonicMoE- Accelerating MoE with IO and Tile-aware Optimizations.pdf
1222 Jiang - Meta-RL Induces Exploration in Language Agents.pdf
1763662397-1763646865-olmo_3_technical_report-1.pdf
2510.21890v1.pdf
DeepSeekMath_V2.pdf
DeepSeek_OCR_paper.pdf
deepseek-v3.2.pdf
kevin_murphy_book1.pdf
kevin_murphy_book2.pdf
metadata.jsonl
the-smol-training-playbook-the-secrets-to-building-world-class-llms.pdf