Explorer
- .. (Parent Directory)
- 1003 Kang - Demystifying Synthetic Data in LLM Pre-training- A Systematic Study of Scaling Laws, Benefits, and Pitfalls.txt
- 1003 Wu - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.txt
- 1004 Hatamizadeh - RLP- Reinforcement as a Pretraining Objective.txt
- 1007 Wang - Critique Fine-Tuning- Learning to Critique is More Effective than Learning to Imitate.txt
- 1009 Cen - Webscale-RL- Automated Data Pipeline for Scaling RL Data to Pretraining Levels.txt
- 1010 NVIDIA - Pretraining Large Language Models with NVFP4.txt
- 1016 Davis - What is the objective of reasoning with reinforcement learning?.txt
- 1016 Khatri - The Art of Scaling Reinforcement Learning Compute for LLMs.txt
- 1019 Ouyang - ReasoningBank- Scaling Agent Self-Evolving with Reasoning Memory.txt
- 1030 Wu - Parallel Loop Transformer for Efficient Test-Time Computation Scaling.txt
- 1030 Zhu - Scaling Latent Reasoning via Looped Language Models.txt
- 1031 Momeni - In-context Continual Learning Assisted by an External Continual Learner.txt
- 1031 Shi - Continual Learning of Large Language Models- A Comprehensive Survey.txt
- 1031 Wu - Continual Learning for Large Language Models- A Survey.txt
- 1103 Lin - Continual Learning via Sparse Memory Finetuning.txt
- 1103 Qi - Defeating the Training-Inference Mismatch via FP16.txt
- 1118 Shenfeld - RL's Razor- Why Online Reinforcement Learning Forgets Less.txt
- 1201 Anthony - Training Foundation Models on a Full-Stack AMD Platform- Compute, Networking, and System Design.txt
- 1201 Merrill - Critical Batch Size Revisited- A Simple Empirical Approach to Large-Batch Language Model Training.txt
- 1201 Wu - HunyuanVideo 1.5 Technical Report.txt
- 1205 Lee - Feedback Descent- Open-Ended Text Optimization via Pairwise Comparison.txt
- 1205 Zheng - Stabilizing Reinforcement Learning with LLMs- Formulation and Practices.txt
- 1206 Cui - Homogeneous Keys, Heterogeneous Values- Exploiting Local KV Cache Asymmetry for Long-Context LLMs.txt
- 1206 Eyring - Noise Hypernetworks- Amortizing Test-Time Compute in Diffusion Models.txt
- 1206 Wang - Reinforcement Learning for Reasoning in Large Language Models with One Training Example.txt
- 1206 Yang - Hyperbolic Fine-tuning for Large Language Models.txt
- 1206 Zhao - SmallKV- Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference.txt
- 1207 Gong - Scaling Diffusion Language Models via Adaptation from Autoregressive Models.txt
- 1207 Hersche - Soft-Masked Diffusion Language Models.txt
- 1207 Nie - Large Language Diffusion Models.txt
- 1207 Wang - Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing.txt
- 1207 Xie - Dream-Coder 7B- An Open Diffusion Language Model for Code.txt
- 1207 Xu - Energy-Based Diffusion Language Models for Text Generation.txt
- 1207 Ye - Dream 7B- Diffusion Large Language Models.txt
- 1210 Zhang - On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models.txt
- 1211 Liang - Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models.txt
- 1212 Cai - Escaping the Verifier- Learning to Reason via Demonstrations.txt
- 1219 Guo - SonicMoE- Accelerating MoE with IO and Tile-aware Optimizations.txt
- 1222 Jiang - Meta-RL Induces Exploration in Language Agents.txt