Explorer

Root / papers / text_metadata / 2025Q4

Folder View | Tree View

.. (Parent Directory)
1003 Kang - Demystifying Synthetic Data in LLM Pre-training- A Systematic Study of Scaling Laws, Benefits, and Pitfalls.txt
1003 Wu - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.txt
1004 Hatamizadeh - RLP- Reinforcement as a Pretraining Objective.txt
1007 Wang - Critique Fine-Tuning- Learning to Critique is More Effective than Learning to Imitate.txt
1009 Cen - Webscale-RL- Automated Data Pipeline for Scaling RL Data to Pretraining Levels.txt
1010 NVIDIA - Pretraining Large Language Models with NVFP4.txt
1016 Davis - What is the objective of reasoning with reinforcement learning?.txt
1016 Khatri - The Art of Scaling Reinforcement Learning Compute for LLMs.txt
1019 Ouyang - ReasoningBank- Scaling Agent Self-Evolving with Reasoning Memory.txt
1030 Wu - Parallel Loop Transformer for Efficient Test-Time Computation Scaling.txt
1030 Zhu - Scaling Latent Reasoning via Looped Language Models.txt
1031 Momeni - In-context Continual Learning Assisted by an External Continual Learner.txt
1031 Shi - Continual Learning of Large Language Models- A Comprehensive Survey.txt
1031 Wu - Continual Learning for Large Language Models- A Survey.txt
1103 Lin - Continual Learning via Sparse Memory Finetuning.txt
1103 Qi - Defeating the Training-Inference Mismatch via FP16.txt
1118 Shenfeld - RL's Razor- Why Online Reinforcement Learning Forgets Less.txt
1201 Anthony - Training Foundation Models on a Full-Stack AMD Platform- Compute, Networking, and System Design.txt
1201 Merrill - Critical Batch Size Revisited- A Simple Empirical Approach to Large-Batch Language Model Training.txt
1201 Wu - HunyuanVideo 1.5 Technical Report.txt
1205 Lee - Feedback Descent- Open-Ended Text Optimization via Pairwise Comparison.txt
1205 Zheng - Stabilizing Reinforcement Learning with LLMs- Formulation and Practices.txt
1206 Cui - Homogeneous Keys, Heterogeneous Values- Exploiting Local KV Cache Asymmetry for Long-Context LLMs.txt
1206 Eyring - Noise Hypernetworks- Amortizing Test-Time Compute in Diffusion Models.txt
1206 Wang - Reinforcement Learning for Reasoning in Large Language Models with One Training Example.txt
1206 Yang - Hyperbolic Fine-tuning for Large Language Models.txt
1206 Zhao - SmallKV- Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference.txt
1207 Gong - Scaling Diffusion Language Models via Adaptation from Autoregressive Models.txt
1207 Hersche - Soft-Masked Diffusion Language Models.txt
1207 Nie - Large Language Diffusion Models.txt
1207 Wang - Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing.txt
1207 Xie - Dream-Coder 7B- An Open Diffusion Language Model for Code.txt
1207 Xu - Energy-Based Diffusion Language Models for Text Generation.txt
1207 Ye - Dream 7B- Diffusion Large Language Models.txt
1210 Zhang - On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models.txt
1211 Liang - Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models.txt
1212 Cai - Escaping the Verifier- Learning to Reason via Demonstrations.txt
1219 Guo - SonicMoE- Accelerating MoE with IO and Tile-aware Optimizations.txt
1222 Jiang - Meta-RL Induces Exploration in Language Agents.txt