Explorer

Root / papers / text_metadata / 2025Q3

Folder View | Tree View

.. (Parent Directory)
0701 Narayan - $μ$nit Scaling- Simple and Scalable FP8 LLM Training.txt
0702 Tang - Pangu Pro MoE- Mixture of Grouped Experts for Efficient Sparsity.txt
0702 Vaswani - Attention Is All You Need.txt
0704 Roy - Fast and Simplex- 2-Simplicial Attention in Triton.txt
0704 Zhu - Establishing Best Practices for Building Rigorous Agentic Benchmarks.txt
0705 Gladstone - Energy-Based Transformers are Scalable Learners and Thinkers.txt
0708 Gelada - Scaling Context Requires Rethinking Attention.txt
0710 Liang - Drag-and-Drop LLMs- Zero-Shot Prompt-to-Weights.txt
0710 Qiu - Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks.txt
0715 Comanici - Gemini 2.5- Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.txt
0722 Prabhudesai - Diffusion Beats Autoregressive in Data-Constrained Settings.txt
0722 Zhou - Apple Intelligence Foundation Language Models- Tech Report 2025.txt
0723 Han - Deep Researcher with Test-Time Diffusion.txt
0724 Liu - The Serial Scaling Hypothesis.txt
0725 Liu - ProRL- Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models.txt
0727 Dherin - Learning without training- The implicit dynamics of in-context learning.txt
0728 Calian - DataRater- Meta-Learned Dataset Curation.txt
0729 Qin - Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved).txt
0730 Gunjal - Rubrics as Rewards- Reinforcement Learning Beyond Verifiable Domains.txt
0801 Chuang - Meta CLIP 2- A Worldwide Scaling Recipe.txt
0801 Dong - Reinforcement Pre-Training.txt
0801 Zhou - Solving Formal Math Problems by Decomposition and Iterative Reflection.txt
0809 Samragh - Your LLM Knows the Future- Uncovering Its Multi-Token Prediction Potential.txt
0812 Wu - On the Generalization of SFT- A Reinforcement Learning Perspective with Reward Rectification.txt
0813 Agarwal - On-Policy Distillation of Language Models- Learning from Self-Generated Mistakes.txt
0813 Hu - REINFORCE++- An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models.txt
0818 Radhakrishna - Apriel-Nemotron-15B-Thinker.txt
0826 Chen - Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models.txt
0829 Ethayarajh - KTO- Model Alignment as Prospect Theoretic Optimization.txt
0831 Weller - On the Theoretical Limitations of Embedding-Based Retrieval.txt
0908 Lin - REFRAG- Rethinking RAG based Decoding.txt
0916 Team - LongCat-Flash Technical Report.txt
0924 Li - Reinforcement Learning on Pre-Training Data.txt
0928 Geng - X-Omni- Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again.txt
0928 Tang - On a few pitfalls in KL divergence gradient estimation for RL.txt
0929 Kim - In Their Own Words- Reasoning Traces Tailored for Small Models Make Them Better Reasoners.txt
0929 Xu - Single-stream Policy Optimization.txt