# 论文下载索引

本页把工程中的论文阅读页统一整理为 `阅读页` 和 `PDF` 两类入口。arXiv PDF 链接已通过 `HEAD` 请求验证可访问。

## GLM Series

| 论文 | 年份 | 阅读页 | PDF |
| --- | --- | --- | --- |
| GLM: General Language Model Pre-training with Autoregressive Blank Infilling | 2021 | [abs](https://arxiv.org/abs/2103.10360) | [pdf](https://arxiv.org/pdf/2103.10360) |
| GLM-130B: An Open Bilingual Pre-trained Model | 2022 | [abs](https://arxiv.org/abs/2210.02414) | [pdf](https://arxiv.org/pdf/2210.02414) |
| WebGLM | 2023 | [abs](https://arxiv.org/abs/2306.07906) | [pdf](https://arxiv.org/pdf/2306.07906) |
| ChatGLM / GLM-4 All Tools | 2024 | [abs](https://arxiv.org/abs/2406.12793) | [pdf](https://arxiv.org/pdf/2406.12793) |
| AutoGLM: Autonomous Foundation Agents for GUIs | 2024 | [abs](https://arxiv.org/abs/2411.00820) | [pdf](https://arxiv.org/pdf/2411.00820) |
| GLM-4-Voice | 2024 | [abs](https://arxiv.org/abs/2412.02612) | [pdf](https://arxiv.org/pdf/2412.02612) |
| GLM-4.1V-Thinking & GLM-4.5V | 2025 | [abs](https://arxiv.org/abs/2507.01006) | [pdf](https://arxiv.org/pdf/2507.01006) |
| GLM-4.5: Agentic, Reasoning, and Coding Foundation Models | 2025 | [abs](https://arxiv.org/abs/2508.06471) | [pdf](https://arxiv.org/pdf/2508.06471) |
| GLM-TTS Technical Report | 2025 | [abs](https://arxiv.org/abs/2512.14291) | [pdf](https://arxiv.org/pdf/2512.14291) |
| GLM-5: From Vibe Coding to Agentic Engineering | 2026 | [abs](https://arxiv.org/abs/2602.15763) | [pdf](https://arxiv.org/pdf/2602.15763) |
| GLM-OCR Technical Report | 2026 | [abs](https://arxiv.org/abs/2603.10910) | [pdf](https://arxiv.org/pdf/2603.10910) |
| GLM-5V-Turbo | 2026 | [abs](https://arxiv.org/abs/2604.26752) | [pdf](https://arxiv.org/pdf/2604.26752) |

## Kimi Series

| 论文 | 年份 | 阅读页 | PDF |
| --- | --- | --- | --- |
| Mooncake: A KVCache-Centric Disaggregated Architecture for LLM Serving | 2024 | [abs](https://arxiv.org/abs/2407.00079) | [pdf](https://arxiv.org/pdf/2407.00079) |
| Kimi k1.5: Scaling Reinforcement Learning with LLMs | 2025 | [abs](https://arxiv.org/abs/2501.12599) | [pdf](https://arxiv.org/pdf/2501.12599) |
| Muon is Scalable for LLM Training | 2025 | [abs](https://arxiv.org/abs/2502.16982) | [pdf](https://arxiv.org/pdf/2502.16982) |
| Kimi-VL Technical Report | 2025 | [abs](https://arxiv.org/abs/2504.07491) | [pdf](https://arxiv.org/pdf/2504.07491) |
| Kimi-Audio Technical Report | 2025 | [abs](https://arxiv.org/abs/2504.18425) | [pdf](https://arxiv.org/pdf/2504.18425) |
| Kimi K2: Open Agentic Intelligence | 2025/2026 | [abs](https://arxiv.org/abs/2507.20534) | [pdf](https://arxiv.org/pdf/2507.20534) |
| Kimi Linear: An Expressive, Efficient Attention Architecture | 2025 | [abs](https://arxiv.org/abs/2510.26692) | [pdf](https://arxiv.org/pdf/2510.26692) |
| Kimi K2.5: Visual Agentic Intelligence | 2026 | [abs](https://arxiv.org/abs/2602.02276) | [pdf](https://arxiv.org/pdf/2602.02276) |

## DeepSeek Series

| 论文 | 年份 | 阅读页 | PDF |
| --- | --- | --- | --- |
| DeepSeek LLM: Scaling Open-Source Language Models with Longtermism | 2024 | [abs](https://arxiv.org/abs/2401.02954) | [pdf](https://arxiv.org/pdf/2401.02954) |
| DeepSeekMoE: Towards Ultimate Expert Specialisation in Mixture-of-Experts Language Models | 2024 | [abs](https://arxiv.org/abs/2401.06066) | [pdf](https://arxiv.org/pdf/2401.06066) |
| DeepSeek-Coder: When the Large Language Model Meets Programming | 2024 | [abs](https://arxiv.org/abs/2401.14196) | [pdf](https://arxiv.org/pdf/2401.14196) |
| DeepSeek-Math: Pushing the Limits of Mathematical Reasoning | 2024 | [abs](https://arxiv.org/abs/2402.03300) | [pdf](https://arxiv.org/pdf/2402.03300) |
| DeepSeek-VL: Towards Real-World Vision-Language Understanding | 2024 | [abs](https://arxiv.org/abs/2403.05525) | [pdf](https://arxiv.org/pdf/2403.05525) |
| DeepSeek-V2: A Strong, Economical and Efficient Mixture-of-Experts Language Model | 2024 | [abs](https://arxiv.org/abs/2405.04434) | [pdf](https://arxiv.org/pdf/2405.04434) |
| DeepSeek-Prover | 2024 | [abs](https://arxiv.org/abs/2405.14333) | [pdf](https://arxiv.org/pdf/2405.14333) |
| DeepSeek-Coder-V2 | 2024 | [abs](https://arxiv.org/abs/2406.11931) | [pdf](https://arxiv.org/pdf/2406.11931) |
| Let the Expert Stick to His Last: Expert-Specialised Fine-Tuning for Sparse Models | 2024 | [abs](https://arxiv.org/abs/2407.01906) | [pdf](https://arxiv.org/pdf/2407.01906) |
| Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts | 2024 | [abs](https://arxiv.org/abs/2408.15664) | [pdf](https://arxiv.org/pdf/2408.15664) |
| DeepSeek-Prover V1.5 | 2024 | [abs](https://arxiv.org/abs/2408.08152) | [pdf](https://arxiv.org/pdf/2408.08152) |
| Janus | 2024 | [abs](https://arxiv.org/abs/2410.13848) | [pdf](https://arxiv.org/pdf/2410.13848) |
| JanusFlow | 2024 | [abs](https://arxiv.org/abs/2411.07975) | [pdf](https://arxiv.org/pdf/2411.07975) |
| DeepSeek-VL2 | 2024 | [abs](https://arxiv.org/abs/2412.10302) | [pdf](https://arxiv.org/pdf/2412.10302) |
| DeepSeek-V3 Technical Report | 2024 | [abs](https://arxiv.org/abs/2412.19437) | [pdf](https://arxiv.org/pdf/2412.19437) |
| Janus-Pro | 2025 | [abs](https://arxiv.org/abs/2501.17811) | [pdf](https://arxiv.org/pdf/2501.17811) |
| DeepSeek-R1 | 2025 | [abs](https://arxiv.org/abs/2501.12948) | [pdf](https://arxiv.org/pdf/2501.12948) |
| Native Sparse Attention | 2025 | [abs](https://arxiv.org/abs/2502.11089) | [pdf](https://arxiv.org/pdf/2502.11089) |
| Inference-Time Scaling for Generalist Reward Modelling | 2025 | [abs](https://arxiv.org/abs/2504.02495) | [pdf](https://arxiv.org/pdf/2504.02495) |
| DeepSeek-Prover V2 | 2025 | [abs](https://arxiv.org/abs/2504.21801) | [pdf](https://arxiv.org/pdf/2504.21801) |
| DeepSeek-OCR | 2025 | [abs](https://arxiv.org/abs/2510.18234) | [pdf](https://arxiv.org/pdf/2510.18234) |
| DeepSeek-Math-V2 | 2025 | [abs](https://arxiv.org/abs/2511.22570) | [pdf](https://arxiv.org/pdf/2511.22570) |
| DeepSeek-V3.2 | 2025 | [abs](https://arxiv.org/abs/2512.02556) | [pdf](https://arxiv.org/pdf/2512.02556) |

## MiniMax Series

| 论文/报告 | 年份 | 阅读页 | PDF/报告 |
| --- | --- | --- | --- |
| MiniMax-01: Scaling Foundation Models with Lightning Attention | 2025 | [abs](https://arxiv.org/abs/2501.08313) | [pdf](https://arxiv.org/pdf/2501.08313) |
| MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention | 2025 | [abs](https://arxiv.org/abs/2506.13585) | [pdf](https://arxiv.org/pdf/2506.13585) |
| MiniMax M2.5: Built for Real-World Productivity | 2026 | [official report](https://www.minimax.io/news/minimax-m25) | [model page](https://www.minimax.io/models/text) |