2026-04-06论文总结
主题: 对于多个Agent相互协作的Agentic AI系统中系统层面有关问题的研究,如系统延迟、系统架构设计等。
在这个主题下筛选得到了3篇论文。
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing
与主题的相关性
-
技术术语的重合度
- 描述:论文中使用的具体技术术语、模型名称、数据集、算法编号与主题词的匹配程度。论文是否直接讨论了与研究主题相关的核心技术?
- 权重:0.500
- 得分:10/10
- 理由:论文直接讨论了多Agent LLM应用、同步轮次、中央调度器、All-Gather通信模式、KV Cache冗余等术语,与主题中的'多个Agent相互协作的Agentic AI系统'和'系统架构设计'高度匹配,并聚焦于系统层面的效率问题。
-
实验设置的适配性
- 描述:论文的实验环境、数据集、评估指标是否是该主题公认的标准?实验是否验证了在该主题场景下的有效性?
- 权重:0.200
- 得分:9/10
- 理由:论文使用GenerativeAgents和AgentSociety作为评估数据集,这些是多Agent系统的典型基准;评估指标包括并发Agent数量、KV Cache存储减少和预填充加速,直接针对系统延迟和架构效率问题,验证了在主题场景下的有效性。
-
方法论的直接关联性
- 描述:论文提出的方法是否可以被直接应用于解决该主题下的核心问题?该方法是否针对该主题的已知痛点?
- 权重:0.200
- 得分:10/10
- 理由:论文提出的TokenDance系统通过集体KV Cache共享来扩展并发Agent数量,直接解决了多Agent LLM系统中由All-Gather通信模式引起的KV Cache冗余问题,这属于系统架构设计和性能优化(如延迟)的核心痛点。
-
代码与数据的可获得性
- 描述:论文是否提供了开源代码、预训练模型或公开数据集?这对于快速跟进和应用至关重要。
- 权重:0.100
- 得分:0/10
- 理由:论文未明确提及开源代码、预训练模型或公开数据集的提供情况,摘要中仅描述了系统设计和评估结果,没有说明代码可获得性。
-
总结
整体的评分为88.00。
上述内容由deepseek-chat生成。
摘要
Multi-agent LLM applications organize execution in synchronized rounds where a central scheduler gathers outputs from all agents and redistributes the combined context. This All-Gather communication pattern creates massive KV Cache redundancy, because every agent's prompt contains the same shared output blocks, yet existing reuse methods fail to exploit it efficiently. We present TokenDance, a system that scales the number of concurrent agents by exploiting the All-Gather pattern for collective KV Cache sharing. TokenDance's KV Collector performs KV Cache reuse over the full round in one collective step, so the cost of reusing a shared block is paid once regardless of agent count. Its Diff-Aware Storage encodes sibling caches as block-sparse diffs against a single master copy, achieving 11-17x compression on representative workloads. Evaluation on GenerativeAgents and AgentSociety shows that TokenDance supports up to 2.7x more concurrent agents than vLLM with prefix caching under SLO requirement, reduces per-agent KV Cache storage by up to 17.5x, and achieves up to 1.9x prefill speedup over per-request position-independent caching.
KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents
与主题的相关性
-
技术术语的重合度
- 描述:论文中使用的具体技术术语、模型名称、数据集、算法编号与主题词的匹配程度。论文是否直接讨论了与研究主题相关的核心技术?
- 权重:0.500
- 得分:9/10
- 理由:论文直接讨论了与研究主题高度相关的核心技术,如LLM agents、system-level abstraction、execution mechanics、scheduling、tool dispatch、dependency resolution、latency,这些术语与主题中的Agentic AI系统、系统延迟、系统架构设计高度匹配。
-
实验设置的适配性
- 描述:论文的实验环境、数据集、评估指标是否是该主题公认的标准?实验是否验证了在该主题场景下的有效性?
- 权重:0.200
- 得分:7/10
- 理由:论文的实验评估了系统延迟(latency penalty, structural advantage on computational queries)并与ReAct基线进行对比,直接验证了其在Agentic AI系统性能方面的有效性,但未明确涉及多Agent协作场景下的系统架构设计评估。
-
方法论的直接关联性
- 描述:论文提出的方法是否可以被直接应用于解决该主题下的核心问题?该方法是否针对该主题的已知痛点?
- 权重:0.200
- 得分:9/10
- 理由:论文提出的KAIJU系统(包括Executive Kernel和Intent-Gated Execution)直接针对Agentic AI系统中的系统层面问题,如通过并行调度和依赖管理来减少延迟(解决serial latency),并通过解耦架构提升安全性和控制,这些方法可直接应用于解决研究主题中的系统延迟和架构设计问题。
-
代码与数据的可获得性
- 描述:论文是否提供了开源代码、预训练模型或公开数据集?这对于快速跟进和应用至关重要。
- 权重:0.100
- 得分:10/10
- 理由:论文明确提供了开源代码(Code available at this https URL),这对于快速跟进和应用至关重要。
-
总结
整体的评分为87.00。
上述内容由deepseek-chat生成。
摘要
Tool-calling autonomous agents based on large language models using ReAct exhibit three limitations: serial latency, quadratic context growth, and vulnerability to prompt injection and hallucination. Recent work moves towards separating planning from execution but in each case the model remains coupled to the execution mechanics. We introduce a system-level abstraction for LLM agents which decouples the execution of agent workflows from the LLM reasoning layer. We define two first-class abstractions: (1) Intent-Gated Execution (IGX), a security paradigm that enforces intent at execution, and (2) an Executive Kernel that manages scheduling, tool dispatch, dependency resolution, failures and security. In KAIJU, the LLM plans upfront, optimistically scheduling tools in parallel with dependency-aware parameter injection. Tools are authorised via IGX based on four independent variables: scope, intent, impact, and clearance (external approval). KAIJU supports three adaptive execution modes (Reflect, nReflect, and Orchestrator), providing progressively finer-grained execution control apt for complex investigation and deep analysis or research. Empirical evaluation against a ReAct baseline shows that KAIJU has a latency penalty on simple queries due to planning overhead, convergence at moderate complexity, and a structural advantage on computational queries requiring parallel data gathering. Beyond latency, the separation enforces behavioural guarantees that ReAct cannot match through prompting alone. Code available at this https URL
InfoSeeker: A Scalable Hierarchical Parallel Agent Framework for Web Information Seeking
与主题的相关性
-
技术术语的重合度
- 描述:论文中使用的具体技术术语、模型名称、数据集、算法编号与主题词的匹配程度。论文是否直接讨论了与研究主题相关的核心技术?
- 权重:0.500
- 得分:9/10
- 理由:论文直接使用'agentic search systems'、'large language model agent systems'、'parallel Workers'、'hierarchical framework'等术语,讨论上下文饱和、错误传播、高延迟等系统层面问题,与研究主题的核心技术高度匹配。
-
实验设置的适配性
- 描述:论文的实验环境、数据集、评估指标是否是该主题公认的标准?实验是否验证了在该主题场景下的有效性?
- 权重:0.200
- 得分:7/10
- 理由:论文实验在WideSearch-en和BrowseComp-zh基准上评估效率和效果,包括速度提升和成功率指标,这些直接验证了系统延迟和架构设计在信息搜索场景中的有效性,但未明确使用主题公认的标准数据集或指标。
-
方法论的直接关联性
- 描述:论文提出的方法是否可以被直接应用于解决该主题下的核心问题?该方法是否针对该主题的已知痛点?
- 权重:0.200
- 得分:9/10
- 理由:论文提出的分层并行Agent框架(包含Host、Managers、Workers)直接针对系统延迟、上下文饱和和错误传播等痛点,通过并行Worker加速任务执行和Manager层隔离来缓解这些问题,方法可应用于多Agent协作系统的架构设计。
-
代码与数据的可获得性
- 描述:论文是否提供了开源代码、预训练模型或公开数据集?这对于快速跟进和应用至关重要。
- 权重:0.100
- 得分:10/10
- 理由:论文摘要明确提到代码已发布在提供的URL链接中,表明开源代码可获得,这有助于快速跟进和应用研究主题中的系统架构设计问题。
-
总结
整体的评分为87.00。
上述内容由deepseek-chat生成。
摘要
Recent agentic search systems have made substantial progress by emphasising deep, multi-step reasoning. However, this focus often overlooks the challenges of wide-scale information synthesis, where agents must aggregate large volumes of heterogeneous evidence across many sources. As a result, most existing large language model agent systems face severe limitations in data-intensive settings, including context saturation, cascading error propagation, and high end-to-end latency. To address these challenges, we present \framework, a hierarchical framework based on principle of near-decomposability, containing a strategic \textit{Host}, multiple \textit{Managers} and parallel \textit{Workers}. By leveraging aggregation and reflection mechanisms at the Manager layer, our framework enforces strict context isolation to prevent saturation and error propagation. Simultaneously, the parallelism in worker layer accelerates the speed of overall task execution, mitigating the significant latency. Our evaluation on two complementary benchmarks demonstrates both efficiency ($ 3-5 \times$ speed-up) and effectiveness, achieving a $8.4\%$ success rate on WideSearch-en and $52.9\%$ accuracy on BrowseComp-zh. The code is released at this https URL
页面生成的统计项
本页面使用deepseek-chat模型生成,token用量统计如下:
| 类型 | 用量 |
|---|---|
| 提示词缓存未命中tokens | 332761 |
| 提示词缓存命中tokens | 30976 |
| 补全tokens | 141987 |
| 思考链tokens | 0 |
| 总计 | 505724 |
页面生成的总用时为8m 7s
<< 昨天的论文总结
>> 明天的论文总结