全球AI:2026年5月26日最重要的事件

今天的《每日AI世界简报》汇集了来自全球关键地区的最新人工智能新闻。重点关注商业应用、监管、安全以及AI模型的发展。 欧洲 从多智能体系统和语义网到具身AI:万物代理网络的一致叙事 arXiv:2507.10644v4 Announce Type: replace Abstract: The Web of Agents (WoA) transforms the document-centric Web into an environment of autonomous agents acting on users' behalf, a vision newly tractable as large language models (LLMs) mature.

We argue that across three decades the WoA has undergone a \emph{semantic-effort migration} in chronological order: from platform-side coordination (Multi-Agent Systems, Generation~I), through data-side annotation (Semantic Web, Generation~II), to model-side interpretation (LLM-era, Generation~III).

The central Gen~II~$\rightarrow$~Gen~III transition within this trajectory, which we call the \emph{semantics-in-data $\rightarrow$ semantics-in-models} shift, is predictive: each generation's failure modes and current open problems follow from where that generation located its semantic effort.

The survey makes five contributions: (i)~a unified evolution 重要性: 有必要关注这些信息对市场、监管和AI用户的影响。 来源: arXiv AI (26.05.2026) 当AI模型中的数据投毒成为安全挑战时 - Table.Briefings 当AI模型中的数据投毒成为安全挑战时 Table.Briefings 为什么这很重要: 有必要关注此信息对市场、监管和人工智能用户的影响。 来源: Google News AI Europe (26.05.2026) 谁来评判评判者?基于指标的治理:用于持续LLM合规监控的运行时框架 arXiv:2605.24737v1 发布类型:跨领域 摘要:目前的人工智能合规方法将符合性视为二元的、审计时的裁决,而非生产系统连续可测量的属性。我们认为这种“合规虚构”结构上不适合《欧盟人工智能法案》的要求,该法案要求持续的人工监督以及对部署系统中新兴行为漂移的检测。我们引入了“基于指标的治理”(governance from metrics)这一原则,即监管合规性不是从静态评估中得出,而是作为运行时可观察性产生的连续信号。在此基础上,我们提出了govllm,这是一个开源框架,它实现了一种由治理驱动的路由架构,模型选择不是仅根据延迟或成本决定,而是根据累积的合规分数来确定。我们的方法核心是一个监管仲裁小组——LLM评估员。 为什么这很重要: 有必要关注此信息对市场、监管和人工智能用户的影响。 来源: arXiv AI (26.05.2026) Infosecurity Europe 2024 Key Findings: AI-Driven Cyber Threats, MFA Bypass, and Supply Chain Vulnerabilities Impacting Microsoft 365, Google Workspace, and Okta - Rescana Infosecurity Europe 2024 Key Findings: AI-Driven Cyber Threats, MFA Bypass, and Supply Chain Vulnerabilities Impacting Microsoft 365, Google Workspace, and Okta Rescana 为什么重要: 有必要关注此信息对市场、法规和人工智能用户的影响。 来源: Google News AI Europe (26.05.2026) 北美 Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security arXiv:2605.23989v1 Announce Type: new Abstract: Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that challenge trustworthiness.

This survey provides a focused examination of trustworthy agentic AI through two core dimensions that are critical for high-risk deployments: Safety and Robustness, and Privacy and System Security.

For each dimension, we clarify key concepts, identify where risks emerge along the agent workflow, and summarize stage-targeted mitigation strategies.

Other trustworthiness aspects (value alignment, transparency, fairness, and accountability) are discussed as relevant context rather than parallel chapters.

To support consistent comparison and deployment decisions, we consolidate evaluation 为什么重要: 有必要观察该信息对市场、监管和人工智能用户的影响。 来源: arXiv AI (26.05.2026) GlobalDentBench:用于评估具有专家校准的牙科领域LLM临床推理的多国基准 arXiv:2605.24636v1 Announce Type: new Abstract: 虽然大型语言模型(LLMs)在医学领域具有变革潜力,但它们在真实临床场景中的推理鲁棒性和安全性仍是关键的未探索领域,尤其是在牙科。在此,我们介绍了 GlobalDentBench,这是第一个多国牙科基准测试集,其分类法涵盖了横跨六大洲的 88 个国家和地区的 14 个牙科专业。该基准测试包含 8,978 道专家验证的问题,格式包括三类(选择题、简答题和案例题),并评估三个渐进的推理级别:知识回忆(L1)、常规推理(L2)和个体化推理(L3)。为确保数据质量,自动化构建框架由六位高级牙医校准,在选择题和简答题方面达到了 99.98% 的专家一致率。 重要性: 有必要关注该信息对市场、监管和人工智能用户的影响。 来源: arXiv AI (26.05.2026) MDIA:HealthBench Professional 上的多智能体诊断智能流程 arXiv:2605.24699v1 Announce Type: new Abstract: Most reported gains on agentic-LLM clinical benchmarks are often attributed to prompt engineering, yet our results suggest that larger improvements can come from architectural and engine-level design.