[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--强化学习、模仿学习、机器人

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

为了答谢各位网友的支持，从今日起免费为300名读者提供订阅主题论文服务，只需VX关注公号并回复{邮箱+论文主题}（如：123456@xx.com + chatgpt@large language model @LLM）,主题必须是同一个领域，最多三个关键词。解释权归博主所有

在这里插入图片描述

分类:

大语言模型LLM

视觉模型VLM

扩散模型

视觉语言导航VLN

强化学习 RL

模仿学习 IL

机器人

开放词汇，检测分割

== Reinforcement Learning @ RL @ RLHF ==

标题: A Reinforcement Learning-Boosted Motion Planning Framework: Comprehensive Generalization Performance in Autonomous Driving

作者: Rainer Trauth, Alexander Hobmeier, Johannes Betz

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01465v1

GitHub: https://github.com/TUM-AVS/Frenetix-RL|

中文摘要: 这项研究介绍了一种自主运动规划的新方法，在Frenet坐标系内用强化学习（RL）代理通知分析算法。这种结合直接解决了自动驾驶中适应性和安全性的挑战。运动规划算法对于导航动态和复杂的场景至关重要。然而，传统方法缺乏不可预测环境所需的灵活性，而机器学习技术，特别是强化学习（RL），提供了适应性，但存在不稳定性和缺乏可解释性。我们独特的解决方案将传统运动规划算法的可预测性和稳定性与RL的动态适应性相结合，使系统能够有效地管理复杂的情况并适应不断变化的环境条件。对我们的集成方法的评估表明，在多种情况下，冲突显著减少，风险管理得到改善，目标成功率也有所提高。这项研究中使用的代码作为开源软件公开提供，可以通过以下链接访问：https：//github.com/TUM-AVS/Frenetix-RL。

摘要: This study introduces a novel approach to autonomous motion planning, informing an analytical algorithm with a reinforcement learning (RL) agent within a Frenet coordinate system. The combination directly addresses the challenges of adaptability and safety in autonomous driving. Motion planning algorithms are essential for navigating dynamic and complex scenarios. Traditional methods, however, lack the flexibility required for unpredictable environments, whereas machine learning techniques, particularly reinforcement learning (RL), offer adaptability but suffer from instability and a lack of explainability. Our unique solution synergizes the predictability and stability of traditional motion planning algorithms with the dynamic adaptability of RL, resulting in a system that efficiently manages complex situations and adapts to changing environmental conditions. Evaluation of our integrated approach shows a significant reduction in collisions, improved risk management, and improved goal success rates across multiple scenarios. The code used in this research is publicly available as open-source software and can be accessed at the following link: https://github.com/TUM-AVS/Frenetix-RL.

标题: ${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

作者: Dingyang Chen, Qi Zhang

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2308.11842v2

GitHub: https://github.com/dchen48/E3AC|

中文摘要: 对自然界对称模式的识别和分析导致了各个科学领域的重大发现，如物理学中引力定律的制定和化学结构研究的进步。在本文中，我们专注于利用某些合作多智能体强化学习（MARL）问题中固有的欧几里德对称性，并在许多应用中普遍存在。我们首先用对称性的一般概念正式刻画马尔可夫博弈的一个子类，它承认对称最优值和策略的存在。受这些特性的激励，我们设计了嵌入对称约束的神经网络架构，作为多智能体参与者——批评家方法的归纳偏差。这种归纳偏差导致在各种合作MARL基准中的优异性能和令人印象深刻的泛化能力，例如在具有重复对称模式的看不见的场景中的零镜头学习和迁移学习。该代码可从以下网址获得：https：//github.com/dchen 48/E3AC。

摘要: Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.

标题: Position Paper: Generalized grammar rules and structure-based generalization beyond classical equivariance for lexical tasks and transduction

作者: Mircea Petrache, Shubhendu Trivedi

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01629v1

中文摘要: 组合泛化是将人类词汇学习与最先进的神经网络区分开来的主要属性之一。我们提出了一个通用框架来构建模型，该模型可以使用广义语法规则（GGRs）的概念进行组合概括，GGRs是一类用于转导任务的基于对称性的组合约束，我们将其视为物理启发的任务中等方差约束的转导类似物。除了形式化语言转换的对称性的广义概念，我们的框架足够通用，可以包含许多现有的工作作为特例。我们提出了如何实现GGR的想法，并在这个过程中与强化学习和其他研究领域建立了联系。

摘要: Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research.

标题: The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning

作者: Ke Sun, Yingnan Zhao, Enze Shi

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2110.03155v5

中文摘要: 尽管分布强化学习（RL）具有显著的经验性能，但它相对于经典RL的理论优势仍然难以捉摸。从分类分布RL~（CDRL）出发，应用返回密度函数分解技术，将分布RL的潜在优势归因于导出的分布匹配正则化。这种在分布RL上下文中未探索的正则化旨在捕获额外的回报分布信息，而不考虑其预期，这有助于在策略优化中增强奖励信号。与MaxEnt RL中显式优化策略以鼓励探索的熵正则化相比，CDRL中的结果正则化隐式优化由新奖励信号引导的策略，以与目标回报分布的不确定性保持一致，从而导致不确定性感知探索效果。最后，大量的实验证实了这种不确定性感知正则化在分布RL中相对于经典RL的经验收益的重要性。

摘要: The theoretical advantages of distributional reinforcement learning~(RL) over classical RL remain elusive despite its remarkable empirical performance. Starting from Categorical Distributional RL~(CDRL), we attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique. This unexplored regularization in the distributional RL context is aimed at capturing additional return distribution information regardless of only its expectation, contributing to an augmented reward signal in the policy optimization. Compared with the entropy regularization in MaxEnt RL that explicitly optimizes the policy to encourage the exploration, the resulting regularization in CDRL implicitly optimizes policies guided by the new reward signal to align with the uncertainty of target return distributions, leading to an uncertainty-aware exploration effect. Finally, extensive experiments substantiate the importance of this uncertainty-aware regularization in distributional RL on the empirical benefits over classical RL.

标题: Distributional Reinforcement Learning by Sinkhorn Divergence

作者: Ke Sun, Yingnan Zhao, Wulong Liu

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2202.00769v4

中文摘要: 分布强化学习~（RL）的经验成功高度依赖于分布表示和分布散度的选择。本文提出了{Sinkhorn distributional RL_{（SinkhornDRL）}，它从收益分布中学习无限制统计量，并利用Sinkhorn发散度来最小化当前和目标Bellman收益分布之间的差异。从理论上证明了SinkhornDRL的收缩性质，与Wasserstein距离和最大平均差异}(MMD)之间Sinkhorn散度的插值性质相一致。我们还建立了Sinkhorn散度和具有正则化矩匹配行为的正则化MMD之间的等价性，有助于解释SinkhornDRL的优越性。根据经验，我们表明SinkhornDRL始终比Atari游戏套件上的现有算法更好或更好。

摘要: The empirical success of distributional reinforcement learning~(RL) highly depends on the distribution representation and the choice of distribution divergence. In this paper, we propose \textit{Sinkhorn distributional RL~(SinkhornDRL)} that learns unrestricted statistics from return distributions and leverages Sinkhorn divergence to minimize the difference between current and target Bellman return distributions. Theoretically, we prove the contraction properties of SinkhornDRL, consistent with the interpolation nature of Sinkhorn divergence between Wasserstein distance and Maximum Mean Discrepancy~(MMD). We also establish the equivalence between Sinkhorn divergence and a regularized MMD with a regularized Moment Matching behavior, contributing to explaining the superiority of SinkhornDRL. Empirically, we show that SinkhornDRL is consistently better or comparable to existing algorithms on the Atari games suite.

标题: Improving Monte Carlo Evaluation with Offline Data

作者: Shuze Liu, Shangtong Zhang

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2301.13734v3

中文摘要: 大多数强化学习从业者使用在线蒙特卡罗估计器评估他们的策略，用于超参数调整或测试不同的算法设计选择，其中策略在环境中重复执行以获得平均结果。在许多情况下，这种与环境的大规模相互作用是令人望而却步的。在本文中，我们提出了新的方法，提高在线蒙特卡罗估计的数据效率，同时保持其无偏性。我们首先提出了一个定制的封闭形式的行为策略，可证明地减少了在线蒙特卡罗估计量的方差。然后，我们设计有效的算法，从以前收集的离线数据中学习这种封闭形式的行为策略。提供了理论分析来表征行为策略学习误差如何影响减少方差的量。与以前的工作相比，我们的方法在更广泛的环境中获得了更好的经验性能，对离线数据的要求更少。

摘要: Most reinforcement learning practitioners evaluate their policies with online Monte Carlo estimators for either hyperparameter tuning or testing different algorithmic design choices, where the policy is repeatedly executed in the environment to get the average outcome. Such massive interactions with the environment are prohibitive in many scenarios. In this paper, we propose novel methods that improve the data efficiency of online Monte Carlo estimators while maintaining their unbiasedness. We first propose a tailored closed-form behavior policy that provably reduces the variance of an online Monte Carlo estimator. We then design efficient algorithms to learn this closed-form behavior policy from previously collected offline data. Theoretical analysis is provided to characterize how the behavior policy learning error affects the amount of reduced variance. Compared with previous works, our method achieves better empirical performance in a broader set of environments, with fewer requirements for offline data.

== Imitation Learning ==

标题: Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data

作者: Hongkuan Zhou, Zhenshan Bing, Xiangtong Yao

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2305.19075v4

Project: https://hk-zh.github.io/spil/|

中文摘要: 对基于语言的机器人操纵的兴趣日益增长，旨在开发能够理解和执行复杂任务的机器人，目的是使机器人能够解释语言命令并相应地操纵物体。虽然以语言为条件的方法在熟悉的环境中表现出令人印象深刻的处理任务的能力，但它们在适应不熟悉的环境设置方面遇到限制。在这项研究中，我们提出了一种通用的、语言条件化的方法，该方法结合了基本技能先验和非结构化数据下的模仿学习，以增强算法在适应陌生环境方面的泛化能力。我们使用零镜头设置来评估我们的模型在模拟和真实环境中的性能。在模拟环境中，所提出的方法超过了先前报告的CALVIN基准的分数，特别是在具有挑战性的零射击多环境设置中。平均完成任务长度（表示代理可以连续完成的平均任务数）比最先进的方法HULC提高了2.5倍以上。此外，我们在真实世界环境中对我们的政策进行零镜头评估，只在模拟环境中进行训练，无需额外的特定调整。在这次评估中，我们设置了十个任务，与当前最先进的方法相比，我们的方法平均提高了30%，在模拟环境和现实世界中都表现出了很高的泛化能力。有关进一步的详细信息，包括访问我们的代码和视频，请参考https：//hk-zh.github.io/spil/

摘要: The growing interest in language-conditioned robot manipulation aims to develop robots capable of understanding and executing complex tasks, with the objective of enabling robots to interpret language commands and manipulate objects accordingly. While language-conditioned approaches demonstrate impressive capabilities for addressing tasks in familiar environments, they encounter limitations in adapting to unfamiliar environment settings. In this study, we propose a general-purpose, language-conditioned approach that combines base skill priors and imitation learning under unstructured data to enhance the algorithm’s generalization in adapting to unfamiliar environments. We assess our model’s performance in both simulated and real-world environments using a zero-shot setting. In the simulated environment, the proposed approach surpasses previously reported scores for CALVIN benchmark, especially in the challenging Zero-Shot Multi-Environment setting. The average completed task length, indicating the average number of tasks the agent can continuously complete, improves more than 2.5 times compared to the state-of-the-art method HULC. In addition, we conduct a zero-shot evaluation of our policy in a real-world setting, following training exclusively in simulated environments without additional specific adaptations. In this evaluation, we set up ten tasks and achieved an average 30% improvement in our approach compared to the current state-of-the-art approach, demonstrating a high generalization capability in both simulated environments and the real world. For further details, including access to our code and videos, please refer to https://hk-zh.github.io/spil/

标题: LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory Optimization

作者: Zhengtong Xu, Yu She

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2401.17500v1

GitHub: https://github.com/ZhengtongXu/LeTO|

中文摘要: 本文介绍了LeTO，一种通过可微轨迹优化学习受限视觉运动策略的方法。我们的方法独特地将可微分的优化层集成到神经网络中。通过将优化层公式化为轨迹优化问题，我们使模型能够以安全和可控的方式端到端地生成动作，而无需额外的模块。我们的方法允许在训练过程中引入约束信息，从而平衡满足约束、平滑轨迹和通过演示最小化误差的训练目标。这种“灰盒”方法将基于优化的安全性和可解释性与神经网络强大的表示能力结合在一起。我们在仿真和真实机器人上定量评估LeTO。在模拟中，LeTO实现了与最先进的模仿学习方法相当的成功率，但生成的轨迹不确定性更小，质量更高，更平滑。在真实世界的实验中，我们部署了LeTO来处理约束关键任务。结果表明，与最先进的模仿学习方法相比，LeTO是有效的。我们在https://github.com/ZhengtongXu/LeTO上发布我们的代码。

摘要: This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules. Our method allows for the introduction of constraints information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This “gray box” method marries the optimization-based safety and interpretability with the powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and on the real robot. In simulation, LeTO achieves a success rate comparable to state-of-the-art imitation learning methods, but the generated trajectories are of less uncertainty, higher quality, and smoother. In real-world experiments, we deployed LeTO to handle constraints-critical tasks. The results show the effectiveness of LeTO comparing with state-of-the-art imitation learning approaches. We release our code at https://github.com/ZhengtongXu/LeTO.

标题: Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

作者: Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2402.01057v1

中文摘要: 在本文中，我们关注单演示模仿学习（IL），这是一种用于真实世界应用的实用方法，在这种应用中，获得大量专家演示是昂贵的或不可行的。与具有多个演示的典型IL设置相反，单演示IL涉及只能访问一个专家轨迹的代理。我们强调了在这种情况下稀疏奖励信号的问题，并建议通过我们提出的基于转移鉴别器的IL（TDIL）方法来缓解这个问题。TDIL是一种IRL方法，旨在通过引入考虑环境动态的更密集的替代奖励函数来解决奖励稀疏性问题。这个代理奖励函数鼓励代理导航到接近专家状态的状态。在实践中，TDIL训练一个转换鉴别器来区分给定环境中的有效和无效转换，以计算代理奖励。实验表明，TDIL优于现有的IL方法，并在五个广泛采用的MuJoCo基准以及“Adroit Door”环境中的单演示IL设置中实现了专家级的性能。

摘要: In this paper, we focus on single-demonstration imitation learning (IL), a practical approach for real-world applications where obtaining numerous expert demonstrations is costly or infeasible. In contrast to typical IL settings with multiple demonstrations, single-demonstration IL involves an agent having access to only one expert trajectory. We highlight the issue of sparse reward signals in this setting and propose to mitigate this issue through our proposed Transition Discriminator-based IL (TDIL) method. TDIL is an IRL method designed to address reward sparsity by introducing a denser surrogate reward function that considers environmental dynamics. This surrogate reward function encourages the agent to navigate towards states that are proximal to expert states. In practice, TDIL trains a transition discriminator to differentiate between valid and non-valid transitions in a given environment to compute the surrogate rewards. The experiments demonstrate that TDIL outperforms existing IL approaches and achieves expert-level performance in the single-demonstration IL setting across five widely adopted MuJoCo benchmarks as well as the “Adroit Door” environment.

标题: Robust Path Planning via Learning from Demonstrations for Robotic Catheters in Deformable Environments

作者: Zhen Li, Chiara Lambranzi, Di Wu

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2402.00537v1

中文摘要: 使用转向能力有限的导管在曲折和可变形的血管中导航强调了对可靠路径规划的需求。最先进的路径规划者没有完全考虑到环境的可变形性。本工作通过从演示中学习的方法提出了一个健壮的路径规划器，称为课程生成对抗性模仿学习（C-GAIL）。该路径规划框架考虑了可操纵导管和血管壁之间的相互作用以及血管的可变形性。计算机对比实验表明，与基于GAIL的最新方法相比，所提出的网络实现了更小的定位误差和更高的成功率。体外验证实验表明，由所提出的C-GAIL路径规划器生成的路径与本研究中使用的气动人工肌肉驱动导管的实际转向能力更好地对齐。因此，与传统的中心线跟随技术相比，所提出的方法可以为用户以更高的精度将导管导向目标提供增强的支持。瞄准和跟踪误差分别为1.26$pm $0.55 mm 和 5.18$ pm$3.48 mm。所提出的路径规划框架在管理与船只变形相关的不确定性方面表现出优越的性能，从而导致较低的跟踪误差。

摘要: Navigation through tortuous and deformable vessels using catheters with limited steering capability underscores the need for reliable path planning. State-of-the-art path planners do not fully account for the deformable nature of the environment. This work proposes a robust path planner via a learning from demonstrations method, named Curriculum Generative Adversarial Imitation Learning (C-GAIL). This path planning framework takes into account the interaction between steerable catheters and vessel walls and the deformable property of vessels. In-silico comparative experiments show that the proposed network achieves smaller targeting errors, and a higher success rate, compared to a state-of-the-art approach based on GAIL. The in-vitro validation experiments demonstrate that the path generated by the proposed C-GAIL path planner aligns better with the actual steering capability of the pneumatic artificial muscle-driven catheter utilized in this study. Therefore, the proposed approach can provide enhanced support to the user in navigating the catheter towards the target with greater precision, in contrast to the conventional centerline-following technique. The targeting and tracking errors are 1.26$\pm $0.55 mman d 5.18$ \pm$3.48mm, respectively. The proposed path planning framework exhibits superior performance in managing uncertainty associated with vessel deformation, thereby resulting in lower tracking errors.

标题: Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

作者: Thanpimon Buamanee, Masato Kobayashi, Yuki Uranishi

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.17698v1

中文摘要: 机器人手臂中的自主操作是机器人学中一个复杂且不断发展的研究领域。本文提出了机器人和机器学习领域两种创新方法的交汇点。受Transformer model动作分块（ACT）模型的启发，该模型采用关节位置和图像数据来预测未来的运动，我们的工作整合了基于双边控制的模仿学习的原则，以增强机器人控制。我们的目标是协同这些技术，从而创建一个更强大和有效的控制机制。在我们的方法中，从环境中收集的数据是来自手爪和头顶摄像机的图像，以及使用双边控制的跟随机器人的关节角度、角速度和力。该模型旨在预测领导者机器人的关节角度、角速度和力的后续步骤。这种预测能力对于在跟随机器人中实现有效的双边控制至关重要，允许更细致入微和反应灵敏的机动。

摘要: Autonomous manipulation in robot arms is a complex and evolving field of study in robotics. This paper proposes work stands at the intersection of two innovative approaches in the field of robotics and machine learning. Inspired by the Action Chunking with Transformer (ACT) model, which employs joint location and image data to predict future movements, our work integrates principles of Bilateral Control-Based Imitation Learning to enhance robotic control. Our objective is to synergize these techniques, thereby creating a more robust and efficient control mechanism. In our approach, the data collected from the environment are images from the gripper and overhead cameras, along with the joint angles, angular velocities, and forces of the follower robot using bilateral control. The model is designed to predict the subsequent steps for the joint angles, angular velocities, and forces of the leader robot. This predictive capability is crucial for implementing effective bilateral control in the follower robot, allowing for more nuanced and responsive maneuvering.

标题: Interpretable Imitation Learning with Dynamic Causal Relations

作者: Tianxiang Zhao, Wenchao Yu, Suhang Wang

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2310.00489v4

中文摘要: 模仿学习通过模仿专家演示来学习代理策略，在医疗制度和自动驾驶汽车等许多应用中显示出有希望的结果。然而，解释代理学习的控制策略仍然是一项困难的任务。困难主要来自两个方面：1）模仿学习中的agent通常实现为深度神经网络，是黑盒模型，缺乏可解释性；2）代理决策背后的潜在因果机制可能会沿轨迹变化，而不是在整个时间步骤中保持静止。为了增加透明度并提供神经代理更好的可解释性，我们建议以有向无环因果图的形式公开其捕获的知识，节点是动作和状态变量，边表示预测背后的因果关系。此外，我们将这个因果发现过程设计为状态相关的，使其能够对潜在因果图中的动态进行建模。具体来说，我们从格兰杰因果关系的角度进行因果发现，并提出一个可自我解释的模仿学习框架{\method}。所提出的框架由三部分组成：动态因果发现模块、因果编码模块和预测模块，并以端到端的方式进行训练。在模型被学习之后，我们可以获得其决策背后的状态和行动变量之间的因果关系，暴露它所学习的政策。在合成数据集和真实数据集上的实验结果证明了所提出的{\method}在学习动态因果图以理解模仿学习决策的同时保持高预测精度的有效性。

摘要: Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents’ decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, {\method}. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed {\method} in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy.

== Embodied Artificial Intelligence@robotic agent@human robot interaction ==

标题: Towards Unified Interactive Visual Grounding in The Wild

作者: Jie Xu, Hanbo Zhang, Qingyi Si

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2401.16699v1

GitHub: https://github.com/jxu124/TiO|

中文摘要: 由于自然语言中不可避免的歧义，人机交互（HRI）中的交互式视觉基础具有挑战性，但也很实用。它要求机器人通过主动收集信息来消除用户输入的歧义。以前的方法通常依赖预定义的模板来询问歧义消除问题，导致现实交互场景中的性能降低。在本文中，我们提出了TiO，一个用于人机交互中交互视觉基础的端到端系统。受益于视觉对话和基础的统一公式，我们的方法可以在广泛的公共数据的联合上进行训练，并对多样化和具有挑战性的开放世界场景表现出卓越的通用性。在实验中，我们验证了TiO的猜猜是什么？！和InViG基准，以明显的优势创造了新的最先进的性能。此外，我们在精心挑选的150个具有挑战性的场景以及真实机器人平台上进行HRI实验。结果表明，我们的方法表现出优于多样化的视觉和语言输入的通用性，成功率高。代码和演示可在https：//github.com/jxu124/TiO。获得

摘要: Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the user input by active information gathering. Previous approaches often rely on predefined templates to ask disambiguation questions, resulting in performance reduction in realistic interactive scenarios. In this paper, we propose TiO, an end-to-end system for interactive visual grounding in human-robot interaction. Benefiting from a unified formulation of visual dialogue and grounding, our method can be trained on a joint of extensive public data, and show superior generality to diversified and challenging open-world scenarios. In the experiments, we validate TiO on GuessWhat?! and InViG benchmarks, setting new state-of-the-art performance by a clear margin. Moreover, we conduct HRI experiments on the carefully selected 150 challenging scenes as well as real-robot platforms. Results show that our method demonstrates superior generality to diversified visual and language inputs with a high success rate. Codes and demos are available at https://github.com/jxu124/TiO.

标题: SLYKLatent, a Learning Framework for Facial Features Estimation

作者: Samuel Adebayo, Joost C. Dessing, Seán McLoone

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01555v1

中文摘要: 在这项研究中，我们提出了SLYKLatent，这是一种通过解决数据集中由于随机不确定性、协变移位和测试域泛化而导致的外观不稳定性挑战来增强凝视估计的新方法。SLYKLatent利用自我监督学习对面部表情数据集进行初始训练，然后使用基于补丁的三分支网络和逆解释方差加权训练损失函数进行细化。我们对基准数据集的评估在Gaze360上实现了8.7%的改进，与顶级MPIIFaceGaze结果相媲美，并在ETH-XGaze的子集上领先13%，大大超过了现有方法。对RAF-DB和Affectnet的适应性测试分别显示86.4%和60.9%的准确性。消融研究证实了SLYKLatent新型组件的有效性。这种方法在人机交互中有很大的潜力。

摘要: In this research, we present SLYKLatent, a novel approach for enhancing gaze estimation by addressing appearance instability challenges in datasets due to aleatoric uncertainties, covariant shifts, and test domain generalization. SLYKLatent utilizes Self-Supervised Learning for initial training with facial expression datasets, followed by refinement with a patch-based tri-branch network and an inverse explained variance-weighted training loss function. Our evaluation on benchmark datasets achieves an 8.7% improvement on Gaze360, rivals top MPIIFaceGaze results, and leads on a subset of ETH-XGaze by 13%, surpassing existing methods by significant margins. Adaptability tests on RAF-DB and Affectnet show 86.4% and 60.9% accuracies, respectively. Ablation studies confirm the effectiveness of SLYKLatent’s novel components. This approach has strong potential in human-robot interaction.

标题: Transferring human emotions to robot motions using Neural Policy Style Transfer

作者: Raul Fernandez-Fernandez, Bartek Łukawski, Juan G. Victores

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2402.00663v1

中文摘要: 神经风格转移（NST）最初是为了使用神经网络的特征提取能力作为对图像执行风格转移的一种方式而提出的。选择预先训练的图像分类架构进行特征提取，导致新图像显示与原始图像相同的内容，但具有不同的风格。在机器人学中，风格转移可以用来将人类的运动风格转移到机器人的运动中。挑战在于缺乏可用于特征提取的机器人运动预训练分类架构。神经策略风格转移TD3（NPST3）被提出用于将人类运动风格转移到机器人运动。这个框架允许相同的机器人运动以不同的以人为中心的运动风格来执行，例如以愤怒、快乐、平静或悲伤的方式。引入双延迟深度确定性策略梯度（TD3）网络来生成控制策略。自动编码器网络负责样式转移步骤的特征提取。风格转移步骤可以离线和在线两种方式执行：离线用于自主执行人类风格的机器人运动，在线用于在运行时适应例如遥控机器人的风格。该框架使用两种不同的机器人平台进行测试：一种是为远程操作任务设计的机器人机械手，另一种是为社会互动设计的人形机器人。对所提出的方法在两个平台上进行了评估，执行了总共147份问卷，要求人类受试者识别转移到机器人运动中的预定义动作集的人类运动风格。

摘要: Neural Style Transfer (NST) was originally proposed to use feature extraction capabilities of Neural Networks as a way to perform Style Transfer with images. Pre-trained image classification architectures were selected for feature extraction, leading to new images showing the same content as the original but with a different style. In robotics, Style Transfer can be employed to transfer human motion styles to robot motions. The challenge lies in the lack of pre-trained classification architectures for robot motions that could be used for feature extraction. Neural Policy Style Transfer TD3 (NPST3) is proposed for the transfer of human motion styles to robot motions. This framework allows the same robot motion to be executed in different human-centered motion styles, such as in an angry, happy, calm, or sad fashion. The Twin Delayed Deep Deterministic Policy Gradient (TD3) network is introduced for the generation of control policies. An autoencoder network is in charge of feature extraction for the Style Transfer step. The Style Transfer step can be performed both offline and online: offline for the autonomous executions of human-style robot motions, and online for adapting at runtime the style of e.g., a teleoperated robot. The framework is tested using two different robotic platforms: a robotic manipulator designed for telemanipulation tasks, and a humanoid robot designed for social interaction. The proposed approach was evaluated for both platforms, performing a total of 147 questionnaires asking human subjects to recognize the human motion style transferred to the robot motion for a predefined set of actions.

标题: Artificial intelligence is algorithmic mimicry: why artificial "agents" are not (and won't be) proper agents

作者: Johannes Jaeger

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2307.07515v3

中文摘要: 开发人工通用智能(AGI)的前景如何？我通过系统地比较生活系统和算法系统来研究这个问题，特别关注“代理”的概念。有三个基本区别需要考虑：（1）生命系统是自生的，即自我制造，因此能够设定自己的内在目标，而算法存在于计算环境中，其目标函数都由外部代理提供。（2）生命系统体现在它们的符号和物理方面之间没有分离的意义上，而算法运行在最大限度地将软件与硬件隔离的计算架构上。（3）生命系统经历了一个大世界，其中大多数问题是不明确定义的（并不是所有的问题都是可定义的），而算法存在于一个小世界，其中所有的问题都是明确定义的。这三个差异意味着生活系统和算法系统具有非常不同的能力和局限性。特别是，在当前人工智能研究的算法框架下，开发出真正的AGI（超越单纯的模仿）是极其不可能的。因此，关于算法工具的正确开发和部署的讨论应该围绕当前狭隘人工智能的危险和机遇，而不是人工系统中出现真正代理的极不可能的前景。

摘要: What is the prospect of developing artificial general intelligence (AGI)? I investigate this question by systematically comparing living and algorithmic systems, with a special focus on the notion of “agency.” There are three fundamental differences to consider: (1) Living systems are autopoietic, that is, self-manufacturing, and therefore able to set their own intrinsic goals, while algorithms exist in a computational environment with target functions that are both provided by an external agent. (2) Living systems are embodied in the sense that there is no separation between their symbolic and physical aspects, while algorithms run on computational architectures that maximally isolate software from hardware. (3) Living systems experience a large world, in which most problems are ill-defined (and not all definable), while algorithms exist in a small world, in which all problems are well-defined. These three differences imply that living and algorithmic systems have very different capabilities and limitations. In particular, it is extremely unlikely that true AGI (beyond mere mimicry) can be developed in the current algorithmic framework of AI research. Consequently, discussions about the proper development and deployment of algorithmic tools should be shaped around the dangers and opportunities of current narrow AI, not the extremely unlikely prospect of the emergence of true agency in artificial systems.

标题: REACT: Two Datasets for Analyzing Both Human Reactions and Evaluative Feedback to Robots Over Time

作者: Kate Candon, Nicholas C. Georgiou, Helen Zhou

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2402.00190v1

中文摘要: 最近在人机交互（HRI）方面的工作表明，机器人可以利用来自用户的隐含通信信号来了解他们在交互过程中是如何被感知的。例如，这些信号可以是反映人类内部状态的凝视模式、面部表情或身体动作。为了促进这一方向的未来研究，我们贡献了REACT数据库，这是一个由两个人机交互数据集组成的集合，显示了用户在合作游戏和摄影场景中对机器人的自然反应。此外，我们分析了数据集，以表明交互历史是影响人类对机器人反应的一个重要因素。因此，我们认为未来解释HRI中隐性反馈的模型应该明确解释这段历史。REACT为未来的这种可能性打开了大门。

摘要: Recent work in Human-Robot Interaction (HRI) has shown that robots can leverage implicit communicative signals from users to understand how they are being perceived during interactions. For example, these signals can be gaze patterns, facial expressions, or body motions that reflect internal human states. To facilitate future research in this direction, we contribute the REACT database, a collection of two datasets of human-robot interactions that display users’ natural reactions to robots during a collaborative game and a photography scenario. Further, we analyze the datasets to show that interaction history is an important factor that can influence human reactions to robots. As a result, we believe that future models for interpreting implicit feedback in HRI should explicitly account for this history. REACT opens up doors to this possibility in the future.

标题: CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance

作者: Federico Rollo, Andrea Zunino, Nikolaos Tsagarakis

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2310.19413v2

中文摘要: 在今天的人机交互（HRI）场景中，存在一种普遍的趋势，即假设机器人应与最近的个体合作，或者场景仅涉及单个人类演员。然而，在现实场景中，例如车间操作，这种假设可能不成立，并且需要机器人在拥挤的环境中进行个性化的目标识别。为了满足这一要求，在这项工作中，我们提出了一种基于连续视觉适应技术的人重新识别模块，该模块确保机器人与适当的个人无缝合作，即使受到不同的视觉外观或部分或完全遮挡。我们在实验室环境和HRI场景中使用记录的视频单独测试该框架，即移动机器人的人跟踪任务。目标被要求在跟踪过程中改变其外观，并从相机视野中消失，以测试遮挡和装备变化的挑战性情况。我们将我们的框架与一种最先进的多目标跟踪（MOT）方法进行了比较，结果表明，在所有情况下（除了两种极限情况），CARPE-ID都可以在整个实验中准确地跟踪每个选定的目标。同时，s-o-t-a MOT对每个视频平均有4个跟踪误差。

摘要: In today’s Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot’s seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.

== Object Detection@ Segmentation@Open vocabulary detection@SAM ==

标题: Convolution kernel adaptation to calibrated fisheye

作者: Bruno Berenguel-Baeta, Maria Santos-Villafranca, Jesus Bermudez-Cameo

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01456v1

Project: https://proceedings.bmvc2023.org/721/|

中文摘要: 卷积核是卷积神经网络（CNN）的基本结构组件。在过去的几年里，人们对鱼眼相机的许多应用越来越感兴趣。然而，这些相机的径向对称投影模型会产生影响CNN性能的高失真，尤其是在视场非常大的情况下。在这项工作中，我们通过提出一种方法来解决这个问题，该方法利用相机的校准来相应地变形卷积核并适应失真。这样，卷积的感受野类似于透视图像中的标准卷积，允许我们在大型透视数据集中利用预训练的网络。我们展示了如何通过在小数据集中进行短暂的微调阶段，就深度估计和语义分割中的标准卷积而言，提高校准鱼眼的网络性能。

摘要: Convolution kernels are the basic structural component of convolutional neural networks (CNNs). In the last years there has been a growing interest in fisheye cameras for many applications. However, the radially symmetric projection model of these cameras produces high distortions that affect the performance of CNNs, especially when the field of view is very large. In this work, we tackle this problem by proposing a method that leverages the calibration of cameras to deform the convolution kernel accordingly and adapt to the distortion. That way, the receptive field of the convolution is similar to standard convolutions in perspective images, allowing us to take advantage of pre-trained networks in large perspective datasets. We show how, with just a brief fine-tuning stage in a small dataset, we improve the performance of the network for the calibrated fisheye with respect to standard convolutions in depth estimation and semantic segmentation.

标题: YOLO-World: Real-Time Open-Vocabulary Object Detection

作者: Tianheng Cheng, Lin Song, Yixiao Ge

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2401.17270v2

GitHub: https://github.com/AILab-CVC/YOLO-World|

中文摘要: “你只看一次”（YOLO）系列探测器已经成为高效实用的工具。然而，它们对预定义和训练过的对象类别的依赖限制了它们在开放场景中的适用性。针对这一限制，我们引入了YOLO世界，这是一种创新的方法，通过视觉语言建模和大规模数据集的预训练，增强了YOLO的开放词汇检测能力。具体来说，我们提出了一种新的可重新参数化的视觉——语言路径聚合网络（RepVL-PAN）和区域——文本对比丢失，以促进视觉和语言信息之间的交互。我们的方法擅长以零镜头的方式高效率地检测大范围的物体。在具有挑战性的LVIS数据集上，YOLO世界在V100上以52.0 FPS实现了35.4 AP，在准确性和速度方面都超过了许多最先进的方法。此外，经过微调的YOLO世界在几个下游任务上取得了显著的性能，包括对象检测和开放词汇实例分割。

摘要: The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation.

标题: Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing Trimodal Data

作者: Christian Stippel, Thomas Heitzinger, Rafael Sterzinger

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01537v1

中文摘要: 在普适机器学习中，特别是在人类行为分析（HBA）中，RGB由于其信息的可访问性和丰富性而成为主要的模态。然而，与它的好处相关的是挑战，包括对照明条件的敏感性和隐私问题。克服这些脆弱性的一种可能性是采用不同的方式。例如，热特别擅长强调人类的形式，而深度增加了关键的背景层。尽管它们有已知的好处，但只有少数整合这些模式的HBA特定数据集存在。为了解决这一不足，我们的研究引入了一种新的生成技术，用于创建三模态，即RGB、热和深度，以人为中心的数据集。该技术利用来自RGB图像的人体分割掩模，结合自动获取的热背景和深度背景。利用这两种成分，我们利用条件图像到图像转换从现有的RGB数据合成深度和热对应物。通过采用这种方法，我们可以生成三模态数据，这些数据可以用来训练数据有限、闪电条件差或隐私敏感区域的模型。

摘要: In pervasive machine learning, especially in Human Behavior Analysis (HBA), RGB has been the primary modality due to its accessibility and richness of information. However, linked with its benefits are challenges, including sensitivity to lighting conditions and privacy concerns. One possibility to overcome these vulnerabilities is to resort to different modalities. For instance, thermal is particularly adept at accentuating human forms, while depth adds crucial contextual layers. Despite their known benefits, only a few HBA-specific datasets that integrate these modalities exist. To address this shortage, our research introduces a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets. This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically. With these two ingredients, we synthesize depth and thermal counterparts from existing RGB data utilizing conditional image-to-image translation. By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.

标题: Advancing Brain Tumor Inpainting with Generative Models

作者: Ruizhi Zhu, Xinru Zhang, Haowen Pang

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01509v1

摘要: Synthesizing healthy brain scans from diseased brain scans offers a potential solution to address the limitations of general-purpose algorithms, such as tissue segmentation and brain extraction algorithms, which may not effectively handle diseased images. We consider this a 3D inpainting task and investigate the adaptation of 2D inpainting methods to meet the requirements of 3D magnetic resonance imaging(MRI) data. Our contributions encompass potential modifications tailored to MRI-specific needs, and we conducted evaluations of multiple inpainting techniques using the BraTS2023 Inpainting datasets to assess their efficacy and limitations.

标题: Dynamic Occupancy Grids for Object Detection: A Radar-Centric Approach

作者: Max Peter Ronecker, Markus Schratter, Lukas Kuschnig

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01488v1

摘要: Dynamic Occupancy Grid Mapping is a technique used to generate a local map of the environment containing both static and dynamic information. Typically, these maps are primarily generated using lidar measurements. However, with improvements in radar sensing, resulting in better accuracy and higher resolution, radar is emerging as a viable alternative to lidar as the primary sensor for mapping. In this paper, we propose a radar-centric dynamic occupancy grid mapping algorithm with adaptations to the state computation, inverse sensor model, and field-of-view computation tailored to the specifics of radar measurements. We extensively evaluate our approach using real data to demonstrate its effectiveness and establish the first benchmark for radar-based dynamic occupancy grid mapping using the publicly available Radarscenes dataset.

标题: XAI for Skin Cancer Detection with Prototypes and Non-Expert Supervision

作者: Miguel Correia, Alceu Bissoto, Carlos Santiago

PubTime: 2024-02-02

Downlink: http://arxiv.org/abs/2402.01410v1

摘要: Skin cancer detection through dermoscopy image analysis is a critical task. However, existing models used for this purpose often lack interpretability and reliability, raising the concern of physicians due to their black-box nature. In this paper, we propose a novel approach for the diagnosis of melanoma using an interpretable prototypical-part model. We introduce a guided supervision based on non-expert feedback through the incorporation of: 1) binary masks, obtained automatically using a segmentation network; and 2) user-refined prototypes. These two distinct information pathways aim to ensure that the learned prototypes correspond to relevant areas within the skin lesion, excluding confounding factors beyond its boundaries. Experimental results demonstrate that, even without expert supervision, our approach achieves superior performance and generalization compared to non-interpretable models.