首页 > 最新文献

Information Systems最新文献

英文 中文
Measuring the decentralisation of DeFi development: An empirical analysis of contributor distribution in Lido 衡量DeFi发展的分散性:丽都贡献者分布的实证分析
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-29 DOI: 10.1016/j.is.2026.102695
Giuseppe Destefanis , Jiahua Xu , Silvia Bartolucci
Decentralised finance (DeFi) protocols often claim to implement decentralised governance via mechanisms such as decentralised autonomous organisations (DAOs), yet the structure of their development processes is rarely examined in detail. This study presents an in-depth case analysis of the development activity distribution in Lido, a prominent DeFi liquid staking protocol. We analyse 6741 human-generated GitHub actions recorded from September 2020 to February 2025. Using standard inequality metrics – Gini coefficient and Herfindahl–Hirschman Index – alongside contributors’ interaction network and core–periphery modelling, we find that development activity is highly concentrated. Overall, the weighted Gini coefficient reaches 0.82 and the most active contributor alone accounts for 24% of the total activity. Despite an even split between core and peripheral contributors, the core group accounts for 98.1% of all weighted development actions. The temporal analysis shows an increase in concentration over time, with the Gini coefficient rising from 0.686 in the bootstrap phase to 0.817 in the maturity phase. The contributors’ interaction network analysis reveals a hub-and-spoke structure with high centralisation in communication flows. While a case study of a single protocol, Lido represents a critical test of decentralisation claims given its prominence, maturity, and DAO governance structure. These findings demonstrate that open-source DeFi development can exhibit highly concentrated control patterns despite decentralised governance mechanisms, revealing a persistent gap between governance and operational decentralisation.
去中心化金融(DeFi)协议通常声称通过去中心化自治组织(dao)等机制实现去中心化治理,但其开发过程的结构很少得到详细研究。本研究对Lido的开发活动分布进行了深入的案例分析,Lido是一个著名的DeFi液体投注协议。我们分析了从2020年9月到2025年2月记录的6741个人类生成的GitHub行为。使用标准的不平等指标——基尼系数和赫芬达尔-赫希曼指数——以及贡献者的互动网络和核心-外围模型,我们发现发展活动高度集中。总体而言,加权基尼系数达到0.82,仅最活跃的贡献者就占总活跃度的24%。尽管核心贡献者和外围贡献者之间的比例相等,但核心群体占所有加权开发行动的98.1%。时间分析显示,随着时间的推移,浓度呈上升趋势,基尼系数从自举期的0.686上升到成熟期的0.817。作者的交互网络分析揭示了一种在通信流中具有高度集中化的轮辐结构。虽然Lido是单一协议的案例研究,但鉴于其突出性、成熟度和DAO治理结构,它代表了对去中心化主张的关键考验。这些发现表明,尽管分散的治理机制,开源的DeFi开发可以表现出高度集中的控制模式,揭示了治理和操作分散之间的持续差距。
{"title":"Measuring the decentralisation of DeFi development: An empirical analysis of contributor distribution in Lido","authors":"Giuseppe Destefanis ,&nbsp;Jiahua Xu ,&nbsp;Silvia Bartolucci","doi":"10.1016/j.is.2026.102695","DOIUrl":"10.1016/j.is.2026.102695","url":null,"abstract":"<div><div>Decentralised finance (DeFi) protocols often claim to implement decentralised governance via mechanisms such as decentralised autonomous organisations (DAOs), yet the structure of their development processes is rarely examined in detail. This study presents an in-depth case analysis of the development activity distribution in Lido, a prominent DeFi liquid staking protocol. We analyse 6741 human-generated GitHub actions recorded from September 2020 to February 2025. Using standard inequality metrics – Gini coefficient and Herfindahl–Hirschman Index – alongside contributors’ interaction network and core–periphery modelling, we find that development activity is highly concentrated. Overall, the weighted Gini coefficient reaches 0.82 and the most active contributor alone accounts for 24% of the total activity. Despite an even split between core and peripheral contributors, the core group accounts for 98.1% of all weighted development actions. The temporal analysis shows an increase in concentration over time, with the Gini coefficient rising from 0.686 in the bootstrap phase to 0.817 in the maturity phase. The contributors’ interaction network analysis reveals a hub-and-spoke structure with high centralisation in communication flows. While a case study of a single protocol, Lido represents a critical test of decentralisation claims given its prominence, maturity, and DAO governance structure. These findings demonstrate that open-source DeFi development can exhibit highly concentrated control patterns despite decentralised governance mechanisms, revealing a persistent gap between governance and operational decentralisation.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"139 ","pages":"Article 102695"},"PeriodicalIF":3.4,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated decision-making for dynamic task assignment at scale 大规模动态任务分配的自动决策
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-22 DOI: 10.1016/j.is.2026.102694
Riccardo Lo Bianco , Willem van Jaarsveld , Jeroen Middelhuis , Luca Begnardi , Remco Dijkman
The Dynamic Task Assignment Problem (DTAP) concerns matching resources to tasks in real time while minimizing some objectives, like resource costs or task cycle time. In this work, we consider a DTAP variant where every task is a case composed of a stochastic sequence of activities. The DTAP, in this case, involves the decision of which employee to assign to which activity to process requests as quickly as possible. In recent years, Deep Reinforcement Learning (DRL) has emerged as a promising tool for tackling this DTAP variant, but most research is limited to solving small-scale, synthetic problems, neglecting the challenges posed by real-world use cases. To bridge this gap, this work proposes a DRL-based Decision Support System (DSS) for real-world scale DTAPs. To this end, we introduce a DRL agent with two novel elements: a graph structure for observations and actions that can effectively represent any DTAP and a reward function that is provably equivalent to the objective of minimizing the average cycle time of tasks. The combination of these two novelties allows the agent to learn effective and generalizable assignment policies for real-world scale DTAPs. The proposed DSS is evaluated on five DTAP instances whose parameters are extracted from real-world logs through process mining. The experimental evaluation shows how the proposed DRL agent matches or outperforms the best baseline in all DTAP instances and generalizes on different time horizons and across instances.
动态任务分配问题(DTAP)关注的是将资源与任务实时匹配,同时最小化某些目标,如资源成本或任务周期时间。在这项工作中,我们考虑一个DTAP变体,其中每个任务都是由随机活动序列组成的情况。在这种情况下,DTAP涉及到将哪个员工分配到哪个活动以尽可能快地处理请求的决策。近年来,深度强化学习(DRL)已经成为解决这种DTAP变体的有前途的工具,但大多数研究仅限于解决小规模的综合问题,忽视了现实世界用例带来的挑战。为了弥补这一差距,本工作提出了一个基于drl的决策支持系统(DSS),用于现实世界规模的dtap。为此,我们引入了一个具有两个新元素的DRL代理:一个用于观察和动作的图结构,可以有效地表示任何DTAP,以及一个可证明等同于最小化任务平均周期时间目标的奖励函数。这两种新特性的结合使智能体能够为现实世界规模的dtap学习有效且可推广的分配策略。通过过程挖掘从真实日志中提取参数的五个DTAP实例对所提出的DSS进行了评估。实验评估显示了所提出的DRL代理如何在所有DTAP实例中匹配或优于最佳基线,并在不同的时间范围和跨实例上进行泛化。
{"title":"Automated decision-making for dynamic task assignment at scale","authors":"Riccardo Lo Bianco ,&nbsp;Willem van Jaarsveld ,&nbsp;Jeroen Middelhuis ,&nbsp;Luca Begnardi ,&nbsp;Remco Dijkman","doi":"10.1016/j.is.2026.102694","DOIUrl":"10.1016/j.is.2026.102694","url":null,"abstract":"<div><div>The Dynamic Task Assignment Problem (DTAP) concerns matching resources to tasks in real time while minimizing some objectives, like resource costs or task cycle time. In this work, we consider a DTAP variant where every task is a case composed of a stochastic sequence of activities. The DTAP, in this case, involves the decision of which employee to assign to which activity to process requests as quickly as possible. In recent years, Deep Reinforcement Learning (DRL) has emerged as a promising tool for tackling this DTAP variant, but most research is limited to solving small-scale, synthetic problems, neglecting the challenges posed by real-world use cases. To bridge this gap, this work proposes a DRL-based Decision Support System (DSS) for real-world scale DTAPs. To this end, we introduce a DRL agent with two novel elements: a graph structure for observations and actions that can effectively represent any DTAP and a reward function that is provably equivalent to the objective of minimizing the average cycle time of tasks. The combination of these two novelties allows the agent to learn effective and generalizable assignment policies for real-world scale DTAPs. The proposed DSS is evaluated on five DTAP instances whose parameters are extracted from real-world logs through process mining. The experimental evaluation shows how the proposed DRL agent matches or outperforms the best baseline in all DTAP instances and generalizes on different time horizons and across instances.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102694"},"PeriodicalIF":3.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A decade of systems for human data interaction 人类数据交互系统的十年
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-21 DOI: 10.1016/j.is.2026.102689
Eugene Wu , Yiru Chen , Haneen Mohammed , Zezhou Huang
Human–data interaction (HDI) presents fundamentally different challenges from traditional data management. HDI systems must meet latency, correctness, and consistency needs that stem from usability rather than query semantics; failing to meet these expectations breaks the user experience. Moreover, interfaces and systems are tightly coupled; neither can easily be optimized in isolation, and effective solutions demand their co-design. This dependence also presents a research opportunity: rather than adapt systems to interface demands, systems innovations and database theory can also inspire new interaction and visualization designs. We survey a decade of our lab’s work that embraces this coupling and argue that HDI systems are the foundation for reliable, interactive, AI-driven applications.
人-数据交互(HDI)提出了与传统数据管理截然不同的挑战。HDI系统必须满足延迟、正确性和一致性需求,这些需求来自于可用性而不是查询语义;不能满足这些期望会破坏用户体验。此外,接口和系统是紧密耦合的;两者都不能单独优化,有效的解决方案需要它们共同设计。这种依赖性也提供了一个研究机会:系统创新和数据库理论也可以激发新的交互和可视化设计,而不是使系统适应界面需求。我们调查了十年来我们实验室的工作,包括这种耦合,并认为HDI系统是可靠的、交互式的、人工智能驱动的应用程序的基础。
{"title":"A decade of systems for human data interaction","authors":"Eugene Wu ,&nbsp;Yiru Chen ,&nbsp;Haneen Mohammed ,&nbsp;Zezhou Huang","doi":"10.1016/j.is.2026.102689","DOIUrl":"10.1016/j.is.2026.102689","url":null,"abstract":"<div><div>Human–data interaction (HDI) presents fundamentally different challenges from traditional data management. HDI systems must meet latency, correctness, and consistency needs that stem from usability rather than query semantics; failing to meet these expectations breaks the user experience. Moreover, interfaces and systems are tightly coupled; neither can easily be optimized in isolation, and effective solutions demand their co-design. This dependence also presents a research opportunity: rather than adapt systems to interface demands, systems innovations and database theory can also inspire new interaction and visualization designs. We survey a decade of our lab’s work that embraces this coupling and argue that HDI systems are the foundation for reliable, interactive, AI-driven applications.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102689"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient data structures for fast and low-cost first-order logic rule mining 高效的数据结构,用于快速和低成本的一阶逻辑规则挖掘
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-21 DOI: 10.1016/j.is.2026.102690
Ruoyu Wang , Raymond Wong , Daniel Sun
Logic rule mining discovers association patterns in the form of logic rules from structured data. Logic rules are widely applied in information systems to assist decisions in an interpretable way. However, too many computational resources are required in state-of-the-art systems, as most of these systems optimize rule mining algorithms from the perspectives of algorithms and architecture, while data efficiency has been overlooked. Although some start-of-the-art systems implement customized data structures to improve mining speed, the space overhead of the data structures is unaffordable when processing large-scale knowledge bases. Therefore, in this article, we propose data structures to improve data efficiency and accelerate logic rule mining. Our techniques implicitly represent the Cartesian product of variable substitutions in logic rules and build compact indices for a logic entailment cache. Furthermore, we create a pool and a lookup table for the cache so that cache components will not be repeatedly created. The evaluation results show that over 95% of memory can be reduced by our techniques, and mining procedures have been accelerated by about 20x on average. Most importantly, mining on large-scale knowledge bases is practical on normal hardware where only one thread and 20GB of memory are sufficient even for large-scale knowledge bases.
逻辑规则挖掘是从结构化数据中以逻辑规则的形式发现关联模式。逻辑规则被广泛应用于信息系统中,以一种可解释的方式辅助决策。然而,目前最先进的系统大多从算法和体系结构的角度对规则挖掘算法进行优化,需要大量的计算资源,而忽略了数据效率。尽管一些最先进的系统实现了定制的数据结构来提高挖掘速度,但在处理大规模知识库时,数据结构的空间开销是无法承受的。因此,在本文中,我们提出了提高数据效率和加速逻辑规则挖掘的数据结构。我们的技术隐式地表示了逻辑规则中变量替换的笛卡尔积,并为逻辑蕴涵缓存构建了紧凑的索引。此外,我们还为缓存创建了一个池和一个查找表,这样就不会重复创建缓存组件。评估结果表明,我们的技术可以减少95%以上的内存,并且挖掘过程平均加快了20倍左右。最重要的是,在大型知识库上进行挖掘在普通硬件上是可行的,因为即使对于大型知识库,一个线程和20GB内存也足够了。
{"title":"Efficient data structures for fast and low-cost first-order logic rule mining","authors":"Ruoyu Wang ,&nbsp;Raymond Wong ,&nbsp;Daniel Sun","doi":"10.1016/j.is.2026.102690","DOIUrl":"10.1016/j.is.2026.102690","url":null,"abstract":"<div><div>Logic rule mining discovers association patterns in the form of logic rules from structured data. Logic rules are widely applied in information systems to assist decisions in an interpretable way. However, too many computational resources are required in state-of-the-art systems, as most of these systems optimize rule mining algorithms from the perspectives of algorithms and architecture, while data efficiency has been overlooked. Although some start-of-the-art systems implement customized data structures to improve mining speed, the space overhead of the data structures is unaffordable when processing large-scale knowledge bases. Therefore, in this article, we propose data structures to improve data efficiency and accelerate logic rule mining. Our techniques implicitly represent the Cartesian product of variable substitutions in logic rules and build compact indices for a logic entailment cache. Furthermore, we create a pool and a lookup table for the cache so that cache components will not be repeatedly created. The evaluation results show that over 95% of memory can be reduced by our techniques, and mining procedures have been accelerated by about 20x on average. Most importantly, mining on large-scale knowledge bases is practical on normal hardware where only one thread and 20GB of memory are sufficient even for large-scale knowledge bases.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"139 ","pages":"Article 102690"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146049170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MDU-Net: Multi-resolution learning and differential clustering fusion for multivariate electricity time series forecasting MDU-Net:多分辨率学习和多元电时间序列预测的差分聚类融合
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-19 DOI: 10.1016/j.is.2026.102693
Yongming Guan , Chengdong Zheng , Yuliang Shi , Gang Wang , Linfeng Wu , Zhiyong Chen , Hui Li
Artificial intelligence (AI) has demonstrated transformative potential in diverse fields such as healthcare, drug discovery, and natural language processing by enabling advanced pattern recognition and predictive modeling of complex data. Particularly in the power system, where it involves areas such as power load, electricity price, and renewable energy, the application of AI technology to enhance the multivariate electricity time series forecasting tasks is crucial for grid security and economic dispatch. In power systems, multivariate electricity time series forecasting tasks involving power load, electricity prices, and renewable energy are crucial for grid security and economic dispatch. Contemporary forecasting approaches primarily focus on two aspects: modeling multi-scale periodic characteristics within sequences and capturing complex collaborative dependencies among variables. However, existing techniques often fail to simultaneously disentangle multi-scale features and model the dynamically heterogeneous dependencies between variables. To overcome these limitations, this paper proposes MDU-Net, a novel forecasting framework. The framework comprises two core modules: Multi-resolution hierarchical Union learning (MRU) module and Differential Channel Clustering Fusion (DCCF) Module. The MRU module constructs multi-granularity temporal representations through downsampling and achieves effective cross-scale feature fusion by integrating channel-independent operations with seasonal-trend decomposition. The DCCF module adopts first- and second-order derivative approximations to generate soft clustering mask matrices, adaptively capturing asymmetric collaborative dependencies among different variables over time. Experimental results on multiple public datasets (ETT, Electricity) demonstrate that MDU-Net significantly outperforms state-of-the-art baselines in multivariate electricity time series prediction. it achieves 2.7% and 17.1% relative MSE reductions compared to TimeMixer and PatchTST, respectively, with 1.4% and 14.4% lower MAE. Notably, MDU-Net maintains strong generalization capabilities and computational efficiency. The framework also shows promising performance in cross-domain applications such as traffic forecasting.
人工智能(AI)通过支持高级模式识别和复杂数据的预测建模,在医疗保健、药物发现和自然语言处理等多个领域展示了变革潜力。特别是在涉及电力负荷、电价、可再生能源等领域的电力系统中,应用人工智能技术增强多元电力时间序列预测任务对电网安全和经济调度至关重要。在电力系统中,涉及电力负荷、电价和可再生能源的多元电力时间序列预测任务对电网安全和经济调度至关重要。当前的预测方法主要集中在两个方面:对序列内的多尺度周期特征进行建模和捕获变量之间复杂的协同依赖关系。然而,现有的技术往往不能同时解开多尺度特征和建模变量之间的动态异构依赖关系。为了克服这些限制,本文提出了一种新的预测框架MDU-Net。该框架包括两个核心模块:多分辨率分层联合学习(MRU)模块和差分信道聚类融合(DCCF)模块。MRU模块通过下采样构建多粒度时态表示,并将信道无关操作与季节趋势分解相结合,实现有效的跨尺度特征融合。DCCF模块采用一阶和二阶导数近似生成软聚类掩模矩阵,自适应捕获不同变量之间随时间的不对称协同依赖关系。在多个公共数据集(ETT, Electricity)上的实验结果表明,MDU-Net在多变量电力时间序列预测中显著优于最先进的基线。与TimeMixer和PatchTST相比,其相对MSE分别降低了2.7%和17.1%,MAE降低了1.4%和14.4%。值得注意的是,MDU-Net保持了强大的泛化能力和计算效率。该框架在流量预测等跨域应用中也显示出良好的性能。
{"title":"MDU-Net: Multi-resolution learning and differential clustering fusion for multivariate electricity time series forecasting","authors":"Yongming Guan ,&nbsp;Chengdong Zheng ,&nbsp;Yuliang Shi ,&nbsp;Gang Wang ,&nbsp;Linfeng Wu ,&nbsp;Zhiyong Chen ,&nbsp;Hui Li","doi":"10.1016/j.is.2026.102693","DOIUrl":"10.1016/j.is.2026.102693","url":null,"abstract":"<div><div>Artificial intelligence (AI) has demonstrated transformative potential in diverse fields such as healthcare, drug discovery, and natural language processing by enabling advanced pattern recognition and predictive modeling of complex data. Particularly in the power system, where it involves areas such as power load, electricity price, and renewable energy, the application of AI technology to enhance the multivariate electricity time series forecasting tasks is crucial for grid security and economic dispatch. In power systems, multivariate electricity time series forecasting tasks involving power load, electricity prices, and renewable energy are crucial for grid security and economic dispatch. Contemporary forecasting approaches primarily focus on two aspects: modeling multi-scale periodic characteristics within sequences and capturing complex collaborative dependencies among variables. However, existing techniques often fail to simultaneously disentangle multi-scale features and model the dynamically heterogeneous dependencies between variables. To overcome these limitations, this paper proposes MDU-Net, a novel forecasting framework. The framework comprises two core modules: Multi-resolution hierarchical Union learning (MRU) module and Differential Channel Clustering Fusion (DCCF) Module. The MRU module constructs multi-granularity temporal representations through downsampling and achieves effective cross-scale feature fusion by integrating channel-independent operations with seasonal-trend decomposition. The DCCF module adopts first- and second-order derivative approximations to generate soft clustering mask matrices, adaptively capturing asymmetric collaborative dependencies among different variables over time. Experimental results on multiple public datasets (ETT, Electricity) demonstrate that MDU-Net significantly outperforms state-of-the-art baselines in multivariate electricity time series prediction. it achieves 2.7% and 17.1% relative MSE reductions compared to TimeMixer and PatchTST, respectively, with 1.4% and 14.4% lower MAE. Notably, MDU-Net maintains strong generalization capabilities and computational efficiency. The framework also shows promising performance in cross-domain applications such as traffic forecasting.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102693"},"PeriodicalIF":3.4,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generalized CALM Theorem for Non-Deterministic Computation in Asynchronous Distributed Systems 异步分布式系统非确定性计算的一个广义CALM定理
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-16 DOI: 10.1016/j.is.2026.102691
Tim Baccaert, Bas Ketsman
In most asynchronous distributed systems, consistency is achieved by use of coordination protocols such as Paxos, Raft, and 2PC. In many settings such protocols are too slow, too difficult to implement, or practically infeasible. The CALM theorem, initially conjectured by Hellerstein, is one of the first results characterizing precisely which problems do not require such a coordination protocol. It states that a problem has a consistent, coordination-free distributed implementation if, and only if, the problem is monotone. This was proven for deterministic problems (i.e., queries) and extends slightly beyond monotone queries for systems in which nodes can consult the data partitioning strategy.
In this work, we generalize the CALM Theorem to work for non-deterministic problems such as leader election. Furthermore, we make the theorem applicable to a wider range of distributed systems. The prior variants of the theorem have only-if directions requiring that systems may only access their identifier in the network, the identifiers of other nodes, and the data partitioning strategy. Our generalization allows us to model systems with arbitrary shared information between the nodes (e.g., network topology, leader nodes, …). It additionally allows us to create a coordination spectrum that classifies how much coordination a problem requires based on how much shared information is needed to compute it. Lastly, we apply this generalized theorem to show that the classes of polynomial time problems and coordination-free problems are not equal.
在大多数异步分布式系统中,一致性是通过使用协调协议(如Paxos、Raft和2PC)来实现的。在许多情况下,这样的协议太慢,太难实现,或者实际上不可行。最初由Hellerstein推测的CALM定理,是精确描述哪些问题不需要这种协调协议的第一批结果之一。它指出,当且仅当问题是单调的时,问题具有一致的、不需要协调的分布式实现。这在确定性问题(即查询)中得到了证明,并且稍微超出了节点可以参考数据分区策略的系统的单调查询。在这项工作中,我们推广了CALM定理,使其适用于领导人选举等非确定性问题。此外,我们使该定理适用于更广泛的分布式系统。该定理的先前变体具有only-if方向,要求系统只能访问其在网络中的标识符、其他节点的标识符和数据分区策略。我们的泛化使我们能够对节点之间任意共享信息的系统建模(例如,网络拓扑、领导节点等)。它还允许我们创建一个协调谱,根据计算问题需要多少共享信息来对问题需要多少协调进行分类。最后,我们应用这一广义定理证明了多项式时间问题和无坐标问题是不相等的。
{"title":"A Generalized CALM Theorem for Non-Deterministic Computation in Asynchronous Distributed Systems","authors":"Tim Baccaert,&nbsp;Bas Ketsman","doi":"10.1016/j.is.2026.102691","DOIUrl":"10.1016/j.is.2026.102691","url":null,"abstract":"<div><div>In most asynchronous distributed systems, consistency is achieved by use of coordination protocols such as Paxos, Raft, and 2PC. In many settings such protocols are too slow, too difficult to implement, or practically infeasible. The CALM theorem, initially conjectured by Hellerstein, is one of the first results characterizing precisely which problems do not require such a coordination protocol. It states that a problem has a consistent, coordination-free distributed implementation if, and only if, the problem is monotone. This was proven for deterministic problems (i.e., queries) and extends slightly beyond monotone queries for systems in which nodes can consult the data partitioning strategy.</div><div>In this work, we generalize the CALM Theorem to work for non-deterministic problems such as leader election. Furthermore, we make the theorem applicable to a wider range of distributed systems. The prior variants of the theorem have only-if directions requiring that systems may only access their identifier in the network, the identifiers of other nodes, and the data partitioning strategy. Our generalization allows us to model systems with arbitrary shared information between the nodes (e.g., network topology, leader nodes, …). It additionally allows us to create a coordination spectrum that classifies how much coordination a problem requires based on how much shared information is needed to compute it. Lastly, we apply this generalized theorem to show that the classes of polynomial time problems and coordination-free problems are not equal.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102691"},"PeriodicalIF":3.4,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynaHash: An efficient blocking structure for streaming record linkage DynaHash:一种高效的流记录链接阻塞结构
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-16 DOI: 10.1016/j.is.2026.102692
Dimitrios Karapiperis , Christos Tjortjis , Vassilios S. Verykios
Record linkage holds a crucial position in data management and analysis by identifying and merging records from disparate data sets that pertain to the same real-world entity. As data volumes grow, the intricacies of record linkage amplify, presenting challenges, such as potential redundancies and computational complexities. This paper introduces DynaHash, a novel randomized record linkage mechanism that utilizes (a) the MinHash technique to generate compact representations of blocking keys and (b) Hamming Locality-Sensitive Hashing (LSH) to construct the blocking structure from these vectors. By employing these methods, DynaHash offers theoretical guarantees of accuracy and achieves sublinear runtime complexities, with appropriate parameter tuning. It comprises two key components: a persistent storage system for permanently storing the blocking structure to ensure complete results, and an in-memory component for generating very fast partial results by summarizing the persisted blocking structure. Additionally, DynaHash leverages Multi-Probe matching to scan multiple neighboring blocks, in terms of their Hamming distances, in order to find matches. Our theoretical work derives a decrease factor in the space requirements, which depends on the Hamming threshold, compared with the baseline LSH. Our experimental evaluation against three state-of-the-art methods on six real-world data sets demonstrates DynaHash’s exceptional recall rates and query times, which are at least 2× faster than its competitors and do not depend on the size of the underlying data sets.
记录链接在数据管理和分析中占有至关重要的地位,它通过识别和合并来自属于同一个现实世界实体的不同数据集的记录。随着数据量的增长,记录链接的复杂性也随之增加,带来了挑战,比如潜在的冗余和计算复杂性。本文介绍了DynaHash,一种新的随机记录链接机制,它利用(a) MinHash技术生成阻塞键的紧凑表示,(b) Hamming位置敏感哈希(LSH)从这些向量构建阻塞结构。通过使用这些方法,DynaHash提供了准确性的理论保证,并通过适当的参数调优实现了亚线性运行时复杂性。它包括两个关键组件:用于永久存储块结构以确保完整结果的持久存储系统,以及用于通过汇总持久块结构生成非常快的部分结果的内存组件。此外,DynaHash利用Multi-Probe匹配来扫描多个相邻块(根据它们的汉明距离),以便找到匹配项。与基线LSH相比,我们的理论工作导出了空间需求的减少因子,这取决于汉明阈值。我们在六个真实数据集上对三种最先进的方法进行的实验评估表明,DynaHash具有出色的召回率和查询时间,比其竞争对手至少快2倍,并且不依赖于底层数据集的大小。
{"title":"DynaHash: An efficient blocking structure for streaming record linkage","authors":"Dimitrios Karapiperis ,&nbsp;Christos Tjortjis ,&nbsp;Vassilios S. Verykios","doi":"10.1016/j.is.2026.102692","DOIUrl":"10.1016/j.is.2026.102692","url":null,"abstract":"<div><div>Record linkage holds a crucial position in data management and analysis by identifying and merging records from disparate data sets that pertain to the same real-world entity. As data volumes grow, the intricacies of record linkage amplify, presenting challenges, such as potential redundancies and computational complexities. This paper introduces DynaHash, a novel randomized record linkage mechanism that utilizes (a) the MinHash technique to generate compact representations of blocking keys and (b) Hamming Locality-Sensitive Hashing (LSH) to construct the blocking structure from these vectors. By employing these methods, DynaHash offers theoretical guarantees of accuracy and achieves sublinear runtime complexities, with appropriate parameter tuning. It comprises two key components: a persistent storage system for permanently storing the blocking structure to ensure complete results, and an in-memory component for generating very fast partial results by summarizing the persisted blocking structure. Additionally, DynaHash leverages Multi-Probe matching to scan multiple neighboring blocks, in terms of their Hamming distances, in order to find matches. Our theoretical work derives a decrease factor in the space requirements, which depends on the Hamming threshold, compared with the baseline LSH. Our experimental evaluation against three state-of-the-art methods on six real-world data sets demonstrates DynaHash’s exceptional recall rates and query times, which are at least <span><math><mrow><mn>2</mn><mo>×</mo></mrow></math></span> faster than its competitors and do not depend on the size of the underlying data sets.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102692"},"PeriodicalIF":3.4,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example 示例驱动的语义相似度感知查询意图发现:使用户能够通过示例查询跨越SQL障碍
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.is.2026.102687
Anna Fariha , Lucy Cousins , Narges Mahyar , Alexandra Meliou
Traditional relational data interfaces require precise structured queries over potentially complex schemas. These rigid data retrieval mechanisms pose hurdles for nonexpert users, who typically lack programming language expertise and are unfamiliar with the details of the schema. Existing tools assist in formulating queries through keyword search, query recommendation, and query auto-completion, but still require some technical expertise. An alternative method for accessing data is query by example (QBE), where users express their data exploration intent simply by providing examples of their intended data and the system infers the intended query. However, existing QBE approaches focus on the structural similarity of the examples and ignore the richer context present in the data. As a result, they typically produce queries that are too general, and fail to capture the user’s intent effectively. In this article, we present SQuID, a system that performs semantic-similarity-aware query intent discovery from user-provided example tuples.
Our work makes the following contributions: (1) We design SQuID: an end-to-end system that automatically formulates select-project-join queries with optional group-by aggregation and intersection operators – a much larger class than what prior QBE techniques support – from user-provided examples, in an open-world setting. (2) We express the problem of query intent discovery using a probabilistic abduction model that infers a query as the most likely explanation of the provided examples. (3) We introduce the notion of an abduction-ready database, which precomputes semantic properties and related statistics, allowing SQuID to achieve real-time performance. (4) We present an extensive empirical evaluation on three real-world datasets, including user intent case studies, demonstrating that SQuID is efficient and effective, and outperforms machine learning methods, as well as the state of the art in the related query reverse engineering problem. (5) We contrast SQuID with traditional SQL querying through a comparative user study, which demonstrates that users with varying expertise are significantly more effective and efficient with SQuID than SQL. We find that SQuID eliminates the barriers in studying the database schema, formalizing task semantics, and writing syntactically correct SQL queries, and, thus, substantially alleviates the need for technical expertise in data exploration.
传统的关系数据接口需要对可能复杂的模式进行精确的结构化查询。这些严格的数据检索机制给非专业用户带来了障碍,这些用户通常缺乏编程语言专业知识,并且不熟悉模式的细节。现有的工具通过关键字搜索、查询推荐和查询自动完成来帮助制定查询,但仍然需要一些技术专长。访问数据的另一种方法是按例查询(QBE),其中用户通过提供预期数据的示例来表达其数据探索意图,系统推断出预期的查询。然而,现有的QBE方法侧重于示例的结构相似性,而忽略了数据中存在的更丰富的上下文。因此,它们通常生成的查询过于笼统,无法有效地捕捉用户的意图。在本文中,我们介绍SQuID,这是一个从用户提供的示例元组中执行语义相似度感知查询意图发现的系统。我们的工作做出了以下贡献:(1)我们设计了SQuID:一个端到端系统,可以在开放世界环境中,从用户提供的示例中自动制定带有可选的group-by聚合和交集操作符的select-project-join查询——这是一个比之前的QBE技术支持的大得多的类。(2)我们使用概率溯因模型来表达查询意图发现问题,该模型将查询推断为所提供示例的最可能解释。(3)我们引入了可溯性数据库的概念,它可以预先计算语义属性和相关统计数据,从而使SQuID实现实时性能。(4)我们对三个真实世界的数据集进行了广泛的实证评估,包括用户意图案例研究,表明SQuID是高效和有效的,并且优于机器学习方法,以及相关查询逆向工程问题的最新技术。(5)我们通过用户对比研究将SQuID与传统的SQL查询进行了对比,结果表明,不同专业知识的用户使用SQuID的效率明显高于使用SQL。我们发现SQuID消除了在研究数据库模式、形式化任务语义和编写语法正确的SQL查询方面的障碍,因此,大大减轻了对数据探索技术专业知识的需求。
{"title":"Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example","authors":"Anna Fariha ,&nbsp;Lucy Cousins ,&nbsp;Narges Mahyar ,&nbsp;Alexandra Meliou","doi":"10.1016/j.is.2026.102687","DOIUrl":"10.1016/j.is.2026.102687","url":null,"abstract":"<div><div>Traditional relational data interfaces require precise structured queries over potentially complex schemas. These rigid data retrieval mechanisms pose hurdles for nonexpert users, who typically lack programming language expertise and are unfamiliar with the details of the schema. Existing tools assist in formulating queries through keyword search, query recommendation, and query auto-completion, but still require some technical expertise. An alternative method for accessing data is <em>query by example</em> (QBE), where users express their data exploration intent simply by providing examples of their intended data and the system infers the intended query. However, existing QBE approaches focus on the structural similarity of the examples and ignore the richer context present in the data. As a result, they typically produce queries that are too general, and fail to capture the user’s intent effectively. In this article, we present <span>SQuID</span>, a system that performs <em>semantic-similarity-aware</em> query intent discovery from user-provided example tuples.</div><div>Our work makes the following contributions: (1) We design <span>SQuID</span>: an end-to-end system that automatically formulates select-project-join queries with optional group-by aggregation and intersection operators – a much larger class than what prior QBE techniques support – from user-provided examples, in an open-world setting. (2) We express the problem of query intent discovery using a <em>probabilistic abduction model</em> that infers a query as the most likely explanation of the provided examples. (3) We introduce the notion of an <em>abduction-ready</em> database, which precomputes semantic properties and related statistics, allowing <span>SQuID</span> to achieve real-time performance. (4) We present an extensive empirical evaluation on three real-world datasets, including user intent case studies, demonstrating that <span>SQuID</span> is efficient and effective, and outperforms machine learning methods, as well as the state of the art in the related query reverse engineering problem. (5) We contrast <span>SQuID</span> with traditional <span>SQL</span> querying through a comparative user study, which demonstrates that users with varying expertise are significantly more effective and efficient with <span>SQuID</span> than <span>SQL</span>. We find that <span>SQuID</span> eliminates the barriers in studying the database schema, formalizing task semantics, and writing syntactically correct <span>SQL</span> queries, and, thus, substantially alleviates the need for technical expertise in data exploration.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102687"},"PeriodicalIF":3.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HLR-SQL: Human-like reasoning for Text-to-SQL with the human in the loop HLR-SQL:类似于人的文本到sql的推理,人在循环中
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-02 DOI: 10.1016/j.is.2025.102670
Timo Eckmann , Matthias Urban , Jan-Micha Bodensohn , Carsten Binnig
Recent LLM-based approaches have achieved impressive results on Text-to-SQL benchmarks such as Spider and Bird. However, these benchmarks do not accurately reflect the complexity typically encountered in real-world enterprise scenarios, where queries often span multiple tables. In this paper, we introduce HLR-SQL, a new approach designed to handle such complex enterprise SQL queries. Unlike existing methods, HLR-SQL imitates Human-Like Reasoning with LLMs by incrementally composing queries through a sequence of intermediate steps, gradually building up to the full query. This is an extended version of Eckmann et al. (2025). The new contributions are centered around incorporating human feedback directly into the reasoning process of HLR-SQL. We evaluate HLR-SQL on a newly constructed benchmark, Spider-HJ, which systematically increases query complexity by splitting tables in the original Spider dataset to raise the average join count needed by queries. Our experiments show that state-of-the-art models experience up to a 70% drop in execution accuracy on Spider-HJ, while HLR-SQL achieves a 9.51% improvement over the best existing approaches on the Spider leaderboard. Finally, we extended HLR-SQL to incorporate human feedback directly into the reasoning process by allowing the LLM to selectively ask for human help when faced with ambiguity or execution errors. We demonstrate that including the human in the loop in this way yields significantly higher accuracy, particularly for complex queries.
最近基于llm的方法在诸如Spider和Bird之类的Text-to-SQL基准测试中取得了令人印象深刻的结果。然而,这些基准测试并不能准确地反映实际企业场景中通常遇到的复杂性,在实际企业场景中,查询通常跨越多个表。在本文中,我们介绍了HLR-SQL,一种用于处理此类复杂企业SQL查询的新方法。与现有方法不同,HLR-SQL通过一系列中间步骤逐步组合查询,逐步构建完整的查询,从而模仿llm的类人推理。这是Eckmann et al.(2025)的扩展版本。新的贡献集中在将人类反馈直接集成到HLR-SQL的推理过程中。我们在新构建的基准Spider- hj上评估了HLR-SQL,该基准通过拆分原始Spider数据集中的表来提高查询所需的平均连接计数,从而系统地增加了查询复杂性。我们的实验表明,最先进的模型在Spider- hj上的执行精度下降了70%,而HLR-SQL在Spider排行榜上比现有的最佳方法提高了9.51%。最后,我们扩展了HLR-SQL,允许LLM在遇到歧义或执行错误时选择性地寻求人工帮助,从而将人工反馈直接纳入推理过程。我们证明,以这种方式将人包含在循环中会产生更高的准确性,特别是对于复杂的查询。
{"title":"HLR-SQL: Human-like reasoning for Text-to-SQL with the human in the loop","authors":"Timo Eckmann ,&nbsp;Matthias Urban ,&nbsp;Jan-Micha Bodensohn ,&nbsp;Carsten Binnig","doi":"10.1016/j.is.2025.102670","DOIUrl":"10.1016/j.is.2025.102670","url":null,"abstract":"<div><div>Recent LLM-based approaches have achieved impressive results on Text-to-SQL benchmarks such as Spider and Bird. However, these benchmarks do not accurately reflect the complexity typically encountered in real-world enterprise scenarios, where queries often span multiple tables. In this paper, we introduce HLR-SQL, a new approach designed to handle such complex enterprise SQL queries. Unlike existing methods, HLR-SQL imitates <u>H</u>uman-<u>L</u>ike <u>R</u>easoning with LLMs by incrementally composing queries through a sequence of intermediate steps, gradually building up to the full query. This is an extended version of Eckmann et al. (2025). The new contributions are centered around incorporating human feedback directly into the reasoning process of HLR-SQL. We evaluate HLR-SQL on a newly constructed benchmark, Spider-HJ, which systematically increases query complexity by splitting tables in the original Spider dataset to raise the average join count needed by queries. Our experiments show that state-of-the-art models experience up to a 70% drop in execution accuracy on Spider-HJ, while HLR-SQL achieves a 9.51% improvement over the best existing approaches on the Spider leaderboard. Finally, we extended HLR-SQL to incorporate human feedback directly into the reasoning process by allowing the LLM to selectively ask for human help when faced with ambiguity or execution errors. We demonstrate that including the human in the loop in this way yields significantly higher accuracy, particularly for complex queries.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102670"},"PeriodicalIF":3.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A visualization-driven decision support system for selecting feature attribution methods 基于可视化的特征归因方法选择决策支持系统
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-02 DOI: 10.1016/j.is.2025.102661
Priscylla Silva , Evandro Ortigossa , Dishita Turakhia , Claudio Silva , Luis Gustavo Nonato
Feature attribution techniques are crucial for interpreting machine learning models, but practitioners often face difficulties to understand how different methods compare and which one best fits their analytical goals. This difficulty arises from inconsistent results across methods, evaluation metrics that emphasize distinct and sometimes conflicting properties, and subjective preferences that influence how explanation quality is interpreted. In this paper, we introduce Explainalytics, an open-source Python library that transforms this challenging decision-making process into an evidence-based visual analytics workflow. Explainalytics calculates a range of evaluation metrics and presents the results through five coordinated views spanning global to local analysis. Linked filtering, dynamic updates, and brushing allow users to pivot fluidly between global trends and local details, supporting exploratory sense-making rather than rigid pipelines. In a within-subject laboratory study with 10 machine learning practitioners, we compared Explainalytics against a baseline. Explainalytics users experienced significantly lower cognitive workload and higher perceived usability.
特征归因技术对于解释机器学习模型至关重要,但从业者经常面临理解不同方法如何比较以及哪一种最适合他们的分析目标的困难。这种困难来自于不同方法的不一致的结果,强调不同且有时相互冲突的属性的评估度量,以及影响如何解释质量的主观偏好。在本文中,我们介绍了Explainalytics,这是一个开源Python库,可以将这一具有挑战性的决策过程转换为基于证据的可视化分析工作流。Explainalytics计算一系列评估指标,并通过五个协调的视图呈现结果,从全球到本地分析。链接过滤、动态更新和刷刷允许用户在全球趋势和本地细节之间流畅地切换,支持探索性的意义构建,而不是僵化的管道。在与10名机器学习从业者进行的主题内实验室研究中,我们将Explainalytics与基线进行了比较。Explainalytics用户的认知负荷显著降低,感知可用性显著提高。
{"title":"A visualization-driven decision support system for selecting feature attribution methods","authors":"Priscylla Silva ,&nbsp;Evandro Ortigossa ,&nbsp;Dishita Turakhia ,&nbsp;Claudio Silva ,&nbsp;Luis Gustavo Nonato","doi":"10.1016/j.is.2025.102661","DOIUrl":"10.1016/j.is.2025.102661","url":null,"abstract":"<div><div>Feature attribution techniques are crucial for interpreting machine learning models, but practitioners often face difficulties to understand how different methods compare and which one best fits their analytical goals. This difficulty arises from inconsistent results across methods, evaluation metrics that emphasize distinct and sometimes conflicting properties, and subjective preferences that influence how explanation quality is interpreted. In this paper, we introduce Explainalytics, an open-source Python library that transforms this challenging decision-making process into an evidence-based visual analytics workflow. Explainalytics calculates a range of evaluation metrics and presents the results through five coordinated views spanning global to local analysis. Linked filtering, dynamic updates, and brushing allow users to pivot fluidly between global trends and local details, supporting exploratory sense-making rather than rigid pipelines. In a within-subject laboratory study with 10 machine learning practitioners, we compared Explainalytics against a baseline. Explainalytics users experienced significantly lower cognitive workload and higher perceived usability.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102661"},"PeriodicalIF":3.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1