The healthcare sector has recently experienced an unprecedented surge in digital data accumulation, especially in the form of electronic health records (EHRs). These records constitute a precious resource that Information Systems (IS) researchers could utilize for various clinical applications, such as morbidity prediction and risk stratification. Recently, deep learning has demonstrated state-of-the-art empirical results in terms of predictive performance on EHRs. However, the blackbox nature of deep learning models prevents both clinicians and patients from trusting the models, especially with regards to life-critical decision making. To mitigate this, attention mechanisms are normally employed to improve the transparency of deep learning models. However, these mechanisms can only highlight important inputs without sufficient clarity on how they correlate with each other and still confuse end-users. To address this drawback, we pioneer a novel model called Rational Multi-Layer Perceptrons (RMLP) that is constructed from weighted finite state automata. RMLP is able to provide better interpretability by coherently linking together relevant inputs at different timesteps into distinct sequences. RMLP can be shown to be a generalization of a multi-layer perceptron (that only works on static data) to sequential, dynamic data. With its theoretical roots in rational series, RMLP’s ability to process longitudinal time-series data and extract interpretable patterns sets it apart. Using real-world EHRs, we have substantiated the effectiveness of our RMLP model through empirical comparisons on six clinical tasks, all of which demonstrate its considerable efficacy.
最近,医疗保健领域的数字数据积累出现了前所未有的激增,尤其是以电子健康记录(EHR)的形式出现。这些记录是信息系统(IS)研究人员可用于各种临床应用(如发病率预测和风险分层)的宝贵资源。最近,深度学习在 EHR 的预测性能方面取得了最先进的实证结果。然而,深度学习模型的黑箱性质妨碍了临床医生和患者对模型的信任,尤其是在做出生命攸关的决策时。为了缓解这一问题,通常会采用关注机制来提高深度学习模型的透明度。然而,这些机制只能突出重要的输入,而不能充分明确它们之间的相互关系,仍然会让最终用户感到困惑。为了解决这一缺陷,我们开创了一种名为 "理性多层感知器"(RMLP)的新型模型,该模型由加权有限状态自动机构建而成。RMLP 能够将不同时间步的相关输入连贯地连接成不同的序列,从而提供更好的可解释性。可以证明,RMLP 是多层感知器(只适用于静态数据)对连续动态数据的一种概括。RMLP 的理论基础是有理数列,它能够处理纵向时间序列数据并提取可解释的模式,这使其与众不同。我们利用现实世界中的电子病历,通过对六项临床任务的实证比较,证实了 RMLP 模型的有效性,所有这些都证明了它的巨大功效。
{"title":"Interpretable Predictive Models for Healthcare via Rational Multi-Layer Perceptrons","authors":"Thiti Suttaket, Stanley Kok","doi":"10.1145/3671150","DOIUrl":"https://doi.org/10.1145/3671150","url":null,"abstract":"The healthcare sector has recently experienced an unprecedented surge in digital data accumulation, especially in the form of electronic health records (EHRs). These records constitute a precious resource that Information Systems (IS) researchers could utilize for various clinical applications, such as morbidity prediction and risk stratification. Recently, deep learning has demonstrated state-of-the-art empirical results in terms of predictive performance on EHRs. However, the blackbox nature of deep learning models prevents both clinicians and patients from trusting the models, especially with regards to life-critical decision making. To mitigate this, attention mechanisms are normally employed to improve the transparency of deep learning models. However, these mechanisms can only highlight important inputs without sufficient clarity on how they correlate with each other and still confuse end-users. To address this drawback, we pioneer a novel model called Rational Multi-Layer Perceptrons (RMLP) that is constructed from weighted finite state automata. RMLP is able to provide better interpretability by coherently linking together relevant inputs at different timesteps into distinct sequences. RMLP can be shown to be a generalization of a multi-layer perceptron (that only works on static data) to sequential, dynamic data. With its theoretical roots in rational series, RMLP’s ability to process longitudinal time-series data and extract interpretable patterns sets it apart. Using real-world EHRs, we have substantiated the effectiveness of our RMLP model through empirical comparisons on six clinical tasks, all of which demonstrate its considerable efficacy.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141378371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessica Qiuhua Sheng, Da Xu, Paul Jen-Hwa Hu, Liang Li, Ting-Shuo Huang
Hip fractures have profound impacts on patients’ conditions and quality of life, even when they receive therapeutic treatments. Many patients face the risk of poor prognosis, physical impairment, and even mortality, especially older patients. Accurate patient outcome estimates after an initial fracture are critical to physicians’ decision-making and patient management. Effective predictions might benefit from analyses of patients’ multimorbidity trajectories and medication usages. If adequately modeled and analyzed, they could help identify patients at higher risk of recurrent fractures or mortality. Most analytics methods overlook the onset, co-occurrence, and temporal sequence of distinct chronic diseases in the trajectory, and they also seldom consider the combined effects of different medications. To support effective predictions, we develop a novel deep learning–based method that uses a cross-attention mechanism to model patient progression by obtaining “contextual information” from multimorbidity trajectories. This method also incorporates a nested self-attention network that captures the combined effects of distinct medications by learning the interactions among medications and how dosages might influence post-fracture outcomes. A real-world patient data set is used to evaluate the proposed method, relative to six benchmark methods. The comparative results indicate that our method consistently outperforms all the benchmarks in precision, recall, F-measures, and area under the curve. The proposed method is generalizable and can be implemented as a decision support system to identify patients at greater risk of recurrent hip fractures or mortality, which should help clinical decision-making and patient management.
{"title":"Mining Multimorbidity Trajectories and Co-Medication Effects from Patient Data to Predict Post–Hip Fracture Outcomes","authors":"Jessica Qiuhua Sheng, Da Xu, Paul Jen-Hwa Hu, Liang Li, Ting-Shuo Huang","doi":"10.1145/3665250","DOIUrl":"https://doi.org/10.1145/3665250","url":null,"abstract":"Hip fractures have profound impacts on patients’ conditions and quality of life, even when they receive therapeutic treatments. Many patients face the risk of poor prognosis, physical impairment, and even mortality, especially older patients. Accurate patient outcome estimates after an initial fracture are critical to physicians’ decision-making and patient management. Effective predictions might benefit from analyses of patients’ multimorbidity trajectories and medication usages. If adequately modeled and analyzed, they could help identify patients at higher risk of recurrent fractures or mortality. Most analytics methods overlook the onset, co-occurrence, and temporal sequence of distinct chronic diseases in the trajectory, and they also seldom consider the combined effects of different medications. To support effective predictions, we develop a novel deep learning–based method that uses a cross-attention mechanism to model patient progression by obtaining “contextual information” from multimorbidity trajectories. This method also incorporates a nested self-attention network that captures the combined effects of distinct medications by learning the interactions among medications and how dosages might influence post-fracture outcomes. A real-world patient data set is used to evaluate the proposed method, relative to six benchmark methods. The comparative results indicate that our method consistently outperforms all the benchmarks in precision, recall, F-measures, and area under the curve. The proposed method is generalizable and can be implemented as a decision support system to identify patients at greater risk of recurrent hip fractures or mortality, which should help clinical decision-making and patient management.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140964914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapidly evolving field of Large Language Models (LLMs) holds immense promise for healthcare, particularly in medication guidance and adverse drug reaction prediction. Despite their potential, existing LLMs face challenges in dealing with complex polypharmacy scenarios and often grapple with data lag issues. To address these limitations, we introduce an LLM-based Chinese medication guidance system, called ShennongMGS, specifically tailored for robust medication guidance and adverse drug reaction predictions. Our system transforms multi-source heterogeneous medication information into a knowledge graph and employs a two-stage training strategy to construct a specialised LLM (ShennongGPT). This method enables the simulation of professional pharmacists’ decision-making processes and incorporates the capability for knowledge self-updating, thereby significantly enhancing drug safety and the overall quality of medical services. Rigorously evaluated by medical professionals and artificial intelligence experts, our method demonstrates superiority, outperforming existing general and specialised LLMs in performance.
{"title":"ShennongMGS: An LLM-based Chinese Medication Guidance System","authors":"Yutao Dou, Yuwei Huang, Xiongjun Zhao, Haitao Zou, Jiandong Shang, Ying Lu, Xiaolin Yang, Jian Xiao, Shaoliang Peng","doi":"10.1145/3658451","DOIUrl":"https://doi.org/10.1145/3658451","url":null,"abstract":"The rapidly evolving field of Large Language Models (LLMs) holds immense promise for healthcare, particularly in medication guidance and adverse drug reaction prediction. Despite their potential, existing LLMs face challenges in dealing with complex polypharmacy scenarios and often grapple with data lag issues. To address these limitations, we introduce an LLM-based Chinese medication guidance system, called ShennongMGS, specifically tailored for robust medication guidance and adverse drug reaction predictions. Our system transforms multi-source heterogeneous medication information into a knowledge graph and employs a two-stage training strategy to construct a specialised LLM (ShennongGPT). This method enables the simulation of professional pharmacists’ decision-making processes and incorporates the capability for knowledge self-updating, thereby significantly enhancing drug safety and the overall quality of medical services. Rigorously evaluated by medical professionals and artificial intelligence experts, our method demonstrates superiority, outperforming existing general and specialised LLMs in performance.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140691422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youxi Wu, Zhen Wang, Yan Li, Ying Guo, He Jiang, Xingquan Zhu, Xindong Wu
Recently, order-preserving pattern (OPP) mining has been proposed to discover some patterns, which can be seen as trend changes in time series. Although existing OPP mining algorithms have achieved satisfactory performance, they discover all frequent patterns. However, in some cases, users focus on a particular trend and its associated trends. To efficiently discover trend information related to a specific prefix pattern, this paper addresses the issue of co-occurrence OPP mining (COP) and proposes an algorithm named COP-Miner to discover COPs from historical time series. COP-Miner consists of three parts: extracting keypoints, preparation stage, and iteratively calculating supports and mining frequent COPs. Extracting keypoints is used to obtain local extreme points of patterns and time series. The preparation stage is designed to prepare for the first round of mining, which contains four steps: obtaining the suffix OPP of the keypoint sub-time series, calculating the occurrences of the suffix OPP, verifying the occurrences of the keypoint sub-time series, and calculating the occurrences of all fusion patterns of the keypoint sub-time series. To further improve the efficiency of support calculation, we propose a support calculation method with an ending strategy that uses the occurrences of prefix and suffix patterns to calculate the occurrences of superpatterns. Experimental results indicate that COP-Miner outperforms the other competing algorithms in running time and scalability. Moreover, COPs with keypoint alignment yield better prediction performance.
最近,有人提出了 "保序模式(OPP)挖掘 "来发现一些模式,这些模式可以看作是时间序列中的趋势变化。虽然现有的 OPP 挖掘算法性能令人满意,但它们发现的都是频繁模式。然而,在某些情况下,用户会关注某一特定趋势及其相关趋势。为了有效发现与特定前缀模式相关的趋势信息,本文针对共现 OPP 挖掘(COP)问题,提出了一种名为 COP-Miner 的算法,用于从历史时间序列中发现 COP。COP-Miner 包括三个部分:提取关键点、准备阶段和迭代计算支持度并挖掘频繁 COP。提取关键点用于获取模式和时间序列的局部极值点。准备阶段旨在为第一轮挖掘做准备,包括四个步骤:获取关键点子时间序列的后缀 OPP、计算后缀 OPP 的出现率、验证关键点子时间序列的出现率、计算关键点子时间序列所有融合模式的出现率。为了进一步提高支持计算的效率,我们提出了一种带有结束策略的支持计算方法,即利用前缀和后缀模式的出现率来计算超模式的出现率。实验结果表明,COP-Miner 在运行时间和可扩展性方面都优于其他竞争算法。此外,具有关键点对齐功能的 COP 能产生更好的预测性能。
{"title":"Co-occurrence order-preserving pattern mining with keypoint alignment for time series","authors":"Youxi Wu, Zhen Wang, Yan Li, Ying Guo, He Jiang, Xingquan Zhu, Xindong Wu","doi":"10.1145/3658450","DOIUrl":"https://doi.org/10.1145/3658450","url":null,"abstract":"Recently, order-preserving pattern (OPP) mining has been proposed to discover some patterns, which can be seen as trend changes in time series. Although existing OPP mining algorithms have achieved satisfactory performance, they discover all frequent patterns. However, in some cases, users focus on a particular trend and its associated trends. To efficiently discover trend information related to a specific prefix pattern, this paper addresses the issue of co-occurrence OPP mining (COP) and proposes an algorithm named COP-Miner to discover COPs from historical time series. COP-Miner consists of three parts: extracting keypoints, preparation stage, and iteratively calculating supports and mining frequent COPs. Extracting keypoints is used to obtain local extreme points of patterns and time series. The preparation stage is designed to prepare for the first round of mining, which contains four steps: obtaining the suffix OPP of the keypoint sub-time series, calculating the occurrences of the suffix OPP, verifying the occurrences of the keypoint sub-time series, and calculating the occurrences of all fusion patterns of the keypoint sub-time series. To further improve the efficiency of support calculation, we propose a support calculation method with an ending strategy that uses the occurrences of prefix and suffix patterns to calculate the occurrences of superpatterns. Experimental results indicate that COP-Miner outperforms the other competing algorithms in running time and scalability. Moreover, COPs with keypoint alignment yield better prediction performance.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140708331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Financial forecasting is an important task for urban development. In this paper, we propose a novel deep learning framework to predict the future financial potential of urban spaces. To be more precise, our target is to infer the number of financial institutions in the future for any arbitrary location with environmental and geographical data. We propose a novel local-regional model, the L ocal-Regional I nterpretable M ulti- A ttention model (LIMA model), that considers multiple aspects of a location - the place itself and its surroundings. Besides, our model offers three kinds of interpretability, providing a superior way for decision makers to understand how the model determines the prediction: critical rules learned from the tree-based module, surrounding locations that are high-correlated with the prediction, and critical regional features. Our module not only takes advantage of a tree-based model, which can effectively extract cross features, but also leverages convolutional neural networks to obtain more complex and inclusive features around the target location. Experimental results on real-world datasets demonstrate the superiority of our proposed LIMA model against the existing state-of-art methods. The LIMA model has been deployed as a web system for assisting one of the largest bank companies in Taiwan to select locations for building new branches in major cities since 2020.
金融预测是城市发展的一项重要任务。在本文中,我们提出了一种新颖的深度学习框架,用于预测城市空间未来的金融潜力。更准确地说,我们的目标是利用环境和地理数据推断任意地点未来金融机构的数量。我们提出了一个新颖的地方-区域模型,即地方-区域可解释多功能模型(LIMA 模型),该模型考虑了地点的多个方面--地点本身及其周边环境。此外,我们的模型还提供了三种可解释性,为决策者理解模型如何决定预测提供了更优越的方式:从基于树的模块中学习到的关键规则、与预测高度相关的周边地点以及关键的区域特征。我们的模块不仅利用了能有效提取交叉特征的树状模型,还利用卷积神经网络获得了目标位置周围更复杂、更全面的特征。在实际数据集上的实验结果表明,我们提出的 LIMA 模型优于现有的先进方法。LIMA 模型已被部署为一个网络系统,用于协助台湾最大的银行公司之一自 2020 年起在主要城市选择新分行的建设地点。
{"title":"Estimating Future Financial Development of Urban Areas for Deploying Bank Branches: A Local-Regional Interpretable Model","authors":"Pei-Xuan Li, Yu-En Chang, Ming-Chun Wei, Hsun-Ping Hsieh","doi":"10.1145/3656479","DOIUrl":"https://doi.org/10.1145/3656479","url":null,"abstract":"\u0000 Financial forecasting is an important task for urban development. In this paper, we propose a novel deep learning framework to predict the future financial potential of urban spaces. To be more precise, our target is to infer the number of financial institutions in the future for any arbitrary location with environmental and geographical data. We propose a novel local-regional model, the\u0000 L\u0000 ocal-Regional\u0000 I\u0000 nterpretable\u0000 M\u0000 ulti-\u0000 A\u0000 ttention model (LIMA model), that considers multiple aspects of a location - the place itself and its surroundings. Besides, our model offers three kinds of interpretability, providing a superior way for decision makers to understand how the model determines the prediction: critical rules learned from the tree-based module, surrounding locations that are high-correlated with the prediction, and critical regional features. Our module not only takes advantage of a tree-based model, which can effectively extract cross features, but also leverages convolutional neural networks to obtain more complex and inclusive features around the target location. Experimental results on real-world datasets demonstrate the superiority of our proposed LIMA model against the existing state-of-art methods. The LIMA model has been deployed as a web system for assisting one of the largest bank companies in Taiwan to select locations for building new branches in major cities since 2020.\u0000","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140731610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Document redaction has become increasingly important for individuals and organizations. This article investigates public-sector information redaction practices in order to determine if they adequately protect personal information from accidental disclosure due to redaction errors. Despite the importance of this in respect of data protection, 66.4% of those Public Authorities that responded did not hold formal policies or procedures at all . To assess those policies that did exist, we produced a 17-item check list of minimum best practice. Even those with policies and procedures had substantial defects to some degree (with the median performance being 29.4% on our checklist), with policies frequently recommending the use of high-risk redaction methods and overlooking essential practices. This means that these existing practices amount to widespread breaches of data protection law on the ground. To remedy this, we articulate a new set of document redaction standards, which overcome the existing inadequacies in current guidance, as well as make proposals for regulatory reform in this space.
{"title":"Exploring How UK Public Authorities Use Redaction to Protect Personal Information","authors":"Yijun Chen, Reuben Kirkham","doi":"10.1145/3651989","DOIUrl":"https://doi.org/10.1145/3651989","url":null,"abstract":"\u0000 Document redaction has become increasingly important for individuals and organizations. This article investigates public-sector information redaction practices in order to determine if they adequately protect personal information from accidental disclosure due to redaction errors. Despite the importance of this in respect of data protection, 66.4% of those Public Authorities that responded did not hold formal policies or procedures\u0000 at all\u0000 . To assess those policies that did exist, we produced a 17-item check list of minimum best practice. Even those with policies and procedures had substantial defects to some degree (with the median performance being 29.4% on our checklist), with policies frequently recommending the use of high-risk redaction methods and overlooking essential practices. This means that these existing practices amount to widespread breaches of data protection law on the ground. To remedy this, we articulate a new set of document redaction standards, which overcome the existing inadequacies in current guidance, as well as make proposals for regulatory reform in this space.\u0000","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140251151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natalia Denisenko, Youzhi Zhang, Chiara Pulice, Shohini Bhattasali, Sushil Jajodia, Philip Resnik, V. S. Subrahmanian
Intellectual property (IP) theft is a growing problem. We build on prior work to deter IP theft by generating n fake versions of a technical document so that a thief has to expend time and effort in identifying the correct document. Our new SbFAKE framework proposes for the first time, a novel combination of language processing, optimization, and the psycholinguistic concept of surprisal to generate a set of such fakes. We start by combining psycholinguistic-based surprisal scores and optimization to generate two bilevel surprisal optimization problems (an Explicit one and a simpler Implicit one) whose solutions correspond directly to the desired set of fakes. As bilevel problems are usually hard to solve, we then show that these two bilevel surprisal optimization problems can each be reduced to equivalent surprisal-based linear programs. We performed detailed parameter tuning experiments and identified the best parameters for each of these algorithms. We then tested these two variants of SbFAKE (with their best parameter settings) against the best performing prior work in the field. Our experiments show that SbFAKE is able to more effectively generate convincing fakes than past work. In addition, we show that replacing words in an original document with words having similar surprisal scores generates greater levels of deception.
知识产权(IP)盗窃是一个日益严重的问题。我们在先前工作的基础上,通过生成 n 个伪造版本的技术文档来阻止知识产权盗窃,从而使盗窃者不得不花费时间和精力来识别正确的文档。我们的新 SbFAKE 框架首次将语言处理、优化和心理语言学的 "惊奇"(surisal)概念结合起来,生成了一组这样的赝品。我们首先将基于心理语言学的意外得分与优化相结合,生成两个双层意外优化问题(一个显性问题和一个更简单的隐性问题),其解决方案直接对应于所需的假词集。由于双层问题通常很难解决,我们随后证明这两个双层惊喜优化问题可以分别简化为等价的基于惊喜的线性程序。我们进行了详细的参数调整实验,确定了每种算法的最佳参数。然后,我们将 SbFAKE 的这两个变体(使用其最佳参数设置)与该领域表现最好的先前工作进行了对比测试。实验结果表明,与以往的研究相比,SbFAKE 能够更有效地生成令人信服的赝品。此外,我们还表明,用具有相似惊奇值的词语替换原始文档中的词语,会产生更高水平的欺骗。
{"title":"A Psycholinguistics-Inspired Method to Counter IP Theft using Fake Documents","authors":"Natalia Denisenko, Youzhi Zhang, Chiara Pulice, Shohini Bhattasali, Sushil Jajodia, Philip Resnik, V. S. Subrahmanian","doi":"10.1145/3651313","DOIUrl":"https://doi.org/10.1145/3651313","url":null,"abstract":"\u0000 Intellectual property (IP) theft is a growing problem. We build on prior work to deter IP theft by generating\u0000 n\u0000 fake versions of a technical document so that a thief has to expend time and effort in identifying the correct document. Our new\u0000 SbFAKE\u0000 framework proposes for the first time, a novel combination of language processing, optimization, and the psycholinguistic concept of surprisal to generate a set of such fakes. We start by combining psycholinguistic-based surprisal scores and optimization to generate two bilevel surprisal optimization problems (an Explicit one and a simpler Implicit one) whose solutions correspond directly to the desired set of fakes. As bilevel problems are usually hard to solve, we then show that these two bilevel surprisal optimization problems can each be reduced to equivalent surprisal-based linear programs. We performed detailed parameter tuning experiments and identified the best parameters for each of these algorithms. We then tested these two variants of\u0000 SbFAKE\u0000 (with their best parameter settings) against the best performing prior work in the field. Our experiments show that\u0000 SbFAKE\u0000 is able to more effectively generate convincing fakes than past work. In addition, we show that replacing words in an original document with words having similar surprisal scores generates greater levels of deception.\u0000","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140078485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Quattrocchi, Willem-jan Van Den Heuvel, D. Tamburri
The service dominant logic is a base concept behind modern economies and software products, with service composition being a well-known practice for companies to gain a competitive edge over others by joining differentiated services together, typically assembled according to a number of features. At the other end of the spectrum, product compositions are a marketing device to sell products together in bundles that often augment the value for the customer, e.g., with suggested product interactions, sharing, etc. Unfortunately, currently each of these two streams—namely, product and service composition—are carried out and delivered individually in splendid isolation: anything is being offered as a product and as a service, disjointly. We argue that the next wave of services computing features more and more service fusion with physical counterparts as well as data around them. Therefore a need emerges to investigate the interactive engagement of both (data) products and services. This manuscript offers a real-life implementation in support of this argument, using (1) genetic algorithms (GA) to shape product-service clusters, (2) end-user feedback to make the GAs interactive with a data-driven fashion, and (3) a hybridized approach which factors into our solution an ensemble machine-learning method considering additional features. All this research was conducted in an industrial environment. With such a cross-fertilized, data-driven, and multi-disciplinary approach, practitioners from both fields may benefit from their mutual state of the art as well as learn new strategies for product, service, and data product-service placement for increased value to the customer as well as the service provider. Results show promise but also highlight plenty of avenues for further research.
{"title":"The Data Product-Service Composition Frontier: a Hybrid Learning Approach","authors":"Giovanni Quattrocchi, Willem-jan Van Den Heuvel, D. Tamburri","doi":"10.1145/3649319","DOIUrl":"https://doi.org/10.1145/3649319","url":null,"abstract":"\u0000 The service dominant logic is a base concept behind modern economies and software products, with service composition being a well-known practice for companies to gain a competitive edge over others by joining differentiated services together, typically assembled according to a number of features. At the other end of the spectrum, product compositions are a marketing device to sell products together in bundles that often augment the value for the customer, e.g., with suggested product interactions, sharing, etc. Unfortunately, currently each of these two streams—namely, product and service composition—are carried out and delivered individually in splendid isolation: anything is being offered as a product and as a service, disjointly. We argue that the next wave of services computing features more and more service fusion with physical counterparts as well as data around them. Therefore a need emerges to investigate the interactive engagement of both (data) products and services. This manuscript offers a real-life implementation in support of this argument, using (1) genetic algorithms (GA) to shape product-service clusters, (2) end-user feedback to make the GAs interactive with a data-driven fashion, and (3) a\u0000 hybridized\u0000 approach which factors into our solution an ensemble machine-learning method considering additional features. All this research was conducted in an industrial environment. With such a cross-fertilized, data-driven, and multi-disciplinary approach, practitioners from both fields may benefit from their mutual state of the art as well as learn new strategies for product, service, and data product-service placement for increased value to the customer as well as the service provider. Results show promise but also highlight plenty of avenues for further research.\u0000","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140423912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joaquin Delgado Fernandez, Tom Josua Barbereau, Orestis Papageorgiou
With advancements in distributed ledger technologies and smart contracts, tokenized voting rights gained prominence within Decentralized Finance (DeFi). Voting rights tokens (aka. governance tokens) are fungible tokens that grant individual holders the right to vote upon the fate of a project. The motivation behind these tokens is to achieve decentral control within a decentralized autonomous organization (DAO). Because the initial allocations of these tokens is often un-democratic, the DeFi project and DAO of Yearn Finance experimented with a fair launch allocation where no tokens are pre-mined and all participants have an equal opportunity to receive them. Regardless, research on voting rights tokens highlights the formation of timocracies over time. The consideration is that the tokens’ tradability is the cause of concentration. To examine this proposition, this paper uses an agent-based model to simulate and analyze the concentration of voting rights tokens post three fair launch allocation scenarios under different trading modalities. The results show that regardless of the allocation, concentration persistently occurs. It confirms the consideration that the ‘disease’ is endogenous: the cause of concentration is the tokens’ tradablility. The findings inform theoretical understandings and practical implications for on-chain governance mediated by tokens.
随着分布式账本技术和智能合约的发展,代币化投票权在去中心化金融(DeFi)领域获得了突出地位。投票权代币(又称治理代币)是可替代代币,赋予个人持有者对项目命运进行投票的权利。这些代币背后的动机是在去中心化自治组织(DAO)内实现去中心化控制。由于这些代币的初始分配往往是不民主的,DeFi 项目和 Yearn Finance 的 DAO 尝试了一种公平的启动分配,即不预先挖掘代币,所有参与者都有平等的机会获得代币。无论如何,关于投票权代币的研究强调了随着时间推移形成的时间型政体。考虑因素是代币的可交易性是集中的原因。为了研究这一命题,本文使用基于代理的模型模拟和分析了不同交易模式下三种公平发射分配方案后投票权代币的集中情况。结果表明,无论采用哪种分配方式,集中都会持续发生。这证实了 "疾病 "是内生性的这一观点:集中的原因在于代币的可交易性。研究结果为以代币为媒介的链上治理提供了理论认识和实践启示。
{"title":"Agent-based Model of Initial Token Allocations: Simulating Distributions post Fair Launch","authors":"Joaquin Delgado Fernandez, Tom Josua Barbereau, Orestis Papageorgiou","doi":"10.1145/3649318","DOIUrl":"https://doi.org/10.1145/3649318","url":null,"abstract":"With advancements in distributed ledger technologies and smart contracts, tokenized voting rights gained prominence within Decentralized Finance (DeFi). Voting rights tokens (aka. governance tokens) are fungible tokens that grant individual holders the right to vote upon the fate of a project. The motivation behind these tokens is to achieve decentral control within a decentralized autonomous organization (DAO). Because the initial allocations of these tokens is often un-democratic, the DeFi project and DAO of Yearn Finance experimented with a fair launch allocation where no tokens are pre-mined and all participants have an equal opportunity to receive them. Regardless, research on voting rights tokens highlights the formation of timocracies over time. The consideration is that the tokens’ tradability is the cause of concentration. To examine this proposition, this paper uses an agent-based model to simulate and analyze the concentration of voting rights tokens post three fair launch allocation scenarios under different trading modalities. The results show that regardless of the allocation, concentration persistently occurs. It confirms the consideration that the ‘disease’ is endogenous: the cause of concentration is the tokens’ tradablility. The findings inform theoretical understandings and practical implications for on-chain governance mediated by tokens.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140435531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Design science research has traditionally been applied to complex real-world problems to produce an artifact to address such problems. Although design science research efforts have been applied traditionally to business or related problems, there is a large set of problems in the area of digital science that also require important, digital artifacts. The digitalization of science has resulted in the need to develop essential, specialized, devices and software before it is feasible for scientists to carry out their work. This research examines digital science to identify its challenges and demonstrate how it can be possible to progress digital science with design science research, thereby establishing digital science as an important area of transdisciplinary inquiry. These areas of research are examined for their synergies and explained by positioning artifact development challenges with respect to Simon's inner and outer environments, and the interface between them.
{"title":"Design with Simon's Inner and Outer Environments: Theoretical Foundations for Design Science Research Methods for Digital Science","authors":"V. Storey, Richard Baskerville","doi":"10.1145/3640819","DOIUrl":"https://doi.org/10.1145/3640819","url":null,"abstract":"Design science research has traditionally been applied to complex real-world problems to produce an artifact to address such problems. Although design science research efforts have been applied traditionally to business or related problems, there is a large set of problems in the area of digital science that also require important, digital artifacts. The digitalization of science has resulted in the need to develop essential, specialized, devices and software before it is feasible for scientists to carry out their work. This research examines digital science to identify its challenges and demonstrate how it can be possible to progress digital science with design science research, thereby establishing digital science as an important area of transdisciplinary inquiry. These areas of research are examined for their synergies and explained by positioning artifact development challenges with respect to Simon's inner and outer environments, and the interface between them.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139619044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}