首页 > 最新文献

2008 Third International Conference on Digital Information Management最新文献

英文 中文
Enhancing interoperability between enterprise planning applications: An architectural framework 增强企业规划应用程序之间的互操作性:体系结构框架
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746725
K. Ishak, B. Archimède, P. Charbonnaud
In this article, a model of service oriented market is presented. It gives a more equitable opportunity of integration in the business markets for the small and medium enterprises. Nevertheless, multi-site planning is a critical and difficult task due to the heterogeneity between planning applications of the various partners. For this objective, the proposed interoperable and distributed architecture SCEP-SOA for multi-site planning is based on SOA (service oriented architecture), integrating the concepts of the generic model of planning and scheduling SCEP (supervisor, customer, environment, and producer). This interoperable architecture enables an applicative interoperability as well as a semantic interoperability between different planning applications used by the partners.
本文提出了一个面向服务的市场模型。它为中小企业提供了更公平的融入商业市场的机会。然而,由于各个合作伙伴的规划应用程序之间的异质性,多站点规划是一项关键而困难的任务。为了实现这一目标,提出的用于多站点规划的可互操作和分布式体系结构sep -SOA基于SOA(面向服务的体系结构),集成了规划和调度SCEP(主管、客户、环境和生产者)的通用模型的概念。这种可互操作的体系结构支持合作伙伴使用的不同规划应用程序之间的应用程序互操作性和语义互操作性。
{"title":"Enhancing interoperability between enterprise planning applications: An architectural framework","authors":"K. Ishak, B. Archimède, P. Charbonnaud","doi":"10.1109/ICDIM.2008.4746725","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746725","url":null,"abstract":"In this article, a model of service oriented market is presented. It gives a more equitable opportunity of integration in the business markets for the small and medium enterprises. Nevertheless, multi-site planning is a critical and difficult task due to the heterogeneity between planning applications of the various partners. For this objective, the proposed interoperable and distributed architecture SCEP-SOA for multi-site planning is based on SOA (service oriented architecture), integrating the concepts of the generic model of planning and scheduling SCEP (supervisor, customer, environment, and producer). This interoperable architecture enables an applicative interoperability as well as a semantic interoperability between different planning applications used by the partners.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131895625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring the applicability of Web Architectural-Inducing Model (WA-IM) for Information Architecture in cultural context: A structural equation modeling approach Web架构诱导模型(WA-IM)在文化背景下对信息架构的适用性探讨:一种结构方程建模方法
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746826
W. Isa, N. Noor, Shafie Mehad
Website information architecture (IA) has transcended into a discipline that is concerned with design principle and architecture of information in digital landscape. IA models in web-mediated environment however lack theoretical perspectives, empirical evidence and cultural context. In our effort to enlighten these aforementioned, we proposed the Web architectural-inducing model (WA-IM) for IA. We conceptualized website IA as multidimensional constructs and explore the applicability of WA-IM for IA. We conducted a web-based survey to 427 Muslim online user as the cultural case study and examine the expectations of using IA in culture-centred website; i.e., Islamic genre website. Construct validation of the multifactor structure of website IA was assessed via confirmatory factor analysis (CFA), using structural equation modeling (SEM). A five factor hypothesis goodness fit model was evaluated where the CFA verified that website IA is composed of multidimensional constructs of five factors: 'content-information', 'content-trust', 'navigation-trait', 'navigation-wayfinding' and 'context-information design'.
网站信息架构(IA)已经超越成为一门关注数字景观中信息的设计原则和架构的学科。然而,网络中介环境下的IA模型缺乏理论视角、经验证据和文化背景。在我们努力启发上述这些方面的过程中,我们提出了IA的Web架构诱导模型(WA-IM)。我们将网站信息集成概念化为多维结构,并探讨了WA-IM对信息集成的适用性。我们对427名穆斯林在线用户进行了一项基于网络的调查,作为文化案例研究,并研究了在以文化为中心的网站中使用IA的期望;即伊斯兰类型网站。采用结构方程模型(SEM),通过验证性因子分析(CFA)对IA网站的多因素结构进行结构验证。评估了五因素假设优度拟合模型,其中CFA验证了网站IA由五个因素的多维结构组成:“内容-信息”,“内容-信任”,“导航-特征”,“导航-寻路”和“上下文-信息设计”。
{"title":"Exploring the applicability of Web Architectural-Inducing Model (WA-IM) for Information Architecture in cultural context: A structural equation modeling approach","authors":"W. Isa, N. Noor, Shafie Mehad","doi":"10.1109/ICDIM.2008.4746826","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746826","url":null,"abstract":"Website information architecture (IA) has transcended into a discipline that is concerned with design principle and architecture of information in digital landscape. IA models in web-mediated environment however lack theoretical perspectives, empirical evidence and cultural context. In our effort to enlighten these aforementioned, we proposed the Web architectural-inducing model (WA-IM) for IA. We conceptualized website IA as multidimensional constructs and explore the applicability of WA-IM for IA. We conducted a web-based survey to 427 Muslim online user as the cultural case study and examine the expectations of using IA in culture-centred website; i.e., Islamic genre website. Construct validation of the multifactor structure of website IA was assessed via confirmatory factor analysis (CFA), using structural equation modeling (SEM). A five factor hypothesis goodness fit model was evaluated where the CFA verified that website IA is composed of multidimensional constructs of five factors: 'content-information', 'content-trust', 'navigation-trait', 'navigation-wayfinding' and 'context-information design'.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127962660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrating multimedia repositories into the PROBADO framework 将多媒体存储库集成到PROBADO框架中
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746720
Ina Blümel, J. Diet, Harald Krottmaier
In this paper, we describe a digital library initiative for non-textual documents. The proposed framework will integrate different types of content-repositories - each one specialized for a specific multimedia domain - into one seamless system and will add features such as automatic annotation, full-text retrieval and recommender services to non-textual documents. Two multimedia domains, 3D graphics and music, will be introduced. The repositories can be searched using both textual (metadata-based) and non-textual retrieval mechanisms (e.g. using a complex sketch-based interface for searching in 3D-models or a query-by-humming interface for music). Domain-specific metadata models are developed and workflows for automated content-based data analysis and indexing proposed.
在本文中,我们描述了一个非文本文档的数字图书馆倡议。拟议的框架将把不同类型的内容存储库(每一种都专门用于特定的多媒体领域)集成到一个无缝的系统中,并将为非文本文档添加自动注释、全文检索和推荐服务等功能。将介绍两个多媒体领域,3D图形和音乐。存储库可以使用文本(基于元数据)和非文本检索机制进行搜索(例如,使用复杂的基于草图的界面进行3d模型搜索或使用哼歌查询音乐界面)。开发了特定领域的元数据模型,并提出了基于内容的自动化数据分析和索引的工作流程。
{"title":"Integrating multimedia repositories into the PROBADO framework","authors":"Ina Blümel, J. Diet, Harald Krottmaier","doi":"10.1109/ICDIM.2008.4746720","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746720","url":null,"abstract":"In this paper, we describe a digital library initiative for non-textual documents. The proposed framework will integrate different types of content-repositories - each one specialized for a specific multimedia domain - into one seamless system and will add features such as automatic annotation, full-text retrieval and recommender services to non-textual documents. Two multimedia domains, 3D graphics and music, will be introduced. The repositories can be searched using both textual (metadata-based) and non-textual retrieval mechanisms (e.g. using a complex sketch-based interface for searching in 3D-models or a query-by-humming interface for music). Domain-specific metadata models are developed and workflows for automated content-based data analysis and indexing proposed.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129211530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
NIDS based on payload word frequencies and anomaly of transitions 基于有效负载词频和转换异常的NIDS
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746821
S. Mrdović, B. Perunicic-Drazenovic
This paper presents a novel payload analysis method. Consecutive bytes are separated by boundary symbols and defined as words. The frequencies of word appearance and word to word transitions are used to build a model of normal behavior. A simple anomaly score calculation is designed for fast attack detection. The method was tested using real traffic and recent attacks to demonstrate that it can be used in IDS. Tolerance to small number of attack in training data is shown.
本文提出了一种新的有效载荷分析方法。连续的字节由边界符号分隔,并定义为单词。单词出现的频率和单词到单词的转换被用来建立正常行为的模型。设计了一种简单的异常评分计算方法,用于快速检测攻击。使用真实流量和最近的攻击对该方法进行了测试,以证明它可以用于IDS。显示了训练数据对少量攻击的容忍度。
{"title":"NIDS based on payload word frequencies and anomaly of transitions","authors":"S. Mrdović, B. Perunicic-Drazenovic","doi":"10.1109/ICDIM.2008.4746821","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746821","url":null,"abstract":"This paper presents a novel payload analysis method. Consecutive bytes are separated by boundary symbols and defined as words. The frequencies of word appearance and word to word transitions are used to build a model of normal behavior. A simple anomaly score calculation is designed for fast attack detection. The method was tested using real traffic and recent attacks to demonstrate that it can be used in IDS. Tolerance to small number of attack in training data is shown.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128849462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data collection system for link analysis 数据采集系统进行环节分析
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746781
Bo Yang, Jian Qin
The study presented in this paper exploited several possible ways to meet the needs of link analysis in Webometrics and developed a prototype (LinkDiscoverer) that collects data from both real-time links and search engines. The prototype consists of two parts: a crawling part for collecting real-time link data from a given domain or site and a search engine part for harvesting link data from search engines by using specific search commands. An experiment was conducted to evaluate the performance of LinkDiscoverer on link analysis. The results show that the LinkDiscovererpsilas functions can well satisfy the needs for link analysis. This study contributes to data collection methods and selection strategy in Webometrics.
本文中提出的研究利用了几种可能的方法来满足Webometrics中链接分析的需求,并开发了一个原型(LinkDiscoverer),可以从实时链接和搜索引擎中收集数据。原型由两部分组成:爬行部分,用于从给定的域或站点收集实时链接数据;搜索引擎部分,用于通过使用特定的搜索命令从搜索引擎获取链接数据。通过实验对LinkDiscoverer在链接分析方面的性能进行了评价。结果表明,LinkDiscovererpsilas函数可以很好地满足链接分析的需要。本研究有助于网络测量学的数据收集方法和选择策略。
{"title":"Data collection system for link analysis","authors":"Bo Yang, Jian Qin","doi":"10.1109/ICDIM.2008.4746781","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746781","url":null,"abstract":"The study presented in this paper exploited several possible ways to meet the needs of link analysis in Webometrics and developed a prototype (LinkDiscoverer) that collects data from both real-time links and search engines. The prototype consists of two parts: a crawling part for collecting real-time link data from a given domain or site and a search engine part for harvesting link data from search engines by using specific search commands. An experiment was conducted to evaluate the performance of LinkDiscoverer on link analysis. The results show that the LinkDiscovererpsilas functions can well satisfy the needs for link analysis. This study contributes to data collection methods and selection strategy in Webometrics.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117070722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Testing concept indexing in crosslingual medical text classification 跨语言医学文本分类中概念标引的测试
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746715
Francisco M. Carrero, José Carlos Cortizo, J. M. G. Hidalgo
MetaMap is an online application that allows mapping text to UMLS Metathesaurus concepts, which is very useful for interoperability among different languages and systems within the biomedical domain. MetaMap Transfer (MMTx) is a Java program that makes MetaMap available to biomedical researchers in controlled, configurable environment. Currently there is no Spanish version of MetaMap, which difficult the use of UMLS Metathesaurus to extract concepts from Spanish biomedical texts. Developing a Spanish version of MetaMap would be a huge task, since there has been a lot of work supporting the English version for the last sixteen years. Our ongoing research is mainly focused on using biomedical concepts for crosslingual text classification. In this context the use of concepts instead of bag of words representation allows us to face text classification tasks abstracting from the language. In this paper we show our experiments on combining automatic translation techniques with the use of biomedical ontologies to produce an English text that can be processed by MMTx in order to extract concepts for text classification.
MetaMap是一个在线应用程序,它允许将文本映射到UMLS元词典概念,这对于生物医学领域内不同语言和系统之间的互操作性非常有用。MetaMap Transfer (MMTx)是一个Java程序,它使生物医学研究人员可以在受控的、可配置的环境中使用MetaMap。目前还没有西班牙语版本的MetaMap,这使得使用UMLS metthesaurus从西班牙生物医学文本中提取概念变得困难。开发西班牙语版本的MetaMap将是一项巨大的任务,因为在过去的16年里已经有很多工作支持英语版本。我们正在进行的研究主要集中在使用生物医学概念进行跨语言文本分类。在这种情况下,使用概念代替词袋表示使我们能够面对从语言中抽象出来的文本分类任务。在本文中,我们展示了我们将自动翻译技术与生物医学本体相结合的实验,以产生可以由MMTx处理的英语文本,以便提取用于文本分类的概念。
{"title":"Testing concept indexing in crosslingual medical text classification","authors":"Francisco M. Carrero, José Carlos Cortizo, J. M. G. Hidalgo","doi":"10.1109/ICDIM.2008.4746715","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746715","url":null,"abstract":"MetaMap is an online application that allows mapping text to UMLS Metathesaurus concepts, which is very useful for interoperability among different languages and systems within the biomedical domain. MetaMap Transfer (MMTx) is a Java program that makes MetaMap available to biomedical researchers in controlled, configurable environment. Currently there is no Spanish version of MetaMap, which difficult the use of UMLS Metathesaurus to extract concepts from Spanish biomedical texts. Developing a Spanish version of MetaMap would be a huge task, since there has been a lot of work supporting the English version for the last sixteen years. Our ongoing research is mainly focused on using biomedical concepts for crosslingual text classification. In this context the use of concepts instead of bag of words representation allows us to face text classification tasks abstracting from the language. In this paper we show our experiments on combining automatic translation techniques with the use of biomedical ontologies to produce an English text that can be processed by MMTx in order to extract concepts for text classification.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117142042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A storage scheme for multidimensional data alleviating dimension dependency 一种多维数据存储方案,减轻了维度依赖性
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746713
Teppei Shimada, T. Tsuji, K. Higuchi
Multidimensional arrays storing multidimensional data in MOLAP are usually very sparse. They also suffer from the problem that the time consumed in sequential access to array elements heavily depends on the dimension along which the elements are accessed. This problem of ldquodimension dependencyrdquo would be alleviated by dividing the whole array into the set of smaller hypercube shaped subarrays called ldquochunksrdquo. But the chunks are also sparse and should be compressed. However, further dimension dependency in accessing array elements would be caused, unless these compressed chunks are arranged judiciously in the page buffer. The difference among the dimension cardinalities could also cause dimension dependency; slice operation along a dimension of large cardinality tends to consume much time. We will alleviate these two kinds of dimension dependency by introducing the notion of an ldquoextended chunkrdquo. Extended chunks can adapt flexibly to the general situation where data densities in chunks are low and are not uniformly distributed. Employing extended chunks, we will propose some secondary storage schemes for a multidimensional array using a space-filling curve such as Z-curve. The evaluation result shows that the proposed storage schemes exhibit good performance while alleviating the dimension dependency.
在MOLAP中存储多维数据的多维数组通常是非常稀疏的。它们还存在这样一个问题,即顺序访问数组元素所消耗的时间在很大程度上取决于访问元素的维度。通过将整个数组划分为称为ldquochunksrdquo的更小的超立方体形状的子数组集,可以缓解ldquodimension依赖的问题。但是这些块也是稀疏的,应该被压缩。但是,在访问数组元素时将导致进一步的维度依赖,除非这些压缩块在页面缓冲区中被合理地安排。维度基数之间的差异也可能导致维度依赖;沿着基数较大的维度进行切片操作往往会消耗大量时间。我们将通过引入ldquoextended chunkrdquo的概念来减轻这两种维度依赖。扩展块可以灵活地适应块中数据密度低且分布不均匀的情况。采用扩展块,我们将使用空间填充曲线(如z曲线)为多维数组提出一些二级存储方案。评价结果表明,所提出的存储方案在降低维度依赖性的同时具有良好的性能。
{"title":"A storage scheme for multidimensional data alleviating dimension dependency","authors":"Teppei Shimada, T. Tsuji, K. Higuchi","doi":"10.1109/ICDIM.2008.4746713","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746713","url":null,"abstract":"Multidimensional arrays storing multidimensional data in MOLAP are usually very sparse. They also suffer from the problem that the time consumed in sequential access to array elements heavily depends on the dimension along which the elements are accessed. This problem of ldquodimension dependencyrdquo would be alleviated by dividing the whole array into the set of smaller hypercube shaped subarrays called ldquochunksrdquo. But the chunks are also sparse and should be compressed. However, further dimension dependency in accessing array elements would be caused, unless these compressed chunks are arranged judiciously in the page buffer. The difference among the dimension cardinalities could also cause dimension dependency; slice operation along a dimension of large cardinality tends to consume much time. We will alleviate these two kinds of dimension dependency by introducing the notion of an ldquoextended chunkrdquo. Extended chunks can adapt flexibly to the general situation where data densities in chunks are low and are not uniformly distributed. Employing extended chunks, we will propose some secondary storage schemes for a multidimensional array using a space-filling curve such as Z-curve. The evaluation result shows that the proposed storage schemes exhibit good performance while alleviating the dimension dependency.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126141923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Identifying bioentity recognition errors of rule-based text-mining systems 基于规则的文本挖掘系统生物实体识别错误的识别
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746791
Francisco M. Couto, Tiago Grego, Hugo P. Bastos, Catia Pesquita, Rafael P. Torres Jiménez, Pablo Sánchez, Leandro Pascual, C. Blaschke
An important research topic in Bioinformatics involves the exploration of vast amounts of biological and biomedical scientific literature (BioLiterature). Over the last few decades, text-mining systems have exploited this BioLiterature to reduce the time spent by researchers in its analysis. However, state-of-the-art approaches are still far from reaching performance levels acceptable by curators, and below the performance obtained in other domains, such as personal name recognition or news text. To achieve high levels of performance, it is essential that text mining tools effectively recognize bioentities present in BioLiterature. This paper presents FIBRE (Filtering Bioentity Recognition Errors), a system for automatically filtering mis annotations generated by rule-based systems that automatically recognize bioentities in BioLiterature. FIBRE aims at using different sets of automatically generated annotations to identify the main features that characterize an annotation of being of a certain type. These features are then used to filter mis annotations using a confidence threshold. The assessment of FIBRE was performed on a set of more than 17,000 documents, previously annotated by Text Detective, a state-of-the-art rule-based name bioentity recognition system. Curators evaluated the gene annotations given by Text Detective that FIBRE classified as non-gene annotations, and we found that FIBRE was able to filter with a precision above 92% more than 600 mis annotations, requiring minimal human effort, which demonstrates the effectiveness of FIBRE in a realistic scenario.
生物信息学的一个重要研究课题是对大量生物和生物医学科学文献(BioLiterature)的探索。在过去的几十年里,文本挖掘系统利用这种生物文献来减少研究人员在分析中花费的时间。然而,最先进的方法仍远未达到策展人可接受的性能水平,并且低于其他领域(如个人姓名识别或新闻文本)所获得的性能。为了达到高水平的性能,文本挖掘工具有效识别生物文献中存在的生物实体是至关重要的。本文介绍了纤维(过滤生物实体识别错误),一个自动过滤由基于规则的系统生成的错误注释的系统,该系统自动识别生物文献中的生物实体。FIBRE的目的是使用不同的自动生成注释集来识别某种类型注释的主要特征。然后使用这些特性使用置信度阈值过滤错误的注释。fiber的评估是在一组超过17,000份文件上进行的,这些文件之前由Text Detective(一种最先进的基于规则的名称生物实体识别系统)注释。管理员评估了文本侦探给出的基因注释,纤维将其归类为非基因注释,我们发现纤维能够以超过92%的精度过滤600多个错误注释,需要最少的人力,这证明了纤维在现实场景中的有效性。
{"title":"Identifying bioentity recognition errors of rule-based text-mining systems","authors":"Francisco M. Couto, Tiago Grego, Hugo P. Bastos, Catia Pesquita, Rafael P. Torres Jiménez, Pablo Sánchez, Leandro Pascual, C. Blaschke","doi":"10.1109/ICDIM.2008.4746791","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746791","url":null,"abstract":"An important research topic in Bioinformatics involves the exploration of vast amounts of biological and biomedical scientific literature (BioLiterature). Over the last few decades, text-mining systems have exploited this BioLiterature to reduce the time spent by researchers in its analysis. However, state-of-the-art approaches are still far from reaching performance levels acceptable by curators, and below the performance obtained in other domains, such as personal name recognition or news text. To achieve high levels of performance, it is essential that text mining tools effectively recognize bioentities present in BioLiterature. This paper presents FIBRE (Filtering Bioentity Recognition Errors), a system for automatically filtering mis annotations generated by rule-based systems that automatically recognize bioentities in BioLiterature. FIBRE aims at using different sets of automatically generated annotations to identify the main features that characterize an annotation of being of a certain type. These features are then used to filter mis annotations using a confidence threshold. The assessment of FIBRE was performed on a set of more than 17,000 documents, previously annotated by Text Detective, a state-of-the-art rule-based name bioentity recognition system. Curators evaluated the gene annotations given by Text Detective that FIBRE classified as non-gene annotations, and we found that FIBRE was able to filter with a precision above 92% more than 600 mis annotations, requiring minimal human effort, which demonstrates the effectiveness of FIBRE in a realistic scenario.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"46 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120993687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using expert systems technology to increase agriculture production and water conservation 利用专家系统技术提高农业产量和节约用水
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746802
M. Mahmoud, M. Rafea, A. Rafea
Although there are many developed expert systems in the world, little consideration has been given to the impacts resulting from their use. There is a difference between the developed expert systems in the laboratories for research and demonstration and the expert systems that can be applied in the fields. The applied expert systems must cover the end users requirements and meet some other evaluation criteria. ESs are evaluated both in the laboratory and in the fields. In the laboratory evaluation, there is an evaluation methodology to guarantee that the ES can be used in the field. The field evaluation is achieved by applying field experiments. Those field experiments showed that fields managed by the ES are better than the control fields. In this paper, the evaluation criteria that guarantee the success of ESs deployed in the fields are presented. These evaluation criteria have been applied on three ES applications, namely: CITEX for citrus cultivation, CUPTEX for cucumber cultivation under plastic tunnel, and NEPER for wheat cultivation. Expert systems potentially have several different types or categories of impact relative to the applied domain. The field experiments are used to evaluate the economical and environmental impacts of the ES. The economical impact includes the cost, profit, and yield. The environmental impact includes the effect of using the ES on water and soil conservation, and also on decreasing the amount of pesticides used in the fields.
虽然世界上有许多发达的专家系统,但很少考虑到它们的使用所产生的影响。在实验室开发的用于研究和示范的专家系统与可应用于现场的专家系统之间存在差异。应用的专家系统必须涵盖最终用户的需求,并满足其他一些评估标准。在实验室和实地都对ESs进行了评估。在实验室评估中,有一个评估方法来保证ES可以在现场使用。通过现场试验进行了现场评价。田间试验结果表明,ES管理的田间效果优于对照田。在此基础上,提出了保证ESs在野外部署成功的评价标准。这些评价标准分别应用于柑橘栽培CITEX、塑料隧道下黄瓜栽培CUPTEX和小麦栽培NEPER 3个ES应用。相对于应用领域,专家系统可能有几种不同类型或类别的影响。通过田间试验,对ES的经济和环境影响进行了评价。经济影响包括成本、利润和产量。环境影响包括使用ES对水土保持的影响,以及对减少田间农药使用量的影响。
{"title":"Using expert systems technology to increase agriculture production and water conservation","authors":"M. Mahmoud, M. Rafea, A. Rafea","doi":"10.1109/ICDIM.2008.4746802","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746802","url":null,"abstract":"Although there are many developed expert systems in the world, little consideration has been given to the impacts resulting from their use. There is a difference between the developed expert systems in the laboratories for research and demonstration and the expert systems that can be applied in the fields. The applied expert systems must cover the end users requirements and meet some other evaluation criteria. ESs are evaluated both in the laboratory and in the fields. In the laboratory evaluation, there is an evaluation methodology to guarantee that the ES can be used in the field. The field evaluation is achieved by applying field experiments. Those field experiments showed that fields managed by the ES are better than the control fields. In this paper, the evaluation criteria that guarantee the success of ESs deployed in the fields are presented. These evaluation criteria have been applied on three ES applications, namely: CITEX for citrus cultivation, CUPTEX for cucumber cultivation under plastic tunnel, and NEPER for wheat cultivation. Expert systems potentially have several different types or categories of impact relative to the applied domain. The field experiments are used to evaluate the economical and environmental impacts of the ES. The economical impact includes the cost, profit, and yield. The environmental impact includes the effect of using the ES on water and soil conservation, and also on decreasing the amount of pesticides used in the fields.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An empirical analysis on progress of technology fusion 技术融合进程的实证分析
Pub Date : 2008-11-01 DOI: 10.1109/ICDIM.2008.4746787
Katsuhiro Suzuki, J. Sakata, J. Hosoya
In order to analyze the progress of research based on the integration of leading edge technologies, this paper introduces three types of patent data categorization, namely the Mix, Only and Mono-IPC type, respectively, expanding the concept of IPC Co-Occurrence. Additionally, the concept of innovation coordinate is introduced as a mean to investigate status and trends of R&D in the field of fuel cells for the years from 2000 to 2004. It is shown that yearly positions determined by the sets of patent applications in Japan concerning fuel cell kept changing during the period; occupation ratio of MIX type inventions decreased while those of Only and Mono-IPC type increased. The declining trend of MIX type, which is considered to be common in the convergence process of technology fusion toward the launch of new products based on cutting-edge technologies, is also observed in the field of micro electro mechanical systems (MEMS). Our future work includes both the trilateral comparison by adding European and US patent datasets, and also, hybrid analyses by introducing external data such as R&D expenditures to investigate the validity and/or the application limit of the present analysis in the quantitative study of the dynamics of innovation.
为了分析基于前沿技术整合的研究进展,本文引入了三种类型的专利数据分类,即Mix、Only和Mono-IPC类型,扩展了IPC Co-Occurrence的概念。并引入创新坐标的概念,以考察2000 - 2004年燃料电池领域的研发现状和发展趋势。结果表明,在此期间,由日本燃料电池专利申请量确定的年度位置不断变化;MIX型发明占比下降,Only型和Mono-IPC型发明占比上升。在以尖端技术为基础推出新产品的技术融合过程中,MIX类型被认为是常见的,这种趋势在微机电系统(MEMS)领域也出现了下降趋势。我们未来的工作包括通过添加欧洲和美国专利数据集进行三方比较,以及通过引入研发支出等外部数据进行混合分析,以调查当前分析在创新动态定量研究中的有效性和/或应用限制。
{"title":"An empirical analysis on progress of technology fusion","authors":"Katsuhiro Suzuki, J. Sakata, J. Hosoya","doi":"10.1109/ICDIM.2008.4746787","DOIUrl":"https://doi.org/10.1109/ICDIM.2008.4746787","url":null,"abstract":"In order to analyze the progress of research based on the integration of leading edge technologies, this paper introduces three types of patent data categorization, namely the Mix, Only and Mono-IPC type, respectively, expanding the concept of IPC Co-Occurrence. Additionally, the concept of innovation coordinate is introduced as a mean to investigate status and trends of R&D in the field of fuel cells for the years from 2000 to 2004. It is shown that yearly positions determined by the sets of patent applications in Japan concerning fuel cell kept changing during the period; occupation ratio of MIX type inventions decreased while those of Only and Mono-IPC type increased. The declining trend of MIX type, which is considered to be common in the convergence process of technology fusion toward the launch of new products based on cutting-edge technologies, is also observed in the field of micro electro mechanical systems (MEMS). Our future work includes both the trilateral comparison by adding European and US patent datasets, and also, hybrid analyses by introducing external data such as R&D expenditures to investigate the validity and/or the application limit of the present analysis in the quantitative study of the dynamics of innovation.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123481432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2008 Third International Conference on Digital Information Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1