首页 > 最新文献

2009 Fourth International Conference on Digital Information Management最新文献

英文 中文
Using tags for breaking news elicitation 在突发新闻诱导中使用标签
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356780
A. Chua, D. Goh
The reduction in news cycle time coupled with high Internet penetration have resulted in a phenomenon known as ‘citizen journalism’ where ordinary people who are non-journalists collect, analyze and disseminate news pieces. This paper leverages tags drawn from iReport, an active citizen journalism Website to elicit breaking news. The goal is to examine the coverage and effectiveness of news elicitation in iReport vis-à-vis those reported in the mainstream media. The data collection procedure involved manually culling major news events reported in mainstream news sources between April 8 and June 6 2008. In parallel, tags from iReport postings were extracted during the same study period. Tags were analyzed using correlational analysis and relative frequencies. Results show that out of the 10 major news events reported in mainstream sources, five could be elicited from tags in iReport. Implications and suggestions for future research are also discussed
新闻周期的缩短,加上互联网的高度普及,导致了一种被称为“公民新闻”的现象,即非记者的普通人收集、分析和传播新闻。本文利用iReport(一个活跃的公民新闻网站)上的标签来引出突发新闻。目的是检查iReport与-à-vis中主流媒体报道的新闻引出的覆盖面和有效性。数据收集过程涉及人工筛选2008年4月8日至6月6日期间主流新闻来源报道的重大新闻事件。同时,在同一研究期间,从iReport帖子中提取标签。使用相关分析和相对频率对标签进行分析。结果表明,在主流媒体报道的10个重大新闻事件中,有5个可以从iReport中的标签中引出。并对未来研究的意义和建议进行了讨论
{"title":"Using tags for breaking news elicitation","authors":"A. Chua, D. Goh","doi":"10.1109/ICDIM.2009.5356780","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356780","url":null,"abstract":"The reduction in news cycle time coupled with high Internet penetration have resulted in a phenomenon known as ‘citizen journalism’ where ordinary people who are non-journalists collect, analyze and disseminate news pieces. This paper leverages tags drawn from iReport, an active citizen journalism Website to elicit breaking news. The goal is to examine the coverage and effectiveness of news elicitation in iReport vis-à-vis those reported in the mainstream media. The data collection procedure involved manually culling major news events reported in mainstream news sources between April 8 and June 6 2008. In parallel, tags from iReport postings were extracted during the same study period. Tags were analyzed using correlational analysis and relative frequencies. Results show that out of the 10 major news events reported in mainstream sources, five could be elicited from tags in iReport. Implications and suggestions for future research are also discussed","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130041964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysis of news agencies' descriptive feature by using SVO structure 用SVO结构分析新闻机构的描述性特征
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356776
Shin Ishida, Qiang Ma, Masatoshi Yoshikawa
In some sense, news is probably never free from the agencies ' subjective valuation and external forces such as owners and advertisers. As a result, the perspective of news content may be biased. To clarify such a bias, we propose a novel method to extract characteristic descriptions on a certain entity (person, location, organization, etc.) in articles of a news agency. For a given entity, a description is one tuple (called SVO tuple) that consists ofthat entity and the other words or phrases appearing in the same sentence on the basis of their SVO (Subject(S), Verb(V) and Object(O)) roles. By computing the frequency and inverse agency frequency of each description, we extract the characteristic description on a certain entity. Intuitively, a SVO tuple, which is often used by the news agency but not commonly used by the others, has high probability of being of a characteristic description. To validate our method, we carried out an experiment to extract characteristic descriptions on persons by using articles from three well-known Japanese newspaper agencies. The experimental results show that our method can elucidate the different features of each agency's writing style. We discuss the useful application using our method and further work.
从某种意义上说,新闻可能永远无法摆脱新闻机构的主观评价和所有者、广告商等外部力量的影响。因此,新闻内容的视角可能会有偏差。为了澄清这种偏见,我们提出了一种新的方法来提取新闻机构文章中关于某个实体(人、地点、组织等)的特征描述。对于给定实体,描述是一个元组(称为SVO元组),该元组由该实体和根据其SVO(主语(S)、动词(V)和宾语(O)角色出现在同一句子中的其他单词或短语组成。通过计算各描述的频率和逆代理频率,提取出某实体上的特征描述。直观地看,新闻机构经常使用而其他机构不常用的SVO元组具有高概率的特征描述。为了验证我们的方法,我们利用日本三家知名报纸机构的文章进行了人物特征描述提取实验。实验结果表明,该方法能较好地阐释各机构写作风格的不同特征。讨论了该方法的应用和进一步的工作。
{"title":"Analysis of news agencies' descriptive feature by using SVO structure","authors":"Shin Ishida, Qiang Ma, Masatoshi Yoshikawa","doi":"10.1109/ICDIM.2009.5356776","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356776","url":null,"abstract":"In some sense, news is probably never free from the agencies ' subjective valuation and external forces such as owners and advertisers. As a result, the perspective of news content may be biased. To clarify such a bias, we propose a novel method to extract characteristic descriptions on a certain entity (person, location, organization, etc.) in articles of a news agency. For a given entity, a description is one tuple (called SVO tuple) that consists ofthat entity and the other words or phrases appearing in the same sentence on the basis of their SVO (Subject(S), Verb(V) and Object(O)) roles. By computing the frequency and inverse agency frequency of each description, we extract the characteristic description on a certain entity. Intuitively, a SVO tuple, which is often used by the news agency but not commonly used by the others, has high probability of being of a characteristic description. To validate our method, we carried out an experiment to extract characteristic descriptions on persons by using articles from three well-known Japanese newspaper agencies. The experimental results show that our method can elucidate the different features of each agency's writing style. We discuss the useful application using our method and further work.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125587524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
From state-based to event-based contextual security policies 从基于状态到基于事件的上下文安全策略
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356768
Yehia El Rakaiby, F. Cuppens, N. Cuppens-Boulahia
In this paper, we present a formal contextual security model for pervasive computing applications. Main features of the model are: support of authorization and obligation policies, monitoring and dynamic revocation of access rights, support of personalized security rule contexts, and support of collaborative applications. The model is also logic-based. Therefore, it enables the use of formal policy conflict and dynamic system analysis techniques.
在本文中,我们提出了普适计算应用的形式化上下文安全模型。该模型的主要特性是:支持授权和义务策略、监视和动态撤销访问权限、支持个性化安全规则上下文以及支持协作应用程序。该模型也是基于逻辑的。因此,它支持使用正式的策略冲突和动态系统分析技术。
{"title":"From state-based to event-based contextual security policies","authors":"Yehia El Rakaiby, F. Cuppens, N. Cuppens-Boulahia","doi":"10.1109/ICDIM.2009.5356768","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356768","url":null,"abstract":"In this paper, we present a formal contextual security model for pervasive computing applications. Main features of the model are: support of authorization and obligation policies, monitoring and dynamic revocation of access rights, support of personalized security rule contexts, and support of collaborative applications. The model is also logic-based. Therefore, it enables the use of formal policy conflict and dynamic system analysis techniques.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116635063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards the healthy nutritional dietary patterns 朝着健康的营养饮食模式发展
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356791
Chendong Li
Association rule mining is a popular technique in data mining and it has an extremely wide application area. In this paper, we study the association rule mining problem and propose a cascaded approach to extract the interesting healthy nutritional dietary patterns. Our approach is mainly based on the Apri-ori algorithm and rule deduction techniques. To test the feasibility and effectiveness of the new approach, we conduct series of experiments with the data obtained from the U. S. Department of Agriculture Food and Nutrient Database for Dietary Studies 3.0. Our experimental results demonstrate that the proposed approach can successfully extract many interesting healthy nutritional dietary patterns. Also some important patterns are unknown before.
关联规则挖掘是数据挖掘中的一种流行技术,具有极其广泛的应用领域。本文研究了关联规则挖掘问题,提出了一种级联的方法来提取有趣的健康营养饮食模式。我们的方法主要基于aprio -ori算法和规则演绎技术。为了验证新方法的可行性和有效性,我们利用美国农业部食品和营养数据库3.0版的数据进行了一系列实验。实验结果表明,该方法可以成功地提取出许多有趣的健康营养饮食模式。还有一些重要的模式是以前不知道的。
{"title":"Towards the healthy nutritional dietary patterns","authors":"Chendong Li","doi":"10.1109/ICDIM.2009.5356791","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356791","url":null,"abstract":"Association rule mining is a popular technique in data mining and it has an extremely wide application area. In this paper, we study the association rule mining problem and propose a cascaded approach to extract the interesting healthy nutritional dietary patterns. Our approach is mainly based on the Apri-ori algorithm and rule deduction techniques. To test the feasibility and effectiveness of the new approach, we conduct series of experiments with the data obtained from the U. S. Department of Agriculture Food and Nutrient Database for Dietary Studies 3.0. Our experimental results demonstrate that the proposed approach can successfully extract many interesting healthy nutritional dietary patterns. Also some important patterns are unknown before.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134098921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Augmenting the exploration of digital libraries with web-based visualizations 利用基于网络的可视化增强对数字图书馆的探索
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356798
Peter Bergström, Darren C. Atkinson
Web-based digital libraries have sped up the process that scholars use to find new, important research papers. Unfortunately, current digital libraries are limited by their inadequate webpage-based paradigm, and it is easy for even the most experienced scholar to get lost. A paper and its immediate references are shown on a webpage, but it is not obvious where that paper belongs in the larger context of a field of research. The goal for our research was to develop and test the effectiveness of a web-based application, PaperCube, that was designed to augment a scholar's interaction with a digital library and explore bibliographic meta data using a defined set of visualizations. These visualizations needed to provide different levels of visibility into a paper's citation network without losing focus of the currently viewed paper. PaperCube was validated through a user study which showed that it was very useful when it comes to augmenting digital library search by reducing the ¿cognitive load¿ put on a scholar and aiding the ¿discoverability¿ of new research material.
基于网络的数字图书馆加快了学者们寻找新的、重要的研究论文的过程。不幸的是,目前的数字图书馆受到其不完善的基于网页的范式的限制,即使是最有经验的学者也很容易迷失方向。一篇论文和它的直接参考文献会显示在网页上,但这篇论文在一个研究领域的大背景下属于什么位置并不明显。我们的研究目标是开发和测试基于网络的应用程序PaperCube的有效性,该应用程序旨在增强学者与数字图书馆的交互,并使用一组定义的可视化来探索书目元数据。这些可视化需要为论文的引用网络提供不同层次的可见性,同时又不会失去当前浏览论文的重点。PaperCube通过一项用户研究得到验证,该研究表明,通过减少学者的“认知负荷”和帮助新研究材料的“发现性”,它在增强数字图书馆搜索方面非常有用。
{"title":"Augmenting the exploration of digital libraries with web-based visualizations","authors":"Peter Bergström, Darren C. Atkinson","doi":"10.1109/ICDIM.2009.5356798","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356798","url":null,"abstract":"Web-based digital libraries have sped up the process that scholars use to find new, important research papers. Unfortunately, current digital libraries are limited by their inadequate webpage-based paradigm, and it is easy for even the most experienced scholar to get lost. A paper and its immediate references are shown on a webpage, but it is not obvious where that paper belongs in the larger context of a field of research. The goal for our research was to develop and test the effectiveness of a web-based application, PaperCube, that was designed to augment a scholar's interaction with a digital library and explore bibliographic meta data using a defined set of visualizations. These visualizations needed to provide different levels of visibility into a paper's citation network without losing focus of the currently viewed paper. PaperCube was validated through a user study which showed that it was very useful when it comes to augmenting digital library search by reducing the ¿cognitive load¿ put on a scholar and aiding the ¿discoverability¿ of new research material.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128223980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Adaptive foreground segmentation using fuzzy approach 基于模糊方法的自适应前景分割
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356792
Huajing Yao, Imran Ahmad
In this paper, we propose a simple and novel method for background modeling and foreground segmentation for visual surveillance applications. This method employs histogram based median method using HSV color space and a fuzzy k-means clustering. A histogram for each pixel among the training frames is constructed first, then the highest bin of the histogram is chosen and the median value among this bin is selected as the estimated value of background model for this pixel. A background model is established after the above procedure is applied to all the pixels. Fuzzy k-means clustering is used to classify each pixel in current frame either as the background pixel or the foreground pixel. Experimental results on a set of indoor videos show the effectiveness of the proposed method. Compared with other two contemporary methods — k-means clustering and Mixture of Gaussians (MoG) — the proposed method is not only time efficient but also provides better segmentation results.
本文提出了一种简单新颖的背景建模和前景分割方法。该方法采用基于直方图的中位数方法,利用HSV色彩空间和模糊k均值聚类。首先为训练帧中的每个像素构建一个直方图,然后选择直方图中最高的bin,并选择该bin中的中位数作为该像素的背景模型估计值。将上述步骤应用于所有像素后,建立背景模型。使用模糊k-means聚类对当前帧中的每个像素进行分类,将其作为背景像素或前景像素。一组室内视频的实验结果表明了该方法的有效性。与k均值聚类和混合高斯聚类(MoG)方法相比,该方法不仅具有时间效率,而且具有更好的分割效果。
{"title":"Adaptive foreground segmentation using fuzzy approach","authors":"Huajing Yao, Imran Ahmad","doi":"10.1109/ICDIM.2009.5356792","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356792","url":null,"abstract":"In this paper, we propose a simple and novel method for background modeling and foreground segmentation for visual surveillance applications. This method employs histogram based median method using HSV color space and a fuzzy k-means clustering. A histogram for each pixel among the training frames is constructed first, then the highest bin of the histogram is chosen and the median value among this bin is selected as the estimated value of background model for this pixel. A background model is established after the above procedure is applied to all the pixels. Fuzzy k-means clustering is used to classify each pixel in current frame either as the background pixel or the foreground pixel. Experimental results on a set of indoor videos show the effectiveness of the proposed method. Compared with other two contemporary methods — k-means clustering and Mixture of Gaussians (MoG) — the proposed method is not only time efficient but also provides better segmentation results.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133867573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The mqr-tree: Improving upon a 2-dimensional spatial access method mqr树:改进二维空间访问方法
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356774
Marc Moreau, W. Osborn, B. Anderson
We propose the mqr-tree, a two-dimensional spatial access method that improves upon the 2DR-tree. The 2DR-tree uses two-dimensional nodes so that the relationships between all objects are maintained. The existing structure of the 2DR-tree has many advantages. However, limitations include higher tree height, overcoverage and overlap than is necessary. The mqr-tree improves utilizes a different node organization, set of validity rules and insertion strategy. A comparison versus the R-tree shows significant improvements in overlap and overcoverage, with comparable height and space utilization. In addition, zero overlap is achieved when the mqr-tree is used to index point data.
我们提出了mqr树,一种改进2dr树的二维空间访问方法。2dr树使用二维节点,以便维护所有对象之间的关系。现有的2dr树结构有很多优点。然而,限制包括树木高度过高,覆盖过多和重叠超过必要。mqr树改进了利用不同的节点组织、有效性规则集和插入策略。与r树的比较表明,在高度和空间利用率相当的情况下,重叠和覆盖范围有了显著改善。此外,当mqr-tree用于索引点数据时,可以实现零重叠。
{"title":"The mqr-tree: Improving upon a 2-dimensional spatial access method","authors":"Marc Moreau, W. Osborn, B. Anderson","doi":"10.1109/ICDIM.2009.5356774","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356774","url":null,"abstract":"We propose the mqr-tree, a two-dimensional spatial access method that improves upon the 2DR-tree. The 2DR-tree uses two-dimensional nodes so that the relationships between all objects are maintained. The existing structure of the 2DR-tree has many advantages. However, limitations include higher tree height, overcoverage and overlap than is necessary. The mqr-tree improves utilizes a different node organization, set of validity rules and insertion strategy. A comparison versus the R-tree shows significant improvements in overlap and overcoverage, with comparable height and space utilization. In addition, zero overlap is achieved when the mqr-tree is used to index point data.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133830230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Text classification based on limited bibliographic metadata 基于有限书目元数据的文本分类
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356767
K. Denecke, T. Risse, Thomas Baehr
In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document's metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document based on identified core features. The system is evaluated on a real-world data set and the influence of different feature combinations and settings is studied. Although the available information is limited, the results show that the approach is capable to efficiently classify data items representing documents.
在本文中,我们介绍了一种根据主题对数字项目进行分类的方法,该方法仅依赖于文档的元数据,如作者姓名和标题信息。提出的方法基于为我们的目的而构建的一组词汇资源(例如,期刊标题,会议名称)和传统的机器学习分类器,该分类器根据识别的核心特征为每个文档分配一个类别。在实际数据集上对系统进行了评估,并研究了不同特征组合和设置的影响。尽管可用的信息有限,但结果表明,该方法能够有效地对表示文档的数据项进行分类。
{"title":"Text classification based on limited bibliographic metadata","authors":"K. Denecke, T. Risse, Thomas Baehr","doi":"10.1109/ICDIM.2009.5356767","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356767","url":null,"abstract":"In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document's metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document based on identified core features. The system is evaluated on a real-world data set and the influence of different feature combinations and settings is studied. Although the available information is limited, the results show that the approach is capable to efficiently classify data items representing documents.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126289162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Learning to rank firms with annual reports 学习用年度报告对公司进行排名
Pub Date : 2009-12-18 DOI: 10.1109/ICDIM.2009.5356781
Xin Ying Qiu
The textual content of company annual reports has proven to contain predictive indicators for the company future performance. This paper addresses the general research question of evaluating the effectiveness of applying machine learning and text mining techniques to building predictive models with annual reports. More specifically, we focus on these two questions: 1) can the advantages of the ranking algorithm help achieve better predictive performance with annual reports? and 2) can we integrate meta semantic features to help support our prediction? We compare models built with different ranking algorithms and document models. We evaluate our models with a simulated investment portfolio. Our results show significantly positive average returns over 5 years with a power law trend as we increase the ranking threshold. Adding meta features to document model has shown to improve ranking performance. The SVR & Meta-augemented model outperforms the others and provides potential for explaining the textual factors behind the prediction.
事实证明,公司年报的文本内容包含对公司未来业绩的预测指标。本文解决了评估应用机器学习和文本挖掘技术构建年度报告预测模型的有效性的一般研究问题。更具体地说,我们关注这两个问题:1)排名算法的优势是否有助于在年报中实现更好的预测性能?2)我们能否整合元语义特征来帮助支持我们的预测?我们比较了用不同排序算法和文档模型构建的模型。我们用模拟的投资组合来评估我们的模型。我们的结果显示,随着排名阈值的增加,5年的平均回报率显著为正,呈幂律趋势。向文档模型添加元特性已被证明可以提高排名性能。SVR和元增强模型优于其他模型,并为解释预测背后的文本因素提供了潜力。
{"title":"Learning to rank firms with annual reports","authors":"Xin Ying Qiu","doi":"10.1109/ICDIM.2009.5356781","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356781","url":null,"abstract":"The textual content of company annual reports has proven to contain predictive indicators for the company future performance. This paper addresses the general research question of evaluating the effectiveness of applying machine learning and text mining techniques to building predictive models with annual reports. More specifically, we focus on these two questions: 1) can the advantages of the ranking algorithm help achieve better predictive performance with annual reports? and 2) can we integrate meta semantic features to help support our prediction? We compare models built with different ranking algorithms and document models. We evaluate our models with a simulated investment portfolio. Our results show significantly positive average returns over 5 years with a power law trend as we increase the ranking threshold. Adding meta features to document model has shown to improve ranking performance. The SVR & Meta-augemented model outperforms the others and provides potential for explaining the textual factors behind the prediction.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context data-driven approach for ubiquitous computing applications 泛在计算应用的上下文数据驱动方法
Pub Date : 2009-11-01 DOI: 10.1109/ICDIM.2009.5356771
Yazid Benazzouz, P. Beaune, F. Ramparany, Laure Chotard
Context data-driven approach refers to the process of collecting and storing context data from a wide range of context sources as sensors and web services. This approach differs from existing context-aware applications, where context models and applications are closely related and ignore how the context is derived from sources and interpreted. In this paper, we propose a context data model based on semantic web technologies for ubiquitous computing applications. This model facilitates the specification of context data to be easy interpreted by applications and services. In addition, this model is supported by existing communications protocols. Therefore, our context data model is applied to promote mining of context data towards service adaptation in ubiquitous computing systems.
上下文数据驱动方法是指从广泛的上下文源(如传感器和web服务)收集和存储上下文数据的过程。这种方法不同于现有的上下文感知应用程序,在现有的上下文感知应用程序中,上下文模型和应用程序是密切相关的,并且忽略了如何从源中派生和解释上下文。本文提出了一种基于语义web技术的上下文数据模型,用于泛在计算应用。该模型简化了上下文数据的规范,使应用程序和服务能够轻松地解释上下文数据。此外,该模型得到了现有通信协议的支持。因此,我们的上下文数据模型被用于促进上下文数据的挖掘,以适应泛在计算系统中的服务。
{"title":"Context data-driven approach for ubiquitous computing applications","authors":"Yazid Benazzouz, P. Beaune, F. Ramparany, Laure Chotard","doi":"10.1109/ICDIM.2009.5356771","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356771","url":null,"abstract":"Context data-driven approach refers to the process of collecting and storing context data from a wide range of context sources as sensors and web services. This approach differs from existing context-aware applications, where context models and applications are closely related and ignore how the context is derived from sources and interpreted. In this paper, we propose a context data model based on semantic web technologies for ubiquitous computing applications. This model facilitates the specification of context data to be easy interpreted by applications and services. In addition, this model is supported by existing communications protocols. Therefore, our context data model is applied to promote mining of context data towards service adaptation in ubiquitous computing systems.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116417172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2009 Fourth International Conference on Digital Information Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1