首页 > 最新文献

arXiv - CS - Digital Libraries最新文献

英文 中文
Science for whom? The influence of the regional academic circuit on gender inequalities in Latin America 科学为谁服务?地区学术圈对拉丁美洲性别不平等的影响
Pub Date : 2024-07-26 DOI: arxiv-2407.18783
Carolina Pradier, Diego Kozlowski, Natsumi S. Shokida, Vincent Larivière
The Latin-American scientific community has achieved significant progresstowards gender parity, with nearly equal representation of women and menscientists. Nevertheless, women continue to be underrepresented in scholarlycommunication. Throughout the 20th century, Latin America established itsacademic circuit, focusing on research topics of regional significance.However, the community has since reoriented its research towards the globalacademic circuit. Through an analysis of scientific publications, this articleexplores the relationship between gender inequalities in science and theintegration of Latin-American researchers into the regional and global academiccircuits between 1993 and 2022. We find that women are more likely to engage inthe regional circuit, while men are more active within the global circuit. Thistrend is attributed to a thematic alignment between women's research interestsand issues specific to Latin America. Furthermore, our results reveal that themechanisms contributing to gender differences in symbolic capital accumulationvary between circuits. Women's work achieves equal or greater recognitioncompared to men's within the regional circuit, but generally garners lessattention in the global circuit. Our findings suggest that policies aimed atstrengthening the regional academic circuit would encourage scientists toaddress locally relevant topics while simultaneously fostering gender equalityin science.
拉丁美洲科学界在实现性别均等方面取得了重大进展,女科学家和男科学家的比例几 乎相等。然而,妇女在学术交流中的代表性仍然不足。在整个 20 世纪,拉丁美洲建立了自己的学术圈,重点关注具有地区意义的研究课题。本文通过对科学出版物的分析,探讨了 1993 年至 2022 年间科学领域的性别不平等与拉美研究人员融入地区和全球学术圈之间的关系。我们发现,女性更有可能加入地区学术圈,而男性在全球学术圈中更为活跃。这一趋势归因于女性的研究兴趣与拉丁美洲特有问题之间的主题一致性。此外,我们的研究结果表明,导致象征性资本积累中性别差异的主题机制在不同线路之间存在差异。与男性相比,女性的工作在地区范围内获得了同等或更高的认可,但在全球范围内一般较少受到关注。我们的研究结果表明,旨在加强地区学术循环的政策将鼓励科学家解决与当地相关的问题,同时促进科学领域的性别平等。
{"title":"Science for whom? The influence of the regional academic circuit on gender inequalities in Latin America","authors":"Carolina Pradier, Diego Kozlowski, Natsumi S. Shokida, Vincent Larivière","doi":"arxiv-2407.18783","DOIUrl":"https://doi.org/arxiv-2407.18783","url":null,"abstract":"The Latin-American scientific community has achieved significant progress\u0000towards gender parity, with nearly equal representation of women and men\u0000scientists. Nevertheless, women continue to be underrepresented in scholarly\u0000communication. Throughout the 20th century, Latin America established its\u0000academic circuit, focusing on research topics of regional significance.\u0000However, the community has since reoriented its research towards the global\u0000academic circuit. Through an analysis of scientific publications, this article\u0000explores the relationship between gender inequalities in science and the\u0000integration of Latin-American researchers into the regional and global academic\u0000circuits between 1993 and 2022. We find that women are more likely to engage in\u0000the regional circuit, while men are more active within the global circuit. This\u0000trend is attributed to a thematic alignment between women's research interests\u0000and issues specific to Latin America. Furthermore, our results reveal that the\u0000mechanisms contributing to gender differences in symbolic capital accumulation\u0000vary between circuits. Women's work achieves equal or greater recognition\u0000compared to men's within the regional circuit, but generally garners less\u0000attention in the global circuit. Our findings suggest that policies aimed at\u0000strengthening the regional academic circuit would encourage scientists to\u0000address locally relevant topics while simultaneously fostering gender equality\u0000in science.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"212 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Process for Reviewing Design Science Research Papers to Enhance Content Knowledge & Research Opportunities 审查设计科学研究论文的流程,以增强内容知识和研究机会
Pub Date : 2024-07-24 DOI: arxiv-2408.07230
Kweku-Muata Osei-Bryson
Most published Information Systems research are of the behavioral scienceresearch (BSR) category rather than the design science research (DSR) category.This is due in part to the BSR orientation of many IS doctoral programs, whichoften do not involve much technical courses. This includes IS doctoral programsthat train Information and Communication Technologies for Development (ICT4D)researchers. Without such technical knowledge many doctoral and postdoctoralresearchers will not feel confident in engaging in DSR research. Given theimportance of designing artifacts that are appropriate for a given context, animportant question is how can ICT4D and other IS researchers increase their IStechnical content knowledge and intimacy with the DSR process. In this paper wepresent, a process for reviewing DSR papers that has as its objectives:enhancing technical content knowledge, increasing knowledge and understandingof approaches to designing and evaluating IS/IT artifacts, and facilitating theidentification of new DSR opportunities. This process has been applied for morethan a decade at a USA research university.
大多数已发表的信息系统研究都属于行为科学研究(BSR)范畴,而不是设计科学研究(DSR)范畴。这部分是由于许多信息系统博士课程都以 BSR 为导向,通常不涉及太多技术课程。这包括培养信息与传播技术促进发展(ICT4D)研究人员的 IS 博士课程。如果没有这些技术知识,许多博士和博士后研究人员将没有信心从事 DSR 研究。鉴于设计适合特定环境的人工制品的重要性,一个重要的问题是,ICT4D 和其他 IS 研究人员如何才能增加他们的 IS 技术内容知识,并提高他们对 DSR 过程的亲近感。在本文中,我们介绍了一种审查 DSR 论文的程序,其目标是:提高技术内容知识,增加对设计和评估 IS/IT 人工制品方法的了解和理解,以及促进发现新的 DSR 机会。美国一所研究型大学采用这一方法已有十多年的历史。
{"title":"A Process for Reviewing Design Science Research Papers to Enhance Content Knowledge & Research Opportunities","authors":"Kweku-Muata Osei-Bryson","doi":"arxiv-2408.07230","DOIUrl":"https://doi.org/arxiv-2408.07230","url":null,"abstract":"Most published Information Systems research are of the behavioral science\u0000research (BSR) category rather than the design science research (DSR) category.\u0000This is due in part to the BSR orientation of many IS doctoral programs, which\u0000often do not involve much technical courses. This includes IS doctoral programs\u0000that train Information and Communication Technologies for Development (ICT4D)\u0000researchers. Without such technical knowledge many doctoral and postdoctoral\u0000researchers will not feel confident in engaging in DSR research. Given the\u0000importance of designing artifacts that are appropriate for a given context, an\u0000important question is how can ICT4D and other IS researchers increase their IS\u0000technical content knowledge and intimacy with the DSR process. In this paper we\u0000present, a process for reviewing DSR papers that has as its objectives:\u0000enhancing technical content knowledge, increasing knowledge and understanding\u0000of approaches to designing and evaluating IS/IT artifacts, and facilitating the\u0000identification of new DSR opportunities. This process has been applied for more\u0000than a decade at a USA research university.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"425 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey Forest Diagram : Gain a Divergent Insight View on a Specific Research Topic 调查森林图 :获得关于特定研究课题的发散性洞察视图
Pub Date : 2024-07-24 DOI: arxiv-2407.17081
Jinghong Li, Wen Gu, Koichi Ota, Shinobu Hasegawa
With the exponential growth in the number of papers and the trend of AIresearch, the use of Generative AI for information retrieval andquestion-answering has become popular for conducting research surveys. However,novice researchers unfamiliar with a particular field may not significantlyimprove their efficiency in interacting with Generative AI because they havenot developed divergent thinking in that field. This study aims to develop anin-depth Survey Forest Diagram that guides novice researchers in divergentthinking about the research topic by indicating the citation clues amongmultiple papers, to help expand the survey perspective for novice researchers.
随着论文数量的指数级增长和人工智能研究的发展趋势,使用生成式人工智能进行信息检索和问题解答已成为开展研究调查的流行方式。然而,不熟悉特定领域的新手研究人员可能无法显著提高与生成式人工智能交互的效率,因为他们尚未在该领域形成发散思维。本研究旨在开发一种深度调查森林图(Survey Forest Diagram),通过指出多篇论文之间的引用线索,引导新手研究人员对研究主题进行发散性思考,从而帮助新手研究人员拓展调查视角。
{"title":"A Survey Forest Diagram : Gain a Divergent Insight View on a Specific Research Topic","authors":"Jinghong Li, Wen Gu, Koichi Ota, Shinobu Hasegawa","doi":"arxiv-2407.17081","DOIUrl":"https://doi.org/arxiv-2407.17081","url":null,"abstract":"With the exponential growth in the number of papers and the trend of AI\u0000research, the use of Generative AI for information retrieval and\u0000question-answering has become popular for conducting research surveys. However,\u0000novice researchers unfamiliar with a particular field may not significantly\u0000improve their efficiency in interacting with Generative AI because they have\u0000not developed divergent thinking in that field. This study aims to develop an\u0000in-depth Survey Forest Diagram that guides novice researchers in divergent\u0000thinking about the research topic by indicating the citation clues among\u0000multiple papers, to help expand the survey perspective for novice researchers.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating global article processing charges paid to six publishers for open access between 2019 and 2023 估算 2019 年至 2023 年向六家出版商支付的全球开放存取文章处理费
Pub Date : 2024-07-23 DOI: arxiv-2407.16551
Stefanie Haustein, Eric Schares, Juan Pablo Alperin, Madelaine Hare, Leigh-Ann Butler, Nina Schönfelder
This study presents estimates of the global expenditure on article processingcharges (APCs) paid to six publishers for open access between 2019 and 2023.APCs are fees charged for publishing in some fully open access journals (gold)and in subscription journals to make individual articles open access (hybrid).There is currently no way to systematically track institutional, national orglobal expenses for open access publishing due to a lack of transparency in APCprices, what articles they are paid for, or who pays them. We therefore curatedand used an open dataset of annual APC list prices from Elsevier, Frontiers,MDPI, PLOS, Springer Nature, and Wiley in combination with the number of openaccess articles from these publishers indexed by OpenAlex to estimate that,globally, a total of $8.349 billion ($8.968 billion in 2023 US dollars) werespent on APCs between 2019 and 2023. We estimate that in 2023 MDPI ($681.6million), Elsevier ($582.8 million) and Springer Nature ($546.6) generatedthe most revenue with APCs. After adjusting for inflation, we also show thatannual spending almost tripled from $910.3 million in 2019 to $2.538 billionin 2023, that hybrid exceed gold fees, and that the median APCs paid are higherthan the median listed fees for both gold and hybrid. Our approach addressesmajor limitations in previous efforts to estimate APCs paid and offers muchneeded insight into an otherwise opaque aspect of the business of scholarlypublishing. We call upon publishers to be more transparent about OA fees.
本研究估算了2019年至2023年间全球支付给六家出版商的开放存取文章处理费(APC)的支出。APC是在一些完全开放存取期刊(黄金期刊)上发表文章以及在订阅期刊上发表单篇文章以实现开放存取(混合期刊)所收取的费用。由于APC价格缺乏透明度,哪些文章需要支付APC,或由谁支付APC,目前还没有办法系统地跟踪机构、国家或全球在开放存取出版方面的支出。因此,我们整理并使用了爱思唯尔、Frontiers、MDPI、PLOS、施普林格-自然(Springer Nature)和威利(Wiley)的年度APC清单价格公开数据集,结合OpenAlex索引的这些出版商的开放获取文章数量,估算出2019年至2023年全球在APC上的支出总额为83.49亿美元(按2023年美元计算为89.68亿美元)。我们估计,2023 年,MDPI(6.816 亿美元)、Elsevier(5.828 亿美元)和 Springer Nature(5.466 亿美元)的 APCs 收入最多。在对通货膨胀进行调整后,我们还表明,每年的支出几乎增加了两倍,从 2019 年的(9.103 亿美元)增加到 2023 年的(25.38 亿美元),混合收费超过了黄金收费,而且支付的 APC 中位数高于黄金收费和混合收费的上市收费中位数。我们的方法解决了以往估算 APC 费用的主要局限性,并为学术出版业务的不透明方面提供了亟需的洞察力。我们呼吁出版商提高 OA 收费的透明度。
{"title":"Estimating global article processing charges paid to six publishers for open access between 2019 and 2023","authors":"Stefanie Haustein, Eric Schares, Juan Pablo Alperin, Madelaine Hare, Leigh-Ann Butler, Nina Schönfelder","doi":"arxiv-2407.16551","DOIUrl":"https://doi.org/arxiv-2407.16551","url":null,"abstract":"This study presents estimates of the global expenditure on article processing\u0000charges (APCs) paid to six publishers for open access between 2019 and 2023.\u0000APCs are fees charged for publishing in some fully open access journals (gold)\u0000and in subscription journals to make individual articles open access (hybrid).\u0000There is currently no way to systematically track institutional, national or\u0000global expenses for open access publishing due to a lack of transparency in APC\u0000prices, what articles they are paid for, or who pays them. We therefore curated\u0000and used an open dataset of annual APC list prices from Elsevier, Frontiers,\u0000MDPI, PLOS, Springer Nature, and Wiley in combination with the number of open\u0000access articles from these publishers indexed by OpenAlex to estimate that,\u0000globally, a total of $8.349 billion ($8.968 billion in 2023 US dollars) were\u0000spent on APCs between 2019 and 2023. We estimate that in 2023 MDPI ($681.6\u0000million), Elsevier ($582.8 million) and Springer Nature ($546.6) generated\u0000the most revenue with APCs. After adjusting for inflation, we also show that\u0000annual spending almost tripled from $910.3 million in 2019 to $2.538 billion\u0000in 2023, that hybrid exceed gold fees, and that the median APCs paid are higher\u0000than the median listed fees for both gold and hybrid. Our approach addresses\u0000major limitations in previous efforts to estimate APCs paid and offers much\u0000needed insight into an otherwise opaque aspect of the business of scholarly\u0000publishing. We call upon publishers to be more transparent about OA fees.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ILiAD: An Interactive Corpus for Linguistic Annotated Data from Twitter Posts ILiAD:从 Twitter 帖子中获取语言注释数据的交互式语料库
Pub Date : 2024-07-22 DOI: arxiv-2407.15374
Simon Gonzalez
Social Media platforms have offered invaluable opportunities for linguisticresearch. The availability of up-to-date data, coming from any part in theworld, and coming from natural contexts, has allowed researchers to studylanguage in real time. One of the fields that has made great use of socialmedia platforms is Corpus Linguistics. There is currently a wide range ofprojects which have been able to successfully create corpora from social media.In this paper, we present the development and deployment of a linguistic corpusfrom Twitter posts in English, coming from 26 news agencies and 27 individuals.The main goal was to create a fully annotated English corpus for linguisticanalysis. We include information on morphology and syntax, as well as NLPfeatures such as tokenization, lemmas, and n- grams. The information ispresented through a range of powerful visualisations for users to explorelinguistic patterns in the corpus. With this tool, we aim to contribute to thearea of language technologies applied to linguistic research.
社交媒体平台为语言研究提供了宝贵的机会。来自世界任何地方和自然语境的最新数据使研究人员能够实时研究语言。语料库语言学是充分利用社交媒体平台的领域之一。在本文中,我们介绍了从 26 家新闻机构和 27 位个人的 Twitter 英语帖子中开发和部署语言语料库的情况。我们的主要目标是创建用于语言分析的全注释英语语料库,其中包括词法和句法信息,以及标记化、词组和 n- grams 等 NLP 特征。这些信息通过一系列功能强大的可视化展示出来,供用户探索语料库中的语言模式。通过这一工具,我们希望为语言技术应用于语言学研究领域做出贡献。
{"title":"ILiAD: An Interactive Corpus for Linguistic Annotated Data from Twitter Posts","authors":"Simon Gonzalez","doi":"arxiv-2407.15374","DOIUrl":"https://doi.org/arxiv-2407.15374","url":null,"abstract":"Social Media platforms have offered invaluable opportunities for linguistic\u0000research. The availability of up-to-date data, coming from any part in the\u0000world, and coming from natural contexts, has allowed researchers to study\u0000language in real time. One of the fields that has made great use of social\u0000media platforms is Corpus Linguistics. There is currently a wide range of\u0000projects which have been able to successfully create corpora from social media.\u0000In this paper, we present the development and deployment of a linguistic corpus\u0000from Twitter posts in English, coming from 26 news agencies and 27 individuals.\u0000The main goal was to create a fully annotated English corpus for linguistic\u0000analysis. We include information on morphology and syntax, as well as NLP\u0000features such as tokenization, lemmas, and n- grams. The information is\u0000presented through a range of powerful visualisations for users to explore\u0000linguistic patterns in the corpus. With this tool, we aim to contribute to the\u0000area of language technologies applied to linguistic research.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"430 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Network Analysis Approach to Conlang Research Literature 对方言研究文献进行网络分析的方法
Pub Date : 2024-07-22 DOI: arxiv-2407.15370
Simon Gonzalez
The field of conlang has evidenced an important growth in the last decades.This has been the product of a wide interest in the use and study of conlangsfor artistic purposes. However, one important question is what it is happeningwith conlang in the academic world. This paper aims to have an overallunderstanding of the literature on conlang research. With this we aim to give arealistic picture of the field in present days. We have implemented acomputational linguistic approach, combining bibliometrics and network analysisto examine all publications available in the Scopus database. Analysing over2300 academic publications since 1927 until 2022, we have found that Esperantois by far the most documented conlang. Three main authors have contributed tothis: Garv'ia R., Fiedler S., and Blanke D. The 1970s and 1980s have been thedecades where the foundations of current research have been built. In terms ofmethodologies, language learning and experimental linguistics are the onescontributing to most to the preferred approaches of study in the field. Wepresent the results and discuss our limitations and future work.
过去几十年来,语言学领域取得了长足的发展,这是因为人们对为艺术目的使用和研究语言产生了广泛的兴趣。然而,一个重要的问题是,在学术界,康朗语究竟发生了什么。本文旨在全面了解有关 conlang 研究的文献。因此,我们希望对这一领域的现状进行分析。我们采用计算语言学方法,结合文献计量学和网络分析,对 Scopus 数据库中的所有出版物进行了研究。通过分析自 1927 年至 2022 年发表的 2300 多篇学术论文,我们发现世界语是迄今为止记载最多的语言。三位主要作者对此做出了贡献:20 世纪 70 年代和 80 年代是当前研究奠定基础的年代。在方法论方面,语言学习和实验语言学对该领域的首选研究方法贡献最大。我们将介绍研究结果,并讨论我们的局限性和未来的工作。
{"title":"A Network Analysis Approach to Conlang Research Literature","authors":"Simon Gonzalez","doi":"arxiv-2407.15370","DOIUrl":"https://doi.org/arxiv-2407.15370","url":null,"abstract":"The field of conlang has evidenced an important growth in the last decades.\u0000This has been the product of a wide interest in the use and study of conlangs\u0000for artistic purposes. However, one important question is what it is happening\u0000with conlang in the academic world. This paper aims to have an overall\u0000understanding of the literature on conlang research. With this we aim to give a\u0000realistic picture of the field in present days. We have implemented a\u0000computational linguistic approach, combining bibliometrics and network analysis\u0000to examine all publications available in the Scopus database. Analysing over\u00002300 academic publications since 1927 until 2022, we have found that Esperanto\u0000is by far the most documented conlang. Three main authors have contributed to\u0000this: Garv'ia R., Fiedler S., and Blanke D. The 1970s and 1980s have been the\u0000decades where the foundations of current research have been built. In terms of\u0000methodologies, language learning and experimental linguistics are the ones\u0000contributing to most to the preferred approaches of study in the field. We\u0000present the results and discuss our limitations and future work.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"429 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Citation Text Generation: Overcoming Limitations in Length Control 改进引文文本生成:克服篇幅控制的局限性
Pub Date : 2024-07-20 DOI: arxiv-2407.14997
Biswadip Mandal, Xiangci Li, Jessica Ouyang
A key challenge in citation text generation is that the length of generatedtext often differs from the length of the target, lowering the quality of thegeneration. While prior works have investigated length-controlled generation,their effectiveness depends on knowing the appropriate generation length. Inthis work, we present an in-depth study of the limitations of predictingscientific citation text length and explore the use of heuristic estimates ofdesired length.
引文文本生成的一个主要挑战是,生成文本的长度往往与目标文本的长度不同,从而降低了生成质量。虽然之前的工作已经研究了长度控制生成,但其有效性取决于是否知道合适的生成长度。在这项工作中,我们深入研究了预测科学引文文本长度的局限性,并探索了使用启发式估计期望长度的方法。
{"title":"Improving Citation Text Generation: Overcoming Limitations in Length Control","authors":"Biswadip Mandal, Xiangci Li, Jessica Ouyang","doi":"arxiv-2407.14997","DOIUrl":"https://doi.org/arxiv-2407.14997","url":null,"abstract":"A key challenge in citation text generation is that the length of generated\u0000text often differs from the length of the target, lowering the quality of the\u0000generation. While prior works have investigated length-controlled generation,\u0000their effectiveness depends on knowing the appropriate generation length. In\u0000this work, we present an in-depth study of the limitations of predicting\u0000scientific citation text length and explore the use of heuristic estimates of\u0000desired length.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLAssist: Simple Tools for Automating Literature Review Using Large Language Models LLAssist:使用大型语言模型自动进行文献综述的简单工具
Pub Date : 2024-07-19 DOI: arxiv-2407.13993
Christoforus Yoga Haryanto
This paper introduces LLAssist, an open-source tool designed to streamlineliterature reviews in academic research. In an era of exponential growth inscientific publications, researchers face mounting challenges in efficientlyprocessing vast volumes of literature. LLAssist addresses this issue byleveraging Large Language Models (LLMs) and Natural Language Processing (NLP)techniques to automate key aspects of the review process. Specifically, itextracts important information from research articles and evaluates theirrelevance to user-defined research questions. The goal of LLAssist is tosignificantly reduce the time and effort required for comprehensive literaturereviews, allowing researchers to focus more on analyzing and synthesizinginformation rather than on initial screening tasks. By automating parts of theliterature review workflow, LLAssist aims to help researchers manage thegrowing volume of academic publications more efficiently.
本文介绍的 LLAssist 是一款开源工具,旨在简化学术研究中的文献综述。在科学出版物呈指数增长的时代,研究人员在高效处理海量文献方面面临着越来越多的挑战。LLAssist 利用大型语言模型 (LLM) 和自然语言处理 (NLP) 技术,将审稿过程的关键环节自动化,从而解决了这一问题。具体来说,它可以从研究文章中提取重要信息,并评估其与用户定义的研究问题的相关性。LLAssist 的目标是大幅减少综合文献综述所需的时间和精力,让研究人员能够将更多精力放在分析和综合信息上,而不是初步筛选任务上。通过将部分文献综述工作流程自动化,LLAssist 旨在帮助研究人员更高效地管理日益增多的学术出版物。
{"title":"LLAssist: Simple Tools for Automating Literature Review Using Large Language Models","authors":"Christoforus Yoga Haryanto","doi":"arxiv-2407.13993","DOIUrl":"https://doi.org/arxiv-2407.13993","url":null,"abstract":"This paper introduces LLAssist, an open-source tool designed to streamline\u0000literature reviews in academic research. In an era of exponential growth in\u0000scientific publications, researchers face mounting challenges in efficiently\u0000processing vast volumes of literature. LLAssist addresses this issue by\u0000leveraging Large Language Models (LLMs) and Natural Language Processing (NLP)\u0000techniques to automate key aspects of the review process. Specifically, it\u0000extracts important information from research articles and evaluates their\u0000relevance to user-defined research questions. The goal of LLAssist is to\u0000significantly reduce the time and effort required for comprehensive literature\u0000reviews, allowing researchers to focus more on analyzing and synthesizing\u0000information rather than on initial screening tasks. By automating parts of the\u0000literature review workflow, LLAssist aims to help researchers manage the\u0000growing volume of academic publications more efficiently.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Productivity profile of CNPq scholarship researchers in computer science from 2017 to 2021 2017 至 2021 年 CNPq 计算机科学奖学金研究人员的生产力概况
Pub Date : 2024-07-19 DOI: arxiv-2407.14690
Marcelo Keese Albertini, André Ricardo Backes
Productivity in Research (PQ) is a scholarship granted by CNPq (BrazilianNational Council for Scientific and Technological Development). Thisscholarship aims to recognize a few selected faculty researchers for theirscientific production, outstanding technology and innovation in theirrespective areas of knowledge. In the present study, we evaluated thescientific production of the 185 researchers in the Computer Science areagranted with PQ scholarship in the last PQ selection notice. To evaluate theproductivity of each professor, we considered papers published in scientificjournals and conferences (complete works) in a five years period (from 2017 to2021). We analyzed the productivity in terms of both quantity and quality. Wealso evaluated its distribution over the country, universities and researchfacilities, as well as, the co-authorship network produced.
科研生产率(PQ)是由 CNPq(巴西国家科技发展委员会)颁发的一项奖学金。该奖学金旨在表彰在其各自知识领域中科研成果、杰出技术和创新能力突出的少数研究人员。在本研究中,我们对计算机科学系在上一次 PQ 评选中获得 PQ 奖学金的 185 名研究人员的科研成果进行了评估。为了评估每位教授的生产力,我们考虑了五年内(从 2017 年到 2021 年)在科学杂志和会议上发表的论文(完整作品)。我们从数量和质量两个方面分析了生产力。我们还评估了其在国家、大学和研究机构的分布情况,以及所产生的合著网络。
{"title":"Productivity profile of CNPq scholarship researchers in computer science from 2017 to 2021","authors":"Marcelo Keese Albertini, André Ricardo Backes","doi":"arxiv-2407.14690","DOIUrl":"https://doi.org/arxiv-2407.14690","url":null,"abstract":"Productivity in Research (PQ) is a scholarship granted by CNPq (Brazilian\u0000National Council for Scientific and Technological Development). This\u0000scholarship aims to recognize a few selected faculty researchers for their\u0000scientific production, outstanding technology and innovation in their\u0000respective areas of knowledge. In the present study, we evaluated the\u0000scientific production of the 185 researchers in the Computer Science area\u0000granted with PQ scholarship in the last PQ selection notice. To evaluate the\u0000productivity of each professor, we considered papers published in scientific\u0000journals and conferences (complete works) in a five years period (from 2017 to\u00002021). We analyzed the productivity in terms of both quantity and quality. We\u0000also evaluated its distribution over the country, universities and research\u0000facilities, as well as, the co-authorship network produced.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis 解码人工智能与人类作者身份:通过 NLP 和统计分析揭示细微差别
Pub Date : 2024-07-15 DOI: arxiv-2408.00769
Mayowa Akinwande, Oluwaseyi Adeliyi, Toyyibat Yussuph
This research explores the nuanced differences in texts produced by AI andthose written by humans, aiming to elucidate how language is expresseddifferently by AI and humans. Through comprehensive statistical data analysis,the study investigates various linguistic traits, patterns of creativity, andpotential biases inherent in human-written and AI- generated texts. Thesignificance of this research lies in its contribution to understanding AI'screative capabilities and its impact on literature, communication, and societalframeworks. By examining a meticulously curated dataset comprising 500K essaysspanning diverse topics and genres, generated by LLMs, or written by humans,the study uncovers the deeper layers of linguistic expression and providesinsights into the cognitive processes underlying both AI and human-driventextual compositions. The analysis revealed that human-authored essays tend tohave a higher total word count on average than AI-generated essays but have ashorter average word length compared to AI- generated essays, and while bothgroups exhibit high levels of fluency, the vocabulary diversity of Humanauthored content is higher than AI generated content. However, AI- generatedessays show a slightly higher level of novelty, suggesting the potential forgenerating more original content through AI systems. The paper addresseschallenges in assessing the language generation capabilities of AI models andemphasizes the importance of datasets that reflect the complexities of human-AIcollaborative writing. Through systematic preprocessing and rigorousstatistical analysis, this study offers valuable insights into the evolvinglandscape of AI-generated content and informs future developments in naturallanguage processing (NLP).
本研究探讨了人工智能生成的文本与人类撰写的文本之间的细微差别,旨在阐明人工智能和人类是如何以不同的方式表达语言的。通过全面的统计数据分析,本研究调查了人类撰写的文本和人工智能生成的文本中固有的各种语言特征、创造性模式和潜在偏见。这项研究的意义在于,它有助于理解人工智能的创造能力及其对文学、交流和社会框架的影响。该研究通过检查一个精心策划的数据集,其中包括 500K 篇由 LLM 生成或由人类撰写的不同主题和体裁的论文,揭示了语言表达的深层含义,并提供了对人工智能和人类文本创作的认知过程的见解。分析表明,人类撰写的文章平均总字数往往高于人工智能生成的文章,但平均字长却短于人工智能生成的文章;虽然两组文章都表现出较高的流畅性,但人类撰写的内容的词汇多样性却高于人工智能生成的内容。不过,人工智能生成的文章显示出稍高的新颖性,这表明人工智能系统有可能生成更多原创内容。本文探讨了评估人工智能模型语言生成能力的挑战,并强调了反映人类-人工智能协作写作复杂性的数据集的重要性。通过系统的预处理和严格的统计分析,本研究为了解人工智能生成内容的演变过程提供了宝贵的见解,并为自然语言处理(NLP)的未来发展提供了参考。
{"title":"Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis","authors":"Mayowa Akinwande, Oluwaseyi Adeliyi, Toyyibat Yussuph","doi":"arxiv-2408.00769","DOIUrl":"https://doi.org/arxiv-2408.00769","url":null,"abstract":"This research explores the nuanced differences in texts produced by AI and\u0000those written by humans, aiming to elucidate how language is expressed\u0000differently by AI and humans. Through comprehensive statistical data analysis,\u0000the study investigates various linguistic traits, patterns of creativity, and\u0000potential biases inherent in human-written and AI- generated texts. The\u0000significance of this research lies in its contribution to understanding AI's\u0000creative capabilities and its impact on literature, communication, and societal\u0000frameworks. By examining a meticulously curated dataset comprising 500K essays\u0000spanning diverse topics and genres, generated by LLMs, or written by humans,\u0000the study uncovers the deeper layers of linguistic expression and provides\u0000insights into the cognitive processes underlying both AI and human-driven\u0000textual compositions. The analysis revealed that human-authored essays tend to\u0000have a higher total word count on average than AI-generated essays but have a\u0000shorter average word length compared to AI- generated essays, and while both\u0000groups exhibit high levels of fluency, the vocabulary diversity of Human\u0000authored content is higher than AI generated content. However, AI- generated\u0000essays show a slightly higher level of novelty, suggesting the potential for\u0000generating more original content through AI systems. The paper addresses\u0000challenges in assessing the language generation capabilities of AI models and\u0000emphasizes the importance of datasets that reflect the complexities of human-AI\u0000collaborative writing. Through systematic preprocessing and rigorous\u0000statistical analysis, this study offers valuable insights into the evolving\u0000landscape of AI-generated content and informs future developments in natural\u0000language processing (NLP).","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Digital Libraries
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1