Bharathi Raja Chakravarthi, R. Priyadharshini, V. Muralidaran, Shardul Suryawanshi, Navya Jose, E. Sherly, John P. McCrae
Sentiment analysis of Dravidian languages has received attention in recent years. However, most social media text is code-mixed and there is no research available on sentiment analysis of code-mixed Dravidian languages. The Dravidian-CodeMix-FIRE 2020, a track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text, focused on creating a platform for researchers to come together and investigate the problem. There were two languages for this track: (i) Tamil, and (ii) Malayalam. The participants were given a dataset of YouTube comments and the goal of the shared task submissions was to recognise the sentiment of each comment by classifying them into positive, negative, neutral, mixed-feeling classes or by recognising whether the comment is not in the intended language. The performance of the systems was evaluated by weighted-F1 score.
德拉威语的情感分析近年来备受关注。然而,大多数社交媒体文本是代码混合的,没有关于代码混合的德拉威语情感分析的研究。德拉威语- codemix - fire 2020是一篇关于德拉威语在代码混合文本中的情感分析的文章,专注于为研究人员创建一个平台,让他们聚集在一起调查这个问题。这条赛道有两种语言:(i)泰米尔语和(ii)马拉雅拉姆语。参与者得到了一个YouTube评论的数据集,提交共享任务的目标是通过将每个评论分为积极、消极、中立、混合情绪类,或者通过识别评论是否使用预期语言来识别每个评论的情绪。采用f1加权评分对系统的性能进行评价。
{"title":"Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text","authors":"Bharathi Raja Chakravarthi, R. Priyadharshini, V. Muralidaran, Shardul Suryawanshi, Navya Jose, E. Sherly, John P. McCrae","doi":"10.1145/3441501.3441515","DOIUrl":"https://doi.org/10.1145/3441501.3441515","url":null,"abstract":"Sentiment analysis of Dravidian languages has received attention in recent years. However, most social media text is code-mixed and there is no research available on sentiment analysis of code-mixed Dravidian languages. The Dravidian-CodeMix-FIRE 2020, a track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text, focused on creating a platform for researchers to come together and investigate the problem. There were two languages for this track: (i) Tamil, and (ii) Malayalam. The participants were given a dataset of YouTube comments and the goal of the shared task submissions was to recognise the sentiment of each comment by classifying them into positive, negative, neutral, mixed-feeling classes or by recognising whether the comment is not in the intended language. The performance of the systems was evaluated by weighted-F1 score.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":" 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133120579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debasis Ganguly, Dipasree Pal, Manisha Verma, Procheta Sen
This paper describes an overview of the track - ’Retrieval from Conversational Dialogues’ (RCD) organized as a part of Forum of Information Retrieval and Evaluation (FIRE), 2020. The motivation of the track is to develop a dataset towards a controlled and reproducible laboratory based experimental setup for investigating the effectiveness if conversational assistance systems. Specifically, the manner of conversational assistance which this track addresses is contextualization of certain concepts within the content either written (e.g. a chat system) or uttered (e.g. in an audio or video communication) by a user about which the other users participating in the communication are not well versed. To study the problem under a laboratory-based reproducible setting, we took a collection of four movie scripts and manually annotated spans of text that may require contextualization. The two tasks involved in RCD track are: a) Task-1:, where participants were required to estimate the annotated span of text likely to be benefited by contextualization from a given sequence of dialogue based interactions from the script; and b) Task-2:, which involved retrieving a ranked list of documents corresponding to the concepts requiring contextualization. To evaluate the effectiveness of Task-1, we used i) a character n-gram based variant of the BLEU score, and ii) bag-of-words based Jaccard coefficient to measure the overlap between the manually annotated ground-truth and the automatically extracted text spans at two different granularity levels of character and word matches, respectively. To evaluate the effectiveness of the retrieved documents for Task-2, we employed two standard precision-oriented information retrieval (IR) metrics, namely precision at top-5 ranks (P@5) and mean reciprocal rank (MRR), along with a both precision and recall oriented metric, namely the mean average precision (MAP). We received a total of 5 submissions from a single participating team for both the tasks. A general trend from the submitted runs is that statistical-based unsupervised approaches of term extraction and summarization from movie scripts turned out to be more effective for both the tasks (i.e. query identification and retrieval) than supervised approaches, such as pre-trained transformer (BERT) based ones.
{"title":"Overview of RCD-2020, the FIRE-2020 track on Retrieval from Conversational Dialogues","authors":"Debasis Ganguly, Dipasree Pal, Manisha Verma, Procheta Sen","doi":"10.1145/3441501.3441518","DOIUrl":"https://doi.org/10.1145/3441501.3441518","url":null,"abstract":"This paper describes an overview of the track - ’Retrieval from Conversational Dialogues’ (RCD) organized as a part of Forum of Information Retrieval and Evaluation (FIRE), 2020. The motivation of the track is to develop a dataset towards a controlled and reproducible laboratory based experimental setup for investigating the effectiveness if conversational assistance systems. Specifically, the manner of conversational assistance which this track addresses is contextualization of certain concepts within the content either written (e.g. a chat system) or uttered (e.g. in an audio or video communication) by a user about which the other users participating in the communication are not well versed. To study the problem under a laboratory-based reproducible setting, we took a collection of four movie scripts and manually annotated spans of text that may require contextualization. The two tasks involved in RCD track are: a) Task-1:, where participants were required to estimate the annotated span of text likely to be benefited by contextualization from a given sequence of dialogue based interactions from the script; and b) Task-2:, which involved retrieving a ranked list of documents corresponding to the concepts requiring contextualization. To evaluate the effectiveness of Task-1, we used i) a character n-gram based variant of the BLEU score, and ii) bag-of-words based Jaccard coefficient to measure the overlap between the manually annotated ground-truth and the automatically extracted text spans at two different granularity levels of character and word matches, respectively. To evaluate the effectiveness of the retrieved documents for Task-2, we employed two standard precision-oriented information retrieval (IR) metrics, namely precision at top-5 ranks (P@5) and mean reciprocal rank (MRR), along with a both precision and recall oriented metric, namely the mean average precision (MAP). We received a total of 5 submissions from a single participating team for both the tasks. A general trend from the submitted runs is that statistical-based unsupervised approaches of term extraction and summarization from movie scripts turned out to be more effective for both the tasks (i.e. query identification and retrieval) than supervised approaches, such as pre-trained transformer (BERT) based ones.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121225133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A recommendation system is a set of programs that utilize different methodologies for relevant item selection for the user. In recent years deep neural networks have been used heavily for improving recommendation quality in every domain. We describe a model for music recommendation system that uses the BERT (Bidirectional Encoder Representations from Transformers) model. In the past, other deep neural networks have been used for music recommendation, which capture the the unidirectional sequential nature of a user’s data. Unlike other sequential techniques of recommendation, BERT uses bidirectional training of a user’s sequence for better recommendation. BERT uses the encoder part of the Transformer model, which uses an attention mechanism to learn contextual relations between a user’s past interactions. The proposed model relies on a user’s previous interaction to determine the bidirectional encoding for the model, which considers both the left and the right contexts. We evaluated our model with a baseline deep sequential model using two different datasets, and comparative results show that the model outperforms other sequential models.
{"title":"Bi-directional Encoder Representation of Transformer model for Sequential Music Recommender System","authors":"Naina Yadav, Anil Kumar Singh","doi":"10.1145/3441501.3441503","DOIUrl":"https://doi.org/10.1145/3441501.3441503","url":null,"abstract":"A recommendation system is a set of programs that utilize different methodologies for relevant item selection for the user. In recent years deep neural networks have been used heavily for improving recommendation quality in every domain. We describe a model for music recommendation system that uses the BERT (Bidirectional Encoder Representations from Transformers) model. In the past, other deep neural networks have been used for music recommendation, which capture the the unidirectional sequential nature of a user’s data. Unlike other sequential techniques of recommendation, BERT uses bidirectional training of a user’s sequence for better recommendation. BERT uses the encoder part of the Transformer model, which uses an attention mechanism to learn contextual relations between a user’s past interactions. The proposed model relies on a user’s previous interaction to determine the bidirectional encoding for the model, which considers both the left and the right contexts. We evaluated our model with a baseline deep sequential model using two different datasets, and comparative results show that the model outperforms other sequential models.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115075552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Datta, Debasis Ganguly, Dwaipayan Roy, Derek Greene, Charles Jochim, Francesca Bonin
This paper describes an overview of the findings of the track named ‘Causality-driven Ad hoc Information Retrieval’ (abbv. CAIR) at the Forum for Information Retrieval Evaluation (FIRE) 2020. The purpose of the track was to investigate how effectively can search systems retrieve documents that are causally related to a specified query event. Different from standard information retrieval (IR), the criteria of relevance in this search scenario is stricter in the sense that the retrieved documents at the top ranks should provide information on the potentially relevant causes that might have caused a given query event, e.g. retrieve documents on political situations that might have led to ‘Brexit’. We released a dataset comprised of a set of 25 queries split into train and test sets. We received submissions from two participating groups. The two main observations from the best performing runs from the two participating groups are that longer queries showed a general trend to yield more causally relevant documents towards top ranks as seen from the results obtained from the first participating group, whereas it turned out that sequence-based text representation for semantically matching the documents with queries did not yield effective retrieval results, thus leaving the scope to develop supervised or semi-supervised methods to address causality-based retrieval.
{"title":"Overview of the Causality-driven Adhoc Information Retrieval (CAIR) task at FIRE-2020","authors":"S. Datta, Debasis Ganguly, Dwaipayan Roy, Derek Greene, Charles Jochim, Francesca Bonin","doi":"10.1145/3441501.3441513","DOIUrl":"https://doi.org/10.1145/3441501.3441513","url":null,"abstract":"This paper describes an overview of the findings of the track named ‘Causality-driven Ad hoc Information Retrieval’ (abbv. CAIR) at the Forum for Information Retrieval Evaluation (FIRE) 2020. The purpose of the track was to investigate how effectively can search systems retrieve documents that are causally related to a specified query event. Different from standard information retrieval (IR), the criteria of relevance in this search scenario is stricter in the sense that the retrieved documents at the top ranks should provide information on the potentially relevant causes that might have caused a given query event, e.g. retrieve documents on political situations that might have led to ‘Brexit’. We released a dataset comprised of a set of 25 queries split into train and test sets. We received submissions from two participating groups. The two main observations from the best performing runs from the two participating groups are that longer queries showed a general trend to yield more causally relevant documents towards top ranks as seen from the results obtained from the first participating group, whereas it turned out that sequence-based text representation for semantically matching the documents with queries did not yield effective retrieval results, thus leaving the scope to develop supervised or semi-supervised methods to address causality-based retrieval.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130097659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","authors":"","doi":"10.1145/3441501","DOIUrl":"https://doi.org/10.1145/3441501","url":null,"abstract":"","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122920417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bhargav Dave, Surupendu Gangopadhyay, Prasenjit Majumder, P. Bhattacharya, S. Sarkar, S. L. Devi
The goal of FIRE 2020 EDNIL track was to create a framework which could be used to detect events from news articles in English, Hindi, Bengali, Marathi and Tamil. The track consisted of two tasks: (i) Identifying a piece of text from news articles that contains an event (Event Identification). (ii) Creating an event frame from the news article (Event Frame Extraction). The events that were identified in Event Identification task were Man-made Disaster and Natural Disaster. In Event Frame Extraction task the event frame consists of Event type, Casualties, Time, Place, Reason.
FIRE 2020 EDNIL的目标是创建一个框架,可用于从英语、印地语、孟加拉语、马拉地语和泰米尔语的新闻文章中检测事件。该轨道包括两项任务:(i)从包含事件的新闻文章中识别一段文本(事件识别)。(ii)从新闻文章中创建事件框架(事件框架提取)。在事件识别任务中识别的事件是人为灾害和自然灾害。在事件框架提取任务中,事件框架由事件类型、人员伤亡、时间、地点和原因组成。
{"title":"FIRE 2020 EDNIL Track: Event Detection from News in Indian Languages","authors":"Bhargav Dave, Surupendu Gangopadhyay, Prasenjit Majumder, P. Bhattacharya, S. Sarkar, S. L. Devi","doi":"10.1145/3441501.3441516","DOIUrl":"https://doi.org/10.1145/3441501.3441516","url":null,"abstract":"The goal of FIRE 2020 EDNIL track was to create a framework which could be used to detect events from news articles in English, Hindi, Bengali, Marathi and Tamil. The track consisted of two tasks: (i) Identifying a piece of text from news articles that contains an event (Event Identification). (ii) Creating an event frame from the news article (Event Frame Extraction). The events that were identified in Event Identification task were Man-made Disaster and Natural Disaster. In Event Frame Extraction task the event frame consists of Event type, Casualties, Time, Place, Reason.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124833223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}