Huixin Zhan, Kun Zhang, Chenyi Hu, Victor S. Sheng
For decades, research in natural language processing (NLP) has focused on summarization. Sequence-to-sequence models for abstractive summarization have been studied extensively, yet generated summaries commonly suffer from fabricated content, and are often found to be near-extractive. We argue that, to address these issues, summarizers need to acquire the co-references that form multiple types of relations over input sentences, e.g., 1-to-N, N-to-1, and N-to-N relations, since the structured knowledge for text usually appears on these relations. By allowing the decoder to pay different attention to the input sentences for the same entity at different generation states, the structured graph representations generate more informative summaries. In this paper, we propose a hierarchical graph attention networks (HGATs) for abstractive summarization with a topic-sensitive PageRank augmented graph. Specifically, we utilize dual decoders, a sequential sentence decoder, and a graph-structured decoder (which are built hierarchically) to maintain the global context and local characteristics of entities, complementing each other. We further design a greedy heuristic to extract salient users' comments while avoiding redundancy to drive a model to better capture entity interactions. Our experimental results show that our models produce significantly higher ROUGE scores than variants without graph-based attention on both SSECIF and CNN/Daily Mail (CNN/DM) datasets.
几十年来,自然语言处理(NLP)的研究一直集中在摘要上。用于抽象摘要的序列到序列模型已经得到了广泛的研究,但是生成的摘要通常受到虚构内容的影响,并且经常被发现是近乎提取的。我们认为,为了解决这些问题,摘要器需要获取在输入句子上形成多种类型关系的共同引用,例如1对n、n对1和n对n关系,因为文本的结构化知识通常出现在这些关系上。通过允许解码器在不同的生成状态下对同一实体的输入句子给予不同的关注,结构化图表示生成了更多信息丰富的摘要。在本文中,我们提出了一种基于主题敏感的PageRank增强图的抽象摘要层次图注意网络(HGATs)。具体来说,我们使用双解码器、顺序句子解码器和图结构解码器(分层构建)来维护实体的全局上下文和局部特征,相互补充。我们进一步设计了一个贪婪启发式算法来提取显著的用户评论,同时避免冗余,以驱动模型更好地捕获实体交互。我们的实验结果表明,我们的模型在SSECIF和CNN/Daily Mail (CNN/DM)数据集上产生的ROUGE分数明显高于没有基于图的关注的变体。
{"title":"HGATs: hierarchical graph attention networks for multiple comments integration","authors":"Huixin Zhan, Kun Zhang, Chenyi Hu, Victor S. Sheng","doi":"10.1145/3487351.3488322","DOIUrl":"https://doi.org/10.1145/3487351.3488322","url":null,"abstract":"For decades, research in natural language processing (NLP) has focused on summarization. Sequence-to-sequence models for abstractive summarization have been studied extensively, yet generated summaries commonly suffer from fabricated content, and are often found to be near-extractive. We argue that, to address these issues, summarizers need to acquire the co-references that form multiple types of relations over input sentences, e.g., 1-to-N, N-to-1, and N-to-N relations, since the structured knowledge for text usually appears on these relations. By allowing the decoder to pay different attention to the input sentences for the same entity at different generation states, the structured graph representations generate more informative summaries. In this paper, we propose a hierarchical graph attention networks (HGATs) for abstractive summarization with a topic-sensitive PageRank augmented graph. Specifically, we utilize dual decoders, a sequential sentence decoder, and a graph-structured decoder (which are built hierarchically) to maintain the global context and local characteristics of entities, complementing each other. We further design a greedy heuristic to extract salient users' comments while avoiding redundancy to drive a model to better capture entity interactions. Our experimental results show that our models produce significantly higher ROUGE scores than variants without graph-based attention on both SSECIF and CNN/Daily Mail (CNN/DM) datasets.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124442943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Techniques to hide a community from community detection algorithms are emerging as a new way to protect the privacy of users. Existing techniques either adapt optimization criteria derived from community detection (e.g., minimizing instead of maximizing modularity) or define new ones (e.g., community safeness) to identify a set of updates (e.g., edge addition/deletions) that deceive community detection algorithms from recovering the original structure of a target community C. However, all existing approaches do not take into account the fact that network's edges can be weighted to take into account node similarity or relation strength. The goal of this paper is to present SECRETORUM, a novel community deception approach for community deception in weighted networks.
{"title":"Community deception in weighted networks","authors":"Valeria Fionda, G. Pirrò","doi":"10.1145/3487351.3488337","DOIUrl":"https://doi.org/10.1145/3487351.3488337","url":null,"abstract":"Techniques to hide a community from community detection algorithms are emerging as a new way to protect the privacy of users. Existing techniques either adapt optimization criteria derived from community detection (e.g., minimizing instead of maximizing modularity) or define new ones (e.g., community safeness) to identify a set of updates (e.g., edge addition/deletions) that deceive community detection algorithms from recovering the original structure of a target community C. However, all existing approaches do not take into account the fact that network's edges can be weighted to take into account node similarity or relation strength. The goal of this paper is to present SECRETORUM, a novel community deception approach for community deception in weighted networks.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123657359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johan Fernquist, Björn Pelzer, Lukas Lundmark, Lisa Kaati, F. Johansson
In this paper we introduce a new type of handcrafted textual features called stylometric traits, used to create a stylistic writeprint of an author's writing style. These can be divided into four categories: (i) word variations, (ii) abbreviations, (iii) internet jargon, and (iv) numbers. A similarity ranking method is developed for ranking users' social media accounts based on how similar their writeprints are. We experiment with both vector distance metrics and machine learning-based class probabilities to measure similarity. The best performance is achieved using stylometric traits combined with the Jensen-Shannon distance metric, outperforming traditional stylometric features used in previous research.
{"title":"Similarity ranking using handcrafted stylometric traits in a swedish context","authors":"Johan Fernquist, Björn Pelzer, Lukas Lundmark, Lisa Kaati, F. Johansson","doi":"10.1145/3487351.3492719","DOIUrl":"https://doi.org/10.1145/3487351.3492719","url":null,"abstract":"In this paper we introduce a new type of handcrafted textual features called stylometric traits, used to create a stylistic writeprint of an author's writing style. These can be divided into four categories: (i) word variations, (ii) abbreviations, (iii) internet jargon, and (iv) numbers. A similarity ranking method is developed for ranking users' social media accounts based on how similar their writeprints are. We experiment with both vector distance metrics and machine learning-based class probabilities to measure similarity. The best performance is achieved using stylometric traits combined with the Jensen-Shannon distance metric, outperforming traditional stylometric features used in previous research.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132505865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modeling and analysis of affective and inner states is gaining prominence in research. Articulating the entire spectrum, ranging from recipes of long-term happiness to factors leading to depression, we frame a model of happiness states of people comprising of three states: G (lasting happiness), P (flickering) and I (frustration), respectively. The definitions of these states are based on psychology literature. We used a XgBoost Classifier to categorize 54,066 Twitter users based on their tweets and analysed the results including what kind of friends each category of users have (for 120 users obtained after thresholding 213 manually labelled users). Analysing XgBoost classification we could re-confirm characteristics mentioned in the definition of the three states (G, P, I) and find out more traits/characteristics beyond the definition as well. We observed that G users are more people-oriented. G and P users are more work-oriented than I users. G users are elder in age to P or I users. I users were found to be more religious than P owing to shelter-seeking traits. Qualitative analysis shows that G group suggests long-term vision, selfless and positive qualities, religious mindset and positive demeanour as expected. I group suggests negative feelings and activities and sensual words as expected. P group has traces of both G and I. P group contains dominating, strong words and extreme negative reactions. We found 21,115 users having Twitter and Goodreads handles to study what kind of books users of each category read. Reading patterns of G constitute of academic/technical, religion, inspirational/self-help and romance. Those of P users are fantasy/fiction, sports, LGBT/BDSM/Erotica and horror/violence/betrayal. I users tend to read fantasy/fiction, death and indiscriminately any arbitrary topic. G and P users make friends in the same category whereas I users tend to have friends in P category, but not among themselves.
{"title":"Which acts model happiness?: an exploratory analysis on Twitter and Goodreads","authors":"Mayank Bhasin, Harshit, Pawan Goyal","doi":"10.1145/3487351.3489475","DOIUrl":"https://doi.org/10.1145/3487351.3489475","url":null,"abstract":"Modeling and analysis of affective and inner states is gaining prominence in research. Articulating the entire spectrum, ranging from recipes of long-term happiness to factors leading to depression, we frame a model of happiness states of people comprising of three states: G (lasting happiness), P (flickering) and I (frustration), respectively. The definitions of these states are based on psychology literature. We used a XgBoost Classifier to categorize 54,066 Twitter users based on their tweets and analysed the results including what kind of friends each category of users have (for 120 users obtained after thresholding 213 manually labelled users). Analysing XgBoost classification we could re-confirm characteristics mentioned in the definition of the three states (G, P, I) and find out more traits/characteristics beyond the definition as well. We observed that G users are more people-oriented. G and P users are more work-oriented than I users. G users are elder in age to P or I users. I users were found to be more religious than P owing to shelter-seeking traits. Qualitative analysis shows that G group suggests long-term vision, selfless and positive qualities, religious mindset and positive demeanour as expected. I group suggests negative feelings and activities and sensual words as expected. P group has traces of both G and I. P group contains dominating, strong words and extreme negative reactions. We found 21,115 users having Twitter and Goodreads handles to study what kind of books users of each category read. Reading patterns of G constitute of academic/technical, religion, inspirational/self-help and romance. Those of P users are fantasy/fiction, sports, LGBT/BDSM/Erotica and horror/violence/betrayal. I users tend to read fantasy/fiction, death and indiscriminately any arbitrary topic. G and P users make friends in the same category whereas I users tend to have friends in P category, but not among themselves.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130759111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The available data of drugs and their targets has increased widely in recent years. Far from the traditional way of studying the drug-target interactions, we propose a network-based computational method to identify new targets for known drugs. In this study, the Stanford Biomedical Network Dataset Collection (BIOSNAP Datasets) is used. A network graph is constructed and analyzed to study the relationship between the drugs and their targets. Different centrality and similarity measures analyses are applied and predict new potential metabolism pathways for five drugs, namely (Wortmannin, Voacamine, Vancomycin, Dactinomycin and Arundic acid) through Cytochrome P450 3A4 enzyme in the liver. The application of network theory to the analysis of this dataset reveals a new significant approach. Finally the molecular docking is performed to confirm the results. Also, the importance of the presented method in drug discovery is highlighted/pointed out.
{"title":"Predictions of drug metabolism pathways through CYP 3A4 enzyme by analysing drug-target interactions network graph","authors":"M. T. Albrijawi, Amrou Haj Ibrahim, R. Alhajj","doi":"10.1145/3487351.3490959","DOIUrl":"https://doi.org/10.1145/3487351.3490959","url":null,"abstract":"The available data of drugs and their targets has increased widely in recent years. Far from the traditional way of studying the drug-target interactions, we propose a network-based computational method to identify new targets for known drugs. In this study, the Stanford Biomedical Network Dataset Collection (BIOSNAP Datasets) is used. A network graph is constructed and analyzed to study the relationship between the drugs and their targets. Different centrality and similarity measures analyses are applied and predict new potential metabolism pathways for five drugs, namely (Wortmannin, Voacamine, Vancomycin, Dactinomycin and Arundic acid) through Cytochrome P450 3A4 enzyme in the liver. The application of network theory to the analysis of this dataset reveals a new significant approach. Finally the molecular docking is performed to confirm the results. Also, the importance of the presented method in drug discovery is highlighted/pointed out.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114377928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clique counting is considered to be a challenging problem in graph mining. The reason is combinatorial explosion; even moderate graphs with a few million edges could have clique counts in the order of many billions. In this paper, we propose a fast and scalable algorithm for approximating 4-clique counts in a single-pass streaming model. By leveraging a combination of sampling approaches, we estimate the 4-clique count with high accuracy. Our algorithm performs well on massive graphs containing several billions of 4-cliques, and terminates within a reasonable amount of time.
{"title":"Approximating 4-cliques in streaming graphs: the power of dual sampling","authors":"Anmol Mann, Venkatesh Srinivasan, Alex Thomo","doi":"10.1145/3487351.3489471","DOIUrl":"https://doi.org/10.1145/3487351.3489471","url":null,"abstract":"Clique counting is considered to be a challenging problem in graph mining. The reason is combinatorial explosion; even moderate graphs with a few million edges could have clique counts in the order of many billions. In this paper, we propose a fast and scalable algorithm for approximating 4-clique counts in a single-pass streaming model. By leveraging a combination of sampling approaches, we estimate the 4-clique count with high accuracy. Our algorithm performs well on massive graphs containing several billions of 4-cliques, and terminates within a reasonable amount of time.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122061996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The unprecedented growth in technology has increased the importance of the required information security that is still hard to be reached. Recently, network and web application attacks have occurred frequently, causing confidential data to be stolen by the available vulnerabilities in the systems and the most prominent is in the form of open ports. This causes the CIA (Confidentiality Integrity and Availability) Triad Model to break. Penetration testing is one of the key techniques used in real life to accurately detect the possible threats and potential attacks against the system, and the first step for hackers to conduct attacks is information collection. In this paper, we present a useful schema for the active information-gathering phase that can be used during penetration testing and by system administrators. It will be the first feature of a security engine going to be implemented. The work involves an automated API-based IP and port scanner, service-version enumerator, and vulnerability detection system. This scheme is based on the Network Mapper (Nmap) to collect the information with high accuracy depending on the provided rules in our schema. Besides, the work has been implemented as a RESTful-API server, aiming at easy integration for real-life cases and allowing administrators to scan and secure their networks more quickly and easily. The effectiveness and efficiency of this technique has been proved by the various test cases applied considering different scenarios from the real world. The average time of scanning a server and detecting the vulnerabilities is 2.2 minutes. Regardless of the number of vulnerabilities, the increase in time for each open port is just about 12 seconds.
{"title":"Automation of active reconnaissance phase: an automated API-based port and vulnerability scanner","authors":"Malek Malkawi, Tansel Özyer, R. Alhajj","doi":"10.1145/3487351.3492720","DOIUrl":"https://doi.org/10.1145/3487351.3492720","url":null,"abstract":"The unprecedented growth in technology has increased the importance of the required information security that is still hard to be reached. Recently, network and web application attacks have occurred frequently, causing confidential data to be stolen by the available vulnerabilities in the systems and the most prominent is in the form of open ports. This causes the CIA (Confidentiality Integrity and Availability) Triad Model to break. Penetration testing is one of the key techniques used in real life to accurately detect the possible threats and potential attacks against the system, and the first step for hackers to conduct attacks is information collection. In this paper, we present a useful schema for the active information-gathering phase that can be used during penetration testing and by system administrators. It will be the first feature of a security engine going to be implemented. The work involves an automated API-based IP and port scanner, service-version enumerator, and vulnerability detection system. This scheme is based on the Network Mapper (Nmap) to collect the information with high accuracy depending on the provided rules in our schema. Besides, the work has been implemented as a RESTful-API server, aiming at easy integration for real-life cases and allowing administrators to scan and secure their networks more quickly and easily. The effectiveness and efficiency of this technique has been proved by the various test cases applied considering different scenarios from the real world. The average time of scanning a server and detecting the vulnerabilities is 2.2 minutes. Regardless of the number of vulnerabilities, the increase in time for each open port is just about 12 seconds.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123950378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lauren Hudson, R. Whitaker, S. M. Allen, Liam D. Turner, Diane H Felmlee
The prevalence of induced triads play an important role in characterising complex networks, supporting approaches for assessment of dynamic and partially obfuscated scenarios. In this paper we introduce a new local edge-centrality measure that is designed to be deployed in this context for complex networks and is highly scalable. It signifies the importance an edge plays within induced triads for a directed network. We observe that an edge can play one of two roles in providing connectivity within any particular triad, based on whether the edge supports connectivity to the third node or not. We call these alternative states overt and covert. As an edge may play alternative roles in different induced triads, this allows us to assess the local importance of an edge across multiple induced substructures. We introduce theory to count the number of induced triads in which an edge is overt and covert. Using 34 data sets derived from public sources, we show how the presence of overt and covert edges can be used to profile diverse real-world networks. The relationship with global network analysis metrics is examined. We observe that overt and covert edge centrality is useful in further differentiating classes of network, when considered in combination with conventional global network analysis metrics.
{"title":"The centrality of edges based on their role in induced triads","authors":"Lauren Hudson, R. Whitaker, S. M. Allen, Liam D. Turner, Diane H Felmlee","doi":"10.1145/3487351.3493825","DOIUrl":"https://doi.org/10.1145/3487351.3493825","url":null,"abstract":"The prevalence of induced triads play an important role in characterising complex networks, supporting approaches for assessment of dynamic and partially obfuscated scenarios. In this paper we introduce a new local edge-centrality measure that is designed to be deployed in this context for complex networks and is highly scalable. It signifies the importance an edge plays within induced triads for a directed network. We observe that an edge can play one of two roles in providing connectivity within any particular triad, based on whether the edge supports connectivity to the third node or not. We call these alternative states overt and covert. As an edge may play alternative roles in different induced triads, this allows us to assess the local importance of an edge across multiple induced substructures. We introduce theory to count the number of induced triads in which an edge is overt and covert. Using 34 data sets derived from public sources, we show how the presence of overt and covert edges can be used to profile diverse real-world networks. The relationship with global network analysis metrics is examined. We observe that overt and covert edge centrality is useful in further differentiating classes of network, when considered in combination with conventional global network analysis metrics.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130227660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Ng, John Yeh Han Tan, Darryl Jing Heng Tan, R. Lee
TikTok is a popular new social media, where users express themselves through short video clips. A common form of interaction on the platform is participating in "challenges", which are songs and dances for users to iterate upon. Challenge contagion can be measured through replication reach, i.e., users uploading videos of their participation in the challenges. The uniqueness of the TikTok platform where both challenge content and user preferences are evolving requires the combination of challenge and user representation. This paper investigates social contagion of TikTok challenges through predicting a user's participation. We propose a novel deep learning model, deepChallenger, to learn and combine latent user and challenge representations from past videos to perform this user-challenge prediction task. We collect a dataset of over 7,000 videos from 12 trending challenges on the ForYouPage, the app's landing page, and over 10,000 videos from 1303 users. Extensive experiments are conducted and the results show that our proposed deepChallenger (F1=0.494) outperforms baselines (F1=0.188) in the prediction task.
{"title":"Will you dance to the challenge?: predicting user participation of TikTok challenges","authors":"L. Ng, John Yeh Han Tan, Darryl Jing Heng Tan, R. Lee","doi":"10.1145/3487351.3488276","DOIUrl":"https://doi.org/10.1145/3487351.3488276","url":null,"abstract":"TikTok is a popular new social media, where users express themselves through short video clips. A common form of interaction on the platform is participating in \"challenges\", which are songs and dances for users to iterate upon. Challenge contagion can be measured through replication reach, i.e., users uploading videos of their participation in the challenges. The uniqueness of the TikTok platform where both challenge content and user preferences are evolving requires the combination of challenge and user representation. This paper investigates social contagion of TikTok challenges through predicting a user's participation. We propose a novel deep learning model, deepChallenger, to learn and combine latent user and challenge representations from past videos to perform this user-challenge prediction task. We collect a dataset of over 7,000 videos from 12 trending challenges on the ForYouPage, the app's landing page, and over 10,000 videos from 1303 users. Extensive experiments are conducted and the results show that our proposed deepChallenger (F1=0.494) outperforms baselines (F1=0.188) in the prediction task.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127882540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated generation of commercial tweets has become a useful and important tool in the use of social media for marketing and advertising. In this context, paraphrase generation has emerged as an important problem. This type of paraphrase generation has the unique requirement of requiring certain elements to be kept in the result, such as the product name or the promotion details. To address this need, we propose a Constraint-Embedded Language Modeling (CELM) framework, in which hard constraints are embedded in the text content and learned through a language model. This embedding helps the model learn not only paraphrase generation but also constraints in the content of the paraphrase specific to commercial tweets. In addition, we apply knowledge learned from a general domain to the generation task of commercial tweets. Our model is shown to outperform general paraphrase generation models as well as the state-of-the-art CopyNet model, in terms of paraphrase similarity, diversity, and the ability to conform to hard constraints.
{"title":"Constraint-embedded paraphrase generation for commercial tweets","authors":"Renhao Cui, G. Agrawal, R. Ramnath","doi":"10.1145/3487351.3490974","DOIUrl":"https://doi.org/10.1145/3487351.3490974","url":null,"abstract":"Automated generation of commercial tweets has become a useful and important tool in the use of social media for marketing and advertising. In this context, paraphrase generation has emerged as an important problem. This type of paraphrase generation has the unique requirement of requiring certain elements to be kept in the result, such as the product name or the promotion details. To address this need, we propose a Constraint-Embedded Language Modeling (CELM) framework, in which hard constraints are embedded in the text content and learned through a language model. This embedding helps the model learn not only paraphrase generation but also constraints in the content of the paraphrase specific to commercial tweets. In addition, we apply knowledge learned from a general domain to the generation task of commercial tweets. Our model is shown to outperform general paraphrase generation models as well as the state-of-the-art CopyNet model, in terms of paraphrase similarity, diversity, and the ability to conform to hard constraints.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}