Pub Date : 2022-08-26DOI: 10.35377/saucis...1121388
F. Üçkardeş
The aim of this study was to use Hierarchical Loglinear Model (HLLM) in the analysis of multiway frequency tables and to interpret the main and interaction effects of this model on suicide cases. The data set used in this study was taken from the Turkish Republic State Statistical Institute (TUIK). A total of 6479 cases in 2016 and 2018 years were used in this analysis and the analyzes were made by considering gender, year and age variables. As a result of HDLM analysis, Year, Gender and Age, which are the main effects in suicide cases, and the interactions of Year × Gender and Gender × Age were found significantly (P
{"title":"Using of Hierarchical Loglinear Model in Multiway Frequency Tables and an Application on Suicide Cases","authors":"F. Üçkardeş","doi":"10.35377/saucis...1121388","DOIUrl":"https://doi.org/10.35377/saucis...1121388","url":null,"abstract":"The aim of this study was to use Hierarchical Loglinear Model (HLLM) in the analysis of multiway frequency tables and to interpret the main and interaction effects of this model on suicide cases. \u0000The data set used in this study was taken from the Turkish Republic State Statistical Institute (TUIK). A total of 6479 cases in 2016 and 2018 years were used in this analysis and the analyzes were made by considering gender, year and age variables. \u0000As a result of HDLM analysis, Year, Gender and Age, which are the main effects in suicide cases, and the interactions of Year × Gender and Gender × Age were found significantly (P","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"12 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127747869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-23DOI: 10.35377/saucis...879855
Ibrahim A. Fadel, Cemil Öz
In this paper, the dataset of real incidents that occurred in Turkey between 2013 and 2017 and are regarded as acts of terrorism without any doubt according to Global Terrorism Database (GTD) are used to predict the group names responsible for unknown attacks. Principal Component Analysis (PCA) technique was used for feature selection. A novel voting method between five classification algorithms such as Random Forests, Logistic Regression, AdaBoost, Neural Network, and Support Vector Machine was used to predict the names. The results clearly demonstrate that the classification accuracy of all classifiers studied in this paper improved when PCA was used to select features as compared to selecting features without using PCA. The prediction of terrorist group names with PCA based feature reduction and the original features is carried out and the results are compared.
{"title":"Prediction of Unknown Terrorist Group Names Responsible for Attacks in Turkey","authors":"Ibrahim A. Fadel, Cemil Öz","doi":"10.35377/saucis...879855","DOIUrl":"https://doi.org/10.35377/saucis...879855","url":null,"abstract":"In this paper, the dataset of real incidents that occurred in Turkey between 2013 and 2017 and are regarded as acts of terrorism without any doubt according to Global Terrorism Database (GTD) are used to predict the group names responsible for unknown attacks. Principal Component Analysis (PCA) technique was used for feature selection. A novel voting method between five classification algorithms such as Random Forests, Logistic Regression, AdaBoost, Neural Network, and Support Vector Machine was used to predict the names. The results clearly demonstrate that the classification accuracy of all classifiers studied in this paper improved when PCA was used to select features as compared to selecting features without using PCA. The prediction of terrorist group names with PCA based feature reduction and the original features is carried out and the results are compared.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127405680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-07DOI: 10.35377/saucis...1147919
Berat Erdemkilic, Mehmet Akif Yazici
Flying Ad-Hoc Networks are wireless ad-hoc networks that unmanned aerial vehicles are used as communication nodes in environments where it is difficult to establish a proper communication infrastructure. FANET systems have high dynamism levels because the nodes move at very high speeds and UAVs can have various mobility models to use. The fact that the devices are aerial vehicles ensures that they have a line of sight between them, and it requires FANET systems operating on large topologies with low node density. For these reasons, the structure of the topology changes rapidly, and the frequency of link disconnections between UAVs increases. Topology-based and position-based traditional routing algorithms do not work well in the face of this problem. Therefore, SDN Based Routing Protocol which can work on both proactive and reactive manner has been developed in order to improve the communication performance of highly dynamic FANET systems. Software Defined Networking technology is used as network management architecture, and the Openflow protocol is used to establish communication between UAVs in the data layer and control layer. To optimize Openflow packets for FANETs, protocol adaption studies are carried out. To make the system more manageable for different dynamism levels, topology control services, timer control services, and node configuration services for SDN controller units are designed. Non delay-tolerant position-based protocols and topology-based reactive and proactive protocols are studied and compared with the protocol designed based on SDN architecture in terms of throughput, end-to-end delay, and control packet overhead parameters. In the comparison studies performed by creating scenarios where the topology has different levels of dynamism, it has been revealed that SDN Based Routing Protocol performs better than traditional protocols.
{"title":"A Software Defined Networking-based Routing Algorithm for Flying Ad Hoc Networks","authors":"Berat Erdemkilic, Mehmet Akif Yazici","doi":"10.35377/saucis...1147919","DOIUrl":"https://doi.org/10.35377/saucis...1147919","url":null,"abstract":"Flying Ad-Hoc Networks are wireless ad-hoc networks that unmanned aerial vehicles are used as communication nodes in environments where it is difficult to establish a proper communication infrastructure. FANET systems have high dynamism levels because the nodes move at very high speeds and UAVs can have various mobility models to use. The fact that the devices are aerial vehicles ensures that they have a line of sight between them, and it requires FANET systems operating on large topologies with low node density. For these reasons, the structure of the topology changes rapidly, and the frequency of link disconnections between UAVs increases. Topology-based and position-based traditional routing algorithms do not work well in the face of this problem. Therefore, SDN Based Routing Protocol which can work on both proactive and reactive manner has been developed in order to improve the communication performance of highly dynamic FANET systems. Software Defined Networking technology is used as network management architecture, and the Openflow protocol is used to establish communication between UAVs in the data layer and control layer. To optimize Openflow packets for FANETs, protocol adaption studies are carried out. To make the system more manageable for different dynamism levels, topology control services, timer control services, and node configuration services for SDN controller units are designed. Non delay-tolerant position-based protocols and topology-based reactive and proactive protocols are studied and compared with the protocol designed based on SDN architecture in terms of throughput, end-to-end delay, and control packet overhead parameters. In the comparison studies performed by creating scenarios where the topology has different levels of dynamism, it has been revealed that SDN Based Routing Protocol performs better than traditional protocols.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122550478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-06DOI: 10.35377/saucis...1138577
Özkan Yılmaz, Mehmet Akif Yazici
Physical layer authentication is an important technique for cybersecurity, especially in military scenarios. Device classification using radio frequency fingerprinting, which is based on recognizing device-unique characteristics of the transient waveform observed at the beginning of a transmission from a radio device, is a promising method in this context. In this study, the effect of the ambient temperature on the performance of radio device classification based on RF fingerprinting is investigated. The radio devices used in the study belong to the same brand, model, and production date, making the problem more difficult than classifying radio devices of different brands or models. Our results show that high levels of accuracy can be attained using convolutional neural network models such as ResNet50 when the test data and the training data are collected at the same temperature, whereas performance suffers when the test data and the training data belong to different temperature values. We also provide the performance figures of a blended training model that uses training data taken at various temperature values.
{"title":"The Effect of Ambient Temperature On Device Classification Based On Radio Frequency Fingerprint Recognition","authors":"Özkan Yılmaz, Mehmet Akif Yazici","doi":"10.35377/saucis...1138577","DOIUrl":"https://doi.org/10.35377/saucis...1138577","url":null,"abstract":"Physical layer authentication is an important technique for cybersecurity, especially in military scenarios. Device classification using radio frequency fingerprinting, which is based on recognizing device-unique characteristics of the transient waveform observed at the beginning of a transmission from a radio device, is a promising method in this context. In this study, the effect of the ambient temperature on the performance of radio device classification based on RF fingerprinting is investigated. The radio devices used in the study belong to the same brand, model, and production date, making the problem more difficult than classifying radio devices of different brands or models. Our results show that high levels of accuracy can be attained using convolutional neural network models such as ResNet50 when the test data and the training data are collected at the same temperature, whereas performance suffers when the test data and the training data belong to different temperature values. We also provide the performance figures of a blended training model that uses training data taken at various temperature values.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127320694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-05DOI: 10.35377/saucis...1153071
Zeynep Oktay, Ç. Erol, N. Arda
Breast cancer is one of the most important global health problems affecting both developed and developing countries. The identification of anticancer compounds, effective on breast cancer cells, is of key importance in chemoprevention investigations and drug development studies. In the literature, there are numerous compounds that have been analyzed for their cytotoxic effects on breast cancer cells, but there is no database where the researchers who want to design a new study on breast cancer can find these compounds all at once. This paper presents a relational database that stores the data of natural and synthetic compounds cytotoxically active on breast cancer cells. The database contains 381 cytotoxicity results and data of 159 compounds, compiled from selected 80 studies. When all this data in our database was queried, it was found out that quercetin, which is a dietary flavonoid, is the most analyzed compound, and MCF-7 cell line is the most used breast cancer cell line.
{"title":"A Relational Database Design for The Compounds Cytotoxically Active on Breast Cancer Cells","authors":"Zeynep Oktay, Ç. Erol, N. Arda","doi":"10.35377/saucis...1153071","DOIUrl":"https://doi.org/10.35377/saucis...1153071","url":null,"abstract":"Breast cancer is one of the most important global health problems affecting both developed and developing countries. The identification of anticancer compounds, effective on breast cancer cells, is of key importance in chemoprevention investigations and drug development studies. In the literature, there are numerous compounds that have been analyzed for their cytotoxic effects on breast cancer cells, but there is no database where the researchers who want to design a new study on breast cancer can find these compounds all at once. This paper presents a relational database that stores the data of natural and synthetic compounds cytotoxically active on breast cancer cells. The database contains 381 cytotoxicity results and data of 159 compounds, compiled from selected 80 studies. When all this data in our database was queried, it was found out that quercetin, which is a dietary flavonoid, is the most analyzed compound, and MCF-7 cell line is the most used breast cancer cell line.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121762171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-22DOI: 10.35377/saucis...1073355
E. Güney, C. Bayilmis
Traffic signs and road objects detection is significant issue for driver safety. It has become popular with the development of autonomous vehicles and driver-assistant systems. This study presents a real-time system that detects traffic signs and various objects in the driving environment with a camera. Faster R-CNN architecture was used as a detection method in this study. This architecture is a well-known two-stage approach for object detection. Dataset was created by collecting various images for training and testing of the model. The dataset consists of 1880 images containing traffic signs and objects collected from Turkey with the GTSRB dataset. These images were combined and divided into the training set and testing set with the ratio of 80/20. The model's training was carried out in the computer environment for 8.5 hours and approximately 10000 iterations. Experimental results show the real-time performance of Faster R-CNN for robustly traffic signs and objects detection.
{"title":"An Implementation of Traffic Signs and Road Objects Detection Using Faster R-CNN","authors":"E. Güney, C. Bayilmis","doi":"10.35377/saucis...1073355","DOIUrl":"https://doi.org/10.35377/saucis...1073355","url":null,"abstract":"Traffic signs and road objects detection is significant issue for driver safety. It has become popular with the development of autonomous vehicles and driver-assistant systems. This study presents a real-time system that detects traffic signs and various objects in the driving environment with a camera. Faster R-CNN architecture was used as a detection method in this study. This architecture is a well-known two-stage approach for object detection. Dataset was created by collecting various images for training and testing of the model. The dataset consists of 1880 images containing traffic signs and objects collected from Turkey with the GTSRB dataset. These images were combined and divided into the training set and testing set with the ratio of 80/20. The model's training was carried out in the computer environment for 8.5 hours and approximately 10000 iterations. Experimental results show the real-time performance of Faster R-CNN for robustly traffic signs and objects detection.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127395900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-08DOI: 10.35377/saucis...1133435
Halit Öztekin, Ali Gülbağ
One of the languages available to describe a digital system in FPGA is the VHDL language. Since programming in hardware requires a different way of thinking than developing software, the students face some difficulties when trying to design in VHDL language with the previous and long experiences kept in mind in the learning of software imperative programming. These are its concurrency, parallel and sequential model. Due to the insufficient understanding of these topics, it is difficult for students to master the VHDL language. Analogies change the conceptual system of existing knowledge by linking the known to the unknown and by changing and strengthening their relationships. This study contributes to overcoming the problems that students encounter in the coding of the above-mentioned topics in VHDL language by using their experiences in traditional programming languages through analogies. Analogies were used in an undergraduate embedded systems course to explain complex concepts such as those related to signals, concurrent/parallel process; and to encourage comprehensive projects in digital circuit design. In feedback from students, the discussion and negotiation of analogies seems to minimize confusion and from using inappropriate expressions in using VHDL language.
{"title":"Transfer of Analogies in Traditional Programming Languages to Teaching VHDL","authors":"Halit Öztekin, Ali Gülbağ","doi":"10.35377/saucis...1133435","DOIUrl":"https://doi.org/10.35377/saucis...1133435","url":null,"abstract":"One of the languages available to describe a digital system in FPGA is the VHDL language. Since programming in hardware requires a different way of thinking than developing software, the students face some difficulties when trying to design in VHDL language with the previous and long experiences kept in mind in the learning of software imperative programming. These are its concurrency, parallel and sequential model. Due to the insufficient understanding of these topics, it is difficult for students to master the VHDL language. Analogies change the conceptual system of existing knowledge by linking the known to the unknown and by changing and strengthening their relationships. This study contributes to overcoming the problems that students encounter in the coding of the above-mentioned topics in VHDL language by using their experiences in traditional programming languages through analogies. Analogies were used in an undergraduate embedded systems course to explain complex concepts such as those related to signals, concurrent/parallel process; and to encourage comprehensive projects in digital circuit design. In feedback from students, the discussion and negotiation of analogies seems to minimize confusion and from using inappropriate expressions in using VHDL language.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129768538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-06DOI: 10.35377/saucis...1139765
Emre Beray Boztepe, Bedirhan Karakaya, B. Karasulu, İsmet Ünlü
This study contains an approach for recognizing the sound environment class from a video to understand the spoken content with its sentimental context via some sort of analysis that is achieved by the processing of audio-visual content using multimodal deep learning methodology. This approach begins with cutting the parts of a given video which the most action happened by using deep learning and this cutted parts get concanarated as a new video clip. With the help of a deep learning network model which was trained before for sound recognition, a sound prediction process takes place. The model was trained by using different sound clips of ten different categories to predict sound classes. These categories have been selected by where the action could have happened the most. Then, to strengthen the result of sound recognition if there is a speech in the new video, this speech has been taken. By using Natural Language Processing (NLP) and Named Entity Recognition (NER) this speech has been categorized according to if the word of a speech has connotation of any of the ten categories. Sentiment analysis and Apriori Algorithm from Association Rule Mining (ARM) processes are preceded by identifying the frequent categories in the concanarated video and helps us to define the relationship between the categories owned. According to the highest performance evaluation values from our experiments, the accuracy for sound environment recognition for a given video's processed scene is 70%, average Bilingual Evaluation Understudy (BLEU) score for speech to text with VOSK speech recognition toolkit's English language model is 90% on average and for Turkish language model is 81% on average. Discussion and conclusion based on scientific findings are included in our study.
{"title":"An Approach for Audio-Visual Content Understanding of Video using Multimodal Deep Learning Methodology","authors":"Emre Beray Boztepe, Bedirhan Karakaya, B. Karasulu, İsmet Ünlü","doi":"10.35377/saucis...1139765","DOIUrl":"https://doi.org/10.35377/saucis...1139765","url":null,"abstract":"This study contains an approach for recognizing the sound environment class from a video to understand the spoken content with its sentimental context via some sort of analysis that is achieved by the processing of audio-visual content using multimodal deep learning methodology. This approach begins with cutting the parts of a given video which the most action happened by using deep learning and this cutted parts get concanarated as a new video clip. With the help of a deep learning network model which was trained before for sound recognition, a sound prediction process takes place. The model was trained by using different sound clips of ten different categories to predict sound classes. These categories have been selected by where the action could have happened the most. Then, to strengthen the result of sound recognition if there is a speech in the new video, this speech has been taken. By using Natural Language Processing (NLP) and Named Entity Recognition (NER) this speech has been categorized according to if the word of a speech has connotation of any of the ten categories. Sentiment analysis and Apriori Algorithm from Association Rule Mining (ARM) processes are preceded by identifying the frequent categories in the concanarated video and helps us to define the relationship between the categories owned. According to the highest performance evaluation values from our experiments, the accuracy for sound environment recognition for a given video's processed scene is 70%, average Bilingual Evaluation Understudy (BLEU) score for speech to text with VOSK speech recognition toolkit's English language model is 90% on average and for Turkish language model is 81% on average. Discussion and conclusion based on scientific findings are included in our study.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128286390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-19DOI: 10.35377/saucis...1085625
Fuat Türk, Yunus Kökver
COVID-19 is a deadly virus that first appeared in late 2019 and spread rapidly around the world. Understanding and classifying computed tomography images (CT) is extremely important for the diagnosis of COVID-19. Many case classification studies face many problems, especially unbalanced and insufficient data. For this reason, deep learning methods have a great importance for the diagnosis of COVID-19. Therefore, we had the opportunity to study the architectures of NasNet-Mobile, DenseNet and Nasnet-Mobile+DenseNet with the dataset we have merged. The dataset we have merged for COVID-19 is divided into 3 separate classes: Normal, COVID-19, and Pneumonia. We obtained the accuracy 87.16%, 93.38% and 93.72% for the NasNet-Mobile, DenseNet and NasNet-Mobile+DenseNet architectures for the classification, respectively. The results once again demonstrate the importance of Deep Learning methods for the diagnosis of COVID-19.
{"title":"Application with deep learning models for COVID-19 diagnosis","authors":"Fuat Türk, Yunus Kökver","doi":"10.35377/saucis...1085625","DOIUrl":"https://doi.org/10.35377/saucis...1085625","url":null,"abstract":"COVID-19 is a deadly virus that first appeared in late 2019 and spread rapidly around the world. Understanding and classifying computed tomography images (CT) is extremely important for the diagnosis of COVID-19. Many case classification studies face many problems, especially unbalanced and insufficient data. For this reason, deep learning methods have a great importance for the diagnosis of COVID-19. Therefore, we had the opportunity to study the architectures of NasNet-Mobile, DenseNet and Nasnet-Mobile+DenseNet with the dataset we have merged. \u0000The dataset we have merged for COVID-19 is divided into 3 separate classes: Normal, COVID-19, and Pneumonia. We obtained the accuracy 87.16%, 93.38% and 93.72% for the NasNet-Mobile, DenseNet and NasNet-Mobile+DenseNet architectures for the classification, respectively. The results once again demonstrate the importance of Deep Learning methods for the diagnosis of COVID-19.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127507381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-15DOI: 10.35377/saucis...1065794
Emre Yalcin
Collaborative filtering algorithms are efficient tools for providing recommendations with reasonable accuracy performances to individuals. However, the previous research has realized that these algorithms are undesirably biased towards blockbuster items. i.e., both popular and highly-liked items, in their recommendations, resulting in recommendation lists dominated by such blockbuster items. As one most prominent types of collaborative filtering approaches, neighborhood-based algorithms aim to produce recommendations based on neighborhoods constructed based on similarities between users or items. Therefore, the utilized similarity function and the size of the neighborhoods are critical parameters on their recommendation performances. This study considers three well-known similarity functions, i.e., Pearson, Cosine, and Mean Squared Difference, and varying neighborhood sizes and observes how they affect the algorithms’ blockbuster bias and accuracy performances. The extensive experiments conducted on two benchmark data collections conclude that as the size of neighborhoods decreases, these algorithms generally become more vulnerable to blockbuster bias while their accuracy increases. The experimental works also show that using the Cosine metric is superior to other similarity functions in producing recommendations where blockbuster bias is treated more; however, it leads to having unqualified recommendations in terms of predictive accuracy as they are usually conflicting goals.
{"title":"Effects of neighborhood-based collaborative filtering parameters on their blockbuster bias performances","authors":"Emre Yalcin","doi":"10.35377/saucis...1065794","DOIUrl":"https://doi.org/10.35377/saucis...1065794","url":null,"abstract":"Collaborative filtering algorithms are efficient tools for providing recommendations with reasonable accuracy performances to individuals. However, the previous research has realized that these algorithms are undesirably biased towards blockbuster items. i.e., both popular and highly-liked items, in their recommendations, resulting in recommendation lists dominated by such blockbuster items. As one most prominent types of collaborative filtering approaches, neighborhood-based algorithms aim to produce recommendations based on neighborhoods constructed based on similarities between users or items. Therefore, the utilized similarity function and the size of the neighborhoods are critical parameters on their recommendation performances. This study considers three well-known similarity functions, i.e., Pearson, Cosine, and Mean Squared Difference, and varying neighborhood sizes and observes how they affect the algorithms’ blockbuster bias and accuracy performances. The extensive experiments conducted on two benchmark data collections conclude that as the size of neighborhoods decreases, these algorithms generally become more vulnerable to blockbuster bias while their accuracy increases. The experimental works also show that using the Cosine metric is superior to other similarity functions in producing recommendations where blockbuster bias is treated more; however, it leads to having unqualified recommendations in terms of predictive accuracy as they are usually conflicting goals.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}