Oliver Scheibert, Jannis Möller, S. Grogorick, M. Eisemann
Tracking errors severely impact the effectiveness of augmented reality display techniques for indoor navigation. In this work we take a look at the sources of error and accuracy of existing tracking technologies. We derive important design criteria for robust display techniques and present objective criteria. These serve evaluation of indoor navigation techniques without or in preparation of quantitative user studies. Based on these criteria we propose a new error tolerant display technique called Bending Words, where words move along the navigation path guiding the user. Bending Words outranks the other evaluated display techniques in many of the tested criteria and provides a robust, error-tolerant alternative to established augmented reality indoor navigation display techniques.
{"title":"Error-Robust Indoor Augmented Reality Navigation: Evaluation Criteria and a New Approach","authors":"Oliver Scheibert, Jannis Möller, S. Grogorick, M. Eisemann","doi":"10.24132/csrn.3301.17","DOIUrl":"https://doi.org/10.24132/csrn.3301.17","url":null,"abstract":"Tracking errors severely impact the effectiveness of augmented reality display techniques for indoor navigation. In this work we take a look at the sources of error and accuracy of existing tracking technologies. We derive important design criteria for robust display techniques and present objective criteria. These serve evaluation of indoor navigation techniques without or in preparation of quantitative user studies. Based on these criteria we propose a new error tolerant display technique called Bending Words, where words move along the navigation path guiding the user. Bending Words outranks the other evaluated display techniques in many of the tested criteria and provides a robust, error-tolerant alternative to established augmented reality indoor navigation display techniques.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124514071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabian Sturm, Rahul Sathiyababu, E. Hergenroether, M. Siegel
Until now, it has been impossible to imagine industrial manual assembly without humans due to their flexibility and adaptability. But the assembly process does not always benefit from human intervention. The error-proneness of the assembler due to disturbance, distraction or inattention requires intelligent support of the employee and is ideally suited for deep learning approaches because of the permanently occurring and repetitive data patterns. However, there is the problem that the labels of the data are not always sufficiently available. In this work, a spatio-temporal transformer model approach is used to address the circumstances of few labels in an industrial setting. A pseudo-labeling method from the field of semi-supervised transfer learning is applied for model training, and the entire architecture is adapted to the fine-grained recognition of human hand actions in assembly. This implementation significantly improves the generalization of the model during the training process over different variations of strong and weak classes from the ground truth and proves that it is possible to work with deep learning technologies in an industrial setting, even with few labels. In addition to the main goal of improving the generalization capabilities of the model by using less data during training and exploring different variations of appropriate ground truth and new classes, the recognition capabilities of the model are improved by adding convolution to the temporal embedding layer, which increases the test accuracy by over 5% compared to a similar predecessor model.
{"title":"Semi-Supervised Learning Approach for Fine Grained Human Hand Action Recognition in Industrial Assembly","authors":"Fabian Sturm, Rahul Sathiyababu, E. Hergenroether, M. Siegel","doi":"10.24132/csrn.3301.58","DOIUrl":"https://doi.org/10.24132/csrn.3301.58","url":null,"abstract":"Until now, it has been impossible to imagine industrial manual assembly without humans due to their flexibility and adaptability. But the assembly process does not always benefit from human intervention. The error-proneness of the assembler due to disturbance, distraction or inattention requires intelligent support of the employee and is ideally suited for deep learning approaches because of the permanently occurring and repetitive data patterns. However, there is the problem that the labels of the data are not always sufficiently available. In this work, a spatio-temporal transformer model approach is used to address the circumstances of few labels in an industrial setting. A pseudo-labeling method from the field of semi-supervised transfer learning is applied for model training, and the entire architecture is adapted to the fine-grained recognition of human hand actions in assembly. This implementation significantly improves the generalization of the model during the training process over different variations of strong and weak classes from the ground truth and proves that it is possible to work with deep learning technologies in an industrial setting, even with few labels. In addition to the main goal of improving the generalization capabilities of the model by using less data during training and exploring different variations of appropriate ground truth and new classes, the recognition capabilities of the model are improved by adding convolution to the temporal embedding layer, which increases the test accuracy by over 5% compared to a similar predecessor model.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117064208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The detection of mirrors is a challenging task due to their lack of a distinctive appearance and the visual similarity of reflections with their surroundings. While existing systems have achieved some success in mirror segmentation, the design of lightweight models remains unexplored, and datasets are mostly limited to clear mirrors in indoor scenes. In this paper, we propose a new dataset consisting of 454 images of outdoor mirrors and reflective surfaces. We also present a lightweight edge-guided convolutional neural network based on PMDNet. Our model uses EfficientNetV2-Medium as its backbone and employs parallel convolutional layers and a lightweight convolutional block attention module to capture both low-level and high-level features for edge extraction. It registered maximum F-measure scores of 0.8483, 0.8117, and 0.8388 on the Mirror Segmentation Dataset (MSD), Progressive Mirror Detection (PMD) dataset, and our proposed dataset, respectively. Applying filter pruning via geometric median resulted in maximum F-measure scores of 0.8498, 0.7902, and 0.8456, respectively, performing competitively with the state-of-the-art PMDNet but with 78.20x fewer floating-point operations per second and 238.16x fewer parameters. The code and dataset are available at https://github.com/memgonzales/mirror-segmentation.
{"title":"Designing a Lightweight Edge-Guided Convolutional Neural Network for Segmenting Mirrors and Reflective Surfaces","authors":"Mark Edward M. Gonzales, Lorene C. Uy, J. Ilao","doi":"10.24132/csrn.3301.14","DOIUrl":"https://doi.org/10.24132/csrn.3301.14","url":null,"abstract":"The detection of mirrors is a challenging task due to their lack of a distinctive appearance and the visual similarity of reflections with their surroundings. While existing systems have achieved some success in mirror segmentation, the design of lightweight models remains unexplored, and datasets are mostly limited to clear mirrors in indoor scenes. In this paper, we propose a new dataset consisting of 454 images of outdoor mirrors and reflective surfaces. We also present a lightweight edge-guided convolutional neural network based on PMDNet. Our model uses EfficientNetV2-Medium as its backbone and employs parallel convolutional layers and a lightweight convolutional block attention module to capture both low-level and high-level features for edge extraction. It registered maximum F-measure scores of 0.8483, 0.8117, and 0.8388 on the Mirror Segmentation Dataset (MSD), Progressive Mirror Detection (PMD) dataset, and our proposed dataset, respectively. Applying filter pruning via geometric median resulted in maximum F-measure scores of 0.8498, 0.7902, and 0.8456, respectively, performing competitively with the state-of-the-art PMDNet but with 78.20x fewer floating-point operations per second and 238.16x fewer parameters. The code and dataset are available at https://github.com/memgonzales/mirror-segmentation.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121126822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent proliferation of advanced data collection technologies for Patient Generated Health Data (PGHD) has made remote health monitoring more accessible. However, the complex nature of the big volume of medical generated data presents a significant challenge for traditional patient monitoring approaches, impeding the effective extraction of useful information. In this context, it is imperative to develop a robust and cost-effective framework that provides the scalability and deals with the heterogeneity of PGHD in real-time. Such a system could serve as a reference and would guide future research for monitoring patient undergoing a treatment at home conditions. This study presents a real-time visual analytics framework offering insightful visual representations of the multimodal big data. The proposed system was designed following the principles of User Centered Design (UCD) to ensure that it meets the needs and expectations of medical practitioners. The usability of this framework was evaluated by its application to the visualization of kinematic data of the upper limbs’ movement of patients during neuromotor rehabilitation exercises.
{"title":"Real-Time Visual Analytics for Remote Monitoring of Patient’s Health","authors":"Maryam Boumrah, S. Garbaya, A. Radgui","doi":"10.24132/csrn.3301.61","DOIUrl":"https://doi.org/10.24132/csrn.3301.61","url":null,"abstract":"The recent proliferation of advanced data collection technologies for Patient Generated Health Data (PGHD) has made remote health monitoring more accessible. However, the complex nature of the big volume of medical generated data presents a significant challenge for traditional patient monitoring approaches, impeding the effective extraction of useful information. In this context, it is imperative to develop a robust and cost-effective framework that provides the scalability and deals with the heterogeneity of PGHD in real-time. Such a system could serve as a reference and would guide future research for monitoring patient undergoing a treatment at home conditions. This study presents a real-time visual analytics framework offering insightful visual representations of the multimodal big data. The proposed system was designed following the principles of User Centered Design (UCD) to ensure that it meets the needs and expectations of medical practitioners. The usability of this framework was evaluated by its application to the visualization of kinematic data of the upper limbs’ movement of patients during neuromotor rehabilitation exercises.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131110738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3D landscapes generation is an interdisciplinary field that requires expertise in both computer graphics and geographic informations systems (GIS). It is a complex and time-consuming process. In this paper, we present a new approach to simplify 3D environment generation process, by creating a go-between data-model containing a list of available source data and steps to use them. To feed the data-model, we introduce a formal language that describes the process"s sequence. We propose an adapted format, designed to be human-readable and machine-readable, allowing for easy creation and modification of the scenery. We demonstrate the utility of our approach by implementing a prototype system to generate 3D landscapes with a use-case fit for multipurpose simulation. Our system takes a description as input and outputs a complete 3D environment, including terrain and feature elements such as buildings created by chosen geometrical process. Experiments show that our approach reduces the time and effort required to generate a 3D environment, making it accessible to a wider range of users without extensive knowledge of GIS. In conclusion, our custom language and implementation provide a simple and effective solution to the complexity of 3D terrain generation, making it a valuable tool for users in the area.
{"title":"Operational theater generation by a descriptive language","authors":"Matis Ghiotto, B. Desbenoit, Romain Raffin","doi":"10.24132/csrn.3301.19","DOIUrl":"https://doi.org/10.24132/csrn.3301.19","url":null,"abstract":"3D landscapes generation is an interdisciplinary field that requires expertise in both computer graphics and geographic informations systems (GIS). It is a complex and time-consuming process. In this paper, we present a new approach to simplify 3D environment generation process, by creating a go-between data-model containing a list of available source data and steps to use them. To feed the data-model, we introduce a formal language that describes the process\"s sequence. We propose an adapted format, designed to be human-readable and machine-readable, allowing for easy creation and modification of the scenery. We demonstrate the utility of our approach by implementing a prototype system to generate 3D landscapes with a use-case fit for multipurpose simulation. Our system takes a description as input and outputs a complete 3D environment, including terrain and feature elements such as buildings created by chosen geometrical process. Experiments show that our approach reduces the time and effort required to generate a 3D environment, making it accessible to a wider range of users without extensive knowledge of GIS. In conclusion, our custom language and implementation provide a simple and effective solution to the complexity of 3D terrain generation, making it a valuable tool for users in the area.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131455334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ray tracing remains of interest to Computer Graphics community with its elegant framing of how light interacts with objects, being able to easily support multiple light sources, and simple framework of merging synthetic and real cameras. Recent trends to provide implementations at the chip-level means raytracing’s constant quest of realism would propel its usage in real-time applications. AR/VR, Animations, 3DGames Industry, 3D-large scale simulations, and future social computing platforms are just a few examples of possible major impact. Raytracing is also appealing to HCI community because raytracing extends well along the 3D-space and time, seamlessly blending both synthetic and real cameras at multiple scales to support storytelling. This presentation will include a few milestones from my work such as the Slicing Extent technique and Directed Safe Zones. Our recent applications of applying machine learning techniques creating novel synthetic views, which could also provide a future doorway to handle dynamic scenes with more compute power as needed, will also be presented. It is once again renaissance for ray tracing which for last 50+ years has remained the most elegant technique for modeling light phenomena in virtual worlds at whatever scale compute power could support.
{"title":"Raytracing Renaissance: An elegant framework for modeling light at Multiple Scales","authors":"S. Semwal","doi":"10.24132/csrn.3301.2","DOIUrl":"https://doi.org/10.24132/csrn.3301.2","url":null,"abstract":"Ray tracing remains of interest to Computer Graphics community with its elegant framing of how light interacts with objects, being able to easily support multiple light sources, and simple framework of merging synthetic and real cameras. Recent trends to provide implementations at the chip-level means raytracing’s constant quest of realism would propel its usage in real-time applications. AR/VR, Animations, 3DGames Industry, 3D-large scale simulations, and future social computing platforms are just a few examples of possible major impact. Raytracing is also appealing to HCI community because raytracing extends well along the 3D-space and time, seamlessly blending both synthetic and real cameras at multiple scales to support storytelling. This presentation will include a few milestones from my work such as the Slicing Extent technique and Directed Safe Zones. Our recent applications of applying machine learning techniques creating novel synthetic views, which could also provide a future doorway to handle dynamic scenes with more compute power as needed, will also be presented. It is once again renaissance for ray tracing which for last 50+ years has remained the most elegant technique for modeling light phenomena in virtual worlds at whatever scale compute power could support.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133043618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent academic literature Sex and Gender have both become synonyms, even though distinct definitions do exist. This give rise to the question, which of those two are actually face image classifiers identifying? It will be argued and explained why CNN based classifiers will generally identify gender, while feeding face recognition feature vectors into a neural network, will tend to verify sex rather than gender. It is shown for the first time how state of the art Sex Classification can be performed using Embedded Prototype Subspace Classifiers (EPSC) and also how the projection depth can be learned efficiently. The automatic Gender classification, which is produced by the emph{InsightFace} project, is used as a baseline and compared to the results given by the EPSC, which takes the feature vectors produced by emph{InsightFace} as input. It turns out that the depth of projection needed is much larger for these face feature vectors than for an example classifying on MNIST or similar. Therefore, one important contribution is a simple method to determine the optimal depth for any kind of data. Furthermore, it is shown how the weights in the final layer can be set in order to make the choice of depth stable and independent of the kind of learning data. The resulting EPSC is extremely light weight and yet very accurate, reaching over $98%$ accuracy for several datasets.
{"title":"Sex Classification of Face Images using Embedded Prototype Subspace Classifiers","authors":"A. Hast","doi":"10.24132/csrn.3301.7","DOIUrl":"https://doi.org/10.24132/csrn.3301.7","url":null,"abstract":"In recent academic literature Sex and Gender have both become synonyms, even though distinct definitions do exist. This give rise to the question, which of those two are actually face image classifiers identifying? It will be argued and explained why CNN based classifiers will generally identify gender, while feeding face recognition feature vectors into a neural network, will tend to verify sex rather than gender. It is shown for the first time how state of the art Sex Classification can be performed using Embedded Prototype Subspace Classifiers (EPSC) and also how the projection depth can be learned efficiently. The automatic Gender classification, which is produced by the emph{InsightFace} project, is used as a baseline and compared to the results given by the EPSC, which takes the feature vectors produced by emph{InsightFace} as input. It turns out that the depth of projection needed is much larger for these face feature vectors than for an example classifying on MNIST or similar. Therefore, one important contribution is a simple method to determine the optimal depth for any kind of data. Furthermore, it is shown how the weights in the final layer can be set in order to make the choice of depth stable and independent of the kind of learning data. The resulting EPSC is extremely light weight and yet very accurate, reaching over $98%$ accuracy for several datasets.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133757922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Energy consumption for computing and using hypersurface curvature in volume dataset analysis and visualization is studied here. Base usage and usage when certain optimization steps, including compiler optimizations and variant memory layout strategies, are considered for both analysis and volume visualization tasks. Focus here is on x86, which is popular and has power measurement capabilities. The work aims to advance understanding of computing"s energy footprint and to provide guidance for energy-responsible volume data analysis.
{"title":"First Considerations in Computing and Using Hypersurface Curvature for Energy Efficiency","authors":"Jacob D. Hauenstein, imothy S. Newman","doi":"10.24132/csrn.3301.22","DOIUrl":"https://doi.org/10.24132/csrn.3301.22","url":null,"abstract":"Energy consumption for computing and using hypersurface curvature in volume dataset analysis and visualization is studied here. Base usage and usage when certain optimization steps, including compiler optimizations and variant memory layout strategies, are considered for both analysis and volume visualization tasks. Focus here is on x86, which is popular and has power measurement capabilities. The work aims to advance understanding of computing\"s energy footprint and to provide guidance for energy-responsible volume data analysis.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124732588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research on light vision mechanisms in biosystems and on the mechanisms of formation of deficits in color discrimination[1] reveals that not only white light is polychromatic but all light waves are. The spectrum of white light is composed of aggregations of only 4 monochromatic waves: magenta UV 384 nm, cyan 432 nm, yellow 576 nm and magenta IR 768 nm, grouped in 5 bi-chromatic waves: cinnabar red (magenta IR + yellow), green (yellow + cyan), indigo (cyan + magenta UV) and also two semi-bright bi-chromatic waves - porphyry IR (semi-infrared wave composed of the magenta IR 768 nm wave and the colorless infrared wave 864 nm) and porphyry UV (semi-ultraviolet wave composed of the magenta UV 384 nm wave and the colorless ultraviolet wave 288 nm). The light waves thus composed create the light sensations due to the mechanism of additive synthesis. The method allows a new approach to interpret the composition of the bright waves, the phenomenon of decomposition of colours and additive synthesis that constitutes the principle of colour production in computers. The new elaborate models of colour physics also constitute the basis by interpretation of the mechanisms of vision of colours.
{"title":"Polychromatism of all light waves: new approach to the analysis of the physical and perceptive color aspects","authors":"Justyna Niewiadomska-Kaplar","doi":"10.24132/csrn.3301.43","DOIUrl":"https://doi.org/10.24132/csrn.3301.43","url":null,"abstract":"Research on light vision mechanisms in biosystems and on the mechanisms of formation of deficits in color discrimination[1] reveals that not only white light is polychromatic but all light waves are. The spectrum of white light is composed of aggregations of only 4 monochromatic waves: magenta UV 384 nm, cyan 432 nm, yellow 576 nm and magenta IR 768 nm, grouped in 5 bi-chromatic waves: cinnabar red (magenta IR + yellow), green (yellow + cyan), indigo (cyan + magenta UV) and also two semi-bright bi-chromatic waves - porphyry IR (semi-infrared wave composed of the magenta IR 768 nm wave and the colorless infrared wave 864 nm) and porphyry UV (semi-ultraviolet wave composed of the magenta UV 384 nm wave and the colorless ultraviolet wave 288 nm). The light waves thus composed create the light sensations due to the mechanism of additive synthesis. The method allows a new approach to interpret the composition of the bright waves, the phenomenon of decomposition of colours and additive synthesis that constitutes the principle of colour production in computers. The new elaborate models of colour physics also constitute the basis by interpretation of the mechanisms of vision of colours.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"57 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123385467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper presents a method for detecting dangerous situations near pedestrian crossings using an in-car camera system. The approach utilizes deep learning-based object detection to identify pedestrians and vehicles, analyzing their behavior to identify potential hazards. The system incorporates vehicle sensor data for enhanced accuracy. Evaluation results show high accuracy in detecting dangerous situations. The proposed system can potentially enhance pedestrian and driver safety in urban transportation.
{"title":"Detection of Dangerous Situations Near Pedestrian Crossings using In-Car Camera","authors":"M. Kubanek, Lukasz Karbowiak, J. Bobulski","doi":"10.24132/csrn.3301.41","DOIUrl":"https://doi.org/10.24132/csrn.3301.41","url":null,"abstract":"The paper presents a method for detecting dangerous situations near pedestrian crossings using an in-car camera system. The approach utilizes deep learning-based object detection to identify pedestrians and vehicles, analyzing their behavior to identify potential hazards. The system incorporates vehicle sensor data for enhanced accuracy. Evaluation results show high accuracy in detecting dangerous situations. The proposed system can potentially enhance pedestrian and driver safety in urban transportation.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129657460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}