首页 > 最新文献

Proceedings of the 5th International Conference on Bioinformatics Research and Applications最新文献

英文 中文
The Optimal Number and Distribution of Channels in Mental Fatigue Classification Based on GA-SVM 基于GA-SVM的心理疲劳分类中通道的最优数量与分布
Yinhe Sheng, Kang Huang, Liping Wang, Pengfei Wei
Mental fatigue is closely related to our daily life and work, a considerable number of studies have achieved good results in quantifying and predicting them. Although some studies have achieved a high accuracy by using only a single channel, and a few have explored the optimal solution for feature and channel selection. However, detailed research of optimally setting the electrodes position and determining the number channels are rarely seen. In this study, by designing a novel genetic operator and applying the GA-SVM model, we compared the maximum number of optimal channels and their distributions. The result suggests that the classification accuracy almost reaches its optimum (94.0±5.3 %) when the maximum number of channels reaches 5, and is not affected by the epoch length. The whole brain optimal channels topographic map analysis shows that the optimal channels are mainly distributed in the prefrontal, occipital and temporal lobes, while hardly any is located in the parietal lobe, which indicates that the mental fatigue induced by visual search task characterized similarly among different individuals and highly task-related.
精神疲劳与我们的日常生活和工作密切相关,相当多的研究在对其进行量化和预测方面取得了很好的效果。虽然一些研究仅使用单个通道就获得了较高的准确性,但也有少数研究探索了特征和通道选择的最佳解决方案。然而,关于电极位置的最佳设置和通道数的确定的详细研究却很少。在本研究中,通过设计一种新的遗传算子,并应用GA-SVM模型,比较了最优信道的最大数量及其分布。结果表明,当最大通道数达到5个时,分类精度基本达到最佳(94.0±5.3%),且不受历元长度的影响。全脑最优通道地形图分析表明,最优通道主要分布在前额叶、枕叶和颞叶,顶叶几乎没有,这说明视觉搜索任务引起的精神疲劳在不同个体之间具有相似性,具有高度的任务相关性。
{"title":"The Optimal Number and Distribution of Channels in Mental Fatigue Classification Based on GA-SVM","authors":"Yinhe Sheng, Kang Huang, Liping Wang, Pengfei Wei","doi":"10.1145/3309129.3309140","DOIUrl":"https://doi.org/10.1145/3309129.3309140","url":null,"abstract":"Mental fatigue is closely related to our daily life and work, a considerable number of studies have achieved good results in quantifying and predicting them. Although some studies have achieved a high accuracy by using only a single channel, and a few have explored the optimal solution for feature and channel selection. However, detailed research of optimally setting the electrodes position and determining the number channels are rarely seen. In this study, by designing a novel genetic operator and applying the GA-SVM model, we compared the maximum number of optimal channels and their distributions. The result suggests that the classification accuracy almost reaches its optimum (94.0±5.3 %) when the maximum number of channels reaches 5, and is not affected by the epoch length. The whole brain optimal channels topographic map analysis shows that the optimal channels are mainly distributed in the prefrontal, occipital and temporal lobes, while hardly any is located in the parietal lobe, which indicates that the mental fatigue induced by visual search task characterized similarly among different individuals and highly task-related.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126221987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proposed Battlefield Simulator Using GPU 提出了使用GPU的战场模拟器
S. Chaware, Omkar Udawant, Kiran Joshi, Tejas Deshpande
Battlefield is an area where you cannot predict the attacking situation from an opposition. The situation may become worse when the enemy tankers may attack from various position and we will not enough get chance to think about our security. If by any mean we can analysis the situation of battling, we can easily decide the attacking strategy against any attack. This entire environment may simulate through a simulator where we can decide to attack and defend ourselves. In this paper, we had proposed a battlefield simulator which helps in eliminating manual efforts of artillery testing and the demonstration cost required for the same. This simulator takes parameters such as type of artillery to be tested, environmental conditions and strategic planning. Damage caused by the artillery is calculated using physics formulae designed for achieving actual results. We had compared the situation with CPU and GPU processor and found that GPU is must faster than CPU and gives more accuracy.
战场是一个你无法预测对手进攻情况的区域。当敌人的坦克从各个位置攻击时,情况可能会变得更糟,我们没有足够的机会考虑我们的安全问题。如果我们能分析战局,就能很容易地决定对付任何进攻的进攻策略。整个环境可以通过模拟器模拟,我们可以决定攻击和防御自己。在本文中,我们提出了一个战场模拟器,有助于消除人工测试和演示所需的成本。该模拟器接受待测火炮类型、环境条件和战略规划等参数。火炮造成的损伤是用物理公式计算的,以达到实际结果。我们将CPU和GPU处理器的情况进行了比较,发现GPU的速度肯定比CPU快,而且精度更高。
{"title":"Proposed Battlefield Simulator Using GPU","authors":"S. Chaware, Omkar Udawant, Kiran Joshi, Tejas Deshpande","doi":"10.1145/3309129.3309131","DOIUrl":"https://doi.org/10.1145/3309129.3309131","url":null,"abstract":"Battlefield is an area where you cannot predict the attacking situation from an opposition. The situation may become worse when the enemy tankers may attack from various position and we will not enough get chance to think about our security. If by any mean we can analysis the situation of battling, we can easily decide the attacking strategy against any attack. This entire environment may simulate through a simulator where we can decide to attack and defend ourselves. In this paper, we had proposed a battlefield simulator which helps in eliminating manual efforts of artillery testing and the demonstration cost required for the same. This simulator takes parameters such as type of artillery to be tested, environmental conditions and strategic planning. Damage caused by the artillery is calculated using physics formulae designed for achieving actual results. We had compared the situation with CPU and GPU processor and found that GPU is must faster than CPU and gives more accuracy.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127943263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Effect of Machine Learning Algorithms on Metagenomics Gene Prediction 机器学习算法对宏基因组基因预测的影响
Amani A. Al-Ajlan, Achraf El Allali
The development of next generation sequencing facilitates the study of metagenomics. Computational gene prediction aims to find the location of genes in a given DNA sequence. Gene prediction in metagenomics is a challenging task because of the short and fragmented nature of the data. Our previous framework minimum redundancy maximum relevance - support vector machines (mRMR-SVM) produced promising results in metagenomics gene prediction. In this paper, we review available metagenomics gene prediction programs and study the effect of the machine learning approach on gene prediction by altering the underlining machine learning algorithm in our previous framework. Overall, SVM produces the highest accuracy based on tests performed on a simulated dataset.
下一代测序技术的发展促进了宏基因组学的研究。计算基因预测的目的是在给定的DNA序列中找到基因的位置。宏基因组学的基因预测是一项具有挑战性的任务,因为数据的短和碎片性。我们之前的框架最小冗余最大相关-支持向量机(mRMR-SVM)在宏基因组基因预测中取得了很好的结果。在本文中,我们回顾了现有的宏基因组学基因预测程序,并通过改变我们之前框架中强调的机器学习算法来研究机器学习方法对基因预测的影响。总体而言,基于在模拟数据集上执行的测试,SVM产生最高的准确性。
{"title":"The Effect of Machine Learning Algorithms on Metagenomics Gene Prediction","authors":"Amani A. Al-Ajlan, Achraf El Allali","doi":"10.1145/3309129.3309136","DOIUrl":"https://doi.org/10.1145/3309129.3309136","url":null,"abstract":"The development of next generation sequencing facilitates the study of metagenomics. Computational gene prediction aims to find the location of genes in a given DNA sequence. Gene prediction in metagenomics is a challenging task because of the short and fragmented nature of the data. Our previous framework minimum redundancy maximum relevance - support vector machines (mRMR-SVM) produced promising results in metagenomics gene prediction. In this paper, we review available metagenomics gene prediction programs and study the effect of the machine learning approach on gene prediction by altering the underlining machine learning algorithm in our previous framework. Overall, SVM produces the highest accuracy based on tests performed on a simulated dataset.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132307373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DNA Computing Sequence Design Based on Bacterial Foraging Algorithm 基于细菌觅食算法的DNA计算序列设计
Jiankang Ren, Yao Yao
Since the quantity and quality of DNA sequence directly affect the accuracy and efficiency of computation, the design of DNA sequence is critical to DNA computing. In order to improve the reliability of DNA computing, there is a rich literature targeting at making DNA sequences specifically hybridize at a lower melting temperature, no non-complementary bases pairs or mismatch hybridization in the reformed double helix. However, most of them are not good enough to control the melting temperature, because DNA sequence design problem under the constraints of hamming distance, secondary structure, molecular thermodynamic is known to be NP-hard. For the sake of achieving the lower and similar melting temperature for each DNA sequence, we proposed a DNA sequence coding method based on Bacterial Foraging Algorithm (BFA). An evaluation criterion is particularly proposed to assess the quality of DNA sequence in the optimization process. With BFA, high-quality DNA strands are replicated to avoid the participation of inferior strands in the operation. Experiments show our proposed approach significantly outperforms existing methods in terms of continuity and melting temperature.
由于DNA序列的数量和质量直接影响计算的准确性和效率,因此DNA序列的设计对DNA计算至关重要。为了提高DNA计算的可靠性,有大量的文献针对DNA序列在较低的熔融温度下特异性杂交,在改造后的双螺旋结构中不存在非互补碱基对或错配杂交。然而,它们中的大多数都不足以控制熔化温度,因为已知DNA序列设计问题在汉明距离、二级结构、分子热力学的约束下是NP-hard。为了使每个DNA序列的熔化温度更低且相似,提出了一种基于细菌觅食算法(BFA)的DNA序列编码方法。特别提出了一种评价DNA序列优化质量的评价标准。利用BFA,高质量的DNA链被复制,以避免劣质DNA链参与手术。实验表明,我们提出的方法在连续性和熔化温度方面明显优于现有方法。
{"title":"DNA Computing Sequence Design Based on Bacterial Foraging Algorithm","authors":"Jiankang Ren, Yao Yao","doi":"10.1145/3309129.3309147","DOIUrl":"https://doi.org/10.1145/3309129.3309147","url":null,"abstract":"Since the quantity and quality of DNA sequence directly affect the accuracy and efficiency of computation, the design of DNA sequence is critical to DNA computing. In order to improve the reliability of DNA computing, there is a rich literature targeting at making DNA sequences specifically hybridize at a lower melting temperature, no non-complementary bases pairs or mismatch hybridization in the reformed double helix. However, most of them are not good enough to control the melting temperature, because DNA sequence design problem under the constraints of hamming distance, secondary structure, molecular thermodynamic is known to be NP-hard. For the sake of achieving the lower and similar melting temperature for each DNA sequence, we proposed a DNA sequence coding method based on Bacterial Foraging Algorithm (BFA). An evaluation criterion is particularly proposed to assess the quality of DNA sequence in the optimization process. With BFA, high-quality DNA strands are replicated to avoid the participation of inferior strands in the operation. Experiments show our proposed approach significantly outperforms existing methods in terms of continuity and melting temperature.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124367354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Microrna-1/206 Target both Monocarboxylate Transporter(MCT)-4 and Vascular Endothelial Growth Factor(VEGF)Genes Leading to Inhibition of Tumor Growth Microrna-1/206同时靶向单羧酸转运蛋白(MCT)-4和血管内皮生长因子(VEGF)基因,从而抑制肿瘤生长
Anas Khaleel, A. Elbakkoush, Amneh H. Tarkhan, Aiman Mahdi
Colorectal cancer is one of the most common types of cancer in the world, and its incidence is mostly influenced by lifestyle factors. Despite having a much smaller role, genetics also affects the susceptibility and development of colorectal cancer. The aim of the present study is to investigate the regulatory functions of candidate microRNAs (miRs) 1 and 206 in the context of solute carrier family 16 member 3 (SLC16A3) and vascular endothelial growth factor (VEGF) expression. To achieve this, 24 oncogenes targeted by miR-1 and miR-206 were analyzed via GeneMANIA. The miRTarBase database was then employed to ascertain the nature of the miR-oncogene relationship. Our findings illustrate that miR-1/206 indirectly reduce CRC growth and infiltration by targeting the both the SLC16A3 and VEGF genes. Moreover, miR-1/206 targets the VEGF gene to reduce tumor angiogenesis and vasculature. Conclusively, the results of the current study illustrate a novel regulation pathway in CRC cells, suggesting new potential lines of CRC therapy.
结直肠癌是世界上最常见的癌症之一,其发病率主要受生活方式因素的影响。尽管基因的作用要小得多,但它也会影响结直肠癌的易感性和发展。本研究的目的是研究候选microRNAs (miRs) 1和206在溶质载体家族16成员3 (SLC16A3)和血管内皮生长因子(VEGF)表达中的调节功能。为了实现这一点,通过GeneMANIA分析了miR-1和miR-206靶向的24个癌基因。然后使用miRTarBase数据库来确定mir -癌基因关系的性质。我们的研究结果表明,miR-1/206通过靶向SLC16A3和VEGF基因间接降低CRC的生长和浸润。此外,miR-1/206靶向VEGF基因减少肿瘤血管生成和血管。总之,本研究的结果阐明了CRC细胞中的一种新的调控途径,为CRC治疗提供了新的潜在途径。
{"title":"Microrna-1/206 Target both Monocarboxylate Transporter(MCT)-4 and Vascular Endothelial Growth Factor(VEGF)Genes Leading to Inhibition of Tumor Growth","authors":"Anas Khaleel, A. Elbakkoush, Amneh H. Tarkhan, Aiman Mahdi","doi":"10.1145/3309129.3309144","DOIUrl":"https://doi.org/10.1145/3309129.3309144","url":null,"abstract":"Colorectal cancer is one of the most common types of cancer in the world, and its incidence is mostly influenced by lifestyle factors. Despite having a much smaller role, genetics also affects the susceptibility and development of colorectal cancer. The aim of the present study is to investigate the regulatory functions of candidate microRNAs (miRs) 1 and 206 in the context of solute carrier family 16 member 3 (SLC16A3) and vascular endothelial growth factor (VEGF) expression. To achieve this, 24 oncogenes targeted by miR-1 and miR-206 were analyzed via GeneMANIA. The miRTarBase database was then employed to ascertain the nature of the miR-oncogene relationship. Our findings illustrate that miR-1/206 indirectly reduce CRC growth and infiltration by targeting the both the SLC16A3 and VEGF genes. Moreover, miR-1/206 targets the VEGF gene to reduce tumor angiogenesis and vasculature. Conclusively, the results of the current study illustrate a novel regulation pathway in CRC cells, suggesting new potential lines of CRC therapy.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"42 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128220822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Central Sleep Apnea Based on a Single-Lead ECG 基于单导联心电图的中枢性睡眠呼吸暂停检测
P. D. Hung
Central sleep apnea (CSA) is a sleep-related disorder in which breathing is either diminished or absent, typically for 10 to 30 seconds, intermittently or in cycles. CSA is usually due to an instability in the body's feedback mechanisms that control respiration. Central sleep apnea can also be an indicator of Arnold-Chiari malformation. Therefore, various attempts have been made to produce a monitoring system for automatic Central sleep apnea scoring to reduce clinical efforts. This paper describes a system that can identify Central sleep apnea by means of a single-lead ECG and a Multilayer Perceptron network (MLP). Results show that a minute-by-minute classification accuracy of over 83% is achievable.
中枢性睡眠呼吸暂停(CSA)是一种与睡眠有关的疾病,患者呼吸减少或消失,通常持续10至30秒,间歇性或周期性。CSA通常是由于控制呼吸的身体反馈机制不稳定造成的。中枢性睡眠呼吸暂停也可能是Arnold-Chiari畸形的一个指标。因此,人们进行了各种尝试,以产生一种自动中枢睡眠呼吸暂停评分的监测系统,以减少临床工作。本文介绍了一种利用单导联心电图和多层感知器网络(MLP)识别中枢性睡眠呼吸暂停的系统。结果表明,分分钟分类准确率可达83%以上。
{"title":"Detection of Central Sleep Apnea Based on a Single-Lead ECG","authors":"P. D. Hung","doi":"10.1145/3309129.3309132","DOIUrl":"https://doi.org/10.1145/3309129.3309132","url":null,"abstract":"Central sleep apnea (CSA) is a sleep-related disorder in which breathing is either diminished or absent, typically for 10 to 30 seconds, intermittently or in cycles. CSA is usually due to an instability in the body's feedback mechanisms that control respiration. Central sleep apnea can also be an indicator of Arnold-Chiari malformation. Therefore, various attempts have been made to produce a monitoring system for automatic Central sleep apnea scoring to reduce clinical efforts. This paper describes a system that can identify Central sleep apnea by means of a single-lead ECG and a Multilayer Perceptron network (MLP). Results show that a minute-by-minute classification accuracy of over 83% is achievable.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125231471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Mental Stress Evaluation of Car Driver in Different Road Complexity Using Heart Rate Variability (HRV) Analysis 基于心率变异性分析的不同道路复杂程度下汽车驾驶员心理压力评价
S. Sugiono, Denny Widhayanuriyawan, Debrina P. Andriyani
Controlling driver stress level is going popular research and put it very important factor to reduce risk of road accident. The aim of the paper is to analysis the impact of road complexity on driver stress level based on physiological factor of Heart Rate Variability (HRV). The first step of the research is literature study on human stress, Heart Rate Variability (HRV), Electrocardiograph (ECG), and NASA TLX mental work load. The driver will use ECG to monitor and then recorded at every heart rate change at any time from three different road conditions of city road, rural road, and motorways. The collected sampling data are 26 male drivers with the average age of 21 years old and average driving experience of 4.08 years. Mental stress evaluation of driver was assessed by frustration level (F) in NASA TLX questioner (subjective measurenment) and HRV in time domain analysis mRR (objective measurenment). The statistic test demontrated that there are not signifficant different mental stress level for driver between mRR and F - NASA TLX. The city road produced avarage F - NASA TLX = 3.92 and mRR = 612.40ms, rural road produced avarage F - NASA TLX = 3.46 and mRR = 621.26 ms, and motorway produced avarage F - NASA TLX = 2.50 and mRR = 820.20 ms. In sort, the mRR of HRV data can be used to monitor the mental stress level of driver in real time as consequence it baneficely implemented in car alert safety system.
控制驾驶员的应激水平是降低道路交通事故风险的重要因素之一。本文的目的是基于心率变异性(HRV)这一生理因素,分析道路复杂程度对驾驶员应激水平的影响。研究的第一步是对人体压力、心率变异性(HRV)、心电图(ECG)和NASA TLX精神工作负荷进行文献研究。驾驶员将使用心电图监测并记录在城市道路、农村道路和高速公路三种不同道路条件下的每一次心率变化。采集的样本数据为26名男性驾驶员,平均年龄21岁,平均驾驶经验4.08年。采用NASA TLX提问者主观测量挫败度(F)和时域分析HRV mRR(客观测量)对驾驶员心理压力进行评价。统计检验表明,驾驶员心理应激水平在mRR和F - NASA TLX之间无显著差异。城市公路产生平均F - NASA TLX = 3.92, mRR = 612.40ms,农村公路产生平均F - NASA TLX = 3.46, mRR = 621.26 ms,高速公路产生平均F - NASA TLX = 2.50, mRR = 820.20 ms。因此,HRV数据的mRR可以用于实时监测驾驶员的心理压力水平,从而有效地实现了汽车警报安全系统。
{"title":"Mental Stress Evaluation of Car Driver in Different Road Complexity Using Heart Rate Variability (HRV) Analysis","authors":"S. Sugiono, Denny Widhayanuriyawan, Debrina P. Andriyani","doi":"10.1145/3309129.3309145","DOIUrl":"https://doi.org/10.1145/3309129.3309145","url":null,"abstract":"Controlling driver stress level is going popular research and put it very important factor to reduce risk of road accident. The aim of the paper is to analysis the impact of road complexity on driver stress level based on physiological factor of Heart Rate Variability (HRV). The first step of the research is literature study on human stress, Heart Rate Variability (HRV), Electrocardiograph (ECG), and NASA TLX mental work load. The driver will use ECG to monitor and then recorded at every heart rate change at any time from three different road conditions of city road, rural road, and motorways. The collected sampling data are 26 male drivers with the average age of 21 years old and average driving experience of 4.08 years. Mental stress evaluation of driver was assessed by frustration level (F) in NASA TLX questioner (subjective measurenment) and HRV in time domain analysis mRR (objective measurenment). The statistic test demontrated that there are not signifficant different mental stress level for driver between mRR and F - NASA TLX. The city road produced avarage F - NASA TLX = 3.92 and mRR = 612.40ms, rural road produced avarage F - NASA TLX = 3.46 and mRR = 621.26 ms, and motorway produced avarage F - NASA TLX = 2.50 and mRR = 820.20 ms. In sort, the mRR of HRV data can be used to monitor the mental stress level of driver in real time as consequence it baneficely implemented in car alert safety system.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129101140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Biotable: A Tool to Extract Semantic Structure of Table in Biology Literature 生物表:生物学文献中表语义结构的提取工具
Daipeng Luo, Jing Peng, Yuhua Fu
The publication of biological literature increasing year by year. And the important information in biomedical articles may only appear in tables. However, research on information extraction from tables is rare. Nowadays, there are two ways to do table mining. The first way is that researchers convert the document to HTML format, but the performance of conversion is terrible. The second way is that researchers use documents in XML format directly, but the number of XML documents are limited. To solve this problem, we propose Biotable, a tool for mining biological tables in PDF documents. We use the concept of Connected Value to locate the table boundary and locate each cell after converting each page of the PDF into a picture. In the analysis of the table header field, we convert all the heterogeneous table headers into one row. Then we will have better understanding of the semantics of each column. Based on Biotable and the pipeline QTLMiners proposed, we performed a table mining experiment on QTLMiner's dataset. The precision value of the table detection is 98.12% and the recall value of table detection is 93.14%. The recall value of QTL statements is 86.53%.
生物文献的发表量逐年增加。生物医学文章中的重要信息可能只出现在表格中。然而,从表格中提取信息的研究很少。目前,有两种方法可以进行表挖掘。第一种方法是研究人员将文档转换为HTML格式,但是转换的性能很差。第二种方法是研究人员直接使用XML格式的文档,但是XML文档的数量有限。为了解决这一问题,我们提出了一个挖掘PDF文档中生物表的工具Biotable。我们使用Connected Value的概念来定位表边界,并在将PDF的每个页面转换为图片后定位每个单元格。在分析表头字段时,我们将所有异构表头转换为一行。这样我们就能更好地理解每一列的语义。基于Biotable和QTLMiner提出的流水线,我们对QTLMiner的数据集进行了表挖掘实验。表检测的准确率为98.12%,召回率为93.14%。QTL语句的召回值为86.53%。
{"title":"Biotable: A Tool to Extract Semantic Structure of Table in Biology Literature","authors":"Daipeng Luo, Jing Peng, Yuhua Fu","doi":"10.1145/3309129.3309139","DOIUrl":"https://doi.org/10.1145/3309129.3309139","url":null,"abstract":"The publication of biological literature increasing year by year. And the important information in biomedical articles may only appear in tables. However, research on information extraction from tables is rare. Nowadays, there are two ways to do table mining. The first way is that researchers convert the document to HTML format, but the performance of conversion is terrible. The second way is that researchers use documents in XML format directly, but the number of XML documents are limited. To solve this problem, we propose Biotable, a tool for mining biological tables in PDF documents. We use the concept of Connected Value to locate the table boundary and locate each cell after converting each page of the PDF into a picture. In the analysis of the table header field, we convert all the heterogeneous table headers into one row. Then we will have better understanding of the semantics of each column. Based on Biotable and the pipeline QTLMiners proposed, we performed a table mining experiment on QTLMiner's dataset. The precision value of the table detection is 98.12% and the recall value of table detection is 93.14%. The recall value of QTL statements is 86.53%.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121716973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Study on Optimizing MarkDuplicate in Genome Sequencing Pipeline 基因组测序流水线中MarkDuplicate优化研究
Qi Zhao
MarkDuplicate is typically one of the most time-consuming operations in the whole genome sequencing pipeline. Picard tool, which is widely used by biologists to sort reads in genome data and mark duplicate reads in sorted genome data, has relatively low performance on MarkDuplicate due to its single-thread sequential Java implementation, which has caused serious impact on nowadays bioinformatic researches. To accelerate MarkDuplicate in Picard, we present our two-stage optimization solution as a preliminary study on next generation bioinformatic software tools to better serve bioinformatic researches. In the first stage, we improve the original algorithm of tracking optical duplicate reads by eliminating large redundant operations. As a consequence, we achieve up to 50X speedup for the second step only and 9.57X overall process speedup. At the next stage, we redesign the I/O processing mechanism of MarkDuplicate as transforming between on-disk genome file and in-memory genome data by using ADAM format instead of previous SAM format, and implement cloud-scale MarkDuplicate application by Scala. Our evaluation is performed on top of Spark cluster with 25 worker nodes and Hadoop distributed file system. According to the evaluation results, our cloudscale MarkDuplicate can provide not only the same output but also better performance compared with the original Picard tool and other existing similar tools. Specifically, among the 13 sets of real whole genome data we used for evaluation at both stages, the best improvement we gain is reducing runtime by 92 hours in total. Average improvement reaches 48.69 decreasing hours.
MarkDuplicate通常是全基因组测序管道中最耗时的操作之一。生物学家广泛使用Picard工具对基因组数据中的reads进行排序,并在排序后的基因组数据中标记重复的reads,但由于其单线程顺序Java实现,使得其在MarkDuplicate上的性能相对较低,严重影响了当今的生物信息学研究。为了加速Picard中的MarkDuplicate,我们提出了两阶段优化方案,作为下一代生物信息学软件工具的初步研究,以更好地服务于生物信息学研究。在第一阶段,我们通过消除大冗余操作来改进原有的光学重复读取跟踪算法。因此,我们仅在第二步就实现了高达50倍的加速,而整个过程的加速则达到了9.57倍。下一步,我们将把MarkDuplicate的I/O处理机制重新设计为磁盘基因组文件和内存基因组数据之间的转换,使用ADAM格式代替之前的SAM格式,并通过Scala实现云规模的MarkDuplicate应用。我们的评估是在具有25个工作节点和Hadoop分布式文件系统的Spark集群上执行的。根据评估结果,我们的云规模MarkDuplicate不仅可以提供相同的输出,而且与原有的Picard工具和其他现有的类似工具相比,性能更好。具体而言,在我们用于两个阶段评估的13组真实全基因组数据中,我们获得的最佳改进是总共减少了92小时的运行时间。平均改善时间达到48.69小时。
{"title":"A Study on Optimizing MarkDuplicate in Genome Sequencing Pipeline","authors":"Qi Zhao","doi":"10.1145/3309129.3309134","DOIUrl":"https://doi.org/10.1145/3309129.3309134","url":null,"abstract":"MarkDuplicate is typically one of the most time-consuming operations in the whole genome sequencing pipeline. Picard tool, which is widely used by biologists to sort reads in genome data and mark duplicate reads in sorted genome data, has relatively low performance on MarkDuplicate due to its single-thread sequential Java implementation, which has caused serious impact on nowadays bioinformatic researches. To accelerate MarkDuplicate in Picard, we present our two-stage optimization solution as a preliminary study on next generation bioinformatic software tools to better serve bioinformatic researches. In the first stage, we improve the original algorithm of tracking optical duplicate reads by eliminating large redundant operations. As a consequence, we achieve up to 50X speedup for the second step only and 9.57X overall process speedup. At the next stage, we redesign the I/O processing mechanism of MarkDuplicate as transforming between on-disk genome file and in-memory genome data by using ADAM format instead of previous SAM format, and implement cloud-scale MarkDuplicate application by Scala. Our evaluation is performed on top of Spark cluster with 25 worker nodes and Hadoop distributed file system. According to the evaluation results, our cloudscale MarkDuplicate can provide not only the same output but also better performance compared with the original Picard tool and other existing similar tools. Specifically, among the 13 sets of real whole genome data we used for evaluation at both stages, the best improvement we gain is reducing runtime by 92 hours in total. Average improvement reaches 48.69 decreasing hours.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116202506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Breast Cancer Prediction Using Spark MLlib and ML Packages 使用Spark MLlib和ML包进行乳腺癌预测
P. D. Hung, Tran Duc Hanh, V. Diep
Nowadays, Machine Learning has been applied in variety aspects of life especially in health care. Classifications using Machine learning has been greatly improved in order to make predictions and to support doctors making diagnoses. Furthermore, human lives are changing with Big Data covering a wide of array of science knowledge and with Data Mining solving problems by analyzing data and discovering patterns in present databases. The prediction process is heavily data driven and therefore advanced machine learning techniques are often utilized. In this paper, we will take a look at what types experiment data are typically used, do preliminary analysis on them, and generate breast cancer prediction models - all with PySpark and its machine learning frameworks. Using a database with more than a hundred sets of data gathered in routine blood analysis, the accuracy rates of detection and classification are about 72% and 83% respectively.
如今,机器学习已经应用于生活的各个方面,特别是在医疗保健方面。为了做出预测和支持医生做出诊断,使用机器学习的分类已经得到了很大的改进。此外,大数据涵盖了广泛的科学知识,而数据挖掘通过分析数据和发现现有数据库中的模式来解决问题,人类的生活正在发生变化。预测过程在很大程度上是数据驱动的,因此经常使用先进的机器学习技术。在本文中,我们将看看通常使用什么类型的实验数据,对它们进行初步分析,并生成乳腺癌预测模型-所有这些都使用PySpark及其机器学习框架。利用百余组血常规分析数据的数据库,检测准确率约72%,分类准确率约83%。
{"title":"Breast Cancer Prediction Using Spark MLlib and ML Packages","authors":"P. D. Hung, Tran Duc Hanh, V. Diep","doi":"10.1145/3309129.3309133","DOIUrl":"https://doi.org/10.1145/3309129.3309133","url":null,"abstract":"Nowadays, Machine Learning has been applied in variety aspects of life especially in health care. Classifications using Machine learning has been greatly improved in order to make predictions and to support doctors making diagnoses. Furthermore, human lives are changing with Big Data covering a wide of array of science knowledge and with Data Mining solving problems by analyzing data and discovering patterns in present databases. The prediction process is heavily data driven and therefore advanced machine learning techniques are often utilized. In this paper, we will take a look at what types experiment data are typically used, do preliminary analysis on them, and generate breast cancer prediction models - all with PySpark and its machine learning frameworks. Using a database with more than a hundred sets of data gathered in routine blood analysis, the accuracy rates of detection and classification are about 72% and 83% respectively.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"81 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133825562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
Proceedings of the 5th International Conference on Bioinformatics Research and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1