Advancing forensic research: An examination of compositional data analysis with an application on petrol fraud detection

IF 1.9 4区 医学 Q2 MEDICINE, LEGAL Science & Justice Pub Date : 2023-11-24 DOI:10.1016/j.scijus.2023.11.003
M. Templ , J. Gonzalez-Rodriguez
{"title":"Advancing forensic research: An examination of compositional data analysis with an application on petrol fraud detection","authors":"M. Templ ,&nbsp;J. Gonzalez-Rodriguez","doi":"10.1016/j.scijus.2023.11.003","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, numerous studies have examined the chemical compounds of petrol and petrol data for forensic research. Standard quantitative methods often assume that the variables or compounds do not have compositional constraints or are not part of a constrained whole, operating within an Euclidean vector space. However, chemical compounds are typically part of a whole, and the appropriate vector space for their analysis is the simplex. Biased and arbitrary results result when statistical analysis are applied on such data without proper pre-processing of such data. Compositional analysis of data has not yet been considered in forensic science. Therefore, we compare classical statistical analysis as applied in forensic research and the new proposed paradigm of compositional data analysis (CoDa). It is demonstrated how such analysis improves the analysis in petrol and forensic science. Our study shows how principal component analysis (PCA) and classification results are affected by the preprocessing steps performed on the raw data.</p><p>Our results indicate that results from a log ratio analysis provides a better separation between subgroups of the data and leads to an easier interpretation of the results. In addition, with a compositional analysis a higher classification accuracy is obtained. Even a non-linear classification method - in our case a random forest - was shown to perform poorly when applied without using compositional methods. Moreover, normalization of samples due to laboratory/unit-of-measurement effects is no longer necessary, since the composition of an observation is in compositional thinking equivalent to a multiple of it, because the used (log) ratios on raw and log ratio transformed data are equal.</p><p>Petrol data from different petrol stations in Brazil are used for the demonstration. This data is highly susceptible to counterfeit petrol. Forensic analysis of its chemical elements requires non-biased statistical analysis designed for compositional data to detect fraud.</p><p>Based on these results, we recommend the use of compositional data methods for gasoline and petrol chemical element analysis and gasoline product characterization, authentication and fraud detection in forensic sciences.</p></div>","PeriodicalId":49565,"journal":{"name":"Science & Justice","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1355030623001223/pdfft?md5=728396c163cfdc0d530930c03d594831&pid=1-s2.0-S1355030623001223-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science & Justice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1355030623001223","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, numerous studies have examined the chemical compounds of petrol and petrol data for forensic research. Standard quantitative methods often assume that the variables or compounds do not have compositional constraints or are not part of a constrained whole, operating within an Euclidean vector space. However, chemical compounds are typically part of a whole, and the appropriate vector space for their analysis is the simplex. Biased and arbitrary results result when statistical analysis are applied on such data without proper pre-processing of such data. Compositional analysis of data has not yet been considered in forensic science. Therefore, we compare classical statistical analysis as applied in forensic research and the new proposed paradigm of compositional data analysis (CoDa). It is demonstrated how such analysis improves the analysis in petrol and forensic science. Our study shows how principal component analysis (PCA) and classification results are affected by the preprocessing steps performed on the raw data.

Our results indicate that results from a log ratio analysis provides a better separation between subgroups of the data and leads to an easier interpretation of the results. In addition, with a compositional analysis a higher classification accuracy is obtained. Even a non-linear classification method - in our case a random forest - was shown to perform poorly when applied without using compositional methods. Moreover, normalization of samples due to laboratory/unit-of-measurement effects is no longer necessary, since the composition of an observation is in compositional thinking equivalent to a multiple of it, because the used (log) ratios on raw and log ratio transformed data are equal.

Petrol data from different petrol stations in Brazil are used for the demonstration. This data is highly susceptible to counterfeit petrol. Forensic analysis of its chemical elements requires non-biased statistical analysis designed for compositional data to detect fraud.

Based on these results, we recommend the use of compositional data methods for gasoline and petrol chemical element analysis and gasoline product characterization, authentication and fraud detection in forensic sciences.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
推进法医学研究:成分数据分析在汽油欺诈检测中的应用
近年来,许多研究检查了汽油的化学成分和汽油数据,用于法医研究。标准的定量方法通常假设变量或化合物没有组成约束或不是受约束整体的一部分,在欧几里得向量空间中操作。然而,化合物通常是整体的一部分,对它们进行分析的合适向量空间是单纯形。如果没有对这些数据进行适当的预处理,就会对这些数据进行统计分析,结果会有偏差和武断。数据的成分分析在法医学中尚未被考虑。因此,我们比较了应用于法医研究的经典统计分析和新提出的成分数据分析范式(CoDa)。演示了这种分析如何改进汽油和法医科学的分析。我们的研究显示了主成分分析(PCA)和分类结果如何受到对原始数据执行的预处理步骤的影响。我们的结果表明,对数比分析的结果提供了数据子组之间更好的分离,并导致更容易解释结果。此外,通过成分分析可以获得更高的分类精度。即使是非线性分类方法——在我们的例子中是随机森林——在不使用组合方法的情况下也表现不佳。此外,由于实验室/测量单位效应,不再需要对样本进行归一化,因为观察的组成在组成思维中相当于它的倍数,因为原始数据和对数比转换数据的使用(对数)比率是相等的。来自巴西不同加油站的汽油数据被用于演示。这些数据很容易受到假冒汽油的影响。其化学元素的法医分析需要为成分数据设计的无偏见统计分析,以检测欺诈。基于这些结果,我们建议在法医科学中使用成分数据方法进行汽油和汽油化学元素分析以及汽油产品表征,认证和欺诈检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Science & Justice
Science & Justice 医学-病理学
CiteScore
4.20
自引率
15.80%
发文量
98
审稿时长
81 days
期刊介绍: Science & Justice provides a forum to promote communication and publication of original articles, reviews and correspondence on subjects that spark debates within the Forensic Science Community and the criminal justice sector. The journal provides a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. Science & Justice is published six times a year, and will be of interest primarily to practising forensic scientists and their colleagues in related fields. It is chiefly concerned with the publication of formal scientific papers, in keeping with its international learned status, but will not accept any article describing experimentation on animals which does not meet strict ethical standards. Promote communication and informed debate within the Forensic Science Community and the criminal justice sector. To promote the publication of learned and original research findings from all areas of the forensic sciences and by so doing to advance the profession. To promote the publication of case based material by way of case reviews. To promote the publication of conference proceedings which are of interest to the forensic science community. To provide a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. To appeal to all those with an interest in the forensic sciences.
期刊最新文献
How 3D printing technologies could undermine law enforcement strategies targeting the production and distribution of designer drugs Balancing validity and reliability as a function of sampling variability in forensic voice comparison Advancing justice: The impact of Brazil’s convict genetic profile identification project after 5 years A cut above the rest? The value of post-mortem examinations in undergraduate forensic science education New on-site color test to discriminate cocaine and cathinone derivatives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1