Pub Date : 2023-09-01DOI: 10.1177/1536867x231196496
Kristoffer Bjärkefur, Luíza Cardoso de Andrade, Benjamin Daniels
Data collection and cleaning workflows implement highly repetitive but extremely important processes. In this article, we describe an update to iefieldkit, a package developed to standardize and simplify best practices for high-quality primary data collection across the World Bank’s Development Impact Evaluation department. The first release of iefieldkit provided workflows to automate error checking for Open Data Kit-based survey modules, duplicate management, data cleaning, and codebook creation. This update to the package includes improved commands to document and implement data point corrections, verify the structure or contents of data using codebooks, and create replicationready data through automated variable subsetting.
数据收集和清理工作流程实现了高度重复但极其重要的过程。在本文中,我们介绍了iefieldkit的更新,该工具包是为规范和简化世界银行发展影响评估部门高质量原始数据收集的最佳实践而开发的。iefieldkit的第一个版本提供了工作流来自动检查基于Open Data kit的调查模块、副本管理、数据清理和代码本创建的错误。该包的更新包括改进的命令,用于记录和实现数据点更正,使用代码本验证数据的结构或内容,以及通过自动变量子集创建可复制数据。
{"title":"iefieldkit: Commands for primary data collection and cleaning (update)","authors":"Kristoffer Bjärkefur, Luíza Cardoso de Andrade, Benjamin Daniels","doi":"10.1177/1536867x231196496","DOIUrl":"https://doi.org/10.1177/1536867x231196496","url":null,"abstract":"Data collection and cleaning workflows implement highly repetitive but extremely important processes. In this article, we describe an update to iefieldkit, a package developed to standardize and simplify best practices for high-quality primary data collection across the World Bank’s Development Impact Evaluation department. The first release of iefieldkit provided workflows to automate error checking for Open Data Kit-based survey modules, duplicate management, data cleaning, and codebook creation. This update to the package includes improved commands to document and implement data point corrections, verify the structure or contents of data using codebooks, and create replicationready data through automated variable subsetting.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"507 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135428961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1177/1536867x231196295
Babak Choodari-Oskooei, Daniel J. Bratton, Mahesh K. B. Parmar
We introduce two commands, nstagebin and nstagebinopt, that can be used to facilitate the design of multiarm multistage (MAMS) trials with binary outcomes. MAMS designs are a class of efficient and adaptive randomized clinical trials that have successfully been used in many disease areas, including cancer, tuberculosis, maternal health, COVID-19, and surgery. The nstagebinopt command finds a class of efficient “admissible” designs based on an optimality criterion using a systematic search procedure. The nstagebin command calculates the stagewise sample sizes, trial timelines, and overall operating characteristics of MAMS designs with binary outcomes. Both commands allow the use of Dunnett’s correction to account for multiple testing. We also use the ROSSINI 2 MAMS design, an ongoing MAMS trial in surgical wound infection, to illustrate the capabilities of both commands. The new commands facilitate the design of MAMS trials with binary outcomes where more than one research question can be addressed under one protocol.
{"title":"Facilities for optimizing and designing multiarm multistage (MAMS) randomized controlled trials with binary outcomes","authors":"Babak Choodari-Oskooei, Daniel J. Bratton, Mahesh K. B. Parmar","doi":"10.1177/1536867x231196295","DOIUrl":"https://doi.org/10.1177/1536867x231196295","url":null,"abstract":"We introduce two commands, nstagebin and nstagebinopt, that can be used to facilitate the design of multiarm multistage (MAMS) trials with binary outcomes. MAMS designs are a class of efficient and adaptive randomized clinical trials that have successfully been used in many disease areas, including cancer, tuberculosis, maternal health, COVID-19, and surgery. The nstagebinopt command finds a class of efficient “admissible” designs based on an optimality criterion using a systematic search procedure. The nstagebin command calculates the stagewise sample sizes, trial timelines, and overall operating characteristics of MAMS designs with binary outcomes. Both commands allow the use of Dunnett’s correction to account for multiple testing. We also use the ROSSINI 2 MAMS design, an ongoing MAMS trial in surgical wound infection, to illustrate the capabilities of both commands. The new commands facilitate the design of MAMS trials with binary outcomes where more than one research question can be addressed under one protocol.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135428775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1177/1536867x231196288
John Tazare, Liam Smeeth, Stephen J. W. Evans, Ian J. Douglas, Elizabeth J. Williamson
Large healthcare databases are increasingly used for research investigating the effects of medications. However, a key challenge is capturing hard-to-measure concepts (often relating to frailty and disease severity) that can be crucial for successful confounder adjustment. The high-dimensional propensity score has been proposed as a data-driven method to improve confounder adjustment within healthcare databases and was developed in the context of administrative claims databases. We present hdps, a suite of commands implementing this approach in Stata that assesses the prevalence of codes, generates high-dimensional propensity-score covariates, performs variable selection, and provides investigators with graphical tools for inspecting the properties of selected covariates.
{"title":"hdps: A suite of commands for applying high-dimensional propensity-score approaches","authors":"John Tazare, Liam Smeeth, Stephen J. W. Evans, Ian J. Douglas, Elizabeth J. Williamson","doi":"10.1177/1536867x231196288","DOIUrl":"https://doi.org/10.1177/1536867x231196288","url":null,"abstract":"Large healthcare databases are increasingly used for research investigating the effects of medications. However, a key challenge is capturing hard-to-measure concepts (often relating to frailty and disease severity) that can be crucial for successful confounder adjustment. The high-dimensional propensity score has been proposed as a data-driven method to improve confounder adjustment within healthcare databases and was developed in the context of administrative claims databases. We present hdps, a suite of commands implementing this approach in Stata that assesses the prevalence of codes, generates high-dimensional propensity-score covariates, performs variable selection, and provides investigators with graphical tools for inspecting the properties of selected covariates.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135428777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1177/1536867x231196497
Ryan P. Thombs
In this article, I review Environmental Econometrics Using Stata, by Christopher F. Baum and Stan Hurn (2021, Stata Press).
在这篇文章中,我回顾了Christopher F. Baum和Stan Hurn使用Stata的环境计量经济学(2021年,Stata出版社)。
{"title":"Review of Environmental Econometrics Using Stata, by Christopher F. Baum and Stan Hurn","authors":"Ryan P. Thombs","doi":"10.1177/1536867x231196497","DOIUrl":"https://doi.org/10.1177/1536867x231196497","url":null,"abstract":"In this article, I review Environmental Econometrics Using Stata, by Christopher F. Baum and Stan Hurn (2021, Stata Press).","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135428779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1177/1536867x231196292
Xueren Zhang, Yuan Xue, Chuntao Li
In this article, we introduce the command cntraveltime, which can calculate both the travel distance and the travel time between two locations in China with respect to different modes of transportation (driving, public transport, and cycling). Existing commands such as georoute, traveltime, and mqtime have difficulties in parsing Chinese locations. cntraveltime solves this outstanding challenge via a feature that enables it to call route-planning services from the Baidu Maps Open Platform. The results of rigorous testing on the features of the command show that, relative to similar existing commands, cntraveltime has the highest capacity in terms of functionality and precision. This suggests that it can be regarded as a useful complement to other existing commands, especially when calculating travel distance and time for locations within China.
{"title":"cntraveltime: Travel distance and travel time in China","authors":"Xueren Zhang, Yuan Xue, Chuntao Li","doi":"10.1177/1536867x231196292","DOIUrl":"https://doi.org/10.1177/1536867x231196292","url":null,"abstract":"In this article, we introduce the command cntraveltime, which can calculate both the travel distance and the travel time between two locations in China with respect to different modes of transportation (driving, public transport, and cycling). Existing commands such as georoute, traveltime, and mqtime have difficulties in parsing Chinese locations. cntraveltime solves this outstanding challenge via a feature that enables it to call route-planning services from the Baidu Maps Open Platform. The results of rigorous testing on the features of the command show that, relative to similar existing commands, cntraveltime has the highest capacity in terms of functionality and precision. This suggests that it can be regarded as a useful complement to other existing commands, especially when calculating travel distance and time for locations within China.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135428962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1177/1536867x231196519
Nicholas J. Cox
Missing values are common in real datasets, and what to do about them is a large and challenging question. This column focuses on the easiest problems in which a researcher is clear, or at least highly confident, about what missing values should be instead, implying a deterministic replacement. The main tricks are copying values from observation to observation and using the ipolate command. Both may often be extended simply to panel or longitudinal datasets or to other datasets with a group structure, such as data on individuals within families or households. This column includes how to satisfy constraints that interpolation is confined to filling gaps between values known to be equal or to observations moderately close to a known value in time or in some other sequence or position variable.
{"title":"Speaking Stata: Replacing missing values: The easiest problems","authors":"Nicholas J. Cox","doi":"10.1177/1536867x231196519","DOIUrl":"https://doi.org/10.1177/1536867x231196519","url":null,"abstract":"Missing values are common in real datasets, and what to do about them is a large and challenging question. This column focuses on the easiest problems in which a researcher is clear, or at least highly confident, about what missing values should be instead, implying a deterministic replacement. The main tricks are copying values from observation to observation and using the ipolate command. Both may often be extended simply to panel or longitudinal datasets or to other datasets with a group structure, such as data on individuals within families or households. This column includes how to satisfy constraints that interpolation is confined to filling gaps between values known to be equal or to observations moderately close to a known value in time or in some other sequence or position variable.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135429098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.1177/1536867X231175264
Ben Jann
This article is an update to Jann (2018a, Stata Journal 18: 765–785). It contains a comprehensive discussion of the colorpalette command, including various changes and additions that have been made to the software since its first publication. Command colorpalette provides colors for use in Stata graphics. In addition to Stata’s default colors, colorpalette supports a variety of named colors, a selection of palettes that have been proposed by users, numerous collections of palettes and colormaps from sources such as ColorBrewer, Carto, D3.js, or Matplotlib, as well as color generators in different color spaces. Furthermore, a new command called colorcheck is presented that can be used to evaluate whether colors are distinguishable by people suffering from color vision deficiency.
本文是Jann (2018a, Stata Journal 18: 765-785)的更新。它包含对colorpalette命令的全面讨论,包括自首次发布以来对软件所做的各种更改和添加。命令调色板提供Stata图形中使用的颜色。除了Stata的默认颜色外,colorpalette还支持各种已命名的颜色、用户提出的调色板选择、来自ColorBrewer、Carto、D3.js或Matplotlib等源的大量调色板和颜色映射集合,以及不同颜色空间中的颜色生成器。在此基础上,提出了一种新的颜色检查方法,用于评估色觉缺陷的人是否能够区分颜色。
{"title":"Color palettes for Stata graphics: An update","authors":"Ben Jann","doi":"10.1177/1536867X231175264","DOIUrl":"https://doi.org/10.1177/1536867X231175264","url":null,"abstract":"This article is an update to Jann (2018a, Stata Journal 18: 765–785). It contains a comprehensive discussion of the colorpalette command, including various changes and additions that have been made to the software since its first publication. Command colorpalette provides colors for use in Stata graphics. In addition to Stata’s default colors, colorpalette supports a variety of named colors, a selection of palettes that have been proposed by users, numerous collections of palettes and colormaps from sources such as ColorBrewer, Carto, D3.js, or Matplotlib, as well as color generators in different color spaces. Furthermore, a new command called colorcheck is presented that can be used to evaluate whether colors are distinguishable by people suffering from color vision deficiency.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"23 1","pages":"336 - 385"},"PeriodicalIF":4.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45687014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.1177/1536867X231175314
Maren Eckert, W. Vach
Recently, Eckert and Vach (2020, Biometrical Journal 62: 598–609) pointed out that both confidence and comparison regions are useful tools to visualize uncertainty in a two-dimensional estimate. Both types of regions can be based on inverting Wald tests or likelihood-ratio tests. confcomptwo enables Stata users to draw both types of regions following one of the two principles for various two-dimensional estimation problems. The use of confcomptwo is illustrated by several examples.
{"title":"Visualizing uncertainty in a two-dimensional estimate using confidence and comparison regions","authors":"Maren Eckert, W. Vach","doi":"10.1177/1536867X231175314","DOIUrl":"https://doi.org/10.1177/1536867X231175314","url":null,"abstract":"Recently, Eckert and Vach (2020, Biometrical Journal 62: 598–609) pointed out that both confidence and comparison regions are useful tools to visualize uncertainty in a two-dimensional estimate. Both types of regions can be based on inverting Wald tests or likelihood-ratio tests. confcomptwo enables Stata users to draw both types of regions following one of the two principles for various two-dimensional estimation problems. The use of confcomptwo is illustrated by several examples.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"23 1","pages":"455 - 490"},"PeriodicalIF":4.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44706138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.1177/1536867X231175345
Marcos Demetry, P. Hjertstrand
The Houtman–Maks index is a measure of the size of a violation of utility-maximizing (that is, rational) behavior. In this article, we introduce the command hmindex, which calculates the Houtman–Maks index for a dataset of prices and observed choices of a consumer. The command is illustrated with an empirical application.
{"title":"Consistent subsets: Computing the Houtman–Maks index in Stata","authors":"Marcos Demetry, P. Hjertstrand","doi":"10.1177/1536867X231175345","DOIUrl":"https://doi.org/10.1177/1536867X231175345","url":null,"abstract":"The Houtman–Maks index is a measure of the size of a violation of utility-maximizing (that is, rational) behavior. In this article, we introduce the command hmindex, which calculates the Houtman–Maks index for a dataset of prices and observed choices of a consumer. The command is illustrated with an empirical application.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"23 1","pages":"578 - 588"},"PeriodicalIF":4.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46348586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.1177/1536867X231175333
D. Drukker
Stata estimation commands that implement frequentist methods produce an output table that contains multiple tests and multiple confidence intervals. Presumably, the multiple tests and multiple confidence are designed to help determine which parameters are responsible for a possible rejection of the overall null hypothesis of no effect. When taken by itself, each test and each confidence interval provides valid inference about the null hypothesis of no effect for each parameter at the specified error rate. However, simultaneously using two or more of these tests or confidence intervals provides inference at an error rate that exceeds the one specified. In this article, I discuss the sotable command, which provides p-values that are adjusted for the multiple tests and a confidence band that can be used to simultaneously test multiple parameters for no effect after almost all frequentist estimation commands. I also provide an introduction to the literature on simultaneous inference.
{"title":"Simultaneous tests and confidence bands for Stata estimation commands","authors":"D. Drukker","doi":"10.1177/1536867X231175333","DOIUrl":"https://doi.org/10.1177/1536867X231175333","url":null,"abstract":"Stata estimation commands that implement frequentist methods produce an output table that contains multiple tests and multiple confidence intervals. Presumably, the multiple tests and multiple confidence are designed to help determine which parameters are responsible for a possible rejection of the overall null hypothesis of no effect. When taken by itself, each test and each confidence interval provides valid inference about the null hypothesis of no effect for each parameter at the specified error rate. However, simultaneously using two or more of these tests or confidence intervals provides inference at an error rate that exceeds the one specified. In this article, I discuss the sotable command, which provides p-values that are adjusted for the multiple tests and a confidence band that can be used to simultaneously test multiple parameters for no effect after almost all frequentist estimation commands. I also provide an introduction to the literature on simultaneous inference.","PeriodicalId":51171,"journal":{"name":"Stata Journal","volume":"23 1","pages":"518 - 544"},"PeriodicalIF":4.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49237775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}