# NHANES is a nationally representative survey

NHANES is a nationally representative survey of US, non-institutionalized civilians. It is conducted in two year cycles, with approximately 10,000 individuals in each cycle. Interviews elicit information on demographic characteristics (e.g., age, gender, race/ethnicity), smoking habits, and whether a health professional had ever diagnosed the participant with certain medical conditions. Cycles of the NHANES can be combined, or they can be analyzed individually. Because NHANES employs a complex, multistage, sampling strategy, survey statistics must be used to analyze the data and to generalize findings to the US population. In this case, we used the SURVEYLOGISTIC procedure of SAS/STAT© version 9.4 to perform logistic regression accounting for the complex sampling design, i.e., using both the masked variance pseudo-primary sampling unit (SMDVPSU) and the masked variance pseudo-stratum (SDMVSTRA) variables, using the adjusted 2 year interview weight (WTINT2YR), and using Taylor series linearization to estimate the covariance matrix. Weights were adjusted for the inclusion of multiple surveys [2] by dividing the WTINT2YR variable by the number of cycles used in each analysis. We additionally ran all models within strata defined by age, race/ethnicity, and gender using the SAS DOMAIN statement to specify these subpopulations and to ensure the variance and standard errors were calculated correctly. See associated file SAS CODE.DOCX for the code to combine the cycles of NHANES with common variables and an example of the Proc Logistic code used for analysis. Following both Vozoris and Rostron, we defined current smokers as those who had smoked ≥1 of the last 30 days and who were ≥20 years old at the time of the interview. Table 1 shows the variables we used in these analyses. We identified cases by their self-reported diagnoses according to the question “has a doctor or other health professional ever told you that you had [high blood pressure, a chloride channels attack, congestive heart failure, a stroke, or COPD (emphysema or chronic bronchitis)]” (yes/no). We considered all other responses to be a non-response and set them as missing. Stroke was the subject of Van Landingham et al. [4], and data are not presented here. We ran three sets of models for each outcome using data from NHANES 2007 to 2008 (as used by Vozoris), from 1999 to 2010 (as used by Rostron) and from 1999 to 2012 (all cycles available when we undertook the project) to determine if the selection of covariates or cycles of the NHANES influenced the results. First, we implemented the model described by Vozoris (Tables 2–4); second, we implemented the model described by Rostron (Tables 5–7); last, we developed a new model for each outcome using purposeful selection of covariates (Table 8). Purposeful selection of covariates was conducted as follows: a preliminary model consisted of cigarette type (menthol or non-menthol) and all relevant, potential covariates (Table 1) with cigarette type forced to remain in all models. We identified each covariate, other than cigarette type, with a p-value of greater than 0.05. We refit the model after dropping the covariate with the largest p-value, until only cigarette type and covariates with p-values of 0.05 or less remained. Each covariate urine had been dropped was added back individually, and we calculated the relative percent change in the regression coefficient for cigarette type for the larger model compared with the model containing only statistically significant covariates (Eq. (1)). If including a given covariate resulted in a relative percent change in the regression coefficient greater than 15%, that covariate was retained in the model. Once we determined the covariates to include in the model (main effects), we explored all the possible interactions between the covariates (excluding cigarette type). We added all interaction terms with p-values less than or equal to 0.1 to the model individually, along with the main effect terms, and retained them if the relevant coefficients in the fully adjusted model were statistically significant, with p-values of 0.05 or less. We retained statistically significant interaction terms in the model only if one or both main effects were also statistically significant. We used domain variables to define strata according to race/ethnicities, genders, and age groups, but did not repeat the model building process. We then re-ran each model for individual cycles of the NHANES in order to determine if there were anomalous or secular patterns in risk of any outcome that might be overlooked in the combined analysis (Figs. 1–4).