***************************************************************************; ** PROGRAM: control_2MRCV.sas **; ** AUTHOR: Christopher R. Bilder **; ** Department of Statistics **; ** University of Nebraska-Lincoln **; ** chris@chrisbilder.com **; ** DATE: 3-23-06 **; ** PURPOSE: Control program for other programs that do the following: **; ** Fit a generalized loglinear model via the marginal modeling **; ** method and use the bootstrap and Rao-Scott approximations **; ** to find a p-value for testing the model of interest vs. the **; ** saturated model. Modifications can be made to the program **; ** to use an alternative hypothesis model other than the **; ** saturated model. **; ** NOTES: **; ** 1) Do not distribute copies of this program without the permission of**; ** the author. **; ** 2) Copyright 2006 Christopher R. Bilder **; ** 3) For more information on these procedures, see **; ** Bilder, C. R. and Loughin, T. M. (2006). Modeling Association **; ** between Two or More Categorical Variables that Allow for **; ** Multiple Category Choices. Submitted to Communications in Stat.**; ** 4) This is the control program for the analysis. The program **; ** reads in code from Gange_2MRCV.sas, data_gen_resample.sas, and **; ** boot_2MRCV.sas to do the analysis. **; ** 5) The data set needs to be in a multinomial format and each row **; ** represents a subject sampled. **; ** 6) In order to account for the lagoon and salt item pair (see Section**; ** 4.1 of the paper), two DATA STEPs are used in the boot_MRCV.sas **; ** program to add: **; ** if w=3 and y=1 then w3y1=1; **; ** else w3y1 = 0; **; ** This may need to be removed for other data set analyses. To find **; ** the code, do a search for the if-then statement . **; ** 7) The statements: **; ** options nomprint nosymbolgen nomlogic; **; ** options nosource nonotes; **; ** may be helpful to prevent the LOG window from having too much **; ** information displayed to it. Of course, this should only be used **; ** when one is confident the user is running the program correctly! **; ** 8) A form of conditional resampling is done here to make sure the **; ** marginal table has at least one positive response per item. This **; ** is done to keep the resampled marginal table the same size as what**; ** was observed. See the &totsim and &numbsim variables to control **; ** this. **; ** 9) This program can take some time to run. I recommend running a **; ** test case with a small number of resamples and a small number of **; ** iterations for the Gange algorithm. **; ** 10) With respect to specifying the model, the data set is transformed **; ** in the program to the following format: **; ** **; ** W Y wi yj COUNT **; ** 1 1 0 0 123 **; ** 1 1 0 1 116 **; ** 1 1 1 0 13 **; ** 1 1 1 1 27 **; ** 1 2 0 0 175 **; ** 1 2 0 1 64 **; ** ... **; ** **; **; ** The W and Y denote the item numbers. The wi and yj denote the **; ** the 0 or 1 item responses. Thus, the first row count above **; ** denotes the W_1 = 0 and Y_1 = 0 sub-table cell of Table 2 in the **; ** paper. Models can be specified as follows in a PROC GENMOD model **; ** statement: **; ** W*Y account for the gamma_ij terms **; ** wi(W*Y) account for the eta^W_a(ij) terms **; ** yj(W*Y) account for the eta^Y_b(ij) terms **; ** wi*yj account for the lambda_ab terms **; ** wi*yj(Y) account for the lambda^Y_ab(j) terms **; ** wi*yj(W) account for the lambda^W_ab(i) terms **; ** PROC GENMOD MODEL statements are given in the Settings section **; ** below to model the data in this format. **; ** 11) These programs have only been tested in SAS version 8.02 **; ***************************************************************************; dm 'log;clear;output;clear;'; options ps=70 ls=100 pageno=1; goptions reset=all border ftext=swiss gunit=cm htext=0.4 htitle=0.5; options source notes; options mprint symbolgen mlogic; *options nomprint nosymbolgen nomlogic nonotes; *Suppress information sent to the RESULTS window (the program will be slower without it); ODS noresults; title1 'Chris Bilder''s marginal modeling of 2 MRCVs program'; ************************************************************************************; *Read in other programs containing the code; * Do not forget to change the folder from where I stored the programs on my computer; * to where you are storing the programs on your computer!; *Part of Gange algorithm which finds a set of multinomial probabilities satisfying the * null hypothesis model; %include "C:\chris\UNL\statweb\mrcv\Comm\Gange_2MRCV.sas"; *Uses the multinomial probabilities from the Gange program to take the resamples; %include "C:\chris\UNL\statweb\mrcv\Comm\data_gen_resample.sas"; *Performs the bootstrap and Rao-Scott (RS) methods, also finds the standardized Pearson residuals; %include "C:\chris\UNL\statweb\mrcv\Comm\boot_2MRCV_final.sas"; ***************************************************************************************; *Settings; *Number of resamples to take. This should be greater than or equal to the number actually desired.; * Resamples are not used if there are no positive responses for an item. This is done to keep; * the marginal table the same size as what was observed; * I recommend using 6,000 with this data set.; %let totsim=6; *Set to 6 initially to test program; *Number of resamples actually to use. For example, B = 5,000 are used for the 2 MRCV example in; * the paper. I recommend trying a low number (like 5) first in order to estimate how long * it will take for 5,000.; %let numbsim=5; *Set to 5 initially to test program; *Number of row items for W; %let nrow2=3; *Number of column items for Y; %let ncol2=4; *Convergence criterion for Gange program - max(abs(tau*_i - tau*_(i+1))) <= tol where; * tau*_i is a vector of multinomial probabilities for the ith iteration; %let tol = 0.0000001; *Maximum Gange iterations for iterative proportional fitting; %let gange_iter = 250; *Constant to add to 0 cells in the item response table; %let constant_zero = 0.5; *Level of significance used with bootstrap methods and (1-alpha)100% C.I.s for odds ratios; %let alpha = 0.05; *Class statement to use in PROCs GENMOD and GLMMOD (2nd PROC is used to find the X matrix); * The w3y1 term is needed for the lagoon and salt item pair (see Section 4.1 of the paper); * otherwise, the extra w3y1 is not needed; %let class = w y wi yj w3y1; *%let class = w y wi yj; *Ho model; %let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(Y) wi*yj(w3y1); *Final model; *%let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(W) wi*yj(Y); *W and Y main effects; *%let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(Y); *Y main effects; *%let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(W); *W main effects; *%let model = W*Y wi(W*Y) yj(W*Y) wi*yj; *Homogenous association; *%let model = W*Y wi(W*Y) yj(W*Y); *SPMI; *Number of sub-tables - no changes need to be made here; %let ncol=%eval(&nrow2+&ncol2); *Total number of multinomial probabilities - no changes need to be made here; %let numbp=%eval(2**&ncol); ***************************************************************************************; *Read in the data set and get it into the correct format; *W is test waste and Y is waste storage method; * Again, change the folder name to where you have the data set stored.; data sasdata_one; infile 'C:\chris\UNL\research\NSF_grant_SES-0207212\paper_communications\revision\programs\2_MRCV_example_data.txt' firstobs=2; input w1-w3 y1-y4; run; *Add a dummy variable to the data set; * The programs are partially set up for a simulation study. By using just one iteration and a few other; * modifications (already done here), the programs can be used for an example data set; data use_it; set sasdata_one; iter = 1; run; ***************************************************************************************; *Call the MACROs to perform the calculations; *%DO_IT_ALL(data set name, seed for boot. resamples, sample size) - perform bootstrap methods for X^2_M; %DO_IT_ALL(use_it, 636265496, 279); *%STAND_RES_CALC(sample size) - finds the standardized Pearson residuals, C.I.s, and RS adjustment; %STAND_RES_CALC(279); ***************************************************************************************; *Sound to let user know the program is done; data _null_; call sound(300, 1000); run; quit;