***************************************************************************;
** PROGRAM: control_2MRCV.sas **;
** AUTHOR: Christopher R. Bilder **;
** Department of Statistics **;
** University of Nebraska-Lincoln **;
** chris@chrisbilder.com **;
** DATE: 3-23-06 **;
** PURPOSE: Control program for other programs that do the following: **;
** Fit a generalized loglinear model via the marginal modeling **;
** method and use the bootstrap and Rao-Scott approximations **;
** to find a p-value for testing the model of interest vs. the **;
** saturated model. Modifications can be made to the program **;
** to use an alternative hypothesis model other than the **;
** saturated model. **;
** NOTES: **;
** 1) Do not distribute copies of this program without the permission of**;
** the author. **;
** 2) Copyright 2006 Christopher R. Bilder **;
** 3) For more information on these procedures, see **;
** Bilder, C. R. and Loughin, T. M. (2006). Modeling Association **;
** between Two or More Categorical Variables that Allow for **;
** Multiple Category Choices. Submitted to Communications in Stat.**;
** 4) This is the control program for the analysis. The program **;
** reads in code from Gange_2MRCV.sas, data_gen_resample.sas, and **;
** boot_2MRCV.sas to do the analysis. **;
** 5) The data set needs to be in a multinomial format and each row **;
** represents a subject sampled. **;
** 6) In order to account for the lagoon and salt item pair (see Section**;
** 4.1 of the paper), two DATA STEPs are used in the boot_MRCV.sas **;
** program to add: **;
** if w=3 and y=1 then w3y1=1; **;
** else w3y1 = 0; **;
** This may need to be removed for other data set analyses. To find **;
** the code, do a search for the if-then statement . **;
** 7) The statements: **;
** options nomprint nosymbolgen nomlogic; **;
** options nosource nonotes; **;
** may be helpful to prevent the LOG window from having too much **;
** information displayed to it. Of course, this should only be used **;
** when one is confident the user is running the program correctly! **;
** 8) A form of conditional resampling is done here to make sure the **;
** marginal table has at least one positive response per item. This **;
** is done to keep the resampled marginal table the same size as what**;
** was observed. See the &totsim and &numbsim variables to control **;
** this. **;
** 9) This program can take some time to run. I recommend running a **;
** test case with a small number of resamples and a small number of **;
** iterations for the Gange algorithm. **;
** 10) With respect to specifying the model, the data set is transformed **;
** in the program to the following format: **;
** **;
** W Y wi yj COUNT **;
** 1 1 0 0 123 **;
** 1 1 0 1 116 **;
** 1 1 1 0 13 **;
** 1 1 1 1 27 **;
** 1 2 0 0 175 **;
** 1 2 0 1 64 **;
** ... **;
** **; **;
** The W and Y denote the item numbers. The wi and yj denote the **;
** the 0 or 1 item responses. Thus, the first row count above **;
** denotes the W_1 = 0 and Y_1 = 0 sub-table cell of Table 2 in the **;
** paper. Models can be specified as follows in a PROC GENMOD model **;
** statement: **;
** W*Y account for the gamma_ij terms **;
** wi(W*Y) account for the eta^W_a(ij) terms **;
** yj(W*Y) account for the eta^Y_b(ij) terms **;
** wi*yj account for the lambda_ab terms **;
** wi*yj(Y) account for the lambda^Y_ab(j) terms **;
** wi*yj(W) account for the lambda^W_ab(i) terms **;
** PROC GENMOD MODEL statements are given in the Settings section **;
** below to model the data in this format. **;
** 11) These programs have only been tested in SAS version 8.02 **;
***************************************************************************;
dm 'log;clear;output;clear;';
options ps=70 ls=100 pageno=1;
goptions reset=all border ftext=swiss gunit=cm htext=0.4 htitle=0.5;
options source notes;
options mprint symbolgen mlogic;
*options nomprint nosymbolgen nomlogic nonotes;
*Suppress information sent to the RESULTS window (the program will be slower without it);
ODS noresults;
title1 'Chris Bilder''s marginal modeling of 2 MRCVs program';
************************************************************************************;
*Read in other programs containing the code;
* Do not forget to change the folder from where I stored the programs on my computer;
* to where you are storing the programs on your computer!;
*Part of Gange algorithm which finds a set of multinomial probabilities satisfying the
* null hypothesis model;
%include "C:\chris\UNL\statweb\mrcv\Comm\Gange_2MRCV.sas";
*Uses the multinomial probabilities from the Gange program to take the resamples;
%include "C:\chris\UNL\statweb\mrcv\Comm\data_gen_resample.sas";
*Performs the bootstrap and Rao-Scott (RS) methods, also finds the standardized Pearson residuals;
%include "C:\chris\UNL\statweb\mrcv\Comm\boot_2MRCV_final.sas";
***************************************************************************************;
*Settings;
*Number of resamples to take. This should be greater than or equal to the number actually desired.;
* Resamples are not used if there are no positive responses for an item. This is done to keep;
* the marginal table the same size as what was observed;
* I recommend using 6,000 with this data set.;
%let totsim=6; *Set to 6 initially to test program;
*Number of resamples actually to use. For example, B = 5,000 are used for the 2 MRCV example in;
* the paper. I recommend trying a low number (like 5) first in order to estimate how long
* it will take for 5,000.;
%let numbsim=5; *Set to 5 initially to test program;
*Number of row items for W;
%let nrow2=3;
*Number of column items for Y;
%let ncol2=4;
*Convergence criterion for Gange program - max(abs(tau*_i - tau*_(i+1))) <= tol where;
* tau*_i is a vector of multinomial probabilities for the ith iteration;
%let tol = 0.0000001;
*Maximum Gange iterations for iterative proportional fitting;
%let gange_iter = 250;
*Constant to add to 0 cells in the item response table;
%let constant_zero = 0.5;
*Level of significance used with bootstrap methods and (1-alpha)100% C.I.s for odds ratios;
%let alpha = 0.05;
*Class statement to use in PROCs GENMOD and GLMMOD (2nd PROC is used to find the X matrix);
* The w3y1 term is needed for the lagoon and salt item pair (see Section 4.1 of the paper);
* otherwise, the extra w3y1 is not needed;
%let class = w y wi yj w3y1;
*%let class = w y wi yj;
*Ho model;
%let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(Y) wi*yj(w3y1); *Final model;
*%let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(W) wi*yj(Y); *W and Y main effects;
*%let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(Y); *Y main effects;
*%let model = W*Y wi(W*Y) yj(W*Y) wi*yj wi*yj(W); *W main effects;
*%let model = W*Y wi(W*Y) yj(W*Y) wi*yj; *Homogenous association;
*%let model = W*Y wi(W*Y) yj(W*Y); *SPMI;
*Number of sub-tables - no changes need to be made here;
%let ncol=%eval(&nrow2+&ncol2);
*Total number of multinomial probabilities - no changes need to be made here;
%let numbp=%eval(2**&ncol);
***************************************************************************************;
*Read in the data set and get it into the correct format;
*W is test waste and Y is waste storage method;
* Again, change the folder name to where you have the data set stored.;
data sasdata_one;
infile 'C:\chris\UNL\research\NSF_grant_SES-0207212\paper_communications\revision\programs\2_MRCV_example_data.txt'
firstobs=2;
input w1-w3 y1-y4;
run;
*Add a dummy variable to the data set;
* The programs are partially set up for a simulation study. By using just one iteration and a few other;
* modifications (already done here), the programs can be used for an example data set;
data use_it;
set sasdata_one;
iter = 1;
run;
***************************************************************************************;
*Call the MACROs to perform the calculations;
*%DO_IT_ALL(data set name, seed for boot. resamples, sample size) - perform bootstrap methods for X^2_M;
%DO_IT_ALL(use_it, 636265496, 279);
*%STAND_RES_CALC(sample size) - finds the standardized Pearson residuals, C.I.s, and RS adjustment;
%STAND_RES_CALC(279);
***************************************************************************************;
*Sound to let user know the program is done;
data _null_;
call sound(300, 1000);
run;
quit;