***************************************************************************;
** PROGRAM: control_3MRCV.sas                                            **;
** AUTHOR:  Christopher R. Bilder                                        **;
**          Department of Statistics                                     **;
**          University of Nebraska-Lincoln                               **;
**          chris@chrisbilder.com                                        **;
** DATE:    3-23-06                                                      **;
** PURPOSE: Control program for other programs that do the following:    **;
**          Fit a generalized loglinear model via the marginal modeling  **;
**          method and use the bootstrap and Rao-Scott approximations    **;
**          to find a p-value for testing the model of interest vs. the  **;
**          saturated model.  This program is for only 3 MRCVs.          **;
** NOTES:                                                                **;
**  1) Do not distribute copies of this program without the permission of**;
**     the author.                                                       **;
**  2) Copyright 2006 Christopher R. Bilder                              **;
**  3) For more information on these procedures, see                     **;
**       Bilder, C. R. and Loughin, T. M. (2006).  Modeling Association  **;
**       between Two or More Categorical Variables that Allow for        **;
**       Multiple Category Choices.  Submitted to Communications in Stat.**;
**  4) This is the control program for the analysis.  The program        **;
**     reads in code from Gange_3MRCV.sas, data_gen_resample.sas, and    **;
**     boot_3MRCV.sas to do the analysis.                                **;
**  5) The data set needs to be in a multinomial format and each row     **;
**     represents a subject sampled.                                     **;
**  7) The statements:                                                   **;
**        options nomprint nosymbolgen nomlogic;                         **; 
**        options nosource nonotes;                                      **;
**     may be helpful to prevent the LOG window from having too much     **;
**     information displayed to it.  Of course, this should only be used **;
**     when one is confident the user is running the program correctly!  **; 
**  8) A form of conditional resampling is done here to make sure the    **;
**     marginal table has at least one positive response per item. This  **; 
**     is done to keep the resampled marginal table the same size as what**;
**     was observed.  See the &totsim and &numbsim variables to control  **;
**     this.                                                             **;
**  9) This program can take some time to run.  I STRONGLY recommend     **;
**     running a test case with a small number of resamples and a small  **;
**     number of iterations for the Gange algorithm.  The program takes  **;
**     about 6 hours on a 2.4GHZ processor with 1GB memory computer.     **;
** 10) With respect to specifying the model, the data set is transformed **;
**     in the program to the following format:                           **;
**                                                                       **;
**  W	Y	Z	wi	yj	zk	COUNT                                        **;
**  1	1	1	0	0	0	115                                          **;
**  1	1	1	0	0	1	8                                            **;
**  1	1	1	0	1	0	88                                           **;
**  1	1	1	0	1	1	28                                           **;
**  1	1	1	1	0	0	11                                           **;
**  1	1	1	1	0	1	2                                            **;
**  1	1	1	1	1	0	21                                           **;
**  1	1	1	1	1	1	6                                            **;
**  1	1	2	0	0	0	93                                           **;
**  1	1	2	0	0	1	30                                           **;
**  ...                                                                  **;
**                                                                       **;                                             **;
**     The W, Y, and Z denote the item numbers.  The wi, yj, and zk      **;
**     denote the 0 or 1 item responses.  Thus, the first row count above**;
**     denotes the W_1 = 0, Y_1 = 0, Z_1 = 0 sub-table cell count of 115.**;
**     Models can be specified in a similar manner as shown in           **;
**     control_2MRCV.sas for the MODEL statement of PROC GENMOD.  For    **;
**     example, W*Y*Z accounts for the gamma_ijk terms.                  **;
**     PROC GENMOD Model statements are given in the Settings section    **;
**     below to model the data in this format.                           **;
** 11) These programs have only been tested in SAS version 8.02          **;
** 12) The code in boot_3MRCV.sas is not quite general enough to use with**;
**     any data set.  Code in the DATA STEP to form pred_sort2 and the   **;
**     DATA STEP to form sasdata_one3 would need to be changed to        **;
**     correspond to the correct number of W, Y, and Z items.            **;
** 13) The RS MACRO calculates the V matrix of Appendix A.  The program  **;
**     is currently set up to read the matrix in from v_mat_m.txt which   **;
**     contains the matrix from the Section 4.2 data set.  This is done  **;
**     since it can take 20 minutes for the matrix to be calculated      **;
**     in PROC IML and it does not change for each model.  When a        **;
**     different data set is used, one should changed some of the        **;
**     code in the RS MACRO (this part is commented - see macro for more **;
**     information).                                                     **; 
***************************************************************************;


dm 'log;clear;output;clear;';
options ps=70 ls=100 pageno=1;
goptions reset=all border ftext=swiss gunit=cm htext=0.4 htitle=0.5;
options source notes;
options mprint symbolgen mlogic;

*Suppress information sent to the RESULTS window (the program will be slower without it);
ODS noresults;                           


title1 'Chris Bilder''s marginal modeling of 3 MRCVs program';

************************************************************************************;
*Read in other programs containing the code;
*  Do not forget to change the folder from where I stored the programs on my computer;
*  to where you are storing the programs on your computer!;  

  *Part of Gange algorithm which finds a set of multinomial probabilities satisfying the 
  *  null hypothesis model;
  %include "C:\chris\UNL\research\NSF_grant_SES-0207212\paper_communications\revision\programs\Gange_3MRCV.sas";  

  *Uses the multinomial probabilities from the Gange program to take the resamples;
  %include "C:\chris\UNL\research\NSF_grant_SES-0207212\paper_communications\revision\programs\data_gen_resample.sas";

  *Performs the bootstrap and Rao-Scott (RS) methods, also finds the standardized Pearson residuals;
  %include "C:\chris\UNL\research\NSF_grant_SES-0207212\paper_communications\revision\programs\boot_3MRCV_final.sas";
 


***************************************************************************************;
*Settings;

  *Number of resamples to take.  This should be greater than or equal to the number actually desired.;
  *  Resamples are not used if there are no positive responses for an item.  This is done to keep;
  *  the marginal table the same size as what was observed.;  
  *  I recommend using 6,000 with this data set.;
  %let totsim=6;                   

  *Number of resamples actually to use.  For example, B = 5,000 are used for the 2 MRCV example in;
  *  the paper.  I recommend trying a low number (like 5) first in order to estimate how long 
  *  it will take for 5,000.;
  %let numbsim=5;                  

  *Number of row items for W; 
  %let nrow2=3;        

  *Number of column items for Y;
  %let ncol2=4;  

  *Number of strata items for Z;
  %let nstrat2=5;
 
  *Convergence criterion for Gange program - max(abs(tau*_i - tau*_(i+1))) <= tol where;
  *  tau*_i is a vector of multinomial probabilities for the ith iteration;
  %let tol = 0.00000001;   

  *Maximum Gange iterations for iterative proportional fitting;
  *  I recommend using at least 250, but test (time it) first with a smaller number.;
  %let gange_iter = 5; 

  *Constant to add to 0 cells in the item response table;
  %let constant_zero = 0.5; 

  *Level of significance used with bootstrap methods;
  %let alpha = 0.05;                

  *Class statement to use in PROCs GENMOD and GLMMOD (2nd PROC is used to find the X matrix);
  %let class = w y z wi yj zk;      

  *Complete independence model;
  %let comp_ind_model = w*y*z wi(w*y*z) yj(w*y*z) zk(w*y*z); 

  *Ho model - everything after complete independence model;
  %let model = wi*yj wi*yj(y) wi*yj(w) wi*yj(y*w)  yj*zk yj*zk(z) yj*zk(y) yj*zk(z*y);  *Final model;
  *%let model = wi*yj wi*yj(y) wi*yj(w) wi*yj(y*w)  yj*zk yj*zk(z) yj*zk(y);  
  *%let model = wi*yj wi*yj(y) wi*yj(w) wi*yj(y*w)  yj*zk yj*zk(z); 
  *%let model = wi*yj wi*yj(y) wi*yj(w) wi*yj(y*w)  yj*zk yj*zk(y); 
  *%let model = wi*yj wi*yj(y) wi*yj(w) wi*yj(y*w)  yj*zk; 
  *%let model =  ;   *For complete independence model;   

  *NOTE: The final model above with constant_zero = 0.5 will result in a few standardized Pearson ;
  *      residuals which are greater than 2.576 in absolute value.  These residuals are all ;
  *      associated with the 0 cell count sub-tables. When this constant is lowered (say to 0.001),;
  *      there are no more standardized Pearson residuals greater than 2.576 in absolute value;
  *      The other models do have standardized Pearson residuals greater than 2.576 when adding;
  *      this constant and these residuals are not associated with the 0 cell count sub-tables.;


  *Number of sub-tables - no changes need to be made here;
  %let ncol=%eval(&nrow2 + &ncol2 + &nstrat2);   

  *Total number of multinomial probabilities - no changes need to be made here;
  %let numbp=%eval(2**&ncol);       



***************************************************************************************;
*Read in the data set and get it into the correct format;

  *W is test waste, Y is waste storage method, and Z is sources of veterinary information;
  data sasdata_one;
    infile 'C:\chris\UNL\research\NSF_grant_SES-0207212\paper_communications\revision\programs\3_MRCV_example_data.txt'
           firstobs=2;
    input w1-w3 y1-y4 z1-z5;
   run;


 *Add a dummy variable to the data set;
  *  The programs are actually partially set up for a simulation study.  By using just one iteration ;
  *  and a few other modifications (already done here), the programs can be used for an example data set;
  data sasdata_two;
    set sasdata_one;
    iter = 1;
  run;


  *This DATA STEP reads in the V matrix (cov(m)) given in Appendix A.  The boot_3MRCV.sas contains code in its 
  *  RS macro to calculate it, but this can take some time.  Since the matrix does not 
  *  change for each model, I calculated it once and then just read it in for whatever model under
  *  investigation.  ;
  PROC IMPORT OUT= WORK.COV_M_SET 
            DATAFILE= "C:\chris\UNL\research\NSF_grant_SES-0207212\paper_communications\revision\programs\v_mat_m.csv" 
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2; 
  RUN;


***************************************************************;
* Bootstrap;

  *%SIM_LOOP(data set, seed number for resamples, sample size);
  *                 - Performs the bootstrap methods;
  *                 - This is located in the boot_3MRCV.sas program;
  %SIM_LOOP(sasdata_two, 636265496, 279);

  *%RS(sample size) - Finds the Rao-Scott (RS) adjusted Pearson statistic and corresponding p-value;
  *                 - Also finds the standardized Pearson residuals;
  *                 - This is located in the boot_3MRCV.sas program;
  %RS(279);



***************************************************************;
*Sound to let user know the program is done;

  data _null_;
    call sound(300, 1000);
  run;


quit;