It is somewhat painful to produce a table of summary statistics for categorical variables in Stata. The issue is that the popular user-written package estout/esttab does not accept factor notations for summary statistics.

In other words, you won’t get a neat table by simply typing the following in Stata:

.	eststo: estpost sum i.x

An obvious solution is to create dummy variables before you summarize them. Stata’s system command can help you do this:

.	tab x, gen(x_)

However, such a approach is cumbersome when you have multiple categorical variables, since you probably need to write a loop:

.	foreach var of varlist x y z {
.		tab(`var'), gen(`var'_)
.	}

Moreover, the generated dummy variables do not have nice labels that you can use later. If you want a descriptive table with approproaite variable names, you need to modify the labels for all the dummy variables or use esttab’s “varlabel()” option. In my mind, both are quite troublesome.

.	sysuse auto, clear
.	tab(foreign), gen(foreign_)
.	describe foreign_*


	      storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------
foreign_1       byte    %8.0g                 foreign==Domestic
foreign_2       byte    %8.0g                 foreign==Foreign

This label problem is also why the command below does not work ideally:

.	eststo: xi: estpost sum i.x i.y

In short, various minor but annoying issues motivate me to write a simple program “dummyout” to avoid inconvenience, which I will introduce in the next section.

The “dummyout” command

The “dummyout” command improves Stata’s “tab(), gen()” in the following ways:

  • It accepts multiple variables and does not require a loop
  • It uses actual values to label generated dummies, instead of using the sequence in which dummies are generated

The ado file for “dummyout” can be downloaded here. You are ready to go once you put it in your Stata’s personal ado-file path (type adopath in Stata to see your path).

.	dummyout x y z

An example

The following example illustrates these points using a dataset from CPS ASEC 2021.

. 	* import data
. 	use CPS2021_union_good, clear

. 
. 	* describe the categorical variable: firmsize
. 	d    firmsize race

	      storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------------
firmsize        byte    %10.0g     firmsize_lbl
					      Number of employees
race            float   %9.0g      newrace    Race (4 categories)


. 
. 	* add numbers to the value labels & tabulate
. 	numlabel firmsize_lbl newrace, add

. 	tab1     firmsize race

-> tabulation of firmsize  

    Number of |
    employees |      Freq.     Percent        Cum.
--------------+-----------------------------------
  1. Under 10 |      1,014       11.45       11.45
  2. 10 to 24 |      1,136       12.82       24.27
  5. 25 to 99 |        665        7.51       31.78
7. 100 to 499 |      1,125       12.70       44.48
8. 500 to 999 |        543        6.13       50.61
     9. 1000+ |      4,375       49.39      100.00
--------------+-----------------------------------
	Total |      8,858      100.00

-> tabulation of race  

    Race (4 |
categories) |      Freq.     Percent        Cum.
------------+-----------------------------------
   1. White |      7,127       80.46       80.46
   2. Black |        893       10.08       90.54
   3. Asian |        558        6.30       96.84
   4. Other |        280        3.16      100.00
------------+-----------------------------------
      Total |      8,858      100.00

As you can see below, the suffix matches the values for firmsize

. 	dummyout firmsize race

dummy variable(s) created for: firmsize
dummy variable(s) created for: race

. 	d firmsize_* race_*

	      storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------
firmsize_1      float   %9.0g                 1. Under 10
firmsize_2      float   %9.0g                 2. 10 to 24
firmsize_5      float   %9.0g                 5. 25 to 99
firmsize_7      float   %9.0g                 7. 100 to 499
firmsize_8      float   %9.0g                 8. 500 to 999
firmsize_9      float   %9.0g                 9. 1000+
race_1          float   %9.0g                 1. White
race_2          float   %9.0g                 2. Black
race_3          float   %9.0g                 3. Asian
race_4          float   %9.0g                 4. Other

Alternatively, the suffix does match the values for firmsize when using tab(),gen()

. 	drop firmsize_* race_*

. 
.	  foreach var of varlist firmsize race {
.         	qui tab(`var'), gen(`var'_)
.	  }

. 
. 	d firmsize_* race_*

	      storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------
firmsize_1      byte    %8.0g                 firmsize==1. Under 10
firmsize_2      byte    %8.0g                 firmsize==2. 10 to 24
firmsize_3      byte    %8.0g                 firmsize==5. 25 to 99
firmsize_4      byte    %8.0g                 firmsize==7. 100 to 499
firmsize_5      byte    %8.0g                 firmsize==8. 500 to 999
firmsize_6      byte    %8.0g                 firmsize==9. 1000+
race_1          byte    %8.0g                 race==1. White
race_2          byte    %8.0g                 race==2. Black
race_3          byte    %8.0g                 race==3. Asian
race_4          byte    %8.0g                 race==4. Other