Generating random numbers

         Sample size calculation



Introduce the student to the principles of data analysis in a clinical trial using a professional statistical package.




1.     STATA- as statistics/data analysis program


1.1.  Widely used

1.2.  Powerful - e.g. adjustment for cluster survey, sample size

1.3.  Self explained using the help command

1.4.  Flexible – easy to customise

1.5.  Small size - may run on old computers

1.6.  Programmable, scriptable (ado and do files)

1.7.  Less expensive than many other statistical software packages – pay only once, not required to renew licences

1.8.  Available for: DOS, Windows, Mac, Unix, Linux operative systems


2.     Windows


2.1.  STATA commands window – where we type command in one line. Each command window starts with a command and followed by the variables. Command window may be used as a calculator by typing command display followed by numbers. For e.g.
display 500+120 
display sqrt(25)
display log(5)


Properly typing the commands in the STATA is crucial and may seem tricky. Special attention to capital and small letters (STATA is case sensitive!), spaces, brackets and symbols (like commas, “, * etc). You don’t need to type the command every time. A previous command can be accessed using the pgup and pgdn keys or selecting from the review window.


Abbreviations in STATA command – can use abbreviations for command and for variables. For example display (dis), tabulate (tab), summarize (sum), regression (regres), logistic regression (logit) etc.


Error messages – are displayed in red colour when you type a command incorrectly. Unrecognised commands, ambiguous variables, required, not allowed, not sorted, invalid syntax, invalid etc.


2.2.  Review window – where all commands are stored since the program was opened


2.3.  Results window – where displays the results of an analysis. If the output is longer than STATA Results window then ---more--- appears at the bottom of the screen.

To interrupt the process any time, hold down Ctrl and press the pause/break or simply press the red Break button.


2.4.  Variables window – shows all variables of the currently used file; clicking on a variable will display it in the command window

2.5.  Do file window. Basically a text editor where you can save your commands and run them from later. Can also be used to make programs within STATA – way beyond the scope of these exercises but we will show some examples.






Generating random numbers



Create a variable called “id”

                        generate id=1

Input serial numbers from 1 to 10











Assign a random number to each of the serial numbers


                        generate treat=round(uniform(),1)


To generate random numbers between and 1


To round to the nearest whole number



Then type list



Note that each serial number have been assigned with a 0 or 1. How many are there in each group?


Delete the variable you have created by typing:

                         drop treat    

      Type   generate treat=round(uniform(),1)


Repeate this and note that the number of 0s and 1s varies every time you do this exercise. 

                        drop treat

      Type  set seed 1000 (or any other number)

      Then  generate treat=round(uniform(),1)


How many 0s and 1s are there now.


Inside the computer there are a table of random numbers, the computer starts at one place and works its way through all the numbers every time you request a sample of the random numbers. When you do a “set seed #” you specify where in the random number list the computer should start. When the set seed number is the same in different computers, the random list generated will be the same.




Program for making a block randomized list with two or more treatments





Copy this text between the two lines into a “do” file in STATA



capture program drop randomize

program define randomize

capture generate id=1

drop if _n>1

capture drop rand

capture drop block

capture drop group

capture drop treat

expand `1'

replace id=_n

generate rand=uniform()

generate block=floor(((id-1)/`2'))+1

sort block rand

by block: generate treat=(_n-1)/_N

by block: replace treat=floor(treat*`3')

sort id





to make a randomized list




randomize [# of participants] [block size] [# of treatments]


then try the commands :





tabulate treat

tabulate block


To  drop the superfluous variables type:


drop rand




Calculating Sample Sizes using STATA


sample size determination



Sample sizes can be determined by the “sampsi command followed by information. about expected values of variance(SD), significance level, mean or proportion size or power. The sampsi command can be used to calculate group sizes, power, etc. The two first numbers after the command is either  proportions or means. If the standard deviation (SD) is not specified and the numbers are between 0 and 1 STATA understands that it is Text Box: Options for the ”sampsi command”
alpha(#) specifies the significance level of the test; the default is
    alpha(.05).  (More correctly, the default is 1-level/100 from set level,
    see help @level@.)
power(#) is power of the test.  Default is power(.90).
n1(#) specifies the size of the first (or only) sample and n2(#) specifies the
    size of the second sample.  If specified, sampsi reports the power calcula-
    tion.  If not specified, sampsi computes sample size.
ratio(#) is an alternative way to specify n2() in two-sample tests.  In a two-
    sample test, if n2() is not specified, n2() is assumed to be n1()*ratio().
    That is, ratio() = n2()/n1().  The default is ratio(1).
sd1(#) and sd2(#) are the standard deviations for comparison of means.  If not
    specified, comparison of proportions is assumed.  In two-sample cases, if
    only sd1() is specified, sd2() is assumed to equal sd1().
onesample indicates a one-sample test.  The default is a two-sample test.
onesided indicates a one-sided test.  The default is a two-sided test.

dealing with proportions. For means, at least one SD has to be specified, when only one SD is specified it is assuming similar SDs in the two groups. As long as power and significance level is not specified and sufficient information regarding mean and SD or proportion sizes are given, STATA calculates group size from a significance level of 0.05 and a power of 90%. Other information beyond mean and proportion sizes is specified after a comma.




Two-sample comparison of mean1 to mean2.


sampsi 132.86 127.44, p(0.8) sd1(15.34)


Estimated sample size for two-sample comparison of means


Test Ho: m1 = m2, where m1 is the mean in population 1

                    and m2 is the mean in population 2



         alpha =   0.0500  (two-sided)

         power =   0.8000

            m1 =   132.86

            m2 =   127.44

           sd1 =    15.34

           sd2 =    15.34

         n2/n1 =     1.00


Estimated required sample sizes:


            n1 =      126

            n2 =      126



If the expected SD is different among the two groups you can specify two SDs this way:


sampsi 132.86 127.44, p(0.8)  sd1(15.34) sd2(18.23)




Estimated sample size for two-sample comparison of means


Test Ho: m1 = m2, where m1 is the mean in population 1

                    and m2 is the mean in population 2



         alpha =   0.0500  (two-sided)

         power =   0.8000

            m1 =   132.86

            m2 =   127.44

           sd1 =    15.34

           sd2 =    18.23

         n2/n1 =     1.00


Estimated required sample sizes:


            n1 =      152

            n2 =      152


Two-sample comparison of proportions.  Compute sample size with n1 = n2

    (i.e., ratio = 1, the default) and power = 0.9 (the default):


 sampsi 0.35 0.5


Estimated sample size for two-sample comparison of proportions


Test Ho: p1 = p2, where p1 is the proportion in population 1

                    and p2 is the proportion in population 2



         alpha =   0.0500  (two-sided)

         power =   0.9000

            p1 =   0.3500

            p2 =   0.5000

         n2/n1 =   1.00


Estimated required sample sizes:


            n1 =      240

            n2 =      240




Compute power from one sample comparing proportions


 sampsi 0.5 0.6, n(200)


Estimated power for one-sample comparison of proportion

  to hypothesized value


Test Ho: p = 0.5000, where p is the proportion in the population




         alpha =   0.0500  (two-sided)

 alternative p =   0.6000

 sample size n =      200


Estimated power:


         power =   0.8123







Introduction to Stata by Jeroen Weesie, Utrecht University.

An introduction to Stata in PDF format.

STATA manuals at the library of Centre for International Health.


To get help for a spesific command type:


help [command name]


for example to view the help file for the command regress, type


help regress


For a general introduction to stata type :

tutorial intro


For those that have never used STATA before it is highly recommended to do the internal tutorial after the computer exercises on the first day.


Alternatively, for other tutorials type:


tutorial contents


  ___  ____  ____  ____  ____ tm

 /__    /   ____/   /   ____/

___/   /   /___/   /   /___/    Tutorials




To run a tutorial, type


       . tutorial ________


filling in the blank with a name from the list on the next page.



We recommend you type


        . tutorial intro


first.  After that, run the tutorials in whatever order you desire.



Tutorial        Description



intro           An introduction to Stata


graphics        How to make graphs

tables          How to make tables


regress         Estimating regression models, including 2SLS

anova           Estimating one-, two- and N-way ANOVA and ANCOVA models

logit           Estimating maximum-likelihood logit and probit models

survival        Estimating maximum-likelihood survival models

factor          Estimating factor and principal component models


ourdata         Description of the data we provide

yourdata        How to input your own data into Stata