Introduce the student to the principles of data analysis in a clinical trial using a professional statistical package.
1. STATA- as statistics/data analysis program
1.1. Widely used
1.2. Powerful - e.g. adjustment for cluster survey, sample size
1.3. Self explained using the help command
1.4. Flexible – easy to customise
1.5. Small size - may run on old computers
1.6. Programmable, scriptable (ado and do files)
1.7. Less expensive than many other statistical software packages – pay only once, not required to renew licences
1.8. Available for: DOS, Windows, Mac, Unix, Linux operative systems
commands window – where we type command in one line. Each command window starts
with a command and followed by the variables. Command window may be used as a
calculator by typing command display followed
by numbers. For e.g.
Properly typing the commands in the STATA is crucial and may seem tricky. Special attention to capital and small letters (STATA is case sensitive!), spaces, brackets and symbols (like commas, “, * etc). You don’t need to type the command every time. A previous command can be accessed using the pgup and pgdn keys or selecting from the review window.
Abbreviations in STATA command – can use abbreviations for command and for variables. For example display (dis), tabulate (tab), summarize (sum), regression (regres), logistic regression (logit) etc.
Error messages – are displayed in red colour when you type a command incorrectly. Unrecognised commands, ambiguous variables, required, not allowed, not sorted, invalid syntax, invalid etc.
2.2. Review window – where all commands are stored since the program was opened
2.3. Results window – where displays the results of an analysis. If the output is longer than STATA Results window then ---more--- appears at the bottom of the screen.
To interrupt the process any time, hold down Ctrl and press the pause/break or simply press the red Break button.
2.4. Variables window – shows all variables of the currently used file; clicking on a variable will display it in the command window
2.5. Do file window. Basically a text editor where you can save your commands and run them from later. Can also be used to make programs within STATA – way beyond the scope of these exercises but we will show some examples.
Create a variable called “id”
Input serial numbers from 1 to 10
Assign a random number to each of the serial numbers
To generate random numbers between and 1
To round to the nearest whole number
Then type list
Note that each serial number have been assigned with a 0 or 1. How many are there in each group?
Delete the variable you have created by typing:
Type generate treat=round(uniform(),1)
Repeate this and note that the number of 0s and 1s varies every time you do this exercise.
Type set seed 1000 (or any other number)
Then generate treat=round(uniform(),1)
How many 0s and 1s are there now.
Inside the computer there are a table of random numbers, the computer starts at one place and works its way through all the numbers every time you request a sample of the random numbers. When you do a “set seed #” you specify where in the random number list the computer should start. When the set seed number is the same in different computers, the random list generated will be the same.
NOTE: A DATASET WITH AT LEAST ONE VARIABLE AND ONE OBSERVATION MUST BE LOADED FOR THIS SCRIPT TO WORK.
capture program drop randomize
program define randomize
capture generate id=1
drop if _n>1
capture drop rand
capture drop block
capture drop group
capture drop treat
sort block rand
by block: generate treat=(_n-1)/_N
by block: replace treat=floor(treat*`3')
to make a randomized list
randomize [# of participants] [block size] [# of treatments]
then try the commands :
To drop the superfluous variables type:
Sample sizes can be determined by the “sampsi” command followed by information. about expected values of variance(SD), significance level, mean or proportion size or power. The sampsi command can be used to calculate group sizes, power, etc. The two first numbers after the command is either proportions or means. If the standard deviation (SD) is not specified and the numbers are between 0 and 1 STATA understands that it is dealing with proportions. For means, at least one SD has to be specified, when only one SD is specified it is assuming similar SDs in the two groups. As long as power and significance level is not specified and sufficient information regarding mean and SD or proportion sizes are given, STATA calculates group size from a significance level of 0.05 and a power of 90%. Other information beyond mean and proportion sizes is specified after a comma.
Two-sample comparison of mean1 to mean2.
sampsi 132.86 127.44, p(0.8) sd1(15.34)
If the expected SD is different among the two groups you can specify two SDs this way:
sampsi 132.86 127.44, p(0.8) sd1(15.34) sd2(18.23)
Two-sample comparison of proportions. Compute sample size with n1 = n2
(i.e., ratio = 1, the default) and power = 0.9 (the default):
sampsi 0.35 0.5
Compute power from one sample comparing proportions
sampsi 0.5 0.6, n(200)
Introduction to Stata by Jeroen Weesie, Utrecht University.
An introduction to Stata in PDF format.
STATA manuals at the library of Centre for International Health.
To get help for a spesific command type:
help [command name]
for example to view the help file for the command regress, type
For a general introduction to stata type :
For those that have never used STATA before it is highly recommended to do the internal tutorial after the computer exercises on the first day.
Alternatively, for other tutorials type:
___ ____ ____ ____ ____ tm
/__ / ____/ / ____/
___/ / /___/ / /___/ Tutorials
To run a tutorial, type
. tutorial ________
filling in the blank with a name from the list on the next page.
We recommend you type
. tutorial intro
first. After that, run the tutorials in whatever order you desire.
intro An introduction to Stata
graphics How to make graphs
tables How to make tables
regress Estimating regression models, including 2SLS
anova Estimating one-, two- and N-way ANOVA and ANCOVA models
logit Estimating maximum-likelihood logit and probit models
survival Estimating maximum-likelihood survival models
factor Estimating factor and principal component models
ourdata Description of the data we provide
yourdata How to input your own data into Stata