** **

**JANUARY 2004**

Day1

Introduce
the student to the principles of data analysis in a clinical trial using a
professional statistical package.

1. STATA- as
statistics/data analysis program

1.1. Widely
used

1.2. Powerful
- e.g. adjustment for cluster survey, sample size

1.3. Self
explained using the help
command

1.4. Flexible
Ð easy to customise

1.5. Small
size - may run on old computers

1.6. Programmable,
scriptable (ado and do files)

1.7. Less
expensive than many other statistical software packages Ð pay only once, not
required to renew licences

1.8. Available
for: DOS, Windows, Mac, Unix, Linux operative systems

2. Windows

2.1. STATA
commands window Ð where we type command in one line. Each command window starts
with a command and followed by the variables. Command window may be used as a
calculator by typing command *display* followed
by numbers. For e.g.

display 500+120

display sqrt(25)

display log(5)

Properly
typing the commands in the STATA is crucial and may seem tricky. Special
attention to capital and small letters (STATA is case sensitive!), spaces,
brackets and symbols (like commas, Ò, * etc). You donÕt need to type the
command every time. A previous command can be accessed using the pgup
and pgdn
keys or selecting from the review window.

Abbreviations
in STATA command Ð can use abbreviations for command and for variables. For
example display (dis), tabulate (tab), summarize (sum), regression (regres),
logistic regression (logit) etc.

Error
messages Ð are displayed in red colour when you type a command incorrectly.
Unrecognised commands, ambiguous variables, required, not allowed, not sorted,
invalid syntax, invalid etc.

2.2. Review
window Ð where all commands are stored since the program was opened

2.3. Results
window Ð where displays the results of an analysis. If the output is longer
than STATA Results window then ---more--- appears at the bottom of the
screen.

To
interrupt the process any time, hold down Ctrl and press
the pause/break
or simply press the red Break button.

2.4. Variables
window Ð shows all variables of the currently used file; clicking on a variable
will display it in the command window

2.5. Do
file window. Basically a text editor where you can save your commands and run
them from later. Can also be used to make programs within STATA Ð way beyond
the scope of these exercises but we will show some examples.

Create a variable called ÒidÓ

generate id=1

Input serial
numbers from 1 to 10

input

1

2

3

.

.

10

end

Assign a random number to each of the serial numbers

generate treat=round(uniform(),1)

To generate random numbers between and 1

uniform()

To round to the
nearest whole number

round(a,1)

Then type list

list

Note that each serial number have been assigned with a 0 or 1. How many are there in each group?

Delete the variable you have created by typing:

drop
treat

Type generate
treat=round(uniform(),1)

Repeate this and note that the number of 0s and 1s varies every time you do this exercise.

drop treat

Type set seed 1000 (or
any other number)

Then generate treat=round(uniform(),1)

How many 0s and 1s are there now.

Inside the computer there are a table of random numbers, the computer starts at one place and works its way through all the numbers every time you request a sample of the random numbers. When you do a Òset seed #Ó you specify where in the random number list the computer should start. When the set seed number is the same in different computers, the random list generated will be the same.

NOTE: A DATASET WITH AT LEAST ONE VARIABLE AND ONE OBSERVATION MUST BE LOADED FOR THIS SCRIPT TO WORK.

Copy this text between the two lines into a ÒdoÓ file in STATA

capture program drop
randomize

program define randomize

capture generate id=1

drop if _n>1

capture drop rand

capture drop block

capture drop group

capture drop treat

expand `1'

replace id=_n

generate rand=uniform()

generate
block=floor(((id-1)/`2'))+1

sort block rand

by block: generate
treat=(_n-1)/_N

by block: replace
treat=floor(treat*`3')

sort id

end

to make a randomized list

type

randomize [# of
participants] [block size] [# of treatments]

then try the commands :

list

edit

tabulate treat

tabulate block

To drop the superfluous variables type:

drop rand

Sample sizes can
be determined by the Òsampsi*Ó*
command followed by information. about expected values of variance(SD), significance
level, mean or proportion size or power. The sampsi command can be used to
calculate group sizes, power, etc. The two first
numbers after the command is either
proportions or means. If the standard deviation (SD) is not specified
and the numbers are between 0 and 1 STATA understands that it is dealing with proportions. For means, at least one SD has to be
specified, when only one SD is specified it is assuming similar SDs in the two groups.
As long as power and significance level is not specified and sufficient
information regarding mean and SD or proportion sizes are given, STATA
calculates group size from a significance level of 0.05 and a power of 90%.
Other information beyond mean and proportion sizes is specified after a comma.

Examples:

Two-sample
comparison of mean1 to mean2.

sampsi 132.86
127.44, p(0.8) sd1(15.34)

Estimated sample size for two-sample comparison of means

Test Ho: m1 = m2, where m1 is the mean in population 1

and m2 is the mean in population 2

Assumptions:

alpha = 0.0500 (two-sided)

power = 0.8000

m1 = 132.86

m2 = 127.44

sd1
= 15.34

sd2
= 15.34

n2/n1 = 1.00

Estimated required sample sizes:

n1 =
126

n2 =
126

If
the expected SD is different among the two groups you can specify two SDs this
way:

sampsi 132.86
127.44, p(0.8) sd1(15.34)
sd2(18.23)

Estimated sample size for two-sample comparison of means

Test Ho: m1 = m2, where m1 is the mean in population 1

and m2 is the mean in population 2

Assumptions:

alpha = 0.0500 (two-sided)

power = 0.8000

m1 = 132.86

m2 = 127.44

sd1
= 15.34

sd2
= 18.23

n2/n1 = 1.00

Estimated required sample sizes:

n1 =
152

n2 =
152

Two-sample
comparison of proportions. Compute
sample size with n1 = n2

(i.e., ratio = 1, the
default) and power = 0.9 (the default):

sampsi 0.35 0.5

Estimated sample size for two-sample comparison of
proportions

Test Ho: p1 = p2, where p1 is the proportion in population
1

and p2 is the proportion in population 2

Assumptions:

alpha = 0.0500 (two-sided)

power = 0.9000

p1 = 0.3500

p2 = 0.5000

n2/n1 = 1.00

Estimated required sample sizes:

n1 =
240

n2 = 240

Compute
power from one sample comparing proportions

sampsi 0.5 0.6, n(200)

Estimated power for one-sample comparison of proportion

to
hypothesized value

Test Ho: p = 0.5000, where p is the proportion in the
population

Assumptions:

alpha = 0.0500 (two-sided)

alternative p
= 0.6000

sample size n
= 200

Estimated power:

power = 0.8123

http://www.stata.com/

Introduction
to Stata by Jeroen Weesie, Utrecht University.

An
introduction to Stata in PDF format.

STATA
manuals at the library of Centre for International Health.

To get help for a spesific command type:

help [command name]

for example to view the help file for the command regress, type

help regress

For a general introduction to stata type :

tutorial intro

For those that have never used STATA before it is highly recommended to do the internal tutorial after the computer exercises on the first day.

Alternatively, for other tutorials type:

tutorial contents

___ ____ ____ ____ ____ tm

/__ /
____/ / ____/

___/ / /___/
/ /___/ Tutorials

-----------------------------------------

To run a
tutorial, type

.
tutorial ________

filling in
the blank with a name from the list on the next page.

We
recommend you type

.
tutorial intro

first. After that, run the tutorials in
whatever order you desire.

Tutorial
Description

-------------------------------------------------------------------------------

intro An
introduction to Stata

graphics How
to make graphs

tables How to make
tables

regress
Estimating regression models, including 2SLS

anova
Estimating one-, two- and N-way ANOVA and ANCOVA models

logit
Estimating maximum-likelihood logit and probit models

survival
Estimating maximum-likelihood survival models

factor Estimating
factor and principal component models

ourdata
Description of the data we provide

yourdata How
to input your own data into Stata