Research Computing >> Software >> Stata

STATA

Contents


Availability

Linux: Stata/MP 4 version 12 is currently available in Kellogg's Linux server, Skew5 for up to six (6) concurrent users. Type stata or xstata to start Stata version 12. Stata/SE version 13 is available in the Social Science Computing Cluster for up to 36 concurrent users. Type stata to start it.

MS Windows: Stata/IC version 13 is installed in the "special software" workstations at the Jacobs Center (ten workstations) and Chicago campus (six workstations) computer laboratories. For more information about the Kellogg computer labs, point your web browser to the KIS page on kiosks and computer labs.

Personal copies: Stata 13 can be purchased for individual Kellogg user's machines through Stata's GradPlan . Other members of the Northwestern community please click on the Stata GradPlan link in the Weinberg IT web page. Through the GradPlan, Intercooled Stata costs $189, while Stata/SE costs $425. The GradPlan also includes licensing the the multi-processor version, Stata/MP, for 2 and 4 cores, to take advantage of dual core and/or dual CPU workstations.

Students enrolled in the school's MBA programs, are eligible for a copy of Stata/IC through the school license. Support information for MBA students, including office hours and scheduled training sessions, can be found here .

Description

Stata/IC is a general purpose statistical package with good graphics capabilities and a graphic editor. Stata covers a wide range of statistical techniques and is programmable, allowing the user to add new commands. Among the highlights of Stata are that it is relatively easy to learn for beginners. It includes a variety of routines to analyze complex survey data ("svy" commands), panel data ("xt" commands), and survival analysis.

Stata/SE is a version of Stata that can handle up to 32,766 (versus 2,047 for Stata/IC) variables in a dataset, strings of to 244 characters (versus 80 characters), matrices of up to 11,000 by 11,000 elements (versus 800 by 800). Thus, this upgrade is suitable for users who have run into these restrictions.

In general, Stata keeps a large portion of the data in memory; hence, available RAM is a typical constraint for Stata users (versus SAS, for example, which uses very little RAM memory and more hard disk space).

Vendor information

StataCorp LP
4905 Lakeway Drive
College Station, TX 77845
Phone: (800) 782-8272
Fax: (979) 696-4601

Support

In addition to Stata's FAQ section, the Stata listserv is very active and a good source of reference. Users may also contact Stata directly, by sending e-mail to tech-support@stata.com and including the serial number for the copy of Stata being used. The serial number appears on the screen when a Stata session starts.

Running Stata

UNIX

Important information when generating Stata graphics in X Windows: When you create a graph with Stata do NOT close the window that Stata creates for this graph with the control box in the upper-left corner of the window. If you do, your X session may lock up. Instead, return to your working Stata window and continue working. You can issue the command "window define graphics" if you wish to set up a window for any type of graph you use.

To ... The command is
Start a Stata interactive (ASCII) session stata
Start a Stata GUI session (requires X Windows) xstata
See the command line switches stata -h
Set the data memory to X kilobytes stata -kX
Example: stata -k2000
Run Stata in batch mode

Refer to our page on running multiple or large jobs in skew3 for more information.

stata -b do filename
Example: stata -b do bonds

or nohup stata -b do bonds &

Stata assumes that the file's extension is ".do" (e.g., bonds.do)

To end a Stata session exit

Examples and solutions

Sample programs at Kellogg

Creating a log file: Stata does not create a log file unless specified. To do so, in the first line of code include the following command:

log using filename, replace

where "filename" is the log file name and path. For example, to save the commands and results to a log file called "reg-tests.do", the necessary command is:

log using reg-tests, replace

At the end of the program include the following command:

log close

This command will save the results to the specified file.

Logs in version 7 and higher (SMCL): The default format for log files in versions 7 and up is "Stata Mark-up and Control Language" (SMCL). SMCL log files can be translated to ASCII with the "translate" command:

translate file.smcl file.log

If you are using the GUI interface, you may select "Log" | "Translate" from the File menu.

To create an ASCII log instead of an SMCL log, use the "text" option in the "log" command:

log using filename, text
log using filename, text replace

Dealing with long commands - changing the command delimiter: By default, Stata uses a carriage return to delimit one command from the next. If you need more than one line for a command, you may change the command delimiter to a semi-colon (";") with the following command:

#delimit ;

After this statement, each Stata command has to end with a semi-colon, but it can take more than one line. To reset the command delimiter to a carriage return:

#delimit cr

Note that the "#delimit" command can only be used in do- or ado-files.

Changing the memory allocated to the data area: For large datasets, the default memory allocation of 1024KB (1MB) may not be enough. In UNIX, Stata can be started with a larger memory allocation with the "-k" command line option (see example above). In bothe the UNIX and MS Windows versions, memory allocation can also be modified within an exisiting Stata session with the "set memory" command:

set memory X

where X is the desired memory allocation in kilobytes (default). You may also specify the memory in MB. The following three examples set the data memory to 4MB:

set memory 4000
set memory 4000k
set memory 4m

Moving Stata data files (.dta) between MS Windows and UNIX: Stata data files can be read by the UNIX and MS Windows versions of Stata regardless of where the file was created, as long as the files are transferred (FTP) in binary mode.

Moving Stata data files (.dta) between version 7 and version 6: Stata 6 cannot read Stata 7 data files. However, Stata 7 allows you to save files in Stata 6 format. Use the "old" option in the "save" command:

save filename, old

If you have variables with names longer than 8 characters (supported in version 7, but not in version 6), Stata will refuse to save the file.

Saving estimation results to a spreadsheet: The statsby command (new in version 7) allows the user to select results saved internally by Stata and place them into a dataset. The dataset can then be saved in a format readable in Excel using the outsheet command. The command allows estimation on the entire dataset or by groups. For example, to get the coefficients, standard errors, R2, adjusted-R2, and F statistic for regressions estimated separately by a categorical variable (catvar), the necessary command would be:

statsby "regress y x1 x2" _b _se rsq=e(r2) adjrsq=e(r2_a) fstat=e(F), by(catvar)

Stata will clear the dataset in memory (unless it has been changed) with a new dataset with the following variables (columns): b_x1, b_x2, b_cons, se_x1, se_x2, se_cons, rsq, adjrsq, fstat. Each line gives the results of the regression for a value of "catvar". The "by(variable_name)" portion of the command is optional. Its exclusion will execute the "regress" command on the entire dataset.

Note that if the dataset was modified and not saved before issuing the statsby command, Stata will not execute the command and will print the following message: "no; data in memory would be lost".

For more information, refer to the Stata "User's Guide" (for version 7, sections 16.6, 21.8 and 21.9), as well as to the section on "Saved Results" included in the reference chapter for any estimation command. A quick list of saved results can be obtained after an estimation by typing "estimates list" or "return list" depending on the command used. The "statsby" command is described in volume 3 of the Stata Reference manuals.

Creating publishable tables: "Outreg", an ado file written by John L. Gallup, reduces the work needed to create publishable tables from regular Stata output. To download and install it, type "net search outreg" at the Stata (version 7) prompt. "Outreg" creates formatted ASCII tables, using tabs as column delimiters. "outreg" was described in STB-46, and reprinted in Stata Technical Bulletin Reprints, volume 8, pp. 200-202 (available in Research Computing). The outreg help file has been converted into a PDF file for printing (provided with the author's consent). Refer to our page on outreg for some examples. The last update for outreg was announced on October 11, 2001, available from the SSC-IDEAS site: with Stata, type "ssc install outreg".

For tables of summary statistics, there are various of user contributed programs. For example, statsmat will produce a matrix with the requested summary statistics; it can be output to LATEX with the outtable command. Another available command is fsum. These packages are also available on the SSC-IDEAS site.

Adding personal ADO files in skew3: Unlike personal workstations, users are not allowed to write to the Stata directory in skew3 or any other Unix server. If you need to use ado files written by you or some other researcher, you may accomplish this by placing these ado files in your "personal ado directory". In skew3, this directory is a subdirectory of the user's home directory: ~/ado/personal. The location may vary in different systems. Use the "sysdir" command to produce a listing of Stata's system directories.

Adding personal or "Plus" (from the Stata Technical Bulletins or Stata Journal) ado files in public lab computer: Users of Kellogg's lab computers cannot install plus or personal ADO files in the directory where Stata is installed. Instead, users can install the needed ADO files to their Windows home directory (drive H). To do this:

  • Typing "sysdir list" in the Stata command window will list the current assignment of directories.
  • Create a folder called "ado" in your H drive. In this folder, create a sub-folder called "plus" or "personal".
  • In the Stata command window, type "sysdir set PLUS h:\ado\stbplus" or "sysdir set PERSONAL h:\ado\personal", to reset the directories where Stata looks for plus or personal ado additions.
  • Install the required ADO files -- they will be written to your home directory in the directory you specified.
  • The assignment made with the "sysdir" command is valid only for the current session. If you close and open Stata, you will have to issue the command again. You can include the sysdir command in the do files you create. Since the additional ADO files are in your home directory, you need to install them only once and you may use them in different lab workstations just by setting the appropriate directory with the "sysdir" command.

Searching and adding user-written additions (ADO files): Within Stata, use the net search command. For example, to find the "suest" ado and help files:

net search suest

The available commands will be listed in reverse chronological order.

Variable labels are not displayed in the variables window: In Stata 7 variable names are shown with a default of 32 characters in the variables window. This often results in variable labels that cannot be seen unless the variables window is made wider. To reduce the space reserved for the variable names in the variables window, use the "varlabelpos" option:

set varlabelpos #

where "#" is a number between 8 and 32.

Executing commands every time Stata is started: To execute commands immediately after Stata starts, create a file called "profile.do" in one of the directories searched by Stata (see the "Getting Started" manual, section A.7). For example, in Windows, "profile.do" could be created in "c:\ado\personal", while in Unix the equivalent directory would be "~/ado/personal". To see a list of Stata's system directories, use the sysdir command. A sample "profile.do" could contain the following commands:

set memory 4m
set logtype text
set varlabelpos 10

Performance issues with graphics in Stata 8: Kellogg users of Windows 98 have reported performance problems with graphics and dialog boxes in Windows 98. For example, a simple scatter plot using the new "scatter" command may take more than a minute to be displayed. Similarly, Stata may take a long time to display some dialog boxes. This delay does not seem affected by the number of observation in the data set, and it is most noticeable when generating the first graphic during a Stata session. Subsequent graphics are produced faster. There are several actions users can take to improve performance:

  • Users who have McAfee VirusScan should make sure that their current scan engine is up to date. As of April 2003, it should be version 4.2.40. To verify the version of the scan engine on your computer, right-click on the McAfee icon on the system tray and select "About". If the scan engine is not up to date, update McAfee and reboot the machine. This will improve performance significantly since a previous release of the McAfee engine caused delays in all applications.
  • Use smaller versions of Stata's dialog boxes and menus. You may try this option by issuing the "set smalldlg on" command in Stata 8. If satisfactory, you can make this option the default: "set smalldlg on, permanently". Since the memory area assigned by Windows 98 for storaging interface controls is shared by all running applications, performance can also be increased by closing other applications.
  • Version 7 style graphics are still much faster than version 8 graphics. For a quick scatter plot, for example, using the "graph7" or "gr7" command (e.g., "gr7 mpg weight" instead of "scatter mpg weight") will produce a graphic without delay.

Manuals available at Research Computing

Starting with Stata 11, there is an electronic version of Stata manuals in .pdf format, linked below. In addition, there is an electronic version of the help files, which you may search. The result of your search will open in a new window.

Search for a Stata command:

Stata 11:

Stata 8:

  • Stata 8: User's Guide
  • Stata 8: Graphics Reference Manual
  • Stata 8: Stata 8: Base Reference Manual (4 volumes)
  • Stata 8: Programming Reference Manual
  • Stata 8: Getting Started manual for Unix
  • Stata 8: Cluster Analysis Reference Manual
  • Stata 8: Cross-Sectional Time-Series Reference Manual
  • Stata 8: Survey Data Reference Manual
  • Stata 8: Survival Analysis & Epidemiological Tables Reference Manual
  • Stata 8: Times-Series Reference Manual

Stata 7:

  • Stata, release 7: User's Guide
  • Stata, release 7: Graphics Manual
  • Stata, release 7: Reference (volumes 1-4)
  • Stata, release 7: Getting Started with Stata for Unix
  • Stata, release 7: Programming Manual

Other books about Stata (available at Kellogg)

Useful links

© 2001-2010 Kellogg School of Management, Northwestern University