Stata

Description 
Stata/IC is a general-purpose statistical package with good graphics capabilities and a graphic editor. Stata covers a wide range of statistical techniques and is programmable, allowing the user to add new commands. One of the highlights of Stata is that it is relatively easy to learn for beginners. It includes a variety of routines to analyze complex survey data ("svy" commands), panel data ("xt" commands), and survival analysis.

Stata/SE is a version of Stata that can handle up to 32,766 (versus 2,047 for Stata/IC) variables in a dataset, strings of up to 244 characters (versus 80 characters), matrices of up to 11,000 by 11,000 elements (versus 800 by 800). Thus, this upgrade is suitable for users who have run into these restrictions.

Note that Stata keeps the data in memory; hence, available RAM is a typical constraint for Stata users (as opposed to SAS, which is optimized to work with very little RAM).

Availability

Linux: Stata/MP4 is currently available on Kellogg's Linux server, Skew for up to six concurrent users. In addition, the Social Science Computing Cluster offers Stata/SE for up to 36 concurrent users.

MS Windows: Stata/IC is installed on the "special software" workstations at the Jacobs Center (ten workstations) and Chicago campus (six workstations) computer laboratories. For more information about the Kellogg computer labs, see the KIS page on Kiosks and Computer Labs. For current Kellogg faculty, a three-user license for Stata MP/32 is also available on KDC.

Personal Copies: MBA and MSMS students, LLMK participants, as well as faculty and TAs teaching with Stata in the current quarter are eligible to obtain Stata for their personal computers at no charge (see the KIS Stata page for details). Support information for MBA students, including office hours and scheduled training sessions, can be found here

Other Kellogg users, including faculty and graduate students who use Stata for research, can purchase licenses through Stata GradPlan. Other members of the Northwestern community should refer to the Stata GradPlan link on the Weinberg IT software purchasing page.

Vendor Information 
StataCorp LP
4905 Lakeway Drive
College Station, TX 77845 
Phone: (800) 782-8272
Fax: (979) 696-4601

Support
In addition to Stata's FAQ section, the Stata listserv is very active and a good source of reference. Users may also contact Stata directly, by sending an email to tech-support@stata.com and including the serial number for the copy of Stata being used. The serial number appears on the screen when a Stata session starts.

Running Stata

UNIX

To...
The Command Is
Start a Stata Interactive (ASCII) Session
stata
Start a Stata GUI Session (Requires X Window)
xstata
See the Command Line Switches
stata -h
Run Stata in Batch Mode
stata -b do filename
Run Stata in Batch Mode, in the Background
(refer to our page on Running Multiple or Large Jobs in Skew for more information)
nohup stata -b do bonds &
Stata assumes that the file's extension is ".do" (e.g., bonds.do)
End a Stata Session
exit

Examples and Solution

Creating a Log File: Stata does not create a log file unless specified. To do so, include the following command in the first line of code:

log using filename, replace

where "filename" is the log file name and path. For example, to save the commands and results to a log file called "reg-tests.do," the necessary command is

log using reg-tests, replace

At the end of the program, include the following command:

log close

This command will save the results to the specified file.
Since Stata version 7, the default format for log files is "Stata Markup and Control Language" (SMCL). SMCL log files can be translated to ASCII with the "translate" command:

translate file.smcl file.log

If you are using the GUI interface, you may select "Log" -> "Translate" from the File menu. To create an ASCII log instead of an SMCL log, use the "text" option in the "log" command:

log using filename, text
log using filename, text replace 

Alternatively, it is possible to switch the default log type to text:

set logtype text


Dealing with Long Commands—Changing the Command Delimiter
: By default, Stata uses a carriage return to delimit one command from the next. If you need more than one line for a command, you may change the command delimiter to a semicolon (";") with the following command:

#delimit ;

After this statement, each Stata command must end with a semicolon, but it can take more than one line. To reset the command delimiter to a carriage return, use the following command: 

#delimit cr

Note that the "#delimit" command can only be used in do- or ado-files.

Changing the Memory Allocated to the Data Area: On some systems, default memory may be set to less than available physical memory. This can be changed by using the "set max_memory" command:

set memory X

where X is the desired memory allocation. For example, to set the data memory to 4 GB, use  "set max_memory 4g". In addition, by default Stata limits the number of variables in a data set to 5,000. This can be changed with "set maxvar". For more information, please refer to Stata documentation, available through a "help memory" command.

Moving Stata Data Files (.dta) between MS Windows and UNIX: Stata data files can be read by the UNIX and MS Windows versions of Stata regardless of where the file was created, as long as the files are transferred (SFTP) in binary mode.

Moving Stata Data Files (.dta) between Different Versions of Stata: Later versions of Stata can read data files created in the earlier versions; however, the inverse is not true. Stata13 has a "saveold" command to save data in Stata12 format; to transfer data for use in an older version of Stata, it may be necessary to export it into another format, such as .csv. 

Moving Stata Code (.do, .ado) between Different Versions of Stata: Sometimes, later Stata versions introduce changes that prevent older programs from working. To help address that, Stata offers a "version" command, which instructs a later version of Stata to emulate the behavior of an earlier one.

Saving Estimation Results and Formatting for Publication: The statsby command (new in version 7) allows the user to select results saved internally by Stata and place them into a dataset. The dataset can then be saved in a format readable in Excel using the outsheet command. The command allows estimation on the entire dataset or by groups. For example, to get the coefficients, standard errors, R2, adjusted-R2, and F statistic for regressions estimated separately by a categorical variable (catvar), the necessary command would be

statsby "regress y x1 x2" _b _se rsq=e(r2) adjrsq=e(r2_a) fstat=e(F), by(catvar)

Stata will clear the dataset in memory (unless it has been changed) with a new dataset with the following variables (columns): b_x1, b_x2, b_cons, se_x1, se_x2, se_cons, rsq, adjrsq, fstat. Each line gives the results of the regression for a value of "catvar". The "by(variable_name)" portion of the command is optional. Its exclusion will execute the "regress" command on the entire dataset.

Note that if the dataset is modified and not saved before the statsby command is issued, Stata will not execute the command and will print the following message: "no; data in memory would be lost".

For more information, refer to the Stata "User's Guide," as well as to the section on "Saved Results" included in the reference chapter for any estimation command. A quick list of saved results can be obtained after an estimation by typing estimates list or return list, depending on the command used.

This handout demonstrates how to save Stata output and how to use contributed commands such as matsaveestout, and a few others to format and export tables of summary statistics and estimation results for publication. Another commonly used tool for saving and exporting estimation results in Stata is outreg2

Adding Personal ADO Files in Skew: Unlike personal workstations, users are not allowed to write to the Stata directory in Skew or any other Unix server. If you need to use ado files written by you or some other researcher, you may do so by placing these ado files in your "personal ado directory." In Skew, this directory is a subdirectory of the user's home directory: ~/ado/personal. The location may vary in different systems. Use the sysdircommand to produce a listing of Stata's system directories.

Adding Personal or "Plus" (from the Stata Technical Bulletins or Stata Journal) ADO Files on a Public Lab Computer: Users of Kellogg's lab computers cannot install plus or personal ADO files in the directory where Stata is installed. Instead, users can install the needed ADO files to their Windows home directory (drive H). To do this:

  1. Type sysdir list in the Stata command window to list the current assignment of directories.
  2. Create a folder called "ado" in your H drive. In this folder, create a subfolder called "plus" or "personal."
  3. In the Stata command window, type sysdir set PLUS h:\ado\stbplus or sysdir set PERSONAL h:\ado\personal to reset the directories where Stata looks for plus or personal ado additions.
  4. Install the required ADO files—they will be written to your home directory in the directory you specified.
  5. The assignment made with the "sysdir" command is valid only for the current session. If you close and open Stata, you will need to reissue the command. You can include the sysdir command in the do files you create. Since the additional ADO files are in your home directory, you need to install them only once and you may use them in different lab workstations by simply setting the appropriate directory with the "sysdir" command.

Searching and Adding User-Written Additions (ADO Files): Within Stata, use the net search command. For example, to find the "suest" ado and help files, use

net search suest

The available commands will be listed in reverse chronological order.

Variable Labels Are Not Displayed in the Variables Window: In Stata 7, variable names are shown with a default of 32 characters in the variables window. This often results in variable labels that cannot be seen unless the variables window is made wider. To reduce the space reserved for the variable names in the variables window, use the "varlabelpos" option:

set varlabelpos #

where "#" is a number between 8 and 32.

Executing Commands Every Time Stata Is Started: To execute commands immediately after Stata starts, create a file called "profile.do" in one of the directories searched by Stata (see the "Getting Started" manual, section A.7). For example, in Windows, "profile.do" could be created in "c:\ado\personal", while in Unix, the equivalent directory would be "~/ado/personal". To see a list of Stata's system directories, use the sysdircommand. A sample "profile.do" could contain the following commands:

set max_memory 4g
set logtype text
set varlabelpos 10

Useful Links