Download and unzip data files from Stata (Linux/Windows)

Recently, I’ve been using Stata’s -shp2dta- command to convert some shapefiles to stata format, grabbing Lat/Lon data and merging into another dataset. There were several compressed shapefiles I wanted to download contained in a directory from the web. I could manually download each file and uncompress each one but that would be time consuming. Also, when the maps are updated, I’d have to do the download/uncompress all over again. I’ve found that the process can be automated from within Stata by using a combination of -shell- and some handy terminal commands. If you are using Windows, you’ll need an additional set of command-line utilities called Unix Utils.

Steps to download and start using compressed data are outlined below.

step one (skip if Linux user):

  1. Download Unix Utilities for Windows
  2. Unzip Unix tools to a directory of your choice. I put them in my “Program Files” directory.
  3. Browse through the directory you just created to find the folder where the executable files are. After extracting the files the executables were in:
    C:\Program Files\usr\local\wbin\
    
  4. Add this directory to your system PATH variable. You do this so that you can call these commands from the terminal from any folder on your hard drive. In my case I right clicked My Computer, selected properties > Advanced settings > Environment Variables, then edit the “path” variable adding the text “;C:\Program Files\usr\local\wbin\” without quotation marks. Be careful not to add a space between the semi-colon and the directory name.

Step two:

Open Stata and run the following commands or add them to a do-file.

Change to a working directory (e.g. cd “C:\Temp”)

Download the file to a file called download.zip by issuing the command:

shell wget -Odownload.zip "http://example.com/download.zip"

This command will use Stata’s -shell- command to send the text following to your operating system’s shell. For the command above this will use the unix utility wget to download a compressed file.

Now that the file is downloaded we need to unzip the compressed archive. On Windows I prefer to use 7zip for compression, but you could also use gzip from the Unix Utils package or unzip if you’re on Linux. I’ve included all three of these options below so choose one. Again, we submit the commands to Stata with the prefix -shell- so it sends the command to the OS shell.

shell 7z x -o.\unZipped download.zip
shell gzip -d download.zip
shell unzip download.zip

Including these in a do-file is extra useful as it automates the download and unzipping process. Be careful using wget in a do-file though, especially if you are trying to debug a your code. Some webmasters won’t like you downloading files multiple times and using up their bandwidth and this can get you blocked.

I haven’t tried it yet, but, I imagine that -shell- could be used to call other handy command line tools like Python scripts or maybe even leverage some useful R packages. I’ll post again if I have any luck with these options.

Now it’s time to dive into the data contained in those compressed archives and unlock their secrets. Good luck!!

Comments on this article can be sent to me by e-mail..

6 Responses to “Download and unzip data files from Stata (Linux/Windows)”

  1. The Stata Blog » Automating web downloads and file unzipping writes:

    [...] J. Dyck wrote a nice post on his blog on how to Download and unzip data files from Stata. He writes Recently, I’ve been using Stata’s -shp2dta- command to convert some shapefiles to [...]

  2. Bachir writes:

    Hi,
    I’ve followed exactly the steps for downloading datasets from the DHS website but nothing is loaded.
    What am I doing wrong?
    Regards.
    Bachir.

  3. Andrew writes:

    Hmm, I think there are likely to be a couple options for why nothing is loaded. Pay attention to whatever the error message in the Stata console says and that should give you some idea of where to start looking. I’d check 1) did the files download properly? 2) any error unzipping? Are the files password protected? 3) Any error in the Stata console in importing the data? Not enough memory, strangely delimited, etc.?

    If you cannot find the answer, try e-mailing statalist if it’s a general problem or try Stack Overflow, which is getting a good following of Stata users to answer questions.

  4. linux writes:

    Very good blog. I reali like it! Good 4 you;)

  5. Steve writes:

    how do we do this with a password protected site, when we have the password?

  6. Andrew writes:

    If you are using the suggestions here with the unixutils on windows or the native wget applications on Mac or Linux then you can download from a password protected site by using the password option of wget. Your command will look something like:

    wget -O download.zip -password=MySecretPassword “http://example.com/download.zip”

    If it is the zip file itself that’s password protected you are looking for the -P option in your unzip utility. If using unix utils you should be able to type unzip –help for a list of options for this application.

    Good luck!