Download many

From PANGAEA Wiki
Jump to navigation Jump to search

For technical reasons it is not possible to download multiple files from a single dataset using PANGAEA web interface with one button at the moment. Presented here are possible intermediate solutions, before the "single click" feature is enabled.

Using browser extensions

Multiple files can be downloaded from the dataset website using a browser extension, such as Simple mass downloader for Chrome or a Simple mass downloader for Firefox. With this extension the links may be extracted from active tab / multiple open tabs or from selected text on web pages (also from the clipboard or a local file or a pattern URL).


[Step 1] Install the browser extension (example Simple mass downloader for Chrome)

This example uses Simple mass downloader for Chrome. If using for the first time, go to https://chrome.google.com/webstore/detail/simple-mass-downloader/abdkkegmcbiomijcbdaodaflgehfffed and install the browser extension (Add to Chrome).

Download browser extension by "Add to Chrome"


Before using the extension, read the Quick Start Tutorial. It urges you to verify that "Ask where to save each file before downloading" option in Chrome Download Settings is NOT checked. Otherwise, a lot of "Save As" dialog boxes will pop up, defeating the main purpose of the extension.
The settings of the browser can be found using the triple vertical dot icon in the upper right corner.

Simple mass downloader Quick start tutorial (screenshot)


In the Menu on the right select "Download" and check, if "Ask where to save each file before downloading" is deactivated.

Download Settings in Chrome (screenshot)


Restart the browser.

[Step 2] Select and download all files

The example dataset Belter, Hans Jakob; Krumpen, Thomas; Herber, Andreas (2020): Electromagnetic induction raw data (EM Bird) of POLAR 6 during 2020 IceBird MOSAiC Summer campaign. PANGAEA, doi:10.1594/PANGAEA.924916 contains 15 raw data files, which should be downloaded at once.


In the lower part of the page, select "View dataset as HTML".

Go to "View dataset as HTML"


A table containing links will be loaded in the lower part of the page. Use right click (anywhere at the page) to initiate the file selection ("Add page links to list").

A table containing links. Use right click to initiate the file selection.


Select data type (extension) you wish to download (.dat in this example), the number of items will be indicated. Enter Folder name for download (the folder will be created autimatically under your default Download folder) and press "Download now". This will initiate the mass download.

Select data type, destination folder and start download


Further recommendations
  • If the dataset matrix contains more than 2000 rows (e.g., more than 2000 file links in a column), the HTML view will initially show the first 2000 rows only. To view the complete dataset, scroll down to the bottom of the page with a link that allows to view the full matrix in your browser. Be aware that this may crash the browser due to the huge size.
  • When the dataset is protected (under moratorium) and your PANGAEA user was granted an access rights, make sure to login in order to download the data. If logged in already, logout & login to be sure that the plugin gets the token.
  • Select "Keep logged in on this computer" when logging in before initiating a large download. If the download takes a long time, this might prevent problems with access rights, if your user automatically logs off in the middle of the process.
  • Only a part of the dataset webpage (e.g. only a subset of the table) can be selected before the download is initiated with the right click in [Step 2] (relevant to Simple mass downloader).
  • At the end of [Step 2] it is also possible to select "Add items to list" instead of "Download now". This will enable further setting and functionality of the browser extension (relevant to Simple mass downloader).

Using a script

The table with all links can be downloaded it as tab-delimited text.

If the columns contain the files names only, not paths (typically this is the case if the column header reads "Binary file"), use the filename from data matrix appended to "https://download.pangaea.de/dataset/XXXXXX/files/", where XXXXXX is the dataset ID (the last 6 digits of the DOI) in order to download the files.

Here we will provide a Python script template in the near future.