Technology

From PANGAEA Wiki
Jump to navigation Jump to search

Technology of PANGAEA is based on a three tiered client/server architecture with a data set cache, a data warehouse and a storage system.

  1. A relational database is backend and central archiving system using PostgreSQL as the RDB management software on a multiprocessor computer.
  2. As middleware, a small server component called "Distributor" or "Background Services" is maintained. This software takes care to marshal the relational database to flat XML files and flat data files that are served by the website. The search interface "PANGAEA Search" is backed by Elasticsearch an enterprise search server based on Apache Lucene. Uwe Schindler is part of the development team of Apache Lucene and supports Elasticsearch developers. He has provided significant contributions to the projects to support numerical search. He is spending a large amount of his time to contribute to those projects. Indexing of the underlying metadata is done through panFMP (version 2.0-dev, not yet released) in near-realtime (NRT), so metadata is searchable once the dataset is processed by background services. All components are encapsulated and use standard interfaces for communication.
  3. On the frontend side different clients ensure access to the system. The graphical user interface (GUI) for data upload and metadata definitions uses 4th Dimension software (ACI) running on a windows server (4D-client for Mac OS X and Windows). A web server runs the various domains and web services for data retrieval, download and harvesting. Middleware and frontend components follow a generic model to ensure a flexible functionality and easy modifications. The system has an Internet connection of 1 GBit.
  4. Data sets are stored in its original configuration in the database cache in XML format. The cache is updated by background services.
  5. Once a day data are mirrored from the relational system to the data warehouse.

  • Small to medium size data sets (ab. some million items) are stored in two tables of the relational system, one for numeric values, one for string values; both are organized through an index tree.
  • Larger data sets or binary objects (e.g. photos, seismics, models) are stored as files, linked to its metadescription and georeference in PANGAEA. Two hierachical storage systems (HS) consist of a combination of hardiscs and tape drive silos located in different buildings. Both HS are mirroring each other and have an increasing capacity of several TB. An incremental backup is running every day, a full backup once per week. The system is technically operated by the AWI Computer Center.

A more detailed documentation about the technical internas can be found in the technical documentation namespace of PANGAEA Wiki: Techdoc:Main Page