Technology

Technology of Pangaea is based on a three tiered client/server architecture with a data set cache, a data warehouse and a storage system.


 * 1) A relational database is backend and central archiving system using Sybase as the RDB management software on a multiprocessor computer.
 * 2) As middleware, a small server component called "Distributor" or "Background Services" is maintained. This software takes care to marshal the relational database to flat XML files and flat data files that are served by the website. The search interface "PANGAEA Search" is currently operated by the open source software panFMP v1.1 which is based on the full text search engine Apache Lucene. Recently, the whole search engine is migrated to be backed by Elasticsearch, using Apache Lucene, too. Indexing of the underlying metadata is done through panFMP (version 2.0-dev, not yet released) in near-realtime (NRT), so metadata is searchable once the dataset is processed by background services. The current system has a delay of 20 minutes. All components are encapsulated and use standard interfaces for communication.
 * 3) On the frontend side different clients ensure access to the system. The graphical user interface (GUI) for data upload and metadata definitions uses 4th Dimension software (ACI) running on a windows server (4D-client for Mac OS X and Windows). A web server runs the various domains and web services for data retrieval, download and harvesting. Middleware and frontend components follow a generic model to ensure a flexible functionality and easy modifications. The system has an Internet connection of 1 GBit.
 * 4) Data sets are stored in its original configuration in the database cache in XML format. The cache is updated by background services.
 * 5) Once a day data are mirrored from the relational system to the data warehouse.


 * Small to medium size data sets (ab. some million items) are stored in two tables of the relational system, one for numeric values, one for string values; both are organized through an index tree.
 * Larger data sets or binary objects (e.g. fotos, seismics, models) are stored as files, linked to its metadescription and georeference in Pangaea. Two hierachical storage systems (HS) consist of a combination of hardiscs and tape drive silos located in different buildings. Both HS are mirroring each other and have an increasing capacity of several TB. An incremental backup is running every day, a full backup once per week. The system is technicaly operated by the AWI Computer Center.

A more detailed documentation about the technical internas can be found in the technical documentation namespace of PangaWiki: Techdoc:Main Page