Ideas for Submission Tool

From PANGAEA Wiki
Jump to navigation Jump to search
  • automatic Metadata maturity assessment based on (or derived from) developments by DataOne (https://github.com/NCEAS/metadig-checks)
  • derived maturity metrics score with potential for an incentive system for users to improve provided metadata self-reliantly
  • automatic flagging and "hint-waving" of not provided or incomplete essential metadata
  • similar systems for data content (mostly tabular data) based on the Frictionless Data Framework (https://framework.frictionlessdata.io)
  • KI-assisted flagging of implausible numerical values (e.g. outliers) to be discussed with data authors → data quality assessment feature
  • interactive tool to for tabular data to iron out minor issues (variable descriptions or semantic annotations, missing units, removing data matric clutter etc.) - without having the need to re-upload everything
  • discipline specific adaptations of templates and workflows based on user input or publication history
  • guided proposal systems to check for existing event info (campaigns, stations etc.), methods, Instrumentation (RDA pidinst) and automatic annotation with controlled vocabulary terms
  • data submission data & metadata complexity checks (combined maybe from (meta)data maturity scores, size and amount of data tables) as an internal measure for expected (estimated) time investment for the editorial processing ("is this submission worth starting today, or tomorrow when I have more time?")
  • automatic extraction of metadata from data formats that are considered "binary" at PANGAEA - if they are following, for example, CF conventions for NetCDF
  • checks, if a 2D data matrix could be extracted from uploaded netCDF files
  • transformation routine for 2D matrices in NetCDF files to ingestible data formats
  • Extraction of specific, e.g. event related, information like temporal and spatial coverage, coordinate ref. system, quality flags etc. from structured file formats like NetCDF
  • block-chain implementation (upload assessment and change tracking) to document data provenance
  • REST-API for scripted (mass) uploads
  • Interface for (user) data management systems like LinkAhead → Import data and metadata directly, including provided provenance information ( ingest and continue provenance records → see block chain idea)
  • Importer for a multitude of metadata formats, if records already exist (EML, ISO19115 etc.) → mappings available in GfBio/NFDI context
  • Suggestion of suitable/matching parameters via drop-down field and matching score (may be even KI-assisted?)
  • Suggestion of suitable/matching events via drop-down field and matching score (may be even KI-assisted?)
  • GPT-3 based chat robot for questions during the submission process.
  • A tool for finding Award labels (when some kind of external webservice available, perhaps when award DOIs become a "thing")
  • Users can store their submissions as Templates and use for future submissions - avoiding repetitive typing of metadata
  • A separate submission form for Expedition archiving: some ideas already here


An additional list can be found here: https://docs.google.com/document/d/1sp-SrmHcvrGck2W7Fm6yW-DOcIYdVQCYjpuqVWjwUuU/edit