Talk:Project data management/HERMES

Dear all, As you know, HERMES has an annual review which is carried out by the EC with the aid of external reviewers. I attended a session in Brussels in June where I had to answer numerous questions from the independent reviewers. The report of the reviewers was passed to the EC and I now have the formal response from the EC on which we need to act. Overall the project has been well received - it is an extremely complex project with a very large number of partners and we have achieved a great deal in the first year. However, no project of this scale could expect to be perfect and we have been asked to address 4 areas. The EC asks us to

...

4. WP9 has the huge task of putting together (a) HERMES data management system. It relies largely on the existing PANGAEA data bank and data access system and has developed an interactive HERMES data portal. The layout and interactivity of PANGAEA are, however, disappointing because HERMES, a large European scale initiative, appears as a one line project together with single cruises, most of which (are) national, with no hierarchy between single-cruise national projects and multi-cruise international initiatives. The PANGAEA portal should be redesigned to insure visibility for large international programs such as HERMES, to help users progression with a hierarchy of entries: project to cruise to type of data (sampling location, chemical data, bathymetry, seismic or gravity etc., to site, to individual data set), and to improve the visual appearance and informativeness of these entries.

...

Phil Weaver, coordinator, 2006-07-14

Dear Phil Weaver,

thank you for distributing the comments of the reviewers. Due to the fact, that it contains a strong criticism to the data management, we need to discuss this in our group how to respond to it. Perhaps you could help us to better understand the comments of the reviewers. When ever there is some written original comments from the EU, we would appreciate to have those. The field of data management is predestinated for misunderstandings; I would like to avoid those.

(1) in particular we do not understand: ... HERMES appears as a one line project together with single cruises, most of which (are) national ...

(2) if we understand the following correct, it would ask for a highly complex funktionality on web pages. do we realy want this ? ... hierarchy of entries: project to cruise to type of data ...

(3) I would be very much interested, wether during the discussion with reviewers and EU the problem of data availability was addressed (no data - no functionality). Are reviewers and EU aware of the general problem of a nearly 'non existing data flow' from investigators to archives?

With best regards Hannes Grobe (2006-07-17)

Hello Hannes

Phil apologises for not having replied to your email concerning the criticism of the Panagea interface in the HERMES reivew - he has been very busy with meetings for most of the last week nad has not had time to catch up. The full text relating to the EC's comments on WP9 was given in Phil's email to you.

I was present at the reivew meeting and can fill you on on what was discussed regarding data management. The reviewers were concerned that when they went to PangaVista and typed in 'HERMES', they got a long list of all the 'hits' for HERMES, with no hierarchical division of results. Now, I understand that this is because Pangaea is a relational database and simply typing in 'HERMES' will bring up all the results for that particular search term, but I suspect that the reviewers did not know or understand this which is why they were so critical. However, we have to bear in mind that this is what any scientist will get if they go to Panagaea and want to see what data has been collected within the HERMES project.

I think that we do need to improve the way that the search results are presented, OR explain clearly on the search page that simply typing in a project name will bring up all data entries containing that keyword. It should be explained that for more specific serches within HERMES, the user should enter more specific search terms, eg 'HERMES Gulf of Lions bathymetry' or whatever.

I guess it was not clear to the reviewers that the visualisation tool for HERMES metadata is actually the GIS system, and that Pangaea is more for data archiving. The problem is that the reviewers had only a short time to read the Annual Report and the DOW and perhaps did not fully grasp the relationship between these two facilities. In fact, I suspect that a proportion of HERMES partners probably do not understand it either!

Therefore, I have a couple of suggestions since clearly we must take some action to make the project metadata/data more accessible for those who are not involved in the project:

1) Is it possible to amend the PangaVista search page to prompt people to type in more specific search terms, or at least explain that a one-word search will bring up all hits containing that word? The full list of results is truly baffling to someone who doesn't understand how the database works, and in fact I have to agree with the reviewers that the current results format (without explaination) is very user-unfriendly.

2) If someone types in 'HERMES' into the search box, is it possible to link this to an intermediate page which explains that metadata (ie, sampling points etc) can be viewed via the HERMES GIS facility, or that they can continue serching the HERMES datasets using specific search terms?

We definitely need to find a way to make the HERMES material more easily accessible, and the EC will be looking for evidence that we are addressing this in the new DoW which I am about to finalise. Therefore, we need to add a line to the WP table to describe our steps towards this. It could be something quite vague like "improve visibility of project metadata and data on the Pangaea web interface".

I do need to get the DoW finished and submitted next week, so I would appreciate a quick reply. With best regards Vikki Gun (2006-07-21)

discussion about review of pangaea within hermes
Von: 	Hannes Grobe  Betreff: 	Re: HERMES review - Pangaea Datum: 	2006-07-24 15:24:12 CEST An: 	Vikki Gunn  Kopie: 	ppew@mercury.noc.soton.ac.uk, vhh@mercury.noc.soton.ac.uk, Michael Diepenbroek 

Hello Vikki,

''I was present at the reivew meeting and can fill you on on what was discussed regarding data management. The reviewers were concerned that when they went to PangaVista and typed in 'HERMES', they got a long list of all the 'hits' for HERMES, with no hierarchical division of results. Now, I understand that this is because Pangaea is a relational database and simply typing in 'HERMES' will bring up all the results for that particular search term, but I suspect that the reviewers did not know or understand this which is why they were so critical. However, we have to bear in mind that this is what any scientist will get if they go to Panagaea and want to see what data has been collected within the HERMES project.''

the general view: PANGAEA has been used by a hundred projects and is used by a dozen running projects for data archiving and publication. There is also a relational database in the background but from the technical point of view the system is a bit more complex. Concering service for science it is a library for now nearly half a million 'data books' - long-term and reliable accessible in the sense of a library and citable in the scense of a publication. Hermes may use pangaea to archive its data, but pangaea is not a specific database for hermes. It was my fault when suggesting to use PangaVista for a 'Hermes' query. It was just ment as a query to check !how many! data sets are archived so far by Hermes - not more. PangaVista is the query tool on the 'library catalog' of the system and not a specific search for a specific project. Using Google with the term 'hermes' you will find 37 million hits. it is the users choice of keyword selection to find what she is looking for - similar to how PangaVista works. -- ''I think that we do need to improve the way that the search results are presented, OR explain clearly on the search page that simply typing in a project name will bring up all data entries containing that keyword. It should be explained that for more specific serches within HERMES, the user should enter more specific search terms, eg 'HERMES Gulf of Lions bathymetry' or whatever.''

the search engine: a help for pangavista is provided and it works similar to google. on this level we hardly can do more. -- ''I guess it was not clear to the reviewers that the visualisation tool for HERMES metadata is actually the GIS system, and that Pangaea is more for data archiving. The problem is that the reviewers had only a short time to read the Annual Report and the DOW and perhaps did not fully grasp the relationship between these two facilities. In fact, I suspect that a proportion of HERMES partners probably do not understand it either!''

the general problem: you are pointing out three typical problems of data management. scientists and reviewers (1) do not have the time and (2) due to the fact that internet and computer technology are developing with an extreme speed, most of them do not have a clue on what is going on and (3) (may be as a result of 1 and 2) the motivation to realy support this important step in the scientific workflow is close to zero. Hard for the data management to deal with and to compensate those deficiencies. -- ''Therefore, I have a couple of suggestions since clearly we must take some action to make the project metadata/data more accessible for those who are not involved in the project:

1) Is it possible to amend the PangaVista search page to prompt people to type in more specific search terms, or at least explain that a one-word search will bring up all hits containing that word? The full list of results is truly baffling to someone who doesn't understand how the database works, and in fact I have to agree with the reviewers that the current results format (without explaination) is very user-unfriendly.

2) If someone types in 'HERMES' into the search box, is it possible to link this to an intermediate page which explains that metadata (ie, sampling points etc) can be viewed via the HERMES GIS facility, or that they can continue serching the HERMES datasets using specific search terms?''

the specific view: Pangaea has all the most recent technology implemented (web service, ISO standard, Dublin Core, XML, OAI-PMH, DOI ...). This means, that nearly any individualy defined part of its content can be distributed on web pages, portals, GIS or library catalogs; the query to 'all data sets of hermes' is just one example. Hermes may produce some ten thousands of data sets, consisting of several hundred to thousand of parameters. as already discussed: how to structurize those? One view into the content is organized by GIS specific to the various study areas. Other suggestion might be for cruises, WPs, parameter groups (give me all sites with coral data), or publications (any reference on the hermes publications page should provide a link to the related data in pangaea instead of providing 'supplementary material'). The 'view' on the hermes data is in the decision of the investigators and as far as I understood it is the responsibility of GIS workpackage and the specific GIS groups to provide this project specific view with various scientific usefull definitions into the hermes data archive - not the work of the archive itself. Certainly the archive can provide the necessary technology to do so. -- ''We definitely need to find a way to make the HERMES material more easily accessible, and the EC will be looking for evidence that we are addressing this in the new DoW which I am about to finalise. Therefore, we need to add a line to the WP table to describe our steps towards this. It could be something quite vague like "improve visibility of project metadata and data on the Pangaea web interfac".''

the specific problem: Pangaea is willing and able to archive any amount of data in any complexity. Without further technical additions pangaea can provide individual parts of imported data to the Hermes web pages or GIS (as proven for the black sea gis) on request. But i am not sure wether we should set up any functionality without having anything behind it. From my opinion, collecting the content is the first step, installing views and functionality is the second step. The 'visibility of project data' can only be improved if there are any; the visibility can not be improved on the pangaea web interface but on the hermes pages.

At this time I am concerned about the words 'disappointing' and 'user-unfriendly'; thats why i asked for a discussion with the reviewers and EU about the availability of the projects data (which is the major problem in data management at all). during the kick off in Rhodes, Phil Weaver said to me, that 'I should remind the community all the time'. I do not realy see my role as a reminder. Anyhow, i can remind each author and chief scientist to submit the cruise or paper related data - do we want this? Is it my responsibility to spend my time with this? people are stuffed with emails and will not answer. i have 'interesting' experience with collecting at least some of the omarc cluster data ...

At first we need to gain access to the hermes data - than we may assure differentiated access - than the reviewers may critizise. Best regards Hannes Grobe