SOCIB / Centro de datos

The Data Centre is the core of SOCIB. Through it, SOCIB is developing and implementing a general data management system to guarantee international standards, quality assurance and inter-operability. The combination of different sources and types of information (time series, profiles, trajectories, grids/meshes, images, acoustic data, etc.) requires appropriate methods to ingest, catalogue, display and distribute this information. The general goal of the SOCIB Data Centre is to provide users with a system to locate and download the data of interest (near real time and delayed mode) and to visualize and manage the information. Following SOCIB principles, data need to be: 1) discoverable and accessible; 2) freely available; 3) interoperable and standardized (Tintoré et al. 2012 and 2013). These principles are in line with the challenges and opportunities of Open Data (European Commission 2010; Reichman et al. 2011; Urban et al. 2012).

To accomplish the full data lifecycle (from modeling and observing systems up to the informed user), the data centre has defined seven steps for the Data Management Process: (1) Platform management and communication; (2) Quality Control assurance; (3) Metadata Aggregation and Standardization; (4) Data Archive; (5) Data Search and Discovery; (6) Data Policy and distribution; (7) Data Viewing.

The Data Centre of SOCIB is therefore responsible for directing the different stages of data management, from data acquisition to distribution and visualization through web applications. The implemented system in the Data Centre relies on open source solutions, following other architectures adopted within the context of marine spatial data infrastructures (Cinnirella et al. 2012).

The majority of data managed by SOCIB comes from its own observation platforms (e.g., HF radar, gliders, drifters, buoys), numerical models or information generated from the SIAS Division. In addition, the Data Centre also performs the management of data coming from external data providers through various collaborations, for example with Harbour Authorities (e.g., Puertos del Estado) or with research groups (e.g., CSIC).

Data processing involves managing different processes such as standardization, data conversion, and data validation. Processes include data ingestion, quality controls, generation of new products and data archival. The generation of metadata follows interoperable and international standards in order to facilitate data discovery, while adopting the European Directive INSPIRE (European Commission 2007). The Data Centre uses different applications for data processing; these include Java, Matlab, R, Python and Geographic Information System (GIS). Data from observation platforms and numerical models are stored in netCDF repositories, while vector data are stored in spatial databases implemented with PostGIS. The management of metadata is done through two main routes. Firstly, all metadata coming from the SOCIB observation platforms are managed by an internal application. Secondly, the rest of the metadata is edited and stored using the application Geonetwork.

The distribution and access to data is done through web services (i.e., OGC, REST). THREDDS and Geoserver are used to generate OGC services from the netCDF repository and PostGIS databases, respectively. In addition, the Data Centre has implemented a REST web service, called data discovery. These services allow data generated by SOCIB to be integrated into applications developed by the Data Centre itself or by third parties, thus providing system interoperability. The OGC catalogue service (CSW) is currently implemented with the Geonetwork catalogue. However, further developments are currently underway to harvest the THREDDS catalogue by Geonetwork to integrate all the metadata in a single catalogue service (CSW).

Finally, the SOCIB Data Center also develops specific tools for the different facilities when required. As a result, several web applications have been implemented, responding to interests from a wide range of users. As an example, some applications are designed to manage the instrumentation platforms by the researchers, while other applications are directed to stakeholders and general public by providing a general view of the data produced at SOCIB. These applications have been developed using different technologies (e.g., OpenLayers, KML, iOS). All of them use the web services described above, and some of them can incorporate OGC services provided by external organizations.

Some specific examples of developments are:

  • SACOSTA: web-based map viewer for cartographic data such as environmental sensitivity of the coastline (http://gis.socib.es/sacosta).
  • LW4NC2: web application for multidimensional data from netCDF files usually from numerical models (http://thredds.socib.es/lw4nc2).
  • BEACH DATA VIEWER: web-based map viewer to display historical and beach survey data (http://gis.socib.es/viewer, currently under re-development)
  • DAPP: web application to display information related to trajectories from mobile platforms (eg. gliders, drifter buoys, ARGO profilers; http://apps.socib.es/dapp)
  • SOCIB app for real-time data from fixed stations (oceanographic buoys, sea level stations and coastal weather stations, etc.), glider trajectories and numerical models (hydrodynamics and waves). The app is available for iPhone and iPad and for Android.

In addition, the Data Centre has developed a Glider Toolbox in Matlab/Octave, to streamline the processing of native files obtained from a glider fleet. The toolbox is openly available at https://github.com/socib/glider_toolbox.