sxmerge

PURPOSE ^

SXMERGE Merge data from combined SeaExplorer glider and payload data sets into a single data set.

SYNOPSIS ^

function [meta, data] = sxmerge(meta_gli, data_gli, meta_pld, data_pld, varargin)

DESCRIPTION ^

SXMERGE  Merge data from combined SeaExplorer glider and payload data sets into a single data set.

  Syntax:
    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD)
    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPTIONS)
    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPT1, VAL1, ...)

  Description:
    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD) merges the
    glider and payload data sets described by metadata structs META_GLI and
    META_PLD, and data arrays DATA_GLI and DATA_PLD into a single data set
    described by metadata struct META and data array or struct DATA
    (see format option described below). Input metadata and data should be
    in the format returned by the function SXCAT. Data rows from both
    data sets are merged based on the order of the respective timestamps.
    See note on merging process.

    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPTIONS) and
    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPT1, VAL1, ...)
    accept the following options given in key-value pairs OPT1, VAL1...
    or in a struct OPTIONS with field names as option keys and field values
    as option values:
      FORMAT: data output format.
        String setting the format of the output DATA. Valid values are:
          'array': DATA is a matrix with variable readings in the column order
            specified by the VARIABLES metadata field.
          'struct': DATA is a struct with variable names as field names
            and column vectors of variable readings as field values.
        Default value: 'array'
      TIMEGLI: glider timestamp.
        String setting the name of the time variable for merging and sorting
        data row readings from SeaExplorer .gli data set.
        Default value: 'Timestamp'
      TIMEPLD: payload timestamp.
        String setting the name of the time variable for merging and sorting
        data row readings from SeaExplorer payload data set.
        Default value: 'PLD_REALTIMECLOCK'
      VARIABLES: variable filtering list.
        String cell array with the names of the variables of interest.
        If given, only variables present in both the input data sets and this
        list will be present in output. The string 'all' may also be given,
        in which case variable filtering is not performed and all variables
        in the input data sets will be present in output.
        Default value: 'all' (do not perform variable filtering).
      PERIOD: time filtering boundaries.
        Two element numeric array with the start and the end of the period
        of interest (seconds since 1970-01-01 00:0:00.00 UTC). If given,
        only row readings with timestamps within this period will be present
        in output. The string 'all' may also be given, in which case time 
        filtering is not performed and all row readings in the input
        data sets will be present in output.
        Default value: 'all' (do not perform time filtering).

  Notes:
    This function should be used to merge data from SeaExplorer glider and
    payload data sets, not from data sets coming from the same type of files
    (use SXCAT instead).

    The merging process sorts row variable readings from glider and  payload
    data sets comparing the respective timestamp values. Row variable readings
    coming from glider and payload data arrays with equal timestamp values are
    merged into a single row, otherwise the missing variable values are filled
    with invalid values (NaN). Variables in glider and payload data sets are
    all different, but if there were duplicated variables, the values from each
    data set would be merged in a common column according to the timestamp,
    and an error would be raised if there were inconsistent valid data entries
    (not NaN) for the same timestamp value.

    All values in timestamp columns should be valid (not NaN).
    In output, the .gli timestamp column contains the merged glider and payload
    timestamps to provide a consistent comprehensive timestamp variable
    for the merged data set. The payload timestamp contains only the timestamps
    of the payload data set.

  Examples:
    [meta, data] = sxmerge(meta_gli, data_gli, meta_pld, data_pld)

  See also:
    SX2MAT
    SXCAT

  Authors:
    Frederic Cyr  <Frederic.Cyr@mio.osupytheas.fr>
    Joan Pau Beltran  <joanpau.beltran@socib.cat>

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

DOWNLOAD ^

sxmerge.m

SOURCE CODE ^

0001 function [meta, data] = sxmerge(meta_gli, data_gli, meta_pld, data_pld, varargin)
0002 %SXMERGE  Merge data from combined SeaExplorer glider and payload data sets into a single data set.
0003 %
0004 %  Syntax:
0005 %    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD)
0006 %    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPTIONS)
0007 %    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPT1, VAL1, ...)
0008 %
0009 %  Description:
0010 %    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD) merges the
0011 %    glider and payload data sets described by metadata structs META_GLI and
0012 %    META_PLD, and data arrays DATA_GLI and DATA_PLD into a single data set
0013 %    described by metadata struct META and data array or struct DATA
0014 %    (see format option described below). Input metadata and data should be
0015 %    in the format returned by the function SXCAT. Data rows from both
0016 %    data sets are merged based on the order of the respective timestamps.
0017 %    See note on merging process.
0018 %
0019 %    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPTIONS) and
0020 %    [META, DATA] = SXMERGE(META_GLI, DATA_GLI, META_PLD, DATA_PLD, OPT1, VAL1, ...)
0021 %    accept the following options given in key-value pairs OPT1, VAL1...
0022 %    or in a struct OPTIONS with field names as option keys and field values
0023 %    as option values:
0024 %      FORMAT: data output format.
0025 %        String setting the format of the output DATA. Valid values are:
0026 %          'array': DATA is a matrix with variable readings in the column order
0027 %            specified by the VARIABLES metadata field.
0028 %          'struct': DATA is a struct with variable names as field names
0029 %            and column vectors of variable readings as field values.
0030 %        Default value: 'array'
0031 %      TIMEGLI: glider timestamp.
0032 %        String setting the name of the time variable for merging and sorting
0033 %        data row readings from SeaExplorer .gli data set.
0034 %        Default value: 'Timestamp'
0035 %      TIMEPLD: payload timestamp.
0036 %        String setting the name of the time variable for merging and sorting
0037 %        data row readings from SeaExplorer payload data set.
0038 %        Default value: 'PLD_REALTIMECLOCK'
0039 %      VARIABLES: variable filtering list.
0040 %        String cell array with the names of the variables of interest.
0041 %        If given, only variables present in both the input data sets and this
0042 %        list will be present in output. The string 'all' may also be given,
0043 %        in which case variable filtering is not performed and all variables
0044 %        in the input data sets will be present in output.
0045 %        Default value: 'all' (do not perform variable filtering).
0046 %      PERIOD: time filtering boundaries.
0047 %        Two element numeric array with the start and the end of the period
0048 %        of interest (seconds since 1970-01-01 00:0:00.00 UTC). If given,
0049 %        only row readings with timestamps within this period will be present
0050 %        in output. The string 'all' may also be given, in which case time
0051 %        filtering is not performed and all row readings in the input
0052 %        data sets will be present in output.
0053 %        Default value: 'all' (do not perform time filtering).
0054 %
0055 %  Notes:
0056 %    This function should be used to merge data from SeaExplorer glider and
0057 %    payload data sets, not from data sets coming from the same type of files
0058 %    (use SXCAT instead).
0059 %
0060 %    The merging process sorts row variable readings from glider and  payload
0061 %    data sets comparing the respective timestamp values. Row variable readings
0062 %    coming from glider and payload data arrays with equal timestamp values are
0063 %    merged into a single row, otherwise the missing variable values are filled
0064 %    with invalid values (NaN). Variables in glider and payload data sets are
0065 %    all different, but if there were duplicated variables, the values from each
0066 %    data set would be merged in a common column according to the timestamp,
0067 %    and an error would be raised if there were inconsistent valid data entries
0068 %    (not NaN) for the same timestamp value.
0069 %
0070 %    All values in timestamp columns should be valid (not NaN).
0071 %    In output, the .gli timestamp column contains the merged glider and payload
0072 %    timestamps to provide a consistent comprehensive timestamp variable
0073 %    for the merged data set. The payload timestamp contains only the timestamps
0074 %    of the payload data set.
0075 %
0076 %  Examples:
0077 %    [meta, data] = sxmerge(meta_gli, data_gli, meta_pld, data_pld)
0078 %
0079 %  See also:
0080 %    SX2MAT
0081 %    SXCAT
0082 %
0083 %  Authors:
0084 %    Frederic Cyr  <Frederic.Cyr@mio.osupytheas.fr>
0085 %    Joan Pau Beltran  <joanpau.beltran@socib.cat>
0086 
0087 %  Copyright (C) 2016
0088 %  ICTS SOCIB - Servei d'observacio i prediccio costaner de les Illes Balears
0089 %  <http://www.socib.es>
0090 %
0091 %  This program is free software: you can redistribute it and/or modify
0092 %  it under the terms of the GNU General Public License as published by
0093 %  the Free Software Foundation, either version 3 of the License, or
0094 %  (at your option) any later version.
0095 %
0096 %  This program is distributed in the hope that it will be useful,
0097 %  but WITHOUT ANY WARRANTY; without even the implied warranty of
0098 %  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0099 %  GNU General Public License for more details.
0100 %
0101 %  You should have received a copy of the GNU General Public License
0102 %  along with this program.  If not, see <http://www.gnu.org/licenses/>.
0103 
0104   error(nargchk(4, 14, nargin, 'struct'));
0105   
0106   
0107   %% Set options and default values.
0108   options.format = 'array';
0109   options.timegli = 'Timestamp';
0110   options.timepld = 'PLD_REALTIMECLOCK';
0111   options.variables = 'all';
0112   options.period = 'all';
0113   
0114   
0115   %% Parse optional arguments.
0116   % Get option key-value pairs in any accepted call signature.
0117   argopts = varargin;
0118   if isscalar(argopts) && isstruct(argopts{1})
0119     % Options passed as a single option struct argument:
0120     % field names are option keys and field values are option values.
0121     opt_key_list = fieldnames(argopts{1});
0122     opt_val_list = struct2cell(argopts{1});
0123   elseif mod(numel(argopts), 2) == 0
0124     % Options passed as key-value argument pairs.
0125     opt_key_list = argopts(1:2:end);
0126     opt_val_list = argopts(2:2:end);
0127   else
0128     error('glider_toolbox:sxmerge:InvalidOptions', ...
0129           'Invalid optional arguments (neither key-value pairs nor struct).');
0130   end
0131   % Overwrite default options with values given in extra arguments.
0132   for opt_idx = 1:numel(opt_key_list)
0133     opt = lower(opt_key_list{opt_idx});
0134     val = opt_val_list{opt_idx};
0135     if isfield(options, opt)
0136       options.(opt) = val;
0137     else
0138       error('glider_toolbox:sxmerge:InvalidOption', ...
0139             'Invalid option: %s.', opt);
0140     end
0141   end
0142   
0143   
0144   %% Set option flags and values.
0145   output_format = lower(options.format);
0146   time_variable_gli = options.timegli;
0147   time_variable_pld = options.timepld;
0148   variable_filtering = true;
0149   variable_list = cellstr(options.variables);
0150   time_filtering = true;
0151   time_range = options.period;
0152   if ischar(options.variables) && strcmp(options.variables, 'all')
0153     variable_filtering = false;
0154   end
0155   if ischar(options.period) && strcmp(options.period, 'all')
0156     time_filtering = false;
0157   end
0158   
0159   
0160   %% Merge data and metadata checking for empty input cases.
0161   if isempty(meta_gli.sources) && isempty(meta_pld.sources)
0162     % No input data.
0163     % Both META_GLI and DATA_GLI, and META_PLD and DATA_PLD
0164     % are equal to the trivial output of SXCAT.
0165     % Disable filtering.
0166     meta = meta_gli; 
0167     data = data_gli;
0168     variable_filtering = false;
0169     time_filtering = false;
0170   elseif isempty(meta_pld.sources)
0171     % Only glider data.
0172     meta = meta_gli;
0173     data = data_gli;
0174     time_variable_merged = time_variable_gli; % Time variable for filtering.
0175   elseif isempty(meta_gli.sources)
0176     % Only payload data.
0177     meta = meta_pld;
0178     data = data_pld;
0179     time_variable_merged = time_variable_pld; % Time variable for filtering.
0180   else
0181     % Build list of sources and variables for merged data and metadata.
0182     sources_gli = meta_gli.sources;
0183     sources_pld = meta_pld.sources;
0184     sources_merged = vertcat(sources_gli, sources_pld);
0185     variables_gli = meta_gli.variables;
0186     variables_pld = meta_pld.variables;
0187     [variables_merged, ~, variables_merged_indices_to] = ...
0188       unique(vertcat(variables_gli, variables_pld));
0189     
0190     % Check that both data sets have their own timestamp variable.
0191     [time_variable_gli_present, time_variable_gli_col] = ...
0192       ismember(time_variable_gli, variables_gli);
0193     if ~time_variable_gli_present
0194       error('glider_toolbox:sxmerge:MissingTimestamp', ...
0195             'Missing timestamp variable in glider data set: %s.', ...
0196             time_variable_gli);
0197     end
0198     [time_variable_pld_present, time_variable_pld_col] = ...
0199       ismember(time_variable_pld, variables_pld);
0200     if ~time_variable_pld_present
0201       error('glider_toolbox:sxmerge:MissingTimestamp', ...
0202             'Missing timestamp variable in payload data set: %s.', ...
0203             time_variable_pld);
0204     end
0205     
0206     % Build list of unique timestamps and the output index of each data row.
0207     stamp_gli = data_gli(:, time_variable_gli_col);
0208     stamp_pld = data_pld(:, time_variable_pld_col);
0209     [stamp_merged, ~, stamp_merged_indices_to] = ...
0210       unique(vertcat(stamp_gli, stamp_pld));
0211     
0212     % Build indices of glider and payload entries in merged data output.
0213     row_num_gli = numel(stamp_gli);
0214     row_range_gli = 1:row_num_gli;
0215     row_indices_gli = stamp_merged_indices_to(row_range_gli);
0216     row_num_pld = numel(stamp_pld);
0217     row_range_pld = row_num_gli + (1:row_num_pld);
0218     row_indices_pld = stamp_merged_indices_to(row_range_pld);
0219     row_num_merged = numel(stamp_merged);
0220     col_num_gli = numel(variables_gli);
0221     col_range_gli = 1:col_num_gli;
0222     col_indices_gli = variables_merged_indices_to(col_range_gli);
0223     col_num_pld = numel(variables_pld);
0224     col_range_pld = col_num_gli + (1:col_num_pld);
0225     col_indices_pld = variables_merged_indices_to(col_range_pld);
0226     col_num_merged = numel(variables_merged);
0227     
0228     % Check for consistency of overlapped glider and payload data.
0229     [row_overlap_merged, row_overlap_gli, row_overlap_pld] = ...
0230       intersect(row_indices_gli, row_indices_pld);
0231     [col_overlap_merged, col_overlap_gli, col_overlap_pld] = ...
0232       intersect(col_indices_gli, col_indices_pld);
0233     data_overlap_gli = data_gli(row_overlap_gli, col_overlap_gli);
0234     data_overlap_pld = data_pld(row_overlap_pld, col_overlap_pld);
0235     data_overlap_gli_valid = ~isnan(data_overlap_gli);
0236     data_overlap_pld_valid = ~isnan(data_overlap_pld);
0237     data_overlap_inconsistent = (data_overlap_gli ~= data_overlap_pld) ...
0238                               & data_overlap_gli_valid ...
0239                               & data_overlap_pld_valid;
0240     if any(data_overlap_inconsistent(:))
0241       [row_inconsistent, col_inconsistent] = find(data_overlap_inconsistent);
0242       err_msg_arg_list = cell(4, numel(row_inconsistent));
0243       err_msg_arg_list(1, :) = ...
0244         variables_merged(col_overlap_merged(col_inconsistent));
0245       err_msg_arg_list(2, :) = cellstr( ...
0246         datestr(posixtime2utc(stamp_merged(row_overlap_merged(row_inconsistent))), ...
0247                 'dd/mm/yyyy HH:MM:SS.FFF'));
0248       err_msg_arg_list(3, :) = ...
0249         num2cell(data_overlap_gli(data_overlap_inconsistent));
0250       err_msg_arg_list(4, :) = ...
0251         num2cell(data_overlap_pld(data_overlap_inconsistent));
0252       err_msg_fmt = '\nInconsistent glider and payload value of %s at %s: %12f %12f';
0253       error('glider_toolbox:sxmerge:InconsistentData', ...
0254             'Inconsistent data:%s', sprintf(err_msg_fmt, err_msg_arg_list{:}));
0255     end
0256     
0257     % Set output merged data.
0258     data = nan(row_num_merged, col_num_merged);
0259     data(row_indices_gli, col_indices_gli) = data_gli;
0260     data(row_indices_pld, col_indices_pld) = data_pld;
0261     data_overlap_merged = data_overlap_gli;
0262     data_overlap_merged(data_overlap_pld_valid) = ...
0263       data_overlap_pld(data_overlap_pld_valid);
0264     data(row_overlap_merged, col_overlap_merged) = data_overlap_merged;
0265     
0266     % Copy payload timestamp entries to glider timestamp entries.
0267     data(row_indices_pld, col_indices_gli(time_variable_gli_col)) = stamp_pld;
0268     time_variable_merged = time_variable_gli;
0269     
0270     % Set metadata fields.
0271     meta.sources = sources_merged;
0272     meta.variables = variables_merged;
0273   end
0274   
0275   
0276   %% Perform time filtering if needed.
0277   if time_filtering
0278     [time_variable_merged_present, time_variable_merged_col] = ...
0279       ismember(time_variable_merged, meta.variables);
0280     if ~time_variable_merged_present
0281       error('glider_toolbox:sxmerge:MissingTimestamp', ...
0282             'Missing timestamp variable in merged data set: %s.', ...
0283             time_variable_merged);
0284     end
0285     stamp_merged = data(:, time_variable_merged_col);
0286     stamp_select = ...
0287       ~(stamp_merged < time_range(1) | stamp_merged > time_range(2));
0288     data = data(stamp_select, :);
0289   end
0290   
0291   
0292   %% Perform variable filtering if needed.
0293   if variable_filtering
0294     [variable_select, ~] = ismember(meta.variables, variable_list);
0295     meta.variables = meta.variables(variable_select);
0296     data = data(:, variable_select);
0297   end
0298   
0299   
0300   %% Convert output data to struct format if needed.
0301   switch output_format
0302     case 'array'
0303     case 'struct'
0304       data = cell2struct(num2cell(data, 1), meta.variables, 2);
0305     otherwise
0306       error('glider_toolbox:sxmerge:InvalidFormat', ...
0307             'Invalid output format: %s.', output_format)
0308   end
0309 
0310 end

Generated on Fri 06-Oct-2017 10:47:42 by m2html © 2005