sxcat

PURPOSE ^

SXCAT Combine data from several SeaExplorer data sets of the same type into a single data set.

SYNOPSIS ^

function [meta, data] = sxcat(meta_list, data_list, timestamp, varargin)

DESCRIPTION ^

SXCAT  Combine data from several SeaExplorer data sets of the same type into a single data set.

  Syntax:
    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP)
    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS)
    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...)

  Description:
    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP) combines data from 
    arrays in cell array DATA_LIST and metadata from structs in cell array 
    META_LIST into a single data set with data in array DATA and metadata in 
    struct array META. Elements in META_LIST and DATA_LIST should have the
    format returned by function SX2MAT, but they do not need to have the same
    set of variables. Outputs META and DATA have the same format, too.
    META is a struct array combining the information in elements of META_LIST.
    It has following fields:
      VARIABLES: string cell array with the names of the variables present
        in the returned data array (in the same column order), built merging
        the VARIABLES field of all elements in META_LIST.
      SOURCES: string cell array built concatenating the SOURCES field
        of all elements in META_LIST.
    DATA is a numeric array combining the rows of arrays in DATA_LIST,
    reordering the variable columns if needed, and sorting the resulting rows
    according to a timestamp from variable named by string TIMESTAMP.

    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS) and
    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...)
    accept the following options given in key-value pairs OPT1, VAL1...
    or in a struct OPTIONS with field names as option keys and field values
    as option values:
      FORMAT: data output format.
        String setting the format of the output DATA. Valid values are:
          'array': DATA is a matrix with variable readings in the column order
            specified by the VARIABLES metadata field.
          'struct': DATA is a struct with variable names as field names
            and column vectors of variable readings as field values.
        Default value: 'array'
      VARIABLES: variable filtering list.
        String cell array with the names of the variables of interest.
        If given, only variables present in both the input data sets and this
        list will be present in output. The string 'all' may also be given,
        in which case variable filtering is not performed and all variables
        in the input list will be present in output.
        Default value: 'all' (do not perform variable filtering).
      PERIOD: time filtering boundaries.
        Two element numeric array with the start and the end of the period
        of interest (seconds since 1970-01-01 00:0:00.00 UTC). If given, 
        only row readings with timestamps within this period will be present
        in output. The string 'all' may also be given, in which case time 
        filtering is not performed and all row readings in the input list
        will be present in output.
        Default value: 'all' (do not perform time filtering).

  Notes:
    This function should be used to combine data from several glider files,
    or from several payload files, but not from both glider and payload files
    (use SXMERGE instead).

    If data rows with the same timestamp are present in several data sets,
    the function checks that data in those row readings is consistent.
    If the same variable is present in row readings from different data sets
    with the same timestamp and different valid values (not NaN), an error is
    thrown. Otherwise the values are merged into a single data row.
    However, note that in the odd case of data rows with the same timestamp
    in the same data set, they would not be merged and the values
    in the latest one would be used.

    All values in timestamp columns should be valid (not NaN).

  Examples:
    [meta, data] = sxcat(meta_list, data_list, timestamp)

  See also:
    SX2MAT
    SXMERGE

  Authors:
    Frederic Cyr  <Frederic.Cyr@mio.osupytheas.fr>
    Joan Pau Beltran  <joanpau.beltran@socib.cat>

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

DOWNLOAD ^

sxcat.m

SOURCE CODE ^

0001 function [meta, data] = sxcat(meta_list, data_list, timestamp, varargin)
0002 %SXCAT  Combine data from several SeaExplorer data sets of the same type into a single data set.
0003 %
0004 %  Syntax:
0005 %    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP)
0006 %    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS)
0007 %    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...)
0008 %
0009 %  Description:
0010 %    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP) combines data from
0011 %    arrays in cell array DATA_LIST and metadata from structs in cell array
0012 %    META_LIST into a single data set with data in array DATA and metadata in
0013 %    struct array META. Elements in META_LIST and DATA_LIST should have the
0014 %    format returned by function SX2MAT, but they do not need to have the same
0015 %    set of variables. Outputs META and DATA have the same format, too.
0016 %    META is a struct array combining the information in elements of META_LIST.
0017 %    It has following fields:
0018 %      VARIABLES: string cell array with the names of the variables present
0019 %        in the returned data array (in the same column order), built merging
0020 %        the VARIABLES field of all elements in META_LIST.
0021 %      SOURCES: string cell array built concatenating the SOURCES field
0022 %        of all elements in META_LIST.
0023 %    DATA is a numeric array combining the rows of arrays in DATA_LIST,
0024 %    reordering the variable columns if needed, and sorting the resulting rows
0025 %    according to a timestamp from variable named by string TIMESTAMP.
0026 %
0027 %    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS) and
0028 %    [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...)
0029 %    accept the following options given in key-value pairs OPT1, VAL1...
0030 %    or in a struct OPTIONS with field names as option keys and field values
0031 %    as option values:
0032 %      FORMAT: data output format.
0033 %        String setting the format of the output DATA. Valid values are:
0034 %          'array': DATA is a matrix with variable readings in the column order
0035 %            specified by the VARIABLES metadata field.
0036 %          'struct': DATA is a struct with variable names as field names
0037 %            and column vectors of variable readings as field values.
0038 %        Default value: 'array'
0039 %      VARIABLES: variable filtering list.
0040 %        String cell array with the names of the variables of interest.
0041 %        If given, only variables present in both the input data sets and this
0042 %        list will be present in output. The string 'all' may also be given,
0043 %        in which case variable filtering is not performed and all variables
0044 %        in the input list will be present in output.
0045 %        Default value: 'all' (do not perform variable filtering).
0046 %      PERIOD: time filtering boundaries.
0047 %        Two element numeric array with the start and the end of the period
0048 %        of interest (seconds since 1970-01-01 00:0:00.00 UTC). If given,
0049 %        only row readings with timestamps within this period will be present
0050 %        in output. The string 'all' may also be given, in which case time
0051 %        filtering is not performed and all row readings in the input list
0052 %        will be present in output.
0053 %        Default value: 'all' (do not perform time filtering).
0054 %
0055 %  Notes:
0056 %    This function should be used to combine data from several glider files,
0057 %    or from several payload files, but not from both glider and payload files
0058 %    (use SXMERGE instead).
0059 %
0060 %    If data rows with the same timestamp are present in several data sets,
0061 %    the function checks that data in those row readings is consistent.
0062 %    If the same variable is present in row readings from different data sets
0063 %    with the same timestamp and different valid values (not NaN), an error is
0064 %    thrown. Otherwise the values are merged into a single data row.
0065 %    However, note that in the odd case of data rows with the same timestamp
0066 %    in the same data set, they would not be merged and the values
0067 %    in the latest one would be used.
0068 %
0069 %    All values in timestamp columns should be valid (not NaN).
0070 %
0071 %  Examples:
0072 %    [meta, data] = sxcat(meta_list, data_list, timestamp)
0073 %
0074 %  See also:
0075 %    SX2MAT
0076 %    SXMERGE
0077 %
0078 %  Authors:
0079 %    Frederic Cyr  <Frederic.Cyr@mio.osupytheas.fr>
0080 %    Joan Pau Beltran  <joanpau.beltran@socib.cat>
0081 
0082 %  Copyright (C) 2016
0083 %  ICTS SOCIB - Servei d'observacio i prediccio costaner de les Illes Balears
0084 %  <http://www.socib.es>
0085 %
0086 %  This program is free software: you can redistribute it and/or modify
0087 %  it under the terms of the GNU General Public License as published by
0088 %  the Free Software Foundation, either version 3 of the License, or
0089 %  (at your option) any later version.
0090 %
0091 %  This program is distributed in the hope that it will be useful,
0092 %  but WITHOUT ANY WARRANTY; without even the implied warranty of
0093 %  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0094 %  GNU General Public License for more details.
0095 %
0096 %  You should have received a copy of the GNU General Public License
0097 %  along with this program.  If not, see <http://www.gnu.org/licenses/>.
0098 
0099   error(nargchk(3, 9, nargin, 'struct'));
0100   
0101   
0102   %% Set options and default values.
0103   options.format = 'array';
0104   options.variables = 'all';
0105   options.period = 'all';
0106   
0107   
0108   %% Parse optional arguments.
0109   % Get option key-value pairs in any accepted call signature.
0110   argopts = varargin;
0111   if isscalar(argopts) && isstruct(argopts{1})
0112     % Options passed as a single option struct argument:
0113     % field names are option keys and field values are option values.
0114     opt_key_list = fieldnames(argopts{1});
0115     opt_val_list = struct2cell(argopts{1});
0116   elseif mod(numel(argopts), 2) == 0
0117     % Options passed as key-value argument pairs.
0118     opt_key_list = argopts(1:2:end);
0119     opt_val_list = argopts(2:2:end);
0120   else
0121     error('glider_toolbox:sxcat:InvalidOptions', ...
0122           'Invalid optional arguments (neither key-value pairs nor struct).');
0123   end
0124   % Overwrite default options with values given in extra arguments.
0125   for opt_idx = 1:numel(opt_key_list)
0126     opt = lower(opt_key_list{opt_idx});
0127     val = opt_val_list{opt_idx};
0128     if isfield(options, opt)
0129       options.(opt) = val;
0130     else
0131       error('glider_toolbox:sxcat:InvalidOption', ...
0132             'Invalid option: %s.', opt);
0133     end
0134   end
0135   
0136   
0137   %% Set option flags and values.
0138   output_format = lower(options.format);
0139   variable_filtering = true;
0140   variable_list = cellstr(options.variables);
0141   time_filtering = true;
0142   time_range = options.period;
0143   if ischar(options.variables) && strcmp(options.variables, 'all')
0144     variable_filtering = false;
0145   end
0146   if ischar(options.period) && strcmp(options.period, 'all')
0147     time_filtering = false;
0148   end
0149   
0150   
0151   %% Cat data and metadata checkin for trivial empty input.
0152   % Check for trivial empty input.
0153   if isempty(meta_list)
0154     sources_cat = cell(0, 1);
0155     variables_cat_list = cell(0, 1);
0156   else
0157     meta_struct = [meta_list{:}];
0158     sources_cat = vertcat(meta_struct.sources);
0159     variables_cat_list = {meta_struct.variables}';
0160   end
0161   
0162   % Build list of sources and variables for concatenated data and metadata.
0163   [~, ~, variables_cat_indices_to] = unique(vertcat(variables_cat_list{:}));
0164   variables_cat = cell(0, 1);
0165   variables_cat(variables_cat_indices_to) = vertcat(variables_cat_list{:});
0166   
0167   % Build list of unique timestamps and the output index of each data row.
0168   stamp_cat_list = cellfun(@(d, m) d(:, strcmp(timestamp, m.variables)), ...
0169                            data_list(:), meta_list(:), 'UniformOutput', false);
0170   [~, ~, stamp_cat_indices_to] = unique(vertcat(stamp_cat_list{:}));
0171   stamp_cat = zeros(0, 1);
0172   stamp_cat(stamp_cat_indices_to) = vertcat(stamp_cat_list{:});
0173 
0174   % Build list of indices of input data entries in concatenated data output.
0175   total_rows = numel(stamp_cat);
0176   row_num_list = cellfun(@numel, stamp_cat_list(:));
0177   row_end_list = cumsum(row_num_list);
0178   row_start_list = 1 + [0; row_end_list(1:end-1)];
0179   total_cols = numel(variables_cat);
0180   col_num_list = cellfun(@numel, variables_cat_list(:));
0181   col_end_list = cumsum(col_num_list);
0182   col_start_list = 1 + [0; col_end_list(1:end-1)];
0183   
0184   % Set output concatenated data checking for consistency of overlapped data.
0185   data = nan(total_rows, total_cols);
0186   for data_idx = 1:numel(data_list)
0187     row_range = row_start_list(data_idx):row_end_list(data_idx);
0188     row_indices = stamp_cat_indices_to(row_range);
0189     col_range = col_start_list(data_idx):col_end_list(data_idx);
0190     col_indices = variables_cat_indices_to(col_range);
0191     data_old = data(row_indices, col_indices);
0192     data_new = data_list{data_idx};
0193     data_old_valid = ~isnan(data_old);
0194     data_new_valid = ~isnan(data_new);
0195     data_inconsistent = ...
0196       (data_old ~= data_new) & data_old_valid & data_new_valid;
0197     if any(data_inconsistent(:))
0198       [row_inconsistent, col_inconsistent] = find(data_inconsistent);
0199       err_msg_arg_list = cell(4, numel(row_inconsistent));
0200       err_msg_arg_list(1, :) = variables_cat(col_indices(col_inconsistent));
0201       err_msg_arg_list(2, :) = cellstr( ...
0202         datestr(posixtime2utc(stamp_cat(row_indices(row_inconsistent))), ...
0203                 'dd/mm/yyyy HH:MM:SS.FFF'));
0204       err_msg_arg_list(3, :) = num2cell(data_old(data_inconsistent));
0205       err_msg_arg_list(4, :) = num2cell(data_new(data_inconsistent));
0206       err_msg_fmt = '\nInconsistent value of %s at %s: %12f %12f';
0207       error('glider_toolbox:sxcat:InconsistentData', ...
0208             'Inconsistent data:%s', sprintf(err_msg_fmt, err_msg_arg_list{:}));
0209     end
0210     data_old(data_new_valid) = data_new(data_new_valid);
0211     data(row_indices, col_indices) = data_old;
0212   end
0213   
0214   % Set metadata fields.
0215   meta.sources = sources_cat;
0216   meta.variables = variables_cat;
0217   
0218   
0219   %% Perform time filtering if needed.
0220   if time_filtering
0221     stamp_select = ~(stamp_cat < time_range(1) | stamp_cat > time_range(2));
0222     data = data(stamp_select, :);
0223   end
0224   
0225   
0226   %% Perform variable filtering if needed.
0227   if variable_filtering
0228     [variable_select, ~] = ismember(meta.variables, variable_list);
0229     meta.variables = meta.variables(variable_select);
0230     data = data(:, variable_select);
0231   end
0232   
0233   
0234   %% Convert output data to struct format if needed.
0235   switch output_format
0236     case 'array'
0237     case 'struct'
0238       data = cell2struct(num2cell(data, 1), meta.variables, 2);
0239     otherwise
0240       error('glider_toolbox:sxcat:InvalidFormat', ...
0241             'Invalid output format: %s.', output_format)
0242   end
0243 
0244 end

Generated on Fri 06-Oct-2017 10:47:42 by m2html © 2005