SXCAT Combine data from several SeaExplorer data sets of the same type into a single data set. Syntax: [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP) [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS) [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...) Description: [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP) combines data from arrays in cell array DATA_LIST and metadata from structs in cell array META_LIST into a single data set with data in array DATA and metadata in struct array META. Elements in META_LIST and DATA_LIST should have the format returned by function SX2MAT, but they do not need to have the same set of variables. Outputs META and DATA have the same format, too. META is a struct array combining the information in elements of META_LIST. It has following fields: VARIABLES: string cell array with the names of the variables present in the returned data array (in the same column order), built merging the VARIABLES field of all elements in META_LIST. SOURCES: string cell array built concatenating the SOURCES field of all elements in META_LIST. DATA is a numeric array combining the rows of arrays in DATA_LIST, reordering the variable columns if needed, and sorting the resulting rows according to a timestamp from variable named by string TIMESTAMP. [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS) and [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...) accept the following options given in key-value pairs OPT1, VAL1... or in a struct OPTIONS with field names as option keys and field values as option values: FORMAT: data output format. String setting the format of the output DATA. Valid values are: 'array': DATA is a matrix with variable readings in the column order specified by the VARIABLES metadata field. 'struct': DATA is a struct with variable names as field names and column vectors of variable readings as field values. Default value: 'array' VARIABLES: variable filtering list. String cell array with the names of the variables of interest. If given, only variables present in both the input data sets and this list will be present in output. The string 'all' may also be given, in which case variable filtering is not performed and all variables in the input list will be present in output. Default value: 'all' (do not perform variable filtering). PERIOD: time filtering boundaries. Two element numeric array with the start and the end of the period of interest (seconds since 1970-01-01 00:0:00.00 UTC). If given, only row readings with timestamps within this period will be present in output. The string 'all' may also be given, in which case time filtering is not performed and all row readings in the input list will be present in output. Default value: 'all' (do not perform time filtering). Notes: This function should be used to combine data from several glider files, or from several payload files, but not from both glider and payload files (use SXMERGE instead). If data rows with the same timestamp are present in several data sets, the function checks that data in those row readings is consistent. If the same variable is present in row readings from different data sets with the same timestamp and different valid values (not NaN), an error is thrown. Otherwise the values are merged into a single data row. However, note that in the odd case of data rows with the same timestamp in the same data set, they would not be merged and the values in the latest one would be used. All values in timestamp columns should be valid (not NaN). Examples: [meta, data] = sxcat(meta_list, data_list, timestamp) See also: SX2MAT SXMERGE Authors: Frederic Cyr <Frederic.Cyr@mio.osupytheas.fr> Joan Pau Beltran <joanpau.beltran@socib.cat>
0001 function [meta, data] = sxcat(meta_list, data_list, timestamp, varargin) 0002 %SXCAT Combine data from several SeaExplorer data sets of the same type into a single data set. 0003 % 0004 % Syntax: 0005 % [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP) 0006 % [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS) 0007 % [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...) 0008 % 0009 % Description: 0010 % [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP) combines data from 0011 % arrays in cell array DATA_LIST and metadata from structs in cell array 0012 % META_LIST into a single data set with data in array DATA and metadata in 0013 % struct array META. Elements in META_LIST and DATA_LIST should have the 0014 % format returned by function SX2MAT, but they do not need to have the same 0015 % set of variables. Outputs META and DATA have the same format, too. 0016 % META is a struct array combining the information in elements of META_LIST. 0017 % It has following fields: 0018 % VARIABLES: string cell array with the names of the variables present 0019 % in the returned data array (in the same column order), built merging 0020 % the VARIABLES field of all elements in META_LIST. 0021 % SOURCES: string cell array built concatenating the SOURCES field 0022 % of all elements in META_LIST. 0023 % DATA is a numeric array combining the rows of arrays in DATA_LIST, 0024 % reordering the variable columns if needed, and sorting the resulting rows 0025 % according to a timestamp from variable named by string TIMESTAMP. 0026 % 0027 % [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPTIONS) and 0028 % [META, DATA] = SXCAT(META_LIST, DATA_LIST, TIMESTAMP, OPT1, VAL1, ...) 0029 % accept the following options given in key-value pairs OPT1, VAL1... 0030 % or in a struct OPTIONS with field names as option keys and field values 0031 % as option values: 0032 % FORMAT: data output format. 0033 % String setting the format of the output DATA. Valid values are: 0034 % 'array': DATA is a matrix with variable readings in the column order 0035 % specified by the VARIABLES metadata field. 0036 % 'struct': DATA is a struct with variable names as field names 0037 % and column vectors of variable readings as field values. 0038 % Default value: 'array' 0039 % VARIABLES: variable filtering list. 0040 % String cell array with the names of the variables of interest. 0041 % If given, only variables present in both the input data sets and this 0042 % list will be present in output. The string 'all' may also be given, 0043 % in which case variable filtering is not performed and all variables 0044 % in the input list will be present in output. 0045 % Default value: 'all' (do not perform variable filtering). 0046 % PERIOD: time filtering boundaries. 0047 % Two element numeric array with the start and the end of the period 0048 % of interest (seconds since 1970-01-01 00:0:00.00 UTC). If given, 0049 % only row readings with timestamps within this period will be present 0050 % in output. The string 'all' may also be given, in which case time 0051 % filtering is not performed and all row readings in the input list 0052 % will be present in output. 0053 % Default value: 'all' (do not perform time filtering). 0054 % 0055 % Notes: 0056 % This function should be used to combine data from several glider files, 0057 % or from several payload files, but not from both glider and payload files 0058 % (use SXMERGE instead). 0059 % 0060 % If data rows with the same timestamp are present in several data sets, 0061 % the function checks that data in those row readings is consistent. 0062 % If the same variable is present in row readings from different data sets 0063 % with the same timestamp and different valid values (not NaN), an error is 0064 % thrown. Otherwise the values are merged into a single data row. 0065 % However, note that in the odd case of data rows with the same timestamp 0066 % in the same data set, they would not be merged and the values 0067 % in the latest one would be used. 0068 % 0069 % All values in timestamp columns should be valid (not NaN). 0070 % 0071 % Examples: 0072 % [meta, data] = sxcat(meta_list, data_list, timestamp) 0073 % 0074 % See also: 0075 % SX2MAT 0076 % SXMERGE 0077 % 0078 % Authors: 0079 % Frederic Cyr <Frederic.Cyr@mio.osupytheas.fr> 0080 % Joan Pau Beltran <joanpau.beltran@socib.cat> 0081 0082 % Copyright (C) 2016 0083 % ICTS SOCIB - Servei d'observacio i prediccio costaner de les Illes Balears 0084 % <http://www.socib.es> 0085 % 0086 % This program is free software: you can redistribute it and/or modify 0087 % it under the terms of the GNU General Public License as published by 0088 % the Free Software Foundation, either version 3 of the License, or 0089 % (at your option) any later version. 0090 % 0091 % This program is distributed in the hope that it will be useful, 0092 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0093 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0094 % GNU General Public License for more details. 0095 % 0096 % You should have received a copy of the GNU General Public License 0097 % along with this program. If not, see <http://www.gnu.org/licenses/>. 0098 0099 error(nargchk(3, 9, nargin, 'struct')); 0100 0101 0102 %% Set options and default values. 0103 options.format = 'array'; 0104 options.variables = 'all'; 0105 options.period = 'all'; 0106 0107 0108 %% Parse optional arguments. 0109 % Get option key-value pairs in any accepted call signature. 0110 argopts = varargin; 0111 if isscalar(argopts) && isstruct(argopts{1}) 0112 % Options passed as a single option struct argument: 0113 % field names are option keys and field values are option values. 0114 opt_key_list = fieldnames(argopts{1}); 0115 opt_val_list = struct2cell(argopts{1}); 0116 elseif mod(numel(argopts), 2) == 0 0117 % Options passed as key-value argument pairs. 0118 opt_key_list = argopts(1:2:end); 0119 opt_val_list = argopts(2:2:end); 0120 else 0121 error('glider_toolbox:sxcat:InvalidOptions', ... 0122 'Invalid optional arguments (neither key-value pairs nor struct).'); 0123 end 0124 % Overwrite default options with values given in extra arguments. 0125 for opt_idx = 1:numel(opt_key_list) 0126 opt = lower(opt_key_list{opt_idx}); 0127 val = opt_val_list{opt_idx}; 0128 if isfield(options, opt) 0129 options.(opt) = val; 0130 else 0131 error('glider_toolbox:sxcat:InvalidOption', ... 0132 'Invalid option: %s.', opt); 0133 end 0134 end 0135 0136 0137 %% Set option flags and values. 0138 output_format = lower(options.format); 0139 variable_filtering = true; 0140 variable_list = cellstr(options.variables); 0141 time_filtering = true; 0142 time_range = options.period; 0143 if ischar(options.variables) && strcmp(options.variables, 'all') 0144 variable_filtering = false; 0145 end 0146 if ischar(options.period) && strcmp(options.period, 'all') 0147 time_filtering = false; 0148 end 0149 0150 0151 %% Cat data and metadata checkin for trivial empty input. 0152 % Check for trivial empty input. 0153 if isempty(meta_list) 0154 sources_cat = cell(0, 1); 0155 variables_cat_list = cell(0, 1); 0156 else 0157 meta_struct = [meta_list{:}]; 0158 sources_cat = vertcat(meta_struct.sources); 0159 variables_cat_list = {meta_struct.variables}'; 0160 end 0161 0162 % Build list of sources and variables for concatenated data and metadata. 0163 [~, ~, variables_cat_indices_to] = unique(vertcat(variables_cat_list{:})); 0164 variables_cat = cell(0, 1); 0165 variables_cat(variables_cat_indices_to) = vertcat(variables_cat_list{:}); 0166 0167 % Build list of unique timestamps and the output index of each data row. 0168 stamp_cat_list = cellfun(@(d, m) d(:, strcmp(timestamp, m.variables)), ... 0169 data_list(:), meta_list(:), 'UniformOutput', false); 0170 [~, ~, stamp_cat_indices_to] = unique(vertcat(stamp_cat_list{:})); 0171 stamp_cat = zeros(0, 1); 0172 stamp_cat(stamp_cat_indices_to) = vertcat(stamp_cat_list{:}); 0173 0174 % Build list of indices of input data entries in concatenated data output. 0175 total_rows = numel(stamp_cat); 0176 row_num_list = cellfun(@numel, stamp_cat_list(:)); 0177 row_end_list = cumsum(row_num_list); 0178 row_start_list = 1 + [0; row_end_list(1:end-1)]; 0179 total_cols = numel(variables_cat); 0180 col_num_list = cellfun(@numel, variables_cat_list(:)); 0181 col_end_list = cumsum(col_num_list); 0182 col_start_list = 1 + [0; col_end_list(1:end-1)]; 0183 0184 % Set output concatenated data checking for consistency of overlapped data. 0185 data = nan(total_rows, total_cols); 0186 for data_idx = 1:numel(data_list) 0187 row_range = row_start_list(data_idx):row_end_list(data_idx); 0188 row_indices = stamp_cat_indices_to(row_range); 0189 col_range = col_start_list(data_idx):col_end_list(data_idx); 0190 col_indices = variables_cat_indices_to(col_range); 0191 data_old = data(row_indices, col_indices); 0192 data_new = data_list{data_idx}; 0193 data_old_valid = ~isnan(data_old); 0194 data_new_valid = ~isnan(data_new); 0195 data_inconsistent = ... 0196 (data_old ~= data_new) & data_old_valid & data_new_valid; 0197 if any(data_inconsistent(:)) 0198 [row_inconsistent, col_inconsistent] = find(data_inconsistent); 0199 err_msg_arg_list = cell(4, numel(row_inconsistent)); 0200 err_msg_arg_list(1, :) = variables_cat(col_indices(col_inconsistent)); 0201 err_msg_arg_list(2, :) = cellstr( ... 0202 datestr(posixtime2utc(stamp_cat(row_indices(row_inconsistent))), ... 0203 'dd/mm/yyyy HH:MM:SS.FFF')); 0204 err_msg_arg_list(3, :) = num2cell(data_old(data_inconsistent)); 0205 err_msg_arg_list(4, :) = num2cell(data_new(data_inconsistent)); 0206 err_msg_fmt = '\nInconsistent value of %s at %s: %12f %12f'; 0207 error('glider_toolbox:sxcat:InconsistentData', ... 0208 'Inconsistent data:%s', sprintf(err_msg_fmt, err_msg_arg_list{:})); 0209 end 0210 data_old(data_new_valid) = data_new(data_new_valid); 0211 data(row_indices, col_indices) = data_old; 0212 end 0213 0214 % Set metadata fields. 0215 meta.sources = sources_cat; 0216 meta.variables = variables_cat; 0217 0218 0219 %% Perform time filtering if needed. 0220 if time_filtering 0221 stamp_select = ~(stamp_cat < time_range(1) | stamp_cat > time_range(2)); 0222 data = data(stamp_select, :); 0223 end 0224 0225 0226 %% Perform variable filtering if needed. 0227 if variable_filtering 0228 [variable_select, ~] = ismember(meta.variables, variable_list); 0229 meta.variables = meta.variables(variable_select); 0230 data = data(:, variable_select); 0231 end 0232 0233 0234 %% Convert output data to struct format if needed. 0235 switch output_format 0236 case 'array' 0237 case 'struct' 0238 data = cell2struct(num2cell(data, 1), meta.variables, 2); 0239 otherwise 0240 error('glider_toolbox:sxcat:InvalidFormat', ... 0241 'Invalid output format: %s.', output_format) 0242 end 0243 0244 end