CIC data processing tutorialΒΆ
Use the cic.processor.make_cic_config_template() method to create a configuration file template and fill it with necessary information. The configuration file is used at processing the data files.
from cic.processor import make_cic_config_template
make_cic_config_template("/home/user/viikki.yml")
Running the above commands will create a configuration file template in the file /home/user/viikki.yml. After filling in the information the configuration file may look like this:
measurement_location: Viikki, Helsinki, Finland
id: viikki
description: Agricultural site
instrument_model: CIC-1-1
longitude: 25.02
latitude: 60.23
data_folder:
- /home/user/data/2021
- /home/user/data/2022
processed_folder: /home/user/viikki
database_file: /home/user/viikki.json
start_date: 2022-09-28
end_date: 2022-09-30
inlet_length: 1.0
do_inlet_loss_correction: true
convert_to_standard_conditions: true
allow_reprocess: false
redo_database: false
file_format: block
resolution: 10min
Then process the data files by running cic_processor()
from cic.processor import cic_processor
cic_processor("/home/user/viikki.yml")
Added 20220928 to database (Viikki, Helsinki, Finland) ...
Added 20220928 to database (Viikki, Helsinki, Finland) ...
Added 20220928 to database (Viikki, Helsinki, Finland) ...
Processing 20220928 (Viikki, Helsinki, Finland) ...
Processing 20220929 (Viikki, Helsinki, Finland) ...
Processing 20220930 (Viikki, Helsinki, Finland) ...
Done!
The code produces daily processed data files CIC_yyyymmdd.nc (netCDF format). These files are saved in the destination given in the configuration file.
The locations of raw and processed files for each day are written in the JSON formatted database_file. This database keeps track of the files and prevents reprocessing in a continuous measurement setting.
If
allow_reprocess: falseonly files newer than the newest file in the database are processed.If
allow_reprocess: trueany unprocessed files in the time range are attempted to be processed.If you want everything to be reprocessed use
redo_database: trueotherwise keepredo_database: false
The netcdf files have the following structure:
Fields |
Dimensions |
Data type |
Units |
Comments |
|---|---|---|---|---|
Coordinates |
||||
time |
time |
datetime64[ns] |
timezone: utc |
|
flag |
flag |
string |
||
Data variables |
||||
neg_conc_1 |
time |
float |
cm-3 |
Negative ion number concentration in channel 1 |
neg_conc_2 |
time |
float |
cm-3 |
Negative ion number concentration in channel 2 |
neg_conc_3 |
time |
float |
cm-3 |
Negative ion number concentration in channel 3 |
pos_conc_1 |
time |
float |
cm-3 |
Positive ion number concentration in channel 1 |
pos_conc_2 |
time |
float |
cm-3 |
Positive ion number concentration in channel 2 |
pos_conc_3 |
time |
float |
cm-3 |
Positive ion number concentration in channel 3 |
neg_temperature |
time |
float |
K |
|
pos_temperature |
time |
float |
K |
|
neg_pressure |
time |
float |
Pa |
|
pos_pressure |
time |
float |
Pa |
|
neg_sampleflow |
time |
float |
lpm |
|
pos_sampleflow |
time |
float |
lpm |
|
neg_ion_flags |
time,flag |
int |
flag=1, no flag=0 |
|
pos_ion_flags |
time,flag |
int |
flag=1, no flag=0 |
|
Attributes |
||||
Measurement info |
dictionary |
Below is an examples of how to access data in the netcdf file.
import xarray as xr
import pandas as pd
# load the dataset
ds = xr.open_dataset("/home/user/viikki/CIC_20220928.nc")
# Get negative ion number concentration in channel 1
neg_conc_1 = ds.neg_conc_1.to_pandas()
# Close the file
ds.close()
We can combine the previously created files into a single continuous datase. We save the result as a netcdf file.
from cic.utils import combine_data
import pandas as pd
import xarray as xr
from pathlib import Path
import os
data_source = Path("/home/user/viikki")
data_files = [data_source / f for f
in os.listdir(data_source) if ".nc" in f]
date_start = "2022-09-28"
date_end = "2022-09-30"
# Combine the data into a single dataset with 30 min time
# resolution and flag a data line only if 50% or more
# of the data inside the 30 min time window contain the flag.
ds = combine_data(data_files, date_start, date_end, "30min",
flag_sensitivity=0.5)
ds.to_netcdf("combined_cic_dataset.nc")
