CIC data processing tutorialΒΆ

Use the cic.processor.make_cic_config_template() method to create a configuration file template and fill it with necessary information. The configuration file is used at processing the data files.

from cic.processor import make_cic_config_template
make_cic_config_template("/home/user/viikki.yml")

Running the above commands will create a configuration file template in the file /home/user/viikki.yml. After filling in the information the configuration file may look like this:

measurement_location: Viikki, Helsinki, Finland
id: viikki
description: Agricultural site
instrument_model: CIC-1-1
longitude: 25.02
latitude: 60.23
data_folder:
- /home/user/data/2021
- /home/user/data/2022
processed_folder: /home/user/viikki
database_file: /home/user/viikki.json
start_date: 2022-09-28
end_date: 2022-09-30
inlet_length: 1.0
do_inlet_loss_correction: true
convert_to_standard_conditions: true
allow_reprocess: false
redo_database: false
file_format: block
resolution: 10min

Then process the data files by running cic_processor()

from cic.processor import cic_processor
cic_processor("/home/user/viikki.yml")
Added 20220928 to database (Viikki, Helsinki, Finland) ...
Added 20220928 to database (Viikki, Helsinki, Finland) ...
Added 20220928 to database (Viikki, Helsinki, Finland) ...
Processing 20220928 (Viikki, Helsinki, Finland) ...
Processing 20220929 (Viikki, Helsinki, Finland) ...
Processing 20220930 (Viikki, Helsinki, Finland) ...
Done!

The code produces daily processed data files CIC_yyyymmdd.nc (netCDF format). These files are saved in the destination given in the configuration file.

The locations of raw and processed files for each day are written in the JSON formatted database_file. This database keeps track of the files and prevents reprocessing in a continuous measurement setting.

  • If allow_reprocess: false only files newer than the newest file in the database are processed.

  • If allow_reprocess: true any unprocessed files in the time range are attempted to be processed.

  • If you want everything to be reprocessed use redo_database: true otherwise keep redo_database: false

The netcdf files have the following structure:

Fields

Dimensions

Data type

Units

Comments

Coordinates

time

time

datetime64[ns]

timezone: utc

flag

flag

string

Data variables

neg_conc_1

time

float

cm-3

Negative ion number concentration in channel 1

neg_conc_2

time

float

cm-3

Negative ion number concentration in channel 2

neg_conc_3

time

float

cm-3

Negative ion number concentration in channel 3

pos_conc_1

time

float

cm-3

Positive ion number concentration in channel 1

pos_conc_2

time

float

cm-3

Positive ion number concentration in channel 2

pos_conc_3

time

float

cm-3

Positive ion number concentration in channel 3

neg_temperature

time

float

K

pos_temperature

time

float

K

neg_pressure

time

float

Pa

pos_pressure

time

float

Pa

neg_sampleflow

time

float

lpm

pos_sampleflow

time

float

lpm

neg_ion_flags

time,flag

int

flag=1, no flag=0

pos_ion_flags

time,flag

int

flag=1, no flag=0

Attributes

Measurement info

dictionary

Below is an examples of how to access data in the netcdf file.

import xarray as xr
import pandas as pd

# load the dataset
ds = xr.open_dataset("/home/user/viikki/CIC_20220928.nc")

# Get negative ion number concentration in channel 1
neg_conc_1 = ds.neg_conc_1.to_pandas()

# Close the file
ds.close()

We can combine the previously created files into a single continuous datase. We save the result as a netcdf file.

from cic.utils import combine_data
import pandas as pd
import xarray as xr
from pathlib import Path
import os

data_source = Path("/home/user/viikki")
data_files = [data_source / f for f 
    in os.listdir(data_source) if ".nc" in f]
date_start = "2022-09-28"
date_end = "2022-09-30"

# Combine the data into a single dataset with 30 min time 
# resolution and flag a data line only if 50% or more 
# of the data inside the 30 min time window contain the flag.
ds = combine_data(data_files, date_start, date_end, "30min",
    flag_sensitivity=0.5)

ds.to_netcdf("combined_cic_dataset.nc")