# `nais-processor` tutorial Use the `nais.processor.make_config_template()` method to create a configuration file template and fill it with necessary information. The configuration file is used at processing the data files. ```python from nais.processor import make_config_template make_config_template("/home/user/viikki.yml") ``` Running the above commands will create a configuration file template in the file `/home/user/viikki.yml`. After filling in the information the configuration file may look like this: ```yaml measurement_location: Viikki, Helsinki, Finland id: viikki description: Agricultural site instrument_model: NAIS-5-27 longitude: 25.02 latitude: 60.23 data_folder: - /home/user/data/2021 - /home/user/data/2022 processed_folder: /home/user/viikki database_file: /home/user/viikki.json start_date: 2022-09-28 end_date: 2022-09-30 inlet_length: 1.0 do_inlet_loss_correction: true convert_to_standard_conditions: true do_wagner_ion_mode_correction: true remove_corona_ions: true allow_reprocess: false redo_database: false fill_temperature: null fill_pressure: null fill_flowrate: null dilution_on: false file_format: block resolution: 5min ``` The following corrections are available * Inlet loss correction (Gromley and Kennedy, 1948) * Ion mode correction (Wagner et al. 2016) * Conversion to standard conditions (293.15 K, 101325 Pa) * Remove charger ion band from total particle data * Use fill values in case of missing environmental sensor data Then process the data files by running `nais_processor()` ```python from nais.processor import nais_processor nais_processor("/home/user/viikki.yml") ``` ``` Added 20220928 to database (Viikki, Helsinki, Finland) ... Added 20220928 to database (Viikki, Helsinki, Finland) ... Added 20220928 to database (Viikki, Helsinki, Finland) ... Processing 20220928 (Viikki, Helsinki, Finland) ... Processing 20220929 (Viikki, Helsinki, Finland) ... Processing 20220930 (Viikki, Helsinki, Finland) ... Done! ``` The code produces daily processed data files `NAIS_yyyymmdd.nc` (netCDF format). These files are saved in the destination given in the configuration file. The locations of raw and processed files for each day are written in the JSON formatted `database_file`. This database keeps track of the files and prevents reprocessing in a continuous measurement setting. * If `allow_reprocess: false` only files newer than the newest file in the database are processed. * If `allow_reprocess: true` any unprocessed files in the time range are attempted to be processed. * If you want everything to be reprocessed use `redo_database: true` otherwise keep `redo_database: false` The netcdf files have the following structure: | Fields | Dimensions | Data type | Units | Comments | |--------------------|---------------|----------------|-------|------------------- | | **Coordinates** | | | | | | time | time | datetime64[ns] | | timezone: utc | | diameter | diameter | float | m | particle diameter | | flag | flag | string | | | | **Data variables** | | | | | | neg_ions | time,diameter | float | cm-3 | dN/dlogDp | | pos_ions | time,diameter | float | cm-3 | dN/dlogDp | | neg_particles | time,diameter | float | cm-3 | dN/dlogDp | | pos_particles | time,diameter | float | cm-3 | dN/dlogDp | | neg_ion_flags | time,flag | int | | flag=1, no flag=0 | | pos_ion_flags | time,flag | int | | flag=1, no flag=0 | | neg_particle_flags | time,flag | int | | flag=1, no flag=0 | | pos_particle_flags | time,flag | int | | flag=1, no flag=0 | | temperature | time | float | K | | | pressure | time | float | Pa | | | relhum | time | float | % | | | sample_flow | time | float | lpm | | | **Attributes** | | | | | | Measurement info | | dictionary | | | Below are some examples of how to access the different variables in the netcdf file. ```python import xarray as xr import pandas as pd # load the dataset ds = xr.open_dataset("/home/user/viikki/NAIS_20220928.nc") # Get negative ion number size distribution df_neg_ions = ds.neg_ions.to_pandas() # Get total particle number size distribution (positive polarity) df_pos_particles = ds.pos_particles.to_pandas() # Get temperature df_temperature = ds.temperature.to_pandas() # Close the file ds.close() ``` Next is an example how to calculate the number concentration in some size range ```python import aerosol.functions as af dp_1 = 2.5e-9 dp_2 = 5e-9 conc = af.calc_conc_interp(df_pos_particles,dp_1,dp_2) ``` Next we combine the previously created files into a single continuous dataset with 1 hour time resolution and only raise a flag if at least 50% of the data points inside the two hour window contain the flag. We save the result as a netcdf file. ```python from nais.utils import combine_data import pandas as pd import xarray as xr import os data_source = "/home/user/viikki" data_files = [os.path.join(data_source,f) for f in os.listdir(data_source) if ".nc" in f] date_start = "2022-09-28" date_end = "2022-09-30" ds = combine_data(data_source, date_start, date_end, "1h", flag_sensitivity=0.5) ds.to_netcdf("combined_nais_dataset.nc") ``` Then we launch the data checker with the combined data file in order to identify bad data. 1. Bounding boxes can be drawn around bad data in the size distributions (initiate an adjustable box with double left click and remove it by right clicking on the box and choosing remove). 2. By clicking the save boundaries button the box coordinates are saved to a netcdf file (filename given in the second argument). 3. If the bounding boxes are saved, they will be reloaded when the checker is reopened with same arguments, so save your work regularly in case the program crashes. ```python from nais.checker import startNaisChecker startNaisChecker("combined_nais_dataset.nc", "bad_data_bounds.nc") ``` We can set the bad data to `NaN` in our combined file and use the resulting dataset as the starting point for further analysis. ```python from nais.utils import remove_bad_data ds = xr.open_dataset("combined_nais_dataset.nc") bad_data = xr.open_dataset("bad_data_bounds.nc") ds = remove_bad_data(ds, bad_data) ``` __References__ _Gormley P. G. and Kennedy M., Diffusion from a Stream Flowing through a Cylindrical Tube, Proceedings of the Royal Irish Academy. Section A: Mathematical and Physical Sciences, 52, (1948-1950), pp. 163-169._ _Wagner R., Manninen H.E., Franchin A., Lehtipalo K., Mirme S., Steiner G., Petäjä T. and Kulmala M., On the accuracy of ion measurements using a Neutral cluster and Air Ion Spectrometer, Boreal Environment Research, 21, (2016), pp. 230–241._