Quick Start

Assuming you’re already familiar with xarray, it’s easy to dive right in to begin reading bpch data. If you don’t have any GEOS-Chem data handy to test with, I’ve archived a sample dataset here here consisting of 14 days of hourly, ND49 output - good for diagnosing surface air quality statistics.

Download the data and extract it to some directory:

$ wget https://ndownloader.figshare.com/files/8251094
$ tar -xvzf sample_nd49.tar.gz

You should now see 14 bpch files in your directory, and two .dat files.

The whole point of xbpch is to read these data files natively into an xarray.Dataset. You can do this with the xbpch.open_bpchdataset() method:

In [1]: import xbpch

In [2]: fn = "ND49_20060102_ref_e2006_m2010.bpch"

In [3]: ds = xbpch.open_bpchdataset(fn)

If we print the dataset back out, we’ll get a familiar representation:

<xarray.Dataset>
Dimensions:          (lat: 91, lev: 47, lon: 144, nv: 2, time: 24)
Coordinates:
  * lev              (lev) float64 0.9925 0.9775 0.9624 0.9473 0.9322 0.9171 ...
  * lon              (lon) float64 -180.0 -177.5 -175.0 -172.5 -170.0 -167.5 ...
  * lat              (lat) float64 -89.5 -88.0 -86.0 -84.0 -82.0 -80.0 -78.0 ...
  * time             (time) datetime64[ns] 2006-01-01T01:00:00 ...
  * nv               (nv) int64 0 1
Data variables:
    IJ_AVG_S_NO      (time, lon, lat) float32 1.16601e-12 1.1599e-12 ...
    time_bnds        (time, nv) datetime64[ns] 2006-01-01T01:00:00 ...
    IJ_AVG_S_O3      (time, lon, lat) float32 9.25816e-09 9.25042e-09 ...
    IJ_AVG_S_SO4     (time, lon, lat) float32 1.41706e-10 1.4142e-10 ...
    IJ_AVG_S_NH4     (time, lon, lat) float32 1.16908e-11 1.16658e-11 ...
    IJ_AVG_S_NIT     (time, lon, lat) float32 9.99837e-31 9.99897e-31 ...
    IJ_AVG_S_BCPI    (time, lon, lat) float32 2.46206e-12 2.45698e-12 ...
    IJ_AVG_S_OCPI    (time, lon, lat) float32 2.65303e-11 2.6476e-11 ...
    IJ_AVG_S_BCPO    (time, lon, lat) float32 4.19881e-19 4.18213e-19 ...
    IJ_AVG_S_OCPO    (time, lon, lat) float32 2.49109e-22 2.53752e-22 ...
    IJ_AVG_S_DST1    (time, lon, lat) float32 7.11484e-12 7.10209e-12 ...
    IJ_AVG_S_DST2    (time, lon, lat) float32 1.55181e-11 1.54779e-11 ...
    IJ_AVG_S_SALA    (time, lon, lat) float32 3.70387e-11 3.69923e-11 ...
    OD_MAP_S_AOD     (time, lon, lat) float32 0.292372 0.325568 0.358368 ...
    OD_MAP_S_DSTAOD  (time, lon, lat) float32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Attributes:
    modelname:    GEOS5_47L
    halfpolar:    1
    res:          (2.5, 2.0)
    center180:    1
    tracerinfo:   tracerinfo.dat
    diaginfo:     diaginfo.dat
    filetitle:    b'GEOS-CHEM DIAG49 instantaneous timeseries'
    source:       ND49_20060101_ref_e2006_m2010.bpch
    filetype:     b'CTM bin 02'
    Conventions:  CF1.6

You can then proceed to process the data using the conventional routines you’d use in any xarray-powered workflow.

In the sample dataset highlighted here, the 14 days of hourly output are split across 14 files - one for each day’s worth of data. xbpch provides a second method, xbpch.open_mfbpchdataset(), for reading in multiple-file datasets like these, and automatically concatenating them on the time record dimension:

In [4]: import xbpch

In [5]: from glob import glob

# List all the bpch files in the current directory
In [6]: fns = glob("ND49_*.bpch")

# Helper function to extract spatial mean O3 from each file
In [7]: def _preprocess(ds):
   ...:     return ds[['IJ_AVG_S_O3', ]].mean(['lon', 'lat'])
   ...: 

In [8]: ds = xbpch.open_mfbpchdataset(
   ...:     fns, preprocess=_preprocess, dask=True, memmap=True
   ...: )
   ...: 

Again, printing yields the expected results:

<xarray.Dataset>
Dimensions:      (time: 336)
Coordinates:
  * time         (time) datetime64[ns] 2006-01-01T01:00:00 ...
Data variables:
    IJ_AVG_S_O3  (time) float32 2.5524e-08 2.55541e-08 2.55588e-08 ...

Finally, if you don’t want to drop into a Python interpreter but just want to quickly convert your binary data to NetCDF, you can run the utility script bpch_to_nc which is shipped with this library:

$ bpch_to_nc /path/to/my/data.bpch /path/to/my/output.nc

Reading in file(s)...

Decoding variables...

Writing to /path/to/my/output.nc ...
syncing
[####################################] | 100% Completed | 52.1s