Reading BPCH Files

xbpch provides three main utilities for reading bpch files, all of which are provided as top-level package imports. For most purposes, you should use open_bpchdataset(), however a lower-level interface, BPCHFile() is also provided in case you would prefer manually processing the bpch contents.

See Usage and Examples for more details.

xbpch.open_bpchdataset(filename, fields=[], categories=[], tracerinfo_file='tracerinfo.dat', diaginfo_file='diaginfo.dat', endian='>', decode_cf=True, memmap=True, dask=True, return_store=False)

Open a GEOS-Chem BPCH file output as an xarray Dataset.

Parameters:

filename : string

Path to the output file to read in.

{tracerinfo,diaginfo}_file : string, optional

Path to the metadata “info” .dat files which are used to decipher the metadata corresponding to each variable in the output dataset. If not provided, will look for them in the current directory or fall back on a generic set.

fields : list, optional

List of a subset of variable names to return. This can substantially improve read performance. Note that the field here is just the tracer name - not the category, e.g. ‘O3’ instead of ‘IJ-AVG-$_O3’.

categories : list, optional

List a subset of variable categories to look through. This can substantially improve read performance.

endian : {‘=’, ‘>’, ‘<’}, optional

Endianness of file on disk. By default, “big endian” (“>”) is assumed.

decode_cf : bool

Enforce CF conventions for variable names, units, and other metadata

default_dtype : numpy.dtype, optional

Default datatype for variables encoded in file on disk (single-precision float by default).

memmap : bool

Flag indicating that data should be memory-mapped from disk instead of eagerly loaded into memory

dask : bool

Flag indicating that data reading should be deferred (delayed) to construct a task-graph for later execution

return_store : bool

Also return the underlying DataStore to the user

Returns:

ds : xarray.Dataset

Dataset containing the requested fields (or the entire file), with data contained in proxy containers for access later.

store : xarray.AbstractDataStore

Underlying DataStore which handles the loading and processing of bpch files on disk

xbpch.open_mfbpchdataset(paths, concat_dim='time', compat='no_conflicts', preprocess=None, lock=None, **kwargs)

Open multiple bpch files as a single dataset.

You must have dask installed for this to work, as this greatly simplifies issues relating to multi-file I/O.

Also, please note that this is not a very performant routine. I/O is still limited by the fact that we need to manually scan/read through each bpch file so that we can figure out what its contents are, since that metadata isn’t saved anywhere. So this routine will actually sequentially load Datasets for each bpch file, then concatenate them along the “time” axis. You may wish to simply process each file individually, coerce to NetCDF, and then ingest through xarray as normal.

Parameters:

paths : list of strs

Filenames to load; order doesn’t matter as they will be lexicographically sorted before we read in the data

concat_dim : str, default=’time’

Dimension to concatenate Datasets over. We default to “time” since this is how GEOS-Chem splits output files

compat : str (optional)

String indicating how to compare variables of the same name for potential conflicts when merging:

  • ‘broadcast_equals’: all values must be equal when variables are broadcast against each other to ensure common dimensions.
  • ‘equals’: all values and dimensions must be the same.
  • ‘identical’: all values, dimensions and attributes must be the same.
  • ‘no_conflicts’: only values which are not null in both datasets must be equal. The returned dataset then contains the combination of all non-null values.

preprocess : callable (optional)

A pre-processing function to apply to each Dataset prior to concatenation

lock : False, True, or threading.Lock (optional)

Passed to dask.array.from_array(). By default, xarray employs a per-variable lock when reading data from NetCDF files, but this model has not yet been extended or implemented for bpch files and so this is not actually used. However, it is likely necessary before dask’s multi-threaded backend can be used

**kwargs : optional

Additional arguments to pass to xbpch.open_bpchdataset().

class xbpch.BPCHFile(filename, mode='rb', endian='>', diaginfo_file='', tracerinfo_file='', eager=False, use_mmap=False, dask_delayed=False)

A file object for representing BPCH data on disk

Attributes

fp (FortranFile) A pointer to the open unformatted Fortran binary output (the original bpch file)
var_data, var_attrs (dict) Containers of `BPCHDataBundle`s and dicts, respectively, holding the accessor functions to the raw bpch data and their associated metadata
__init__(filename, mode='rb', endian='>', diaginfo_file='', tracerinfo_file='', eager=False, use_mmap=False, dask_delayed=False)

Load a BPCHFile

Parameters:

filename : str

Path to the bpch file on disk

mode : str

Mode string to pass to the file opener; this is currently fixed to “rb” and all other values will be rejected

endian : str {“>”, “<”, “:”}

Endian-ness of the Fortran output file

{tracerinfo, diaginfo}_file : str

Path to the tracerinfo.dat and diaginfo.dat files containing metadata pertaining to the output in the bpch file being read.

eager : bool

Flag to immediately read variable data; if “False”, then nothing will be read from the file and you’ll need to do so manually

use_mmap : bool

Use memory-mapping to read data from file

dask_delayed : bool

Use dask to create delayed references to the data-reading functions

__weakref__

list of weak references to the object (if defined)

_read()

Parse the entire bpch file on disk and set up easy access to meta- and data blocks.

_read_header()

Process the header information (data model / grid spec)

_read_metadata()

Read the main metadata packaged within a bpch file, indicating the output filetype and its title.

_read_var_data()

Iterate over the block of this bpch file and return handlers in the form of `BPCHDataBundle`s for access to the data contained therein.

close()

Close this bpch file.