Reading BPCH Files¶
xbpch provides three main utilities for reading bpch files, all of which
are provided as top-level package imports. For most purposes, you should use
open_bpchdataset()
, however a lower-level interface, BPCHFile()
is also
provided in case you would prefer manually processing the bpch contents.
See Usage and Examples for more details.
-
xbpch.
open_bpchdataset
(filename, fields=[], categories=[], tracerinfo_file='tracerinfo.dat', diaginfo_file='diaginfo.dat', endian='>', decode_cf=True, memmap=True, dask=True, return_store=False)¶ Open a GEOS-Chem BPCH file output as an xarray Dataset.
Parameters: filename : string
Path to the output file to read in.
{tracerinfo,diaginfo}_file : string, optional
Path to the metadata “info” .dat files which are used to decipher the metadata corresponding to each variable in the output dataset. If not provided, will look for them in the current directory or fall back on a generic set.
fields : list, optional
List of a subset of variable names to return. This can substantially improve read performance. Note that the field here is just the tracer name - not the category, e.g. ‘O3’ instead of ‘IJ-AVG-$_O3’.
categories : list, optional
List a subset of variable categories to look through. This can substantially improve read performance.
endian : {‘=’, ‘>’, ‘<’}, optional
Endianness of file on disk. By default, “big endian” (“>”) is assumed.
decode_cf : bool
Enforce CF conventions for variable names, units, and other metadata
default_dtype : numpy.dtype, optional
Default datatype for variables encoded in file on disk (single-precision float by default).
memmap : bool
Flag indicating that data should be memory-mapped from disk instead of eagerly loaded into memory
dask : bool
Flag indicating that data reading should be deferred (delayed) to construct a task-graph for later execution
return_store : bool
Also return the underlying DataStore to the user
Returns: ds : xarray.Dataset
Dataset containing the requested fields (or the entire file), with data contained in proxy containers for access later.
store : xarray.AbstractDataStore
Underlying DataStore which handles the loading and processing of bpch files on disk
-
xbpch.
open_mfbpchdataset
(paths, concat_dim='time', compat='no_conflicts', preprocess=None, lock=None, **kwargs)¶ Open multiple bpch files as a single dataset.
You must have dask installed for this to work, as this greatly simplifies issues relating to multi-file I/O.
Also, please note that this is not a very performant routine. I/O is still limited by the fact that we need to manually scan/read through each bpch file so that we can figure out what its contents are, since that metadata isn’t saved anywhere. So this routine will actually sequentially load Datasets for each bpch file, then concatenate them along the “time” axis. You may wish to simply process each file individually, coerce to NetCDF, and then ingest through xarray as normal.
Parameters: paths : list of strs
Filenames to load; order doesn’t matter as they will be lexicographically sorted before we read in the data
concat_dim : str, default=’time’
Dimension to concatenate Datasets over. We default to “time” since this is how GEOS-Chem splits output files
compat : str (optional)
String indicating how to compare variables of the same name for potential conflicts when merging:
- ‘broadcast_equals’: all values must be equal when variables are broadcast against each other to ensure common dimensions.
- ‘equals’: all values and dimensions must be the same.
- ‘identical’: all values, dimensions and attributes must be the same.
- ‘no_conflicts’: only values which are not null in both datasets must be equal. The returned dataset then contains the combination of all non-null values.
preprocess : callable (optional)
A pre-processing function to apply to each Dataset prior to concatenation
lock : False, True, or threading.Lock (optional)
Passed to
dask.array.from_array()
. By default, xarray employs a per-variable lock when reading data from NetCDF files, but this model has not yet been extended or implemented for bpch files and so this is not actually used. However, it is likely necessary before dask’s multi-threaded backend can be used**kwargs : optional
Additional arguments to pass to
xbpch.open_bpchdataset()
.
-
class
xbpch.
BPCHFile
(filename, mode='rb', endian='>', diaginfo_file='', tracerinfo_file='', eager=False, use_mmap=False, dask_delayed=False)¶ A file object for representing BPCH data on disk
Attributes
fp (FortranFile) A pointer to the open unformatted Fortran binary output (the original bpch file) var_data, var_attrs (dict) Containers of `BPCHDataBundle`s and dicts, respectively, holding the accessor functions to the raw bpch data and their associated metadata -
__init__
(filename, mode='rb', endian='>', diaginfo_file='', tracerinfo_file='', eager=False, use_mmap=False, dask_delayed=False)¶ Load a BPCHFile
Parameters: filename : str
Path to the bpch file on disk
mode : str
Mode string to pass to the file opener; this is currently fixed to “rb” and all other values will be rejected
endian : str {“>”, “<”, “:”}
Endian-ness of the Fortran output file
{tracerinfo, diaginfo}_file : str
Path to the tracerinfo.dat and diaginfo.dat files containing metadata pertaining to the output in the bpch file being read.
eager : bool
Flag to immediately read variable data; if “False”, then nothing will be read from the file and you’ll need to do so manually
use_mmap : bool
Use memory-mapping to read data from file
dask_delayed : bool
Use dask to create delayed references to the data-reading functions
-
__weakref__
¶ list of weak references to the object (if defined)
-
_read
()¶ Parse the entire bpch file on disk and set up easy access to meta- and data blocks.
-
_read_header
()¶ Process the header information (data model / grid spec)
-
_read_metadata
()¶ Read the main metadata packaged within a bpch file, indicating the output filetype and its title.
-
_read_var_data
()¶ Iterate over the block of this bpch file and return handlers in the form of `BPCHDataBundle`s for access to the data contained therein.
-
close
()¶ Close this bpch file.
-