Catalogs

PyCSEP provides routines for working with and manipulating earthquake catalogs for the purposes of evaluating earthquake forecasting models.

If you are able to make use of these tools for other reasons, please let us know. We are especially interested in including basic catalog statistics into this package. If you are interested in helping implement routines like b-value estimation and catalog completeness that would be much appreciated.

Introduction

PyCSEP catalog basics

An earthquake catalog contains a collection of seismic events each defined by a set of attributes. PyCSEP implements a simple event description that is suitable for evaluating earthquake forecasts. In this format, every seismic event is defined by its location in space (longitude, latitude, and depth), magnitude, and origin time. In addition, each event can have an optional event_id as a unique identifier.

PyCSEP provides csep.core.catalogs.CSEPCatalog to represent an earthquake catalog. The essential event data are stored in a structured NumPy array with the following data type.

dtype = numpy.dtype([('id', 'S256'),
                     ('origin_time', '<i8'),
                     ('latitude', '<f4'),
                     ('longitude', '<f4'),
                     ('depth', '<f4'),
                     ('magnitude', '<f4')])

Additional information can be associated with an event using the id field in the structured array in a class member called metadata. The essential event data must be complete meaning that each event should have these attributes defined. The metadata is more freeform and PyCSEP does not impose any restrictions on the way that event metadata is stored. Only that the metadata for an event should be accessible using the event id. An example of this could be

catalog = csep.load_catalog('catalog_file.csv')
event_metadata = catalog.metadata[event_id]

This would load a catalog stored in the PyCSEP .csv format. PyCSEP contains catalog readers for the following formats (We are also looking to support other catalog formats. Please suggest some or better yet help us write the readers!):

  1. CSEP ascii format

  2. NDK format (used by the gCMT catalog)

  3. INGV gCMT catalog

  4. ZMAP format

  5. pre-processed JMA format

PyCSEP supports the ability to easily define a custom reader function for a catalog format type that we don’t currently support. If you happen to implement a reader for a new catalog format please check out the contribution guidelines and make a pull request so we can include this in the next release.

Catalog as Pandas dataframes

You might be comfortable using Pandas dataframes to manipulate tabular data. PyCSEP provides some routines for accessing catalogs as a pandas.DataFrame. You can use df = catalog.to_dataframe(with_datetimes=True) to return the DataFrame representation of the catalog. Using the catalog = CSEPCatalog.from_dataframe(df) you can return back to the PyCSEP data model.

Note

Going between a DataFrame and CSEPCatalog is a lossy transformation. It essentially only retains the essential event attributes that are defined by the dtype of the class.

Loading catalogs

Load catalogs from files

You can easily load catalogs in the supported format above using csep.load_catalog(). This function provides a top-level function to load catalogs that are currently supported by PyCSEP. You must specify the type of the catalog and the format you want it to be loaded. The type of the catalog can be:

catalog_type = ('ucerf3', 'csep-csv', 'zmap', 'jma-csv', 'ndk')
catalog_format = ('csep', 'native')

The catalog type determines which reader csep.utils.readers will be used to load in the file. The default is the csep-csv type and the native format. The jma-csv format can be created using the ./bin/deck2csv.pl Perl script.

Note

The format is important for ucerf3 catalogs, because those are stored as big endian binary numbers by default. If you are working with ucerf3-etas catalogs and would like to convert them into the CSEPCatalog format you can use the format='csep' option when loading in a catalog or catalogs.

Load catalogs from ComCat

PyCSEP provides top-level functions to load catalogs using ComCat. We incorporated the work done by Mike Hearne and others from the U.S. Geological Survey into PyCSEP in an effort to reduce the dependencies of this project. The top-level access to ComCat catalogs can be accessed from csep.query_comcat(). Some lower level functionality can be accessed through the csep.utils.comcat module. All credit for this code goes to the U.S. Geological Survey.

Here is complete example of accessing the ComCat catalog.

Writing custom loader functions

You can easily add custom loader functions to import data formats that are not currently included with the PyCSEP tools. Both csep.core.catalogs.CSEPCatalog.load_catalog() and csep.load_catalog() support an optional argument called loader to support these custom data formats.

In the simplest form the function should have the following stub:

def my_custom_loader_function(filename):
    """ Custom loader function for catalog data.

    Args:
        filename (str): path to the file containing the path to the forecast

    Returns:
        eventlist: iterable of event data with the order:
                (event_id, origin_time, latitude, longitude, depth, magnitude)
    """

    # imagine there is some logic to read in data from filename

    return eventlist

This function can then be passed to csep.load_catalog() or CSEPCatalog.load_catalog with the loader keyword argument. The function should be passed as a first-class object like this:

import csep
my_custom_catalog = csep.load_catalog(filename, loader=my_custom_loader_function)

Note

The origin_time is actually an integer time. We recommend to parse the timing information as a datetime.datetime object and use the datetime_to_utc_epoch function to convert this to an integer time.

Notice, we did not actually call the function but we just passed it as a reference. These functions can also access web-based catalogs like we implement with the csep.query_comcat() function. This function doesn’t work with either csep.load_catalog() or CSEPCatalog.load_catalog, because these are intended for file-based catalogs. Instead, we can create the catalog object directly. We would do that like this

def my_custom_web_loader(...):
    """ Accesses catalog from online data source.

    There are no requirements on the arguments if you are creating the catalog directly from the class.

    Returns:
        eventlist: iterable of event data with the order:
            (event_id, origin_time, latitude, longitude, depth, magnitude)
    """

    # custom logic to access online catalog repository

    return eventlist

As you might notice, all loader functions are required to return an event-list. This event-list must be iterable and contain the required event data.

Note

The events in the eventlist should follow the form

eventlist = my_custom_loader_function(...)

event = eventlist[0]

event[0] = event_id
# see note above about using integer times
event[1] = origin_time
event[2] = latitude
event[3] = longitude
event[4] = depth
event[5] = magnitude

Once you have a function that returns an eventlist, you can create the catalog object directly. This uses the csep.core.catalogs.CSEPCatalog as an example.

import csep

eventlist = my_custom_web_loader(...)
catalog = csep.catalogs.CSEPCatalog(data=eventlist, **kwargs)

The **kwargs represents any other keyword argument that can be passed to CSEPCatalog. This could be the catalog_id or the CartesianGrid2D.

Including custom event metadata

Catalogs can include additional metadata associated with each event. Right now, there are no direct applications for event metadata. Nonetheless, it can be included with a catalog object.

The event metadata should be a dictionary where the keys are the event_id of the individual events. For example,

event_id = 'my_dummy_id'
metadata_dict = catalog.metadata[event_id]

Each event meta_data should be a JSON-serializable dictionary or a class that implements the to_dict() and from_dict() methods. This is required to properly save the catalog files into JSON format and verify whether two catalogs are the same. You can see the to_dict and from_dict methods for an example of how these would work.

Accessing Event Information

In order to utilize the low-level acceleration from Numpy, most catalog operations are vectorized. The catalog classes provide some getter methods to access the essential catalog data. These return arrays of numpy.ndarray with the dtype defined by the class.

The following functions return numpy.ndarrays of the catalog information.

CSEPCatalog.event_count

Number of events in catalog

CSEPCatalog.get_magnitudes()

Returns magnitudes of all events in catalog

CSEPCatalog.get_longitudes()

Returns longitudes of all events in catalog

CSEPCatalog.get_latitudes()

Returns latitudes of all events in catalog

CSEPCatalog.get_depths()

Returns depths of all events in catalog

CSEPCatalog.get_epoch_times()

Returns the datetime of the event as the UTC epoch time (aka unix timestamp)

CSEPCatalog.get_datetimes()

Returns datetime object from timestamp representation in catalog

CSEPCatalog.get_cumulative_number_of_events()

Returns the cumulative number of events in the catalog.

The catalog data can be iterated through event-by-event using a standard for-loop. For example, we can do something like

for event in catalog.data:
    print(
        event['id'],
        event['origin_time'],
        event['latitude'],
        event['longitude'],
        event['depth'],
        event['magnitude']
    )

The keyword for the event tuple are defined by the dtype of the class. The keywords for CSEPCatalog are shown in the snippet directly above. For example, a quick and dirty plot of the cumulative events over time can be made using the matplotlib.pyplot interface

import csep
import matplotlib.pyplot as plt

# lets assume we already loaded in some catalog
catalog = csep.load_catalog("my_catalog_path.csv")

# quick and dirty plot
fig, ax = plt.subplots()
ax.plot(catalog.get_epoch_times(), catalog.get_cumulative_number_of_events())
plt.show()

Filtering events

Most of the catalog files (or catalogs accessed via the web) contain more events that are desired for a given use case. PyCSEP provides a few routines to help filter events out of the catalog. The following methods help to filter out unwanted events from the catalog.

CSEPCatalog.filter([statements, in_place])

Filters the catalog based on statements.

CSEPCatalog.filter_spatial([region, …])

Removes events outside of the region.

CSEPCatalog.apply_mct(m_main, event_epoch[, mc])

Applies time-dependent magnitude of completeness following a mainshock.

Filtering events by attribute

The function CSEPCatalog.filter provides the ability to filter events based on their essential attributes. This function works by parsing filtering strings and applying them using a logical and operation. The catalog strings have the following format filter_string = f"{attribute} {operator} {value}". The filter strings represent a statement that would evaluate as True after they are applied. For example, the statement catalog.filter('magnitude >= 2.5') would retain all events in the catalog greater-than-or-equal to magnitude 2.5.

The attributes are determined by the dtype of the catalog, therefore you can filter based on the (origin_time, latitude, longitude, depth, and magnitude). Additionally, you can use the attribute datetime and provide a datetime.datetime object to filter events using the data type.

The filter function can accept a string or a list of filter statements. If the function is called without any arguments the function looks to use the catalog.filters member. This can be provided during class instantiation or bound to the class afterward. Here is complete example of how to filter a catalog using the filtering strings.

Filtering events in space

You might want to supply a non-rectangular polygon that can be used to filter events in space. This is commonly done to prepare an observed catalog for forecast evaluation. Right now, this can be accomplished by supplying a region to the catalog or filter_spatial. There will be more information about using regions in the user-guide page page. The catalog filtering contains a complete example of how to filter a catalog using a user defined aftershock region based on the M7.1 Ridgecrest mainshock.

Time-dependent magnitude of completeness

Seismic networks have difficulty recording events immediately after a large event occurs, because the passing seismic waves from the larger event become mixed with any potential smaller events. Usually when we evaluate an aftershock forecast, we should account for this time-dependent magnitude of completeness. PyCSEP provides the Helmstetter et al., [2006] implementation of the time-dependent magnitude completeness model.

This requires information about an event which can be supplied directly to apply_mct. Additionally, PyCSEP provides access to the ComCat API using get_event_by_id. An exmaple of this can be seen in the filtering catalog tutorial.

Binning Events

Another common task requires binning earthquakes by their spatial locations and magnitudes. This is routinely done when evaluating earthquake forecasts. Like filtering a catalog in space, you need to provide some information about the region that will be used for the binning. Please see the user-guide page for more information about regions.

Note

We would like to make this functionality more user friendly. If you have suggestions or struggles, please open an issue on the GitHub page and we’d be happy to incorporate these ideas into the toolkit.

The following functions allow binning of catalogs using space-magnitude regions.

CSEPCatalog.spatial_counts()

Returns counts of events within discrete spatial region

CSEPCatalog.magnitude_counts([mag_bins, retbins])

Computes the count of events within mag_bins

CSEPCatalog.spatial_magnitude_counts([…])

Return counts of events in space-magnitude region.

These functions return numpy.ndarrays containing the count of the events determined from the catalogs. This example shows how to obtain magnitude counts from a catalog. The index of the ndarray corresponds to the index of the associated space-magnitude region. For example,

import csep
import numpy

catalog = csep.load_catalog("my_catalog_file")

# returns bin edges [2.5, 2.6, ... , 7.5]
bin_edges = numpy.arange(2.5, 7.55, 0.1)

magnitude_counts = catalog.magnitude_counts(mag_bins=bin_edges)

In this example, magnitude_counts[0] is the number of events with 2.5 ≤ M < 2.6. All of the magnitude binning assumes that the final bin extends to infinity, therefore magnitude_counts[-1] contains the number of events with 7.5 ≤ M < ∞.