Example use-case: One-zone chemical evolution model.

This use-case example intends to show-case the use of persistent_data and accessing previous convolution results.

In some cases it could be useful to keep track of some data in a dictionary that the user can continually update along the convolution. The most relevant thing that comes to mind is a simple one-zone galactic-chemical evolution model where starformation in a closed box can change the general composition of the gas in that box. In that case, one would want to track the composition of stars that form, and how much of that is locked away in compact objects or un-evolving stars, and what the composition is of the material that is returned to the reservoir. Over time, the overall abundance of the reservoir changes.

To be able to do this, we need to keep track of data.

We can do this by utilising the persistent_data dictionary that is available in the post_convolution_hook when we perform a sequential convolution (as opposed to multiprocessing).

do note: this use-case is really just a closed-box one-zone (ch1.3 of https://edoc.ub.uni-muenchen.de/17111/ Rob Yates’ thesis) toy model to provide some rudimentary understanding and build intuition. More sophisticated modeling is can be done with FlexCE

For

In this notebook we show how to use chemical yield output from population-synthesis, in this case from binary_c, and build a simple GCE model. We compare intermediate results to Yates, Hendriks, et al 2023 as that study uses similar datasets.

Let us first load some relevant data. We’ll use a dataset containing chemical yield of isotopes of a population of stars.

[1]:

import copy
import json
import os
import h5py
import pkg_resources

import numpy as np
import pandas as pd
import astropy.units as u
import matplotlib.pyplot as plt

from syntheticstellarpopconvolve.ensemble_utils import convert_ensemble_to_dataframe
from syntheticstellarpopconvolve.SFR_dict_plotting_routines import load_mpl_rc
from syntheticstellarpopconvolve import convolve, default_convolution_config
from syntheticstellarpopconvolve.general_functions import temp_dir


# load the data
example_ensemble_filename = pkg_resources.resource_filename(
    "syntheticstellarpopconvolve", "example_data/example_ensemble.json"
)
example_ensemble_filename = '/home/david/Desktop/lgalaxies_paper_scripts/binary_c_yields/default/ensembles/binaryStars/ensemble_output-3083cc1f.json'

with open(example_ensemble_filename, "r") as f_ensemble:
    ensemble = json.loads(f_ensemble.read())

This data is stored in the ensemble format, but we will inflate it, convert the isotopes to elements, and select the top 10 of elements.

First lets load the data and make sure it represents what we want it to: the data should be in the unit of mass per unit time (i.e. a normalized yield rate) in this case.

[2]:

print(ensemble['metadata']['total_probability_weighted_mass'])

0.7689220857287028

[3]:

inflated_ensemble = convert_ensemble_to_dataframe(
    ensemble_data=ensemble["ensemble"]['Xyield'],
    verbose=False,
    contains_named_layers=True,
)
inflated_ensemble = inflated_ensemble.astype({'time': 'float', 'probability': 'float'})

inflated_ensemble['normalized_yield'] = inflated_ensemble['probability'] / ensemble['metadata']['total_probability_weighted_mass']
inflated_ensemble = inflated_ensemble[['time', 'source', 'isotope', 'normalized_yield']]

Since the data in this case stored as a number per log10(time), we need to modify the data first a bit and express it in linear time.

[43]:

# divide by binsizes
from syntheticstellarpopconvolve.general_functions import calculate_bin_edges
import astropy.units as u

# sorted_unique_time_centers = np.sort(inflated_ensemble['time'].unique())
# print(sorted_time_bin_centers)
# time_bin_edges = calculate_bin_edges(sorted_unique_time_centers)

# # turn to linear
# linear_time_bin_edges = 10**time_bin_edges * u.Myr
# linear_time_bin_sizes = np.diff(linear_time_bin_edges).to(u.yr)
# print(linear_time_bin_sizes)

# indices = np.digitize(inflated_ensemble['probability'], time_bin_edges) - 1

# # inflated_ensemble['normalized_yield'] = inflated_ensemble['probability']/linear_time_bin_sizes.value
# inflated_ensemble['normalized_yield'] = inflated_ensemble['probability'] / linear_time_bin_sizes.value[indices]

# print(inflated_ensemble[['normalized_yield', 'probability']])
# inflated_ensemble = inflated_ensemble[['time', 'source', 'isotope', 'normalized_yield']]

[-0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.   0.1  0.2  0.3  0.4
  0.5  0.6  0.7  0.8  0.9  1.   1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8
  1.9  2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2
  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.   4.1  4.2]
[2.90519090e+04 3.65741865e+04 4.60441729e+04 5.79661793e+04
 7.29750961e+04 9.18702029e+04 1.15657733e+05 1.45604459e+05
 1.83305154e+05 2.30767516e+05 2.90519090e+05 3.65741865e+05
 4.60441729e+05 5.79661793e+05 7.29750961e+05 9.18702029e+05
 1.15657733e+06 1.45604459e+06 1.83305154e+06 2.30767516e+06
 2.90519090e+06 3.65741865e+06 4.60441729e+06 5.79661793e+06
 7.29750961e+06 9.18702029e+06 1.15657733e+07 1.45604459e+07
 1.83305154e+07 2.30767516e+07 2.90519090e+07 3.65741865e+07
 4.60441729e+07 5.79661793e+07 7.29750961e+07 9.18702029e+07
 1.15657733e+08 1.45604459e+08 1.83305154e+08 2.30767516e+08
 2.90519090e+08 3.65741865e+08 4.60441729e+08 5.79661793e+08
 7.29750961e+08 9.18702029e+08 1.15657733e+09 1.45604459e+09
 1.83305154e+09 2.30767516e+09 2.90519090e+09 3.65741865e+09] yr

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.pyenv/versions/3.9.9/envs/sspc3.9.9/lib/python3.9/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'probability'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[43], line 14
     11 linear_time_bin_sizes = np.diff(linear_time_bin_edges).to(u.yr)
     12 print(linear_time_bin_sizes)
---> 14 indices = np.digitize(inflated_ensemble['probability'], time_bin_edges) - 1
     16 # inflated_ensemble['normalized_yield'] = inflated_ensemble['probability']/linear_time_bin_sizes.value
     17 inflated_ensemble['normalized_yield'] = inflated_ensemble['probability'].to_numpy() / linear_time_bin_sizes.value[indices]

File ~/.pyenv/versions/3.9.9/envs/sspc3.9.9/lib/python3.9/site-packages/pandas/core/frame.py:4102, in DataFrame.__getitem__(self, key)
   4100 if self.columns.nlevels > 1:
   4101     return self._getitem_multilevel(key)
-> 4102 indexer = self.columns.get_loc(key)
   4103 if is_integer(indexer):
   4104     indexer = [indexer]

File ~/.pyenv/versions/3.9.9/envs/sspc3.9.9/lib/python3.9/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'probability'

Now lets select the unique isotopes and group them by their elements.

[5]:

unique_isotopes = inflated_ensemble['isotope'].unique()
# print(unique_isotopes)

# remove numbers
cleaned_isotopes = [''.join([i for i in isotope if not i.isdigit()]) for isotope in unique_isotopes]
unique_elements = np.unique(np.array(cleaned_isotopes))
# print(unique_elements)

# sort on order and reverse to perform groupby
unique_elements = sorted(unique_elements, key=len)[::-1]
print(unique_elements)



#
reduced_inflated_ensemble = inflated_ensemble.copy(deep=True)
element_df = pd.DataFrame()

# Loop over all elements
for unique_element_i, unique_element in enumerate(unique_elements):
    query = 'isotope.str.startswith("{}")'.format(unique_element)
    # print(query)

    #
    elements = reduced_inflated_ensemble.query(query)
    elements_mask = reduced_inflated_ensemble.eval(query).to_numpy()
    # print(len(reduced_inflated_ensemble.index))

    #
    reduced_inflated_ensemble = reduced_inflated_ensemble[np.invert(elements_mask)]

    # print(len(reduced_inflated_ensemble.index))
    elements = elements[['time', 'source', 'normalized_yield']]
    grouped = elements.groupby(['time', 'source']).agg({'normalized_yield': 'sum'}).reset_index()
    grouped['element'] = unique_element

    # Combine dataframe again
    element_df = pd.concat([element_df, grouped])






# select elements like in Yates, Hendriks, et al 2023
selected_element_list = ['H', 'He', 'C', 'N', 'O', 'Ne', 'Mg', 'Si', 'S', 'Ca', 'Fe']

# # select which elements have the largest yield (regardless of channel or delay time)
# total_yield_per_element = element_df.groupby(['element']).agg({'probability': 'sum'}).reset_index()
# top_yield_elements = total_yield_per_element.sort_values(by='probability', ascending=False).head(n=10)
# selected_element_list = top_yield_elements['element'].to_list()

# query the main element-df based on the top 10 elements
element_df = element_df.query('element.isin(@selected_element_list)')
print(element_df)

['Zr', 'Zn', 'Yb', 'Xe', 'Tm', 'Tl', 'Ti', 'Te', 'Tc', 'Tb', 'Ta', 'Sr', 'Sn', 'Sm', 'Si', 'Se', 'Sc', 'Sb', 'Ru', 'Rh', 'Re', 'Rb', 'Pt', 'Pr', 'Po', 'Pm', 'Pd', 'Pb', 'Os', 'Ni', 'Ne', 'Nd', 'Nb', 'Na', 'Mo', 'Mn', 'Mg', 'Lu', 'Li', 'La', 'Kr', 'Ir', 'In', 'Ho', 'Hg', 'Hf', 'He', 'Ge', 'Gd', 'Ga', 'Fe', 'Eu', 'Er', 'Dy', 'Cu', 'Cs', 'Cr', 'Co', 'Cl', 'Ce', 'Cd', 'Ca', 'Br', 'Bi', 'Be', 'Ba', 'Au', 'As', 'Ar', 'Al', 'Ag', 'Y', 'W', 'V', 'S', 'P', 'O', 'N', 'K', 'I', 'H', 'F', 'C', 'B']
     time           source  normalized_yield element
0    -0.9             Wind      2.824626e-08      Si
1    -0.8             Wind      3.585835e-08      Si
2    -0.7             Wind      4.555696e-08      Si
3    -0.6             Wind      5.803942e-08      Si
4    -0.5             Wind      7.417973e-08      Si
..    ...              ...               ...     ...
379   4.2             RLOF      6.739562e-12       C
380   4.2            SNAIC      4.012069e-07       C
381   4.2  SNIa_CHAND_Coal      4.033410e-07       C
382   4.2               WR      3.435218e-05       C
383   4.2             Wind      2.909188e-06       C

[4254 rows x 4 columns]

[6]:

# lets plot some data and compare it to yates 2023
sum_per_timebin = element_df.groupby(['time']).agg({'normalized_yield': 'sum'}).reset_index()

# turn into linear time.
sum_per_timebin['normalized_yield'] = sum_per_timebin['normalized_yield'] / (10**sum_per_timebin['time'].to_numpy()) * np.log(10)
# print(sum_per_timebin)

#
load_mpl_rc()
fig, ax = plt.subplots()

ax.plot(sum_per_timebin['time'], sum_per_timebin['normalized_yield'])
ax.set_yscale('log')
ax.set_xlabel('log10(t/Myr)')

print(sum_per_timebin)

    time  normalized_yield
0   -0.9          0.003399
1   -0.8          0.003427
2   -0.7          0.003459
3   -0.6          0.003500
4   -0.5          0.003553
5   -0.4          0.003624
6   -0.3          0.003717
7   -0.2          0.003842
8   -0.1          0.004014
9    0.0          0.004254
10   0.1          0.004598
11   0.2          0.005110
12   0.3          0.005903
13   0.4          0.007137
14   0.5          0.008746
15   0.6          0.055880
16   0.7          0.090917
17   0.8          0.085617
18   0.9          0.077823
19   1.0          0.062652
20   1.1          0.042677
21   1.2          0.033583
22   1.3          0.024392
23   1.4          0.019279
24   1.5          0.014415
25   1.6          0.010820
26   1.7          0.008164
27   1.8          0.006366
28   1.9          0.004960
29   2.0          0.003853
30   2.1          0.003005
31   2.2          0.002370
32   2.3          0.001826
33   2.4          0.001404
34   2.5          0.001090
35   2.6          0.000843
36   2.7          0.000665
37   2.8          0.000526
38   2.9          0.000414
39   3.0          0.000325
40   3.1          0.000267
41   3.2          0.000282
42   3.3          0.000162
43   3.4          0.000116
44   3.5          0.000086
45   3.6          0.000065
46   3.7          0.000046
47   3.8          0.000037
48   3.9          0.000026
49   4.0          0.000019
50   4.1          0.000014
51   4.2          0.000010

../_images/examples_notebook_example_GCE_11_1.png

[29]:

sum_per_timebin['linear_time'] = 10**sum_per_timebin['time']

#
current_time = 0.2 * u.Myr
linear_times = sum_per_timebin['linear_time'].to_numpy() * u.Myr
rates = sum_per_timebin['normalized_yield'].to_numpy() * (u.Msun/u.Myr)
total_yield = 0 * u.Msun
print('linear_times', linear_times)

timestep = 0.1 * u.Myr
while current_time < 14 * u.Gyr:

    current_timebin = np.digitize(np.array([current_time.to(u.yr).value]), linear_times.to(u.yr).value)-1
    current_rate = rates[current_timebin[0]]

    current_yield = current_rate * timestep
    total_yield += current_yield
    current_time += timestep

print(total_yield)

linear_times [1.25892541e-01 1.58489319e-01 1.99526231e-01 2.51188643e-01
 3.16227766e-01 3.98107171e-01 5.01187234e-01 6.30957344e-01
 7.94328235e-01 1.00000000e+00 1.25892541e+00 1.58489319e+00
 1.99526231e+00 2.51188643e+00 3.16227766e+00 3.98107171e+00
 5.01187234e+00 6.30957344e+00 7.94328235e+00 1.00000000e+01
 1.25892541e+01 1.58489319e+01 1.99526231e+01 2.51188643e+01
 3.16227766e+01 3.98107171e+01 5.01187234e+01 6.30957344e+01
 7.94328235e+01 1.00000000e+02 1.25892541e+02 1.58489319e+02
 1.99526231e+02 2.51188643e+02 3.16227766e+02 3.98107171e+02
 5.01187234e+02 6.30957344e+02 7.94328235e+02 1.00000000e+03
 1.25892541e+03 1.58489319e+03 1.99526231e+03 2.51188643e+03
 3.16227766e+03 3.98107171e+03 5.01187234e+03 6.30957344e+03
 7.94328235e+03 1.00000000e+04 1.25892541e+04 1.58489319e+04] Myr
3.47227468074716 solMass

[2]:

records = [{"a": 2, "b": 3, "normalized_yield": 1, "time": 1}]
dummy_df = pd.DataFrame.from_records(records)
TMP_DIR = temp_dir("code", "persistent_data", clean_path=True)


def post_convolution_function(
    config,
    sfr_dict,
    data_dict,
    convolution_results,
    convolution_instruction,
    persistent_data,
    previous_convolution_results,
):
    """
    Post-convolution function to handle integrating the systems forward in time and finding those that end up in the LISA waveband.

    using local_indices to select everything and using Alexey's distance sampler to handle sampling the distances
    """

    print(persistent_data)
    print(previous_convolution_results)

    if "test" not in persistent_data:
        persistent_data["test"] = 0
    persistent_data["test"] += 1

    return convolution_results


##################
#

# create file
output_hdf5_filename = os.path.join(TMP_DIR, "output_hdf5.h5")
generate_boilerplate_outputfile(output_hdf5_filename)

# store the data frame in the hdf5file
dummy_df.to_hdf(input_hdf5_filename, key="input_data/events/example")


#
convolution_config = copy.copy(default_convolution_config)
convolution_config["output_filename"] = output_hdf5_filename
convolution_config["tmp_dir"] = TMP_DIR
convolution_config["multiprocessing"] = False


###
# convolution instructions
convolution_config["convolution_instructions"] = [
    {
        "convolution_type": "integrate",
        "input_data_name": "example",
        "output_data_name": "example",
        "filter_future_events": False,
        "post_convolution_function": post_convolution_function,
        "data_column_dict": {
            # required
            "normalized_yield": "normalized_yield",
            "delay_time": {"column_name": "time", "unit": u.Myr},
        },
    },
]

#
convolution_config["convolution_lookback_time_bin_edges"] = np.arange(0, 6, 1) * u.Gyr

# construct the sfr-dict (NOTE: this uses absolute SFR, not metallicity dependent)
scale = 1e-5
convolution_config["SFR_info"] = {
    'lookback_time_bin_edges': (np.arange(0, 10, 1) * u.Gyr).to(u.yr),
    'starformation_rate_array': scale * np.ones(9) * u.Msun / u.yr
}

[convolve_events.py:39 - convolve_events_by_integration_post_convolution_hook_wrapper ] 2025-01-15 17:06:34,123: Handling post-convolution function hook call for convolve-events by integration

{}
None

[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,133: Storing yield
[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,135: Storing test
[convolve_events.py:39 - convolve_events_by_integration_post_convolution_hook_wrapper ] 2025-01-15 17:06:34,138: Handling post-convolution function hook call for convolve-events by integration

{'test': 1}
{'convolution_results': {'yield': <Quantity [1.e-05] 1 / yr>}}

[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,170: Storing yield
[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,175: Storing test
[convolve_events.py:39 - convolve_events_by_integration_post_convolution_hook_wrapper ] 2025-01-15 17:06:34,183: Handling post-convolution function hook call for convolve-events by integration

{'test': 2}
{'convolution_results': {'yield': <Quantity [1.e-05] 1 / yr>}}

[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,213: Storing yield
[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,217: Storing test
[convolve_events.py:39 - convolve_events_by_integration_post_convolution_hook_wrapper ] 2025-01-15 17:06:34,223: Handling post-convolution function hook call for convolve-events by integration

{'test': 3}
{'convolution_results': {'yield': <Quantity [1.e-05] 1 / yr>}}

[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,247: Storing yield
[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,251: Storing test
[convolve_events.py:39 - convolve_events_by_integration_post_convolution_hook_wrapper ] 2025-01-15 17:06:34,260: Handling post-convolution function hook call for convolve-events by integration

{'test': 4}
{'convolution_results': {'yield': <Quantity [1.e-05] 1 / yr>}}

[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,280: Storing yield
[convolve_populations.py:113 - store_convolution_result_entries ] 2025-01-15 17:06:34,282: Storing test

finished convolution
<KeysViewHDF5 ['config', 'input_data', 'output_data']>
<KeysViewHDF5 ['event']>
<KeysViewHDF5 ['example']>
<KeysViewHDF5 ['example']>
<KeysViewHDF5 ['convolution_results']>
<KeysViewHDF5 ['0.5 Gyr', '1.5 Gyr', '2.5 Gyr', '3.5 Gyr', '4.5 Gyr']>
<KeysViewHDF5 ['test', 'yield']>

[ ]:

# convolve
convolve(config=convolution_config)

print("finished convolution")

[ ]:

# read out content and integrate until today
with h5py.File(convolution_config["output_filename"], "r") as output_hdf5_file:

    print(output_hdf5_file.keys())
    print(output_hdf5_file["output_data"].keys())
    print(output_hdf5_file["output_data/event"].keys())
    print(output_hdf5_file["output_data/event/example"].keys())
    print(output_hdf5_file["output_data/event/example/example"].keys())
    print(
        output_hdf5_file["output_data/event/example/example/convolution_results"].keys()
    )
    print(
        output_hdf5_file[
            "output_data/event/example/example/convolution_results/0.5 Gyr"
        ].keys()
    )

Further ideas

This example is a rather simple setup. Its a single box without any inflow and outflow, stars form at a fixed metallicity and they form following a prescribed star formation rate.

Real life, of course, is somewhat more complicated. Among other processes, the ejecta and outflows of the stars will impact the metallicity and rate at which the next set of stars form. Inflow and outflow will change the reservoir abundance as well.