Support to Convert .fit Results to CSV (or any format)#393
Conversation
Converter directory now reflects that any other data converters may be added in the future, not just CSV.
The parameters are now saved with their errors. The verbose flag now controls the amount of output during processing.
Was requiring that amplitudes with common amp names in "reaction::sum::ampName" format be constrained to each other. Now it will save the mapping for unique amplitude groups, e.g. "ampName", "sum::ampName", or the full "reaction::sum::ampName" strings.
Files are accessed so many times it makes more sense to save them. File loading happens in the constructor now. Also added a background file bool for easy tracking of whether or not the background files are present. A template for getting the -t values is also added, but not yet implemented. This will also effect how the other distributions are handled.
The largest addition is a function that extracts the values of interest for the beam energy, which incorporates signal and background subtraction. To help with this, a min/max finder function was added to find a common min/max value for a branch across files. A few other report lines were added, and some fixes to compile properly.
Uses a RDataFrame method to compute t from the various 4-vector component branches, then fills a histogram with the t values. If background files are present, also computes a background histogram and subtracts it from the data histogram before calculating statistics. Aside from this, small reports and comments were added.
Removed the mass-branch arg, as the mass can be calculated from the labeled 4-vectors. The indices can now be set by the user. Aside from this, the files have been formatted.
Having the functions return the created histogram makes it: 1. Easier to understand the purpose of the function, and doesn't hide the map filling in the implementation 2. Allows for possibility of printing the hist for debugging purposes Also added a function to return the total number of events and its error
In order to save the coherent sums, a new AmplitudeParser class was created to parse the amplitude names and categorize them into groups based on the quantum numbers they contain. This relies on known "naming schemes" for the amplitudes. Currently the most common schemes are supported, with instructions for how to add new schemes.
Also added a quick method to get the reaction string, which was helpful for the normInt functions. This commit also includes some formatting.
|
Test status for this pull request: SUCCESS Summary: /work/halld/pull_request_test/halld_sim^csv_converter/tests/summary.txt Build log: /work/halld/pull_request_test/halld_sim^csv_converter/make_csv_converter.log |
|
Test status for this pull request: SUCCESS Summary: summary.txt Build log: make_csv_converter.log |
|
Thanks for developing this converter — I tested it on a few fit outputs and it works for standard partial-wave-style fits where the amplitude naming follows the usual expected structure. I did run into a crash for an SDME fit with a slightly different but still valid AmpTools configuration, where the intensity function has multiple factors in one sum. This is different from the mass-independent PWA-style fits that I tested successfully, though similar structures may also appear in some mass-dependent PWA fits. In my case the The crash happens in So I think this is mostly a robustness issue rather than a problem with the general converter logic. It would be helpful if the converter could fall back to full amplitude names when the shortened naming scheme cannot be inferred reliably, rather than crashing. I have made the test |
This request is to merge a script and set of classes that will allow any Amptools-based analysis to convert their
.fitresults into a comma-separated value (CSV) file. Several plotters already exist for analyzing fit results per bin, and these are very well suited for analyzing the angular distributions, but mass-independent fits must "stitch" together their fit results to observe any behavior of the amplitudes and phases across mass bins. In addition, the 100s of fit results produced by bootstrap or randomized fits have no standard way to be aggregated. This CSV converter is designed to fill this gap in the analysis process. Below I've provided a short description for each component added.convert_to_csvThis is the primary script that users will interact with. A user with several fit results
result_1.fit,result_2.fit... can simply executeand a CSV will be made where each row corresponds to the
.fitfile, and the columns indicate AmpTools fit outputs, parameters, intensities, and phase differences.This CSV can then be read into a Python Pandas dataframe, ROOT tree or dataframe, or used by practically any programming language, and then plotted. The script is designed to be as generic as possible, so that any AmpTools-based analysis can use it. Listed are some more highlighted features of the script
--data-fileflag. It will read the associated data (with optional weights and/or background) files of the result and extract the info to a CSV file--lower-vertex-indicesflag. This tells theROOTDataConverterwhich 4-vector indices correspond to the upper or lower vertex, thus allowing the correct calculation of the mass and--naming-schemeAmplitudeParserfor more detailsFitConverterHandles the
.fit->.csvconversion. This class stores:AmplitudeParserbelow)Currently supports
.fit->.csvconversion, but can easily be expanded to any file format desired. This is because all the results of interest are stored in various maps, and so writing to CSV is as easy as iterating over the maps.ROOTDataConverterThis class is responsible for extracting the PWA-related information from a ROOT file. It stores:
Just like the FitConverter, any file format beyond CSV can be used. To get the info, the class uses the data and monte carlo files associated with the fit. If available, it also properly incorporates event weights or background files. As discussed above, to calculate the mass and$-t$ info, the user specifies the 4-vector indices.
AmplitudeParserThis was the biggest hurdle for generalizing the converter. A lot of times we are not just interested in the individual amplitudes and phases, but their (in)coherent sums, like "total reflectivity contribution" or "behavior of JL waves summed over the spin-projections". The problem is that these sums are typically defined manually, because the amplitudes (and thus their quantum numbers) are user defined. The only way to identify them for grouping is by identifying the naming scheme of the amplitude, but not everyone uses the same scheme.
This class tries to identify the amplitude naming scheme used, and defines a set of possible sums based off the quantum numbers given in the scheme. It currently supports:
JLme- the current recommended generic formateJPmL- used for some vector-pseudosalar analysesLme- common scheme for 2-pseudoscalar analysisbut can be easily extended to other schemes by users.
Updates from previous version
For those using the older standalone version of this script shown in the last tutorial, I figure its worth it to list some key differences:
halld_simnow