Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README/ReleaseNotes/v638/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ The following people have contributed to this new version:
Florian Uhlig, GSI,\
Devajith Valaparambil Sreeramaswamy, CERN/EP-SFT,\
Vassil Vassilev, Princeton,\
Sandro Wenzel, CERN/ALICE
Sandro Wenzel, CERN/ALICE,\
Petr Jacka, Czech Technical University in Prague

## Deprecation and Removal

Expand Down Expand Up @@ -134,6 +135,7 @@ If you want to keep using `TList*` return values, you can write a small adapter
to numbers such as 8 would share one 3-d histogram among 8 threads, greatly reducing the memory consumption. This might slow down execution if the histograms
are filled at very high rates. Use lower number in this case.
- The Snapshot method has been refactored so that it does not need anymore compile-time information (i.e. either template arguments or JIT-ting) to know the input column types. This means that any Snapshot call that specifies the template arguments, e.g. `Snapshot<int, float>(..., {"intCol", "floatCol"})` is now redundant and the template arguments can safely be removed from the call. At the same time, Snapshot does not need to JIT compile the column types, practically giving huge speedups depending on the number of columns that need to be written to disk. In certain cases (e.g. when writing O(10000) columns) the speedup can be larger than an order of magnitude. The Snapshot template is now deprecated and it will issue a compile-time warning when called. The function overload is scheduled for removal in ROOT 6.40.
- Add HistoNSparseD action that fills a sparse N-dimensional histogram.

## Python Interface

Expand Down
16 changes: 13 additions & 3 deletions bindings/distrdf/python/DistRDF/Operation.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,21 @@
# The histogram model can be passed as the keyword argument 'model'. All
# Histo*D specializations have the same name for this argument. If it is
# present, we know the execution can proceed safely.
if not "model" in self.kwargs:

Check failure on line 49 in bindings/distrdf/python/DistRDF/Operation.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E713)

bindings/distrdf/python/DistRDF/Operation.py:49:16: E713 Test for membership should be `not in`
# If the keyword argument was not passed, we need to check the first
# positional argument. In all Histo*D overload where it is present,
# it is always the first argument.
if not isinstance(self.args[0],
(tuple, ROOT.RDF.TH1DModel, ROOT.RDF.TH2DModel, ROOT.RDF.TH3DModel, ROOT.RDF.THnDModel)):
if not isinstance(
self.args[0],
(
tuple,
ROOT.RDF.TH1DModel,
ROOT.RDF.TH2DModel,
ROOT.RDF.TH3DModel,
ROOT.RDF.THnDModel,
ROOT.RDF.THnSparseDModel,
),
):
message = (
"Creating a histogram without a model is not supported in distributed mode. Please make sure to "
"specify the histogram model when rerunning the distributed RDataFrame application. For example:\n\n"
Expand Down Expand Up @@ -104,6 +113,7 @@
"Histo2D": Histo,
"Histo3D": Histo,
"HistoND": Histo,
"HistoNSparseD": Histo,
"Max": Action,
"Mean": Action,
"Min": Action,
Expand All @@ -116,7 +126,7 @@
"StdDev": Action,
"Sum": Action,
"VariationsFor": VariationsFor,
"Vary": Transformation
"Vary": Transformation,
}


Expand Down
11 changes: 11 additions & 0 deletions bindings/distrdf/test/test_operation.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import unittest

from DistRDF import Operation

import ROOT

Check failure on line 5 in bindings/distrdf/test/test_operation.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

bindings/distrdf/test/test_operation.py:1:1: I001 Import block is un-sorted or un-formatted


class ClassifyTest(unittest.TestCase):
Expand Down Expand Up @@ -112,3 +112,14 @@
"""Creating a histogram without model raises ValueError."""
with self.assertRaises(ValueError):
_ = Operation.create_op("HistoND", ["a", "b", "c", "d"])

def test_histonsparsed_with_thnsparsedmodel(self):
"""THnDModel"""
op = Operation.create_op("HistoNSparseD", ROOT.RDF.THnSparseDModel(), ["a", "b", "c", "d"])
self.assertIsInstance(op, Operation.Histo)
self.assertEqual(op.name, "HistoNSparseD")

def test_histonsparsed_without_model(self):
"""Creating a histogram without model raises ValueError."""
with self.assertRaises(ValueError):
_ = Operation.create_op("HistoNSparseD", ["a", "b", "c", "d"])
11 changes: 10 additions & 1 deletion hist/hist/inc/THn.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ class THn: public THnBase {
THn() = default;
THn(const char* name, const char* title, Int_t dim, const Int_t* nbins,
const Double_t* xmin, const Double_t* xmax);

THn(const char *name, const char *title, const std::vector<TAxis> &axes);
THn(const char *name, const char *title, Int_t dim, const Int_t *nbins,
const std::vector<std::vector<double>> &xbins);

Expand Down Expand Up @@ -226,6 +226,15 @@ class THnT: public THn {
THn(name, title, dim, nbins, xmin, xmax),
fArray(dim, nbins, true) {}

THnT(const char *name, const char *title, const std::vector<TAxis> &axes) : THn(name, title, axes)
{
const Int_t dim = axes.size();
std::vector<Int_t> nbins(dim);
for (Int_t i = 0; i < dim; i++)
nbins[i] = axes.at(i).GetNbins();
fArray = TNDArrayT<T>(dim, nbins.data(), true);
}

THnT(const char *name, const char *title, Int_t dim, const Int_t *nbins,
const std::vector<std::vector<double>> &xbins)
: THn(name, title, dim, nbins, xbins), fArray(dim, nbins, true)
Expand Down
6 changes: 5 additions & 1 deletion hist/hist/inc/THnSparse.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ class THnSparse: public THnBase {
TExMap fBinsContinued; ///<! Filled bins for non-unique hashes, containing pairs of (bin index 0, bin index 1)
THnSparseCompactBinCoord *fCompactCoord; ///<! Compact coordinate

THnSparse(const THnSparse&) = delete;
THnSparse& operator=(const THnSparse&) = delete;

protected:
Expand Down Expand Up @@ -77,6 +76,11 @@ class THnSparse: public THnBase {
const std::vector<TAxis>& axes,
Int_t chunksize = 1024 * 16);

THnSparse(const char *name, const char *title, Int_t dim, const Int_t *nbins,
const std::vector<std::vector<double>> &xbins, Int_t chunksize = 1024 * 16);

THnSparse(const THnSparse &other);

~THnSparse() override;

static THnSparse* CreateSparse(const char* name, const char* title,
Expand Down
9 changes: 9 additions & 0 deletions hist/hist/src/THn.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,15 @@ THn::THn(const char* name, const char* title,
{
}

THn::THn(const char *name, const char *title, const std::vector<TAxis> &axes) : THnBase(name, title, axes)
{
const Int_t dim = axes.size();
std::vector<Int_t> nbins(dim);
for (Int_t i = 0; i < dim; i++)
nbins[i] = axes.at(i).GetNbins();
fSumw2 = TNDArrayT<Double_t>(dim, nbins.data(), kTRUE /*overflow*/);
}

THn::THn(const char *name, const char *title, Int_t dim, const Int_t *nbins,
const std::vector<std::vector<double>> &xbins)
: THnBase(name, title, dim, nbins, xbins), fSumw2(dim, nbins, kTRUE /*overflow*/)
Expand Down
38 changes: 38 additions & 0 deletions hist/hist/src/THnSparse.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,44 @@ THnBase(name, title, axes),
}

////////////////////////////////////////////////////////////////////////////////
/// Construct a THnSparse with dim dimensions and unequal binning.
/// nbins and std::vector xbins are used to describe bin edges for each dimension.
/// chunksize represents the size of the chunks.

THnSparse::THnSparse(const char *name, const char *title, Int_t dim, const Int_t *nbins,
const std::vector<std::vector<double>> &xbins, Int_t chunksize)
: THnBase(name, title, dim, nbins, xbins), fChunkSize(chunksize), fFilledBins(0), fCompactCoord(nullptr)
{
fCompactCoord = new THnSparseCompactBinCoord(dim, nbins);
fBinContent.SetOwner();
}

////////////////////////////////////////////////////////////////////////////////
/// Construct a THnSparse as a copy of "other"

THnSparse::THnSparse(const THnSparse &other)
: THnBase(other),
fChunkSize(other.fChunkSize),
fFilledBins(other.fFilledBins),
fBins(other.fBins),
fBinsContinued(other.fBinsContinued),
fCompactCoord(nullptr)
{

TObjArray *copiedContent = (TObjArray *)other.fBinContent.Clone();
fBinContent = *copiedContent;
copiedContent->SetOwner(kFALSE);
delete copiedContent;
fBinContent.SetOwner(kTRUE);

Int_t dim = other.GetNdimensions();
std::vector<Int_t> nbins(dim);
for (Int_t i = 0; i < dim; i++)
nbins[i] = other.GetAxis(i)->GetNbins();

fCompactCoord = new THnSparseCompactBinCoord(dim, nbins.data());
}

/// Destruct a THnSparse

THnSparse::~THnSparse() {
Expand Down
37 changes: 37 additions & 0 deletions hist/hist/test/THn.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,43 @@
#include "TH1.h"
#include "TH2.h"

// Constructors for THn and THnSparse
TEST(THn, Constructors)
{

std::vector<int> nbins = {4, 5, 6};
std::vector<double> xmin = {0., 0., 0.};
std::vector<double> xmax = {4., 5., 6.};

std::vector<std::vector<double>> edges = {{0, 1, 2, 3, 4}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5, 6}};

std::vector<TAxis> axes = {TAxis(nbins[0], xmin[0], xmax[0]), TAxis(nbins[1], xmin[1], xmax[1]),
TAxis(nbins[2], xmin[2], xmax[2])};

THnD hn_v1("hn_v1", "hn_v1", 3, nbins.data(), xmin.data(), xmax.data());
THnD hn_v2("hn_v2", "hn_v2", 3, nbins.data(), edges);
THnD hn_v3("hn_v3", "hn_v3", axes);
THnI hn_v4("hn_v4", "hn_v4", axes);
THnD hn_v5(hn_v1);

THnSparseD hs_v1("hs_v1", "hs_v1", 3, nbins.data(), xmin.data(), xmax.data());
THnSparseD hs_v2("hs_v2", "hs_v2", 3, nbins.data(), edges);
THnSparseD hs_v3("hs_v3", "hs_v3", axes);
THnSparseI hs_v4("hs_v4", "hs_v4", axes);
THnSparseD hs_v5(hs_v1);

std::vector<THnBase *> hns = {&hn_v1, &hn_v2, &hn_v3, &hn_v4, &hn_v5, &hs_v1, &hs_v2, &hs_v3, &hs_v4, &hs_v5};
for (THnBase *hn : hns) {
EXPECT_EQ(hn->GetNdimensions(), 3);
for (int dim = 0; dim < 3; ++dim) {
EXPECT_EQ(hn->GetAxis(dim)->GetNbins(), nbins[dim]);
for (int bin = 1; bin <= (int)edges[dim].size(); ++bin) {
EXPECT_DOUBLE_EQ(hn->GetAxis(dim)->GetBinLowEdge(bin), edges[dim][bin - 1]);
}
}
}
}

// Filling THn
TEST(THn, Fill) {
Int_t bins[2] = {2, 3};
Expand Down
22 changes: 22 additions & 0 deletions roottest/python/distrdf/backends/check_reducer_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import numpy
import pytest
import ROOT
from DistRDF.Backends import Dask

Check failure on line 6 in roottest/python/distrdf/backends/check_reducer_merge.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (F401)

roottest/python/distrdf/backends/check_reducer_merge.py:6:30: F401 `DistRDF.Backends.Dask` imported but unused


class TestReducerMerge:
Expand Down Expand Up @@ -125,6 +125,28 @@
assert histond_distrdf.GetEntries() == histond_rdf.GetEntries()
assert histond_distrdf.GetNbins() == histond_rdf.GetNbins()

def test_histonsparsed_merge(self, payload):
"""Check the working of HistoND merge operation in the reducer."""
nbins = (10, 10, 10, 10)
xmin = (0.0, 0.0, 0.0, 0.0)
xmax = (100.0, 100.0, 100.0, 100.0)
modelTHNSparseD = ("name", "title", 4, nbins, xmin, xmax)
colnames = ("x0", "x1", "x2", "x3")

connection, _ = payload
distrdf = ROOT.RDataFrame(100, executor=connection)

rdf = ROOT.RDataFrame(100)

distrdf_withcols = self.define_four_columns(distrdf, colnames)
rdf_withcols = self.define_four_columns(rdf, colnames)

histond_distrdf = distrdf_withcols.HistoNSparseD(modelTHNSparseD, colnames)
histond_rdf = rdf_withcols.HistoNSparseD(modelTHNSparseD, colnames)

assert histond_distrdf.GetEntries() == histond_rdf.GetEntries()
assert histond_distrdf.GetNbins() == histond_rdf.GetNbins()

def test_profile1d_merge(self, payload):
"""Check the working of Profile1D merge operation in the reducer."""
# Operations with DistRDF
Expand Down
30 changes: 30 additions & 0 deletions tree/dataframe/inc/ROOT/RDF/HistoModels.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ class TH3D;
template <typename T>
class THnT;
using THnD = THnT<double>;
template <typename T>
class THnSparseT;
class TArrayD;
using THnSparseD = THnSparseT<TArrayD>;
class TProfile;
class TProfile2D;

Expand Down Expand Up @@ -123,6 +127,32 @@ struct THnDModel {
std::shared_ptr<::THnD> GetHistogram() const;
};

struct THnSparseDModel {
TString fName;
TString fTitle;
int fDim;
std::vector<int> fNbins;
std::vector<double> fXmin;
std::vector<double> fXmax;
std::vector<std::vector<double>> fBinEdges;
Int_t fChunkSize;

THnSparseDModel() = default;
THnSparseDModel(const THnSparseDModel &) = default;
~THnSparseDModel();
THnSparseDModel(const ::THnSparseD &h);
THnSparseDModel(const char *name, const char *title, int dim, const int *nbins, const double *xmin,
const double *xmax, Int_t chunksize = 1024 * 16);
// alternate version with std::vector to allow more convenient initialization from PyRoot
THnSparseDModel(const char *name, const char *title, int dim, const std::vector<int> &nbins,
const std::vector<double> &xmin, const std::vector<double> &xmax, Int_t chunksize = 1024 * 16);
THnSparseDModel(const char *name, const char *title, int dim, const int *nbins,
const std::vector<std::vector<double>> &xbins, Int_t chunksize = 1024 * 16);
THnSparseDModel(const char *name, const char *title, int dim, const std::vector<int> &nbins,
const std::vector<std::vector<double>> &xbins, Int_t chunksize = 1024 * 16);
std::shared_ptr<::THnSparseD> GetHistogram() const;
};

struct TProfile1DModel {
TString fName;
TString fTitle;
Expand Down
3 changes: 2 additions & 1 deletion tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ struct Histo1D{};
struct Histo2D{};
struct Histo3D{};
struct HistoND{};
struct HistoNSparseD{};
struct Graph{};
struct GraphAsymmErrors{};
struct Profile1D{};
Expand Down Expand Up @@ -121,7 +122,7 @@ struct HistoUtils<T, false> {
static bool HasAxisLimits(T &) { return true; }
};

// Generic filling (covers Histo2D, HistoND, Profile1D and Profile2D actions, with and without weights)
// Generic filling (covers Histo2D, HistoND, HistoNSparseD, Profile1D and Profile2D actions, with and without weights)
template <typename... ColTypes, typename ActionTag, typename ActionResultType, typename PrevNodeType>
std::unique_ptr<RActionBase>
BuildAction(const ColumnNames_t &bl, const std::shared_ptr<ActionResultType> &h, const unsigned int nSlots,
Expand Down
75 changes: 75 additions & 0 deletions tree/dataframe/inc/ROOT/RDF/RInterface.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
#include "TH2.h" // For Histo actions
#include "TH3.h" // For Histo actions
#include "THn.h"
#include "THnSparse.h"
#include "TProfile.h"
#include "TProfile2D.h"
#include "TStatistic.h"
Expand Down Expand Up @@ -2286,6 +2287,80 @@ public:
columnList.size());
}

////////////////////////////////////////////////////////////////////////////
/// \brief Fill and return a sparse N-dimensional histogram (*lazy action*).
/// \tparam FirstColumn The first type of the column the values of which are used to fill the object. Inferred if not
/// present.
/// \tparam OtherColumns A list of the other types of the columns the values of which are used to fill the
/// object.
/// \param[in] model The returned histogram will be constructed using this as a model.
/// \param[in] columnList
/// A list containing the names of the columns that will be passed when calling `Fill`.
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
/// \return the N-dimensional histogram wrapped in a RResultPtr.
///
/// This action is *lazy*: upon invocation of this method the calculation is
/// booked but not executed. See RResultPtr documentation.
///
/// ### Example usage:
/// ~~~{.cpp}
/// auto myFilledObj = myDf.HistoNSparseD<float, float, float, float>({"name","title", 4,
/// {40,40,40,40}, {20.,20.,20.,20.}, {60.,60.,60.,60.}},
/// {"col0", "col1", "col2", "col3"});
/// ~~~
///
template <typename FirstColumn, typename... OtherColumns> // need FirstColumn to disambiguate overloads
RResultPtr<::THnSparseD> HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList)
{
std::shared_ptr<::THnSparseD> h(nullptr);
{
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
h = model.GetHistogram();

if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
h->Sumw2();
} else if (int(columnList.size()) != h->GetNdimensions()) {
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
}
}
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, FirstColumn, OtherColumns...>(columnList, h, h,
fProxiedPtr);
}

////////////////////////////////////////////////////////////////////////////
/// \brief Fill and return a sparse N-dimensional histogram (*lazy action*).
/// \param[in] model The returned histogram will be constructed using this as a model.
/// \param[in] columnList A list containing the names of the columns that will be passed when calling `Fill`
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
/// \return the N-dimensional histogram wrapped in a RResultPtr.
///
/// This action is *lazy*: upon invocation of this method the calculation is
/// booked but not executed. Also see RResultPtr.
///
/// ### Example usage:
/// ~~~{.cpp}
/// auto myFilledObj = myDf.HistoNSparseD({"name","title", 4,
/// {40,40,40,40}, {20.,20.,20.,20.}, {60.,60.,60.,60.}},
/// {"col0", "col1", "col2", "col3"});
/// ~~~
///
RResultPtr<::THnSparseD> HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList)
{
std::shared_ptr<::THnSparseD> h(nullptr);
{
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
h = model.GetHistogram();

if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
h->Sumw2();
} else if (int(columnList.size()) != h->GetNdimensions()) {
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
}
}
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, RDFDetail::RInferredType>(
columnList, h, h, fProxiedPtr, columnList.size());
}

////////////////////////////////////////////////////////////////////////////
/// \brief Fill and return a TGraph object (*lazy action*).
/// \tparam X The type of the column used to fill the x axis.
Expand Down
Loading
Loading