Skip to content

Commit ffdab66

Browse files
[TMVA][Preprocessing] - Additional normalisation method
Add scaling VarTransform functionality (like normalisation it linearly scales the data but the sign of the input and output data is retained). Add scaling VarTransform functionality to TMVA preproccessing (like normalisation it linearly scales the data but the sign of the input and output data is retained). I have added to the functionality of the VariableNormalizeTransform class in the style of the VariableGaussTransform class to transform data such that it remains in the range of [-1,1], there is no offset, so the sign of the input data is unchanged by the transformation. This is proving essential for my neural network analyses that treat a detector hit data like an image classification problem and use ReLU activation functions at the beginning of my network. I have also added a description to the TMVA documentation
1 parent eecf269 commit ffdab66

File tree

4 files changed

+66
-17
lines changed

4 files changed

+66
-17
lines changed

documentation/tmva/UsersGuide/DataPreprocessing.tex

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,17 @@ \section{Data Preprocessing}
1313
decomposition are available for input and target variables, gaussianization, uniformization and decorrelation
1414
discussed below can only be used for input variables.
1515

16-
Apart from five variable transformation methods mentioned above, an unsupervised variable selection method Variance Threshold is also implemented in TMVA. It follows a completely different processing pipeline. It is discussed in detail in section \ref{sec:varianceThreshold}.
16+
Apart from six variable transformation methods mentioned above, an unsupervised variable selection method Variance Threshold is also implemented in TMVA. It follows a completely different processing pipeline. It is discussed in detail in section \ref{sec:varianceThreshold}.
1717

1818
\subsection{Transforming input variables}
1919
\label{sec:variableTransform}
2020

21-
Currently five preprocessing\index{Discriminating variables!preprocessing of}
21+
Currently six preprocessing\index{Discriminating variables!preprocessing of}
2222
transformations\index{Discriminating variables!transformation of}
2323
are implemented in TMVA:
2424
\begin{itemize}
2525
\item variable normalisation;
26+
\item variable scaling;
2627
\item decorrelation via the square-root of the covariance matrix ;
2728
\item decorrelation via a principal component decomposition;
2829
\item transformation of the variables into Uniform distributions (``Uniformization'').
@@ -100,6 +101,17 @@ \subsubsection{Variable normalisation\index{Discriminating variables!normalisati
100101
Normalisation may also render minimisation processes, such as the adjustment of
101102
neural network weights, more effective.
102103

104+
\subsubsection{Variable scaling\index{Discriminating variables!scaling of}}
105+
\label{sec:scaling}
106+
107+
The larger absolute value of the minimum and maximum values is determined from the training events
108+
and used to scale the dataset to lie within $[-1,1]$. There is no offset added and thus the original
109+
sign of the input is maintained. E.g Input data with a range $[x,y]$ where $|y|>|x|$ will transform
110+
to the range $[x/|y|,1]$.
111+
As with Normalisation, this may also render minimisation processes, such as the adjustment of
112+
neural network weights, more effective especially if you are using sign sensitive activation
113+
functions such as the a rectified linear unit ( ReLU ).
114+
103115
\subsubsection{Variable decorrelation\index{Discriminating variables!decorrelation of}}
104116
\label{sec:decorrelation}
105117

@@ -225,10 +237,10 @@ \subsubsection{Booking and chaining transformations for some or all input variab
225237
Variable transformations to be applied prior to the MVA training (and application)
226238
can be defined independently for each MVA method with the booking option
227239
{\tt VarTransform=<type>}, where {\tt <type>} denotes the desired transformation
228-
(or chain of transformations). The available transformation types are normalisation,
240+
(or chain of transformations). The available transformation types are normalisation, scaling,
229241
decorrelation, principal component analysis and Gaussianisation, which are labelled by
230-
\code{Norm}, \code{Deco}, \code{PCA}, \code{Uniform}, \code{Gauss}, respectively, or, equivalently,
231-
by the short-hand notations \code{N}, \code{D}, \code{P}, \code{U} , \code{G}.
242+
\code{Norm}, \code{Scale}, \code{Deco}, \code{PCA}, \code{Uniform}, \code{Gauss}, respectively, or, equivalently,
243+
by the short-hand notations \code{N}, \code{S}, \code{D}, \code{P}, \code{U} , \code{G}.
232244

233245
Transformations can be {\em chained} allowing the consecutive application of all defined
234246
transformations to the variables for each event.
@@ -326,7 +338,7 @@ \subsection{Variable selection based on variance}
326338
\label{eq:meanecalculation}
327339
\mu_j = \frac{\sum_{i=1}^N w_i x_{j}(i)}{\sum_{i=1}^N w_i}
328340
\eeq
329-
Unlike above five variable transformation method, this Variance Threshold method is implemented in DataLoader class. After loading dataset in the DataLoader object, we can apply this method. It returns a new DataLoader with the selected variables which have variance strictly greater than the threshold value passed by user. Default value of threshold is zero i.e. remove the variables which have same value in all the events.
341+
Unlike above six variable transformation method, this Variance Threshold method is implemented in DataLoader class. After loading dataset in the DataLoader object, we can apply this method. It returns a new DataLoader with the selected variables which have variance strictly greater than the threshold value passed by user. Default value of threshold is zero i.e. remove the variables which have same value in all the events.
330342

331343
\begin{codeexample}
332344
\begin{tmvacode}

tmva/tmva/inc/TMVA/VariableNormalizeTransform.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ namespace TMVA {
5151

5252
typedef std::vector<Float_t> FloatVector;
5353
typedef std::vector< FloatVector > VectorOfFloatVectors;
54-
VariableNormalizeTransform( DataSetInfo& dsi );
54+
VariableNormalizeTransform( DataSetInfo& dsi, TString strcor="" );
5555
virtual ~VariableNormalizeTransform( void );
5656

5757
void Initialize() override;
@@ -77,6 +77,8 @@ namespace TMVA {
7777

7878
private:
7979

80+
Bool_t fNoOffset;
81+
8082
void CalcNormalizationParams( const std::vector< Event*>& events);
8183

8284
// mutable Event* fTransformedEvent;

tmva/tmva/src/VariableNormalizeTransform.cxx

Lines changed: 41 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,13 @@ Linear interpolation class
5454
////////////////////////////////////////////////////////////////////////////////
5555
/// constructor
5656

57-
TMVA::VariableNormalizeTransform::VariableNormalizeTransform( DataSetInfo& dsi )
58-
: VariableTransformBase( dsi, Types::kNormalized, "Norm" )
57+
TMVA::VariableNormalizeTransform::VariableNormalizeTransform( DataSetInfo& dsi, TString strcor )
58+
: VariableTransformBase( dsi, Types::kNormalized, "Norm" ),
59+
fNoOffset(kFALSE)
5960
{
61+
if (strcor=="Scale") {fNoOffset = kTRUE;
62+
SetName("Scale");
63+
}
6064
}
6165

6266
////////////////////////////////////////////////////////////////////////////////
@@ -143,10 +147,16 @@ const TMVA::Event* TMVA::VariableNormalizeTransform::Transform( const TMVA::Even
143147

144148
min = minVector.at(iidx);
145149
max = maxVector.at(iidx);
146-
Float_t offset = min;
147-
Float_t scale = 1.0/(max-min);
148150

149-
Float_t valnorm = (val-offset)*scale * 2 - 1;
151+
Float_t valnorm;
152+
if (!fNoOffset) {
153+
Float_t offset = min;
154+
Float_t scale = 1.0/(max-min);
155+
valnorm = (val-offset)*scale * 2 - 1;
156+
} else {
157+
fabs(max)>fabs(min) ? valnorm=val/fabs(max) : valnorm=val/fabs(min);
158+
}
159+
150160
output.push_back( valnorm );
151161

152162
++iidx;
@@ -188,10 +198,16 @@ const TMVA::Event* TMVA::VariableNormalizeTransform::InverseTransform(const TMVA
188198

189199
min = minVector.at(iidx);
190200
max = maxVector.at(iidx);
191-
Float_t offset = min;
192-
Float_t scale = 1.0/(max-min);
193201

194-
Float_t valnorm = offset+((val+1)/(scale * 2));
202+
Float_t valnorm;
203+
if (!fNoOffset) {
204+
Float_t offset = min;
205+
Float_t scale = 1.0/(max-min);
206+
valnorm = offset+((val+1)/(scale * 2));
207+
} else {
208+
fabs(max)>fabs(min) ? valnorm=val*fabs(max) : valnorm=val*fabs(min);
209+
}
210+
195211
output.push_back( valnorm );
196212

197213
++iidx;
@@ -282,8 +298,15 @@ std::vector<TString>* TMVA::VariableNormalizeTransform::GetTransformationStrings
282298

283299
Char_t type = (*itGet).first;
284300
UInt_t idx = (*itGet).second;
285-
Float_t offset = min;
286-
Float_t scale = 1.0/(max-min);
301+
Float_t offset;
302+
Float_t scale;
303+
if (!fNoOffset) {
304+
offset = min;
305+
scale = 1.0/(max-min);
306+
} else {
307+
offset = 0.;
308+
fabs(max)>fabs(min) ? scale=.5/fabs(max) : scale=.5/fabs(min);
309+
}
287310
TString str("");
288311
VariableInfo& varInfo = (type=='v'?fDsi.GetVariableInfo(idx):(type=='t'?fDsi.GetTargetInfo(idx):fDsi.GetSpectatorInfo(idx)));
289312

@@ -329,6 +352,7 @@ void TMVA::VariableNormalizeTransform::AttachXMLTo(void* parent)
329352
{
330353
void* trfxml = gTools().AddChild(parent, "Transform");
331354
gTools().AddAttr(trfxml, "Name", "Normalize");
355+
gTools().AddAttr(trfxml, "UseOffsetOrNot", (fNoOffset?"NoOffset":"UseOffset") );
332356
VariableTransformBase::AttachXMLTo( trfxml );
333357

334358
Int_t numC = (GetNClasses()<= 1)?1:GetNClasses()+1;
@@ -353,6 +377,13 @@ void TMVA::VariableNormalizeTransform::AttachXMLTo(void* parent)
353377

354378
void TMVA::VariableNormalizeTransform::ReadFromXML( void* trfnode )
355379
{
380+
TString UseOffsetOrNot;
381+
382+
gTools().ReadAttr(trfnode, "UseOffsetOrNot", UseOffsetOrNot );
383+
384+
if (UseOffsetOrNot == "NoOffset") fNoOffset = kTRUE;
385+
else fNoOffset = kFALSE;
386+
356387
Bool_t newFormat = kFALSE;
357388

358389
void* inpnode = NULL;

tmva/tmva/src/VariableTransform.cxx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,10 @@ void CreateVariableTransforms(const TString& trafoDefinitionIn,
158158
if (variables.Length() == 0) variables = "_V_,_T_";
159159
transformation = new VariableNormalizeTransform(dataInfo);
160160
}
161+
else if (trName == "S" || trName == "Scale" || trName == "ScaleNorm" ) {
162+
if (variables.Length() == 0) variables = "_V_,_T_";
163+
transformation = new VariableNormalizeTransform(dataInfo,"Scale");
164+
}
161165
else
162166
log << kFATAL << Form("Dataset[%s] : ",dataInfo.GetName())
163167
<< "<ProcessOptions> Variable transform '"

0 commit comments

Comments
 (0)