| Simple learning
algorithm |
The learning algorithm (tree growth option) starts with one
linear function piece. This fits the data like linear regression does. If the error
is too high, the piece splits into two, each new piece fitting only some of the sample
points. The two pieces are joined by taking either a minimum of a maximum of their values.
Splitting continues, forming a tree, but a piece is never split if its error is
within what can be caused by noise in the data. Estimation of the noise in the data
is a first step in solving the problem (see below). In subsequent training, a piece
is not split if its error could be caused by noise. This is a new, simple and
effective technique for achieving good generalization. |
| Automatic choice
of architecture |
You no longer have to worry about whether the architecture you
choose will allow your network to produce a good result. By (optionally) growing a tree of
minimum and maximum operators with linear functions at the leaves, and controlling the
splitting of pieces with too high an error, the architecture fits the data by
construction. |
| Simple form of
results |
The result of ALN training is a function made up of linear
pieces connected into a continuous function surface. One goal is to keep the number
of pieces small, so noise is not fitted. A result made up of a small number of
linear pieces is easy to analyze and check compared to more complicated expressions. You
don't have to fear unexpected results if your application is safety-critical -- you don't
have to evaluate it on an astronomical number of points -- you just check all the linear
pieces of the result carefully. |
| Smooth results |
The linear pieces have quadratic fillets, that fill in the
corners between them smoothly. You control how much smoothing is used. The
deviation from the continuous piecewise linear function is bounded by a value you set. |
| Real-time
evaluation speed |
After training, the input space can be partitioned into boxes
within each of which only a few linear pieces are active. This is called an ALN
decision tree. Most linear pieces don't have to be evaluated to compute a specific
output. The input lies in a certain box, and only the linear pieces touching that
box need to be considered. Computations which don't have to be done are omitted. |
| Scalability |
The method of omitting computations described under
"real-time evaluation speed" becomes increasingly efficient as the size of the
problem grows larger. In the limit of very large problems, this is far better than
"massive parallelism" using fast hardware. |
| Non-normalized
data |
The learning algorithms are invariant to translation and
scaling, so you don't have to normalize your data. This may not be the case with
neural networks you are currently using, e.g some backpropagation algorithms are not
invariant in this way, so you have to normalize the data. |
| Localized
sensitivity analysis |
A certain input variable may affect the output to a greater or
less extent depending on the values of all input components. You can analyze linear
pieces, on which all sensitivities (which are just the weights) are constant. This
helps to interpret your result and gain value from it, say in areas like data mining. |
| Control of
sensitivities |
If you know the sensitivities (partial derivatives) of your
result will always be within certain bounds, you can enforce those bounds and make sure
your result has properties you know it should have. This is simply control of the bounds
on weights of linear pieces, and is conserved when fillets smooth the piecewise linear
solution. |
| Control over
function shape |
If the ideal functional relationship you seek is convex, you
can constrain your result to be convex (up or down). It is forced to conform to what
you know the real world solution has to look like. |
| Control over
increase and decrease |
If you know your result must increase (or decrease) in a
certain variable to be realistic, you can enforce that property during training.
This is very useful in real-world systems satisfying physical or economic laws. Some
researchers have called these constraints "hints". |
| Estimation of
noise in data |
One of the most difficult problems in using neural nets, or
doing pattern recognition in general is filtering the noise out of your result. With
ALNs, you estimate that noise first. After that step, you set an output tolerance so
the next training result fits the information in your data, but not the noise. This
represents a significant advance over methods which merely stop training at a certain
point. At that point in other systems, parts of the net may be overtrained, others
undertrained. With ALNs, it is the error of a piece that determines when to stop.
This is illustrated in ALNBench in the case where the error is of constant magnitude. If
the relative error is constant, the output should be transformed into its logarithm before
training. If the error varies in other ways, the same idea can be applied using the
Dendronic Learning Engine. The above process is shown in a series of images (click here to view them). |
Back to the products and services page |