INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 527
www.rsisinternational.org
Symbolic Regression for Approximating Analytic Solutions to
Differential Equations
Dionisel Y. Regalado
North western Mindanao State College of Science and Technology, Philippines
DOI: https://doi.org/10.51244/IJRSI.2025.120800044
Received: 24 July 2025; Accepted: 31 July 2025; Published: 02 September 2025
ABSTRACT
Approximate analytic expressions are obtained for initial value problems with purely numerical solutions.
Symbolic regression was utilized to obtain such analytic expression. For functions that are Lipschitz
continuous, results revealed that the maximum absolute error (sup-norm) is bounded.
Keywords: symbolic regression, initial value problem, finite difference method
INTRODUCTION
Many differential equations used in Engineering and the Sciences have no closed-form analytic solutions but
are numerically solved (Hoffman et al., 2001). Numerical methods are techniques used to evaluate a solution at
a point without necessarily knowing the exact analytic form of the solution to an ordinary differential equation
(ODE). In most application, approximate numerical solutions may be sufficient but for purposes of analyzing
the mathematical properties of the solution, it may be necessary to derive an analytic expression for it. For
such situations, approximate symbolic expressions may serve as important bases for further mathematical
analysis. Recent developments in genetic programming can be exploited to provide such approximate solutions
through symbolic regression.
A first order differential equation is an initial value problem (IVP) if (Iserlas, 2008):
)(,)(
'
tutftu
,
00
)( utu
(1)
where
dd
RxRtf ,:
0
and
d
Ru
0
. For higher order systems, it is possible to analyze the system as a
larger set of first order systems such as (1) by employing extra variables. Thus, without loss of generality, one
may restrict attention to (1) since a higher order system can be converted to a larger system of first order
systems. For instance,
and
uz
'
(2)
where
z
is an extra variable.
Existence of a unique solution to (1) is guaranteed by the Picard-Lindelӧf Theorem which states that a unique
solution exists provided
f
is Lipschitz continuous. We state Lipschitz continuity in the case of real-valued
functions. A real-valued function
RRf :
is Lipschitz continuous if there exists a positive real
K
such that
for all real
1
x
and
2
x
:
󰇛
󰇜
󰇛
󰇜
(3)
In fact, let
RRg :
be everywhere differentiable, then
g
is Lipschitz continuous if and only if
g
has a
bounded first derivative (Grossman et al.,2007).
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 528
www.rsisinternational.org
A numerical method for solving (1) with boundary conditions is the finite difference method. Starting from (1),
replace the derivative
tu
'
by the approximation (Strikwerda, 2004):
h
tuhtu
tu
'
(4)
which gives:
tuthftutu ,
'
(5)
To apply formula (5), choose a step-size
h
and construct the sequence
0
t
,
htt
01
,
htt 2
02
, …,
tt
n
and denote by:
nn
tuu
.
Expression (5) is then transformed into recursive relation:
nnnn
uthfuu ,
1
(6)
By choosing
h
small enough, we obtain the ordered pairs
nn
utututut ,,...,,,,,,
221100
. We now seek a
function
tu *
that passes through all of the points such that:
tutu
i
t
*sup
(7)
is minimum.
Theoretical and Experimental Results
We start by defining an initial value problem.
Definition 1. Let
󰇟
󰇜
 be a real valued function. A first order equations is an initial value
problem (IVP) if
󰆒
󰇛
󰇜
󰇛
󰇜

󰇛
󰇜
.
The function
󰇛
󰇜
is a solution to the IVP if it satisfies the differential equation. We seek necessary and
sufficient conditions for the IVP to have a unique solution.
Definition 2. A real-valued function is Lipschitz continuous if there exists a constant  such
that for all
,
󰇛
󰇜 󰇛
󰇜
.
Lemma 1. Let be Lipschitz continuous, then has a bounded first derivative.
Proof: From the definition of a derivative:




󰇛

󰇜
󰇛󰇜

.
By Lipschitz continuity of f, there exists as for which
󰇛

󰇜
󰇛󰇜

,
which yields
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 529
www.rsisinternational.org




󰇛

󰇜
󰇛󰇜






It follows that


 
The Picard Lindelof Theorem guarantees a unique solution to the IVP.
Theorem 1.
󰇛
 󰇘
󰇜
. Let
󰆒
󰇛
󰇜
󰇛
󰇜

󰇛
󰇜
.
Let f be uniformly Lipschitz continuous in and continuous in t. then for a given , there exists a unique
solution to the IVP on
󰇟
󰇠
.
Proof. Write the IVP as the integral equation:
󰇛
󰇜
󰇛
󰇜
 󰇟
󰇠. (8)
where is to be determined. Define:
󰇟
󰇠 󰇟
󰇠
󰇛
󰇜
󰇛
󰇜

Hence, (8) is a fixed point of T. we show that T satisfies a Lipschitz condition:
󰇛
󰇜
󰇛󰇜 󰇛󰇜
󰇛
󰇜
󰇼
󰇛
󰇜

󰇛
󰇜

󰇼
󰇛
󰇜
󰇛
󰇜


󰇛
󰇜
󰇛
󰇜


󰇛
󰇜

(9)
where the norm on 󰇟
󰇠 is:

󰇝
󰇛
󰇜
󰇛󰇜
󰇞
.
Choose
. It follows from this choice of and (9) that T is a contraction mapping and is Lipschitz
continuous. By Banach’s fixed point theorem, there exists a unique fixed point 󰇛󰇜 for which:
󰇛
󰇜󰇛
󰇜
󰇛󰇜
which solves the IVP. 
Finite Difference Method.
Consider the IVP and suppose that we wish to find
󰇛
󰇜
at same later value
. If a closed form analytic
solution can be obtained, then the problem is trivial. If, however, no such closed form solution is available,
where M is independent of u
by Uniform continuity
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 530
www.rsisinternational.org
then we turn to numerical methods (Strikwerda, 2004). The simplest and most commonly-used method is the
finite difference method. The method is based on the Taylor series expansion:
󰇛
󰇜
󰇛
󰇜
󰆓
󰇛
󰇜

󰆓󰆓
󰇛
󰇜

(10)
By truncating all terms after the first derivative term, we have:
󰇛
󰇜
󰇛
󰇜
󰆒
󰇛
󰇜
󰇜 󰇛
󰇜 (11)
Definition 4. A function 󰇛󰇜 is “big Oh󰇛󰇜, written,
󰇛
󰇜
󰇛󰇛󰇜󰇜
If there exists a constant such that
󰇛
󰇜
󰇛󰇜
as
or:



󰇛󰇜
󰇛󰇜
The error term in (11) is 󰇛
󰇜 which tends to zero as . It follows that:
󰇛
󰇜
󰇛
󰇜
󰆒
󰇛
󰇜
(12)
with an error proportional to
. Equation (12) can be rewritten as a recursive relation:
󰇛

󰇜
󰇛
󰇜

󰇛
󰇜
(13)




The closed interval
󰇟
󰇠
is divided into non-overlapping sub-intervals of length h, i.e.
󰇟

󰇠
The
number of such sub-intervals is:

(14)
Convergence. For (13) to be useful, we need to show that the sequence
󰇝
󰇞

converges.
Lemma 2. For 󰇟
󰇠, the sequence
󰇝
󰇞

of (13) is a Cauchy sequence.
Proof.
Let
and let
󰇛
󰇜

󰇛
󰇜
󰇛

󰇜
where h is the step size and the function f is
Lipschitz continuous. Now,







󰇛


󰇜
󰇛

󰇜
󰇛

󰇜

󰇛󰇛

󰇜
󰇛

󰇜
󰇛
󰇜󰇜
(15)
where M is a Lipschitz constant. Let
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 531
www.rsisinternational.org

󰇝

󰇞
(16)
Then:
 As 󰇛
󰇜 , (17)
hence,
 (18)
Theorem 2. The sequence
󰇝
󰇞

converges.
Proof.
Since
󰇝
󰇛󰇜
󰇞

is a Cauchy sequence of real numbers for each t, it follows that
󰇝
󰇛󰇜
󰇞
converges. The
 󰇘 theorem guarantees that
󰇛󰇜 󰇛󰇜 as
Symbolic Regression
Symbolic regression is a type of regression analysis that does not specify the functional form of relationships
between two variables. It utilizes a genetic algorithm to execute the analysis. This is included as an option in
much statistical software. For this purpose, a one-month trial version of the software EUREQA was used to
analyze the given data set (Regalado et al., 2019.
The data set analyzed consists of the ordered pairs
󰇝󰇛
t
i
,u
i
󰇜󰇞
i=0
n
generated by the recursive relation (13). As
shown in the preceding sections, the computed values u
i
(t
i
) can approximate the true solution u (t
i
) as closely
as desired. These approximations provide a reliable input for symbolic regression, enabling the discovery of an
analytic expression that best fits the numerical solution.
Let
  and consider the pairs
󰇝󰇛
󰇜󰇞
. In traditional regression analysis, we assume a
model of the form:
󰇛
󰇜
󰇛
󰇜
󰇛󰇜 (19)
where
󰇛
󰇜
is a functional form that is completely specified except for the parameter values and 󰇛󰇜 are
random errors with zero expectation and constant variance. For instance, (t) may be a third degree
polynomial:
󰇛
󰇜
 
󰇛󰇜 (20)
where a, b, c and d are parameter to be estimated from the data.
In symbolic regression, the functional form of 󰇛󰇜 is not specified but is assumed to be derived from a class
of functions called building blocks. Let
󰇛󰇜 󰇟

󰇠
(21)
be the building blocks of function that are continuous. Symbolic regression, then, searches the space for an
optimal combination of building blocks that best fit the observations. Fitness is a user-defined quantity such as
the mean absolute error (MAE), the maximum error (ME) or the squared correlation goodness of fit 󰇛
󰇜.
The search process is implemented by applying the principles of genetic algorithm (GA). We consider the case
when has a finite number of building blocks:
󰇝
󰇞
Each
󰇛󰇜 is assigned as fitness value e.g. maximum error, when fitted to the observations. Let
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 532
www.rsisinternational.org
󰇝
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜󰇞
(22)
be the set of fitness values for the building blocks. The building are then arranged from the fittest to the least
fit function. Let
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜

where the subscripts denote ordering based on the fitness values i.e.
󰇛󰇜
is more fit than
󰇛󰇜
.
A second generation of building blocks is obtained by combining the most fit building blocks and recomputing
the fitness values of the resulting combinations of building blocks. The process continuous until a desired
fitness value is obtained. Gelly et al. (2017) proved under sufficient conditions , the solution given by Genetic
Programming converges when the number of examples goes to infinity , toward the actual function used to
generate the examples. This property is known in Statistical Learning as Universal Consistency. The authors
provided new results in the direction of bloat analysis: structural bloat and functional bloat. Structural bloat
occurs when no optimal solution , that is, when no function exactly matches all possible examples, is
approximated by the search space. In such cases, the authors demonstrated that optimal solutions of increasing
accuracy will also exhibit increasing complexity. ). On the other hand, functional bloat is defined as the bloat
that takes place when programs length keeps on growing even though an optimal solution (of known
complexity) does lie in the search space.
Illustration and Application
We provide a simple illustration of the proposed method for deriving an approximate analytic solution to a
differential equation based on a numerically-derived ordered pairs. The illustration can be solved analytically.
Thus, the solution is known which allows us to evaluate the proposed procedure more clearly.

󰆒
󰇛
󰇜
 
󰇛
󰇜


󰇛
󰇜
󰇛
󰇛

󰇜
󰇜
We confine the search domain on the interval 󰇟󰇠 and we seek the value u(1). We chose step sizes h = .1,
.01, .001, .0001, .00001 and observed the maximum absolute error:
 
󰇛
󰇜
󰇛󰇜

The finite difference equation gives

󰇛



󰇜


Table 1 shows the relationship between step size and the maximum absolute error (sup).
Table 1: Relationship of step size and maximum absolute error
step size
sup
0.1
36.3
0.01
6.765
0.001
0.735
0.0001
0.0744
0.00001
0.0742
The estimated regression equation shows that as the step size increases, the maximum absolute error likewise
increases.
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 533
www.rsisinternational.org

 
Predictor Coef SE Coef T P
Constant 0.8647 0.7886 1.10 0.353
step siz 356.63 17.55 20.33 0.000
S = 1.533 R-Sq = 99.3% R-Sq(adj) = 99.0%
Using h = 0.001, we used symbolic regression using a trial version of EUREQA to obtain a closed form
expression for the solution of the IVP. A portion of the data set is reproduced below:
Table 2: Portion of the Finite Difference Solution with h = .001
T
u
0
0
0.001
0.002
0.002
0.00401
0.003
0.00603
0.004
0.00806
0.005
0.010101
0.006
0.012151
0.007
0.014212
0.008
0.016283
0.009
0.018364
0.01
0.020456
The solution chosen has a maximum error of 0.00000515. The exact expression is shown on the screenshot
below:
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 534
www.rsisinternational.org
Figure 1: Screenshot of the Final Solution
The final approximate solution to the IVP when rounded to the first decimal place is:
󰇛
󰇜

󰇛

󰇛

󰇜

󰇜
󰇛


󰇜

󰇛

󰇜
󰇛󰇜
The maximum difference between the approximate solution and the actual solution is:
󰇛
󰇜
󰇛󰇜

Let
󰇛
󰇜
solution obtained at
by the finite difference method
󰇛
󰇜
symbolic regression value at
using
󰇛
󰇜
as inputs
󰇛
󰇜
actual value of the solution at
Then



󰇛
󰇜
󰇛
󰇜



󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
+
As ,
and if the solution 󰇛󰇜 is in the search space 
Thus,
󰇛
󰇜
󰇛
󰇜
as

REFERENCES
1. Blickle T. and Thiele L. (1994).Genetic programming and redundancy. In J. Hopf, editor, Genetic
Algorithms Workshop at KI-94, pages 3338. Max-Planck-Institut f¨ur Informatik.
2. Gelly Sylvain, Teytaud Olivier, Bredeche Nicolas, Schoenauer Marc (2017) Symbolic regression,
parsimony, and some theoretical considerations about GP search space .Equipe TAO - INRIA Futurs,
LRI, Bat. 490, University Paris-Sud, 91405 Orsay Cedex. France
3. Grossmann, C, Hans-G. Roos; Martin Stynes (2007). Numerical Treatment of Partial Differential
Equations. Springer Science & Business Media.
4. Hoffman JD; Frankel S (2001). Numerical methods for engineers and scientists. CRC Press, Boca
Raton.
5. Iserlas, A. (2008). A first course in the numerical analysis of differential equations. Cambridge
University Press.
6. Koza J. R.(1992). Genetic Programming: On the Programming of Computers by Means of Natural
Selection. MIT Press, Cambridge, MA, USA.
7. Regalado et al. (2019). Approximate Analytic Solution To The Lotka-Volterra PredatorPrey
Differential Equations Model. Journal of Higher Education Research Disciplines 4 (1), 38-47.
8. Strikwerda, J/ (2004). Finite Difference Schemes and Partial Differential Equations (2nd ed.). SIAM.