Performance determination of
a single cell oil process.
L. Muniglia’, S. Papanikolaou’, Cc. Fonteix’, I. Marc’.
" Laboratoire des Sciences du Génie Chimique - U.P.R. C.N.R.S. 6811,
13 rue du Bois de la Champelle, 54500 VAND@UVRE FRANCE
Telephone : (33) 3 83 44 83 30 - Facsimile : (33) 3 83 44 83 28
e-mail : imarc @ensic.u-nancy.fr
* Laboratoire des Sciences du Génie Chimique - U.P.R. C.N.R.S. 6811,
E.N.S.A.LA., 2 Av. de la forét de Haye, B.P. 172, 54505 VANDCEUVRE FRANCE
Telephone : (33) 3 83 59 58 42 - Facsimile : (33) 3 83 59 58 04
Abstract
The development of a process is submitted to a necessary and essential phase
of optimization to use it in the best way. Experimental design allows to obtain a
maximum of information with a minimun of experiments. A new approach based on
fuzzy logic, named Fuzzy Dynamic Experimental Design (F.D.E.D.), developed in our
laboratory, has been validated on a bioprocess. The interest of this work is to show
the efficiency of this method on a chosen example: the production of single cell oil.
This study comes within already realised works on a yeast: Yarrowia lipolytica. With
the help of F.D.E.D., the covering rate of the domain has been evaluated. Then, we
had to add few experiments to increase the domain covering. The validity of the
calculated models has been verified with new experiments not used for the
identification of parameters. Results show a satisfactory prediction.
1. Introduction
F.D.E.D. is a new method of experimental planning developped within the
L.S.G.C.. This one has already been successfully tested on chemical reactions and
especially on the decylation of lactose (Fonteix et al., 1997).
Characteristics of this method justify to test it on a bioprocess for many
reasons. First, more than chemical reactions, bioprocesses are often dynamic
phenomena characterized by strongly non linear variations. These characteristics
restrict the use of classic planning well suited for static experiments and for models
which are non linear on their parameters. Thus, it is very difficult to choose a model
prior to experimentation what is essential in the case of the use of classical method.
This is difficult without knowledge or with partial one. For this reason, a method, like
F.D.E.D., when choice of the model a priori is not necessary, is interesting to realise
initial experiments or to complete previous ones. Besides, implemented processes are
often costly in time and experiments. Then F.D.E.D. is able to use all available
knowledge by integrating it in the design. We can employ an evolutive method, with
no loss of information, and, besides, which allows to complete the information in an
iterative way.
Then, F.D.E.D. seems to answer correctly to these constraints but has not been
tested yet on a bioprocess. Consequently, we have chosen to apply it to the production
of single cell oil and especially to obtain a cocoa butter equivalent in a microbial way.
2. F.D.E.D. theory
2.1. References definition
Classical experimental design defines a set of "static" experiments E = {E),
Ep, ... En}. Here, experiments Ej are references which are not realized. Suppose that a
dynamic experiment is made with sampling time t, (k varies from 1 to p).
"Static" experiment Ej is a set of qualification levels corresponding to each
measured state variable xj (j varies from 1 to q): Ej = {Lit, Li, ... Lij, ... Liq}-
Reference fuzzy set Fi, of Ej, Fix = LLije Ei fijx/Lij, represents the total
accomplishment of experiment Ej at sampling time tx. Thus fijk = 1 Vi, j, k.
2.2. Experimental data treatment
2.2.1. Expression of rj
The dynamic experiment is realized. So, the fuzzy set V of E, V = rick 1i/Ei,
is the accomplishment of the experimental design through the dynamic experiment.
But V = Uc) Vk with Vx = XgicE aik/Ej, the accomplishment of the experimental
design through the dynamic experiment at sampling time tk.
Then it results: rj = sup(ky (aix).
2.2.2. Expression of aj,
Fuzzy set Wik of Ej, Wik = XLije Bi bijx/Lij, is the accomplishment of the
reference through the dynamic experiment. So, bij, is the possibility that the
experimental measurement of variable xj at time tx is compatible with the
qualification level Lij. Thus, ajx is the necessity of Wix referring to Fix.
It reads: ajx = N(Wik 3 Fik) = infLije Ei ) Max(bijk , 1-fijk) = inf(Lije Bi ) bijk -
2.2.3. Expression of bijx
Membership function HAjk (xj) of fuzzy set Ajx, defined by Ajx = hj HAjk
(xp)/x; , is deduced from measurement and simulation of Xj at tk. This membership
function integrates known measurement uncertainty and model accuracy. Membership
function prij (xj) of fuzzy set Qij, defined by Qij = hj HLij (xj)/xj , is associated to
each qualification level Lij. Human expert chooses qualification levels and associated
membership functions. Thus, bij, is the possibility of Ajx referring to Qij and bijx =
TI(Ajg + Qij) = sup(xjy MiN(HAjk Oj)» HLij Oj).
2.2.4. Membership function pajx
Membership function Hymjx (xj) of fuzzy set Mjx, defined by Mix = Jxj Mmjk
(xj)/xj , is deduced from measurement of x; at ty and its known uncertainty (figure 1a).
This membership function can be triangular or gaussian. [jx is the possibility to
have xj at t, according to measurement at this time. Membership function [gjk (xj) of
fuzzy set Six, defined by Six = hj Hsjk (xj)/xj, results from simulation of xj at tk and
from the acceptable inaccuracy of the model, chosen by the human expert (figure 1b).
Thus Ajk = Mjk 0 Sjx and Hajk (xj) = min (Mmjk (x), Hsjk (Xj).
Hmjk (a) H sik (b)
I . I
| I
| I
~ + a
— x ——
Emjk J Veg xj
X mjk Xsjk
Figure 1. Presentation of membership functions.
(a) for measurement of state variable xj at tk (Xmjk) With estimated uncertainty Emjx
(b) for model simulation of state variable xj at tk (Xsjk) with acceptable inaccuracy €gjx
2.3. Global relationship
At last, the accomplishment degree rj of the "static" experiment i through the
dynamic experiment reads
1 = supay (infLijeniy (Sup(xj) Min (MIN (Umjk (%}), Usjk OG) > HLij ) ))).
3. Characteristics of the system
3.1. Initial experiments
Our purpose is to find the best composition of a yeast culture medium to
obtain the higher lipid accumulation synthetised by yeasts.
Table 1. Characteristics of the 9 already realized experiments.
Medium components ( g/l ) Factors
N° Glucose Glycerol Stearin Ammonium Yeast x y Zz
sulfate extract
1 0 10.5 10.0 0.5 2.0 0.64 0.34 0.17
2 33.0 0 0 0.1 0.5 0.00 0.00 0.81
3 0 18.0 0 0.1 0.5 0.00 0.99 0.43
4 16.0 9.0 0 0.1 0.5 0.00 0.35 0.61
5 0 22:7 11.7 0.5 0.5 0.50 0.50 0.44
6 0 34.5 11.5 0.5 0.5 0.39 0.60 0.55
7 0 0 14.0 0.7 2.0 0.98 0.00 0.13
30.5 0 0 0.1 2.0 0.00 0.00 0.27
9 0 30.5 0 0.1 2.0 0.00 0.98 0.26
This synthesis is made when carbon substrates are present. Besides, nitrogen
supply is known to be important for this kind of production; lipid accumulation by the
yeast would begin when nitrogen exhaustion. Then, the synthesis begins when the
[C\/[N] ratio is high (Gill et al., 1977).
At the beginning of the study, preliminary runs have been realized so as to test
the ability of the chosen yeast, Yarrowia lipolytica, to accumulate lipids and its ability
to grow on designated carbon substrates. Three of them have been chosen for their
low costs and their interesting potentialities: glucose, raw glycerol (unpurified) and
stearin (free fatty acids from animal fat). As a consequence, nine experiments, already
done with the three chosen substrates, before the beginning of the experimental
design, can be used supplying data. These ones and their characteristics are
represented in table 1.
3.2. Factors
The study of the system is needed to define what kind of response is obtained
for what kind of operating conditions. Factors are elements which can be modified by
the user and constitute the entry variables of the experimental design. Three factors
have been chosen, hence, and allowed to describe the culture medium that is carbon
supply, type and quantity, and [C]/[N] value.
All three factors are calculated considering the total elementary concentration
of carbon in the medium [C]. This quantity is obtained by adding carbon rate of each
substrate. Considering the elementary mass composition, stearin, raw glycerol and
glucose are respectively composed of 76%, 39% and 40% of carbon.
Then, [C] is:
[C] = 0.76[stearin] + 0.39[glycerol] + 0.40[glucose]
The first both factors, x and y, characterise the initial composition of carbon
supply. x express the carbon rate brought by the stearin. The same is done for y which
represents carbon supply of raw glycerol. x and y represent percentages and have
values normed between 0.0 and 1.0.
—_ 0.76[stearin] y= 0.39[ glycerol]
[C] [C]
Carbon supply due to glucose (40% of total glucose mass) is not represented
as a factor but can be easily deducted from x and y:
{C]
HAT x.
oa 1X)
[glucose] =
The third factor, z, allows to know nitrogen supply in the medium. z
represents the ratio between [C] and [N]. Tested values are from 0 to 340. This is the
reason why a coefficient is added to norm it between 0 and 1.
Calculated values of factors are shown in table 1.
3.3. Membership functions and covering rate
To evaluate the covering rate of the domain, this one is cut in several fuzzy
sets. By this way, each factor can belong with a known percentage to one of the third
following levels: low, medium or high (Kuehm et al., 1996).
These levels and the ensued percentages depend on the membership functions
particular for each factor. These functions are defined separately for each factor by the
user who choose them helped with his a priori knowledge. Thus, he can further a
potentially interesting area by reducing the size of the function at this place. By this
way, the more an area is narrow, the more experiments will have to be situated
exactly in this area to obtain a satisfactory covering rate in there. On the contrary, a
large area will generate a good rate with few experiments. The three functions are
represented figure 2.
ow medium high
1004
high
025 05007510 ol 020 0.40 0.60 10 0.75
x= 0.76[stearin] y 0.39[glycerol] z= IC]
[C] [C] [N]
Figure 2. Membership functions of factors.
For each experiment, the three factors are represented in a fuzzy way thanks to
these membership functions. The covering rate is here calculated by the simplified
relationship: ri = sup(k).inf(Lije Ei )-HLij Xmjk)s
where the Ei are all the feasible combination of low, medium and high for the
three factors.
The domain of study has been defined by the three factors which fixed the
following restraints:
0.0<x,y,z< 1.0
and 0.0<x+y<1.0.
G@
38
0.39[glycerol]
iG m) 57 7 ic
A & a -#” INI
100 0 IN
- = @ GW
40
0 {0 c
Qo @®
Da 30 76[stearin]
_-"-OG@
[c]
Figure 3. Representation of the domain with the covering rate
of each area for 9 experiments.
Thus, the domain was represented as half a cube divided in 18 smaller fuzzy
sets defined by levels low, medium and high of each of the three factors. The covering
rate of the 18 areas has been determined thanks to fuzzy values of factors.
The figure 3 shows each area with its covering rate for the nine initial
experiments.
3.4. Criteria
The aim is to optimize the production of single cell oil. Thus, it is necessary to
improve the quantity and the quality of produced lipids as well as the efficiency of the
culture. That is why, these three criteria have been chosen:
[Intracellular lipids]
O productivity : P= .
[Total bioma:
] * Time
_ [Carbon in intracellular lipids]
~ [Total consumed carbon]
QO yield :
Qcomposition: Co= [Unsaturated Fatty acids]
[Total Fatty acids]
16-4 Optimal point 16
Concentrations (g/l)
of= T T T T
T
0 20 40 60 80 100 120
Time (hours)
Figure 4. Example of choice of the optimal point for the calculation of criteria,
@ stearin, O biomass and X intracellular lipids.
These criteria are calculated for each experiment as described above. For the
calculation, one optimal point has been chosen for each experiment when
[intracellular lipids]/[biomass] was higher as shown in figure 4.
4. Supplementary experiments
4.1. Number of new experiments
Nine exploratory experiments have been realised to verify the feasibility of the
study. Now, we have to add new runs to find the best possible design considering the
nine first ones. The chosen model is a second degree polynomial model and need ten
parameters. We had to add at least one experiment for the identification of
parameters. Four supplementary ones have been proposed to identify the error on the
measure variance and to determine the confidence region of parameters.
4.2. Choice of new experiments
Helped with the F.D.E.D., five additional experiments allowed to obtain
satisfactory covering rate. In this case, we have chosen not to use an optimization
criteria but the human knowledge and decision support.
Covering rates obtained with the five new experiments added with the
F.D.E.D. (see table 2) are represented in figure 5.
meta
Q@
38
a m) 57 | 50 id
1 jo 3 © INI
mM) 43 | 93 100 50 EN
: - @& @®
40
100] 70 a
- ey ®
37 | 50. [ 0.76[stearin]
. [cl]
Figure 5. Representation of the domain with the covering rate
of each area for 14 experiments.
4.3. Comparison with optimized ones
We decide to consider two statistical criteria to obtain new experiments. We
choose the D-optimality criterion and the rotatability described by Khuri (1988). The
addition of experiments considering these last two criteria is done with the help of a
diploid genetic algorithm (Perrin et al., 1997). Table 2 presents the different
complemented designs obtained.
Table 2. Comparison of four obtained designs.
9 experiments 9 experiments 9 experiments 9 experiments
+5 F.D.E.D. +5 D-optimality +5 rotatability +3 D-optimality
+2 rotatability
Factors x y Z x y ZL x y Zz x y Zz
10 0.71 | 0.29 | 0.43 | 0.00 | 1.00 | 1.00 | 0.66 | 0.34 | 1.00 | 0.00 | 1.00 | 1.00
ll 0.70 | 0.30 | 0.93 | 1.00 | 0.00 | 1.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 1.00
12 0.75 | 0.24 | 0.09 | 0.00 | 0.62 | 0.00 | 0.39 | 0.61 | 0.00 | 0.50 | 0.00 | 0.96
13 0.00 | 0.49 | 0.11 | 0.45 | 0.00 | 1.00 | 0.00 | 0.00 | 0.37 | 0.00 | 0.00 | 0.42
14 0.49 | 0.00 | 0.40 | 0.41 | 0.00 | 0.00 | 0.52 | 0.48 | 0.48 | 0.00 | 0.00 | 0.48
D-optimality 5.8e-4 8.8e-3 3.3e-5 2.3e-3
Rotatability 16.2% 12.0% 37.0% 19.1%
The comparison between the different results shows that the best compromise
is obtained when addition of three D-optimality experiments and two ones to increase
the rotatability. However, the calculation is then complex because of the necessity to
use two different algorithms. To employ F.D.E.D. is only based on reflection. That is
the reason why it will be applied in practice.
5. Modeling
5.1. Second degree polynomial model
Polynomials allow to reproduce, with any precision, any set of experimental
values. The only restriction is to choose a high enough degree. In order to determine a
polynomial model, it is in the nature of things to begin by a first degree model which
is a linear model considering Hadamard. But, such a model shows quickly its limits
because phenomena are not linear. For this reason, a second degree polynomial model
has been chosen so as to be able to represent criteria in a satisfactory way. Therefore,
for three factors, we needed to determine ten parameters. Indeed, the structure of the
model is the following:
h = ag + ayX + agy + a3Z + ag x2 + as y2 + ag Z2 + a7 XY + Ag XZ + Ag YZ
where h is the value of the criteria calculated for each experiment.
ag,..,A9 are parameters of the polynomial we have to determine.
x, y et z represent respectively the three factors (stearin, raw
glycerol and [C]/[N] ratio).
At least, ten experiments were necessary for the calculation. The nine ones,
defined in table 1, and the five ones, defined in table 2 with the F.D.E.D., were used.
5.2. Parameters calculation
The followed process is the same for the three criteria. H is the matrix of the
criteria for the 14 experiments, M is the matrix of factors and A is the matrix of
parameters which were unknown.
We write: H=MA+e (1)
where € represents the difference observed between
measurement and prediction.
€ is made up of independent hazard N(0, v) where v is the unknown variance.
The estimation of the maximum likelihood is done on unknown parameters
i.e. the matrix A and the variance v.
The equation (1) gives the way to estimate A and 4:
A=(M!.M)y).MlH
~ =(1/M,).(H - M.A)? where n, is the number of experiments.
In this case, % represents a biased estimation.
The confidence region of parameters is determined as described in the book
written by Walter and Pronzato (1997). This method is useful for models which are
non linear in their parameters. For these which are linear, this technique can be
reduced to the one described by Draper and Smith (1981).
We can write:
(A - A)T.MT.M.(A - A) < (np/(ng - np).Fe(p, Ne - np).(H - M.A)?
where np is the number of parameters.
i.e.
1np.v).(A - A)T.M™.M.(A - A) is a 2 with ny liberty degrees
and
1/((n, - ny).v).(H - M.A)? is a x? with (n, - np) liberty degrees.
We can deduce a calculation of the estimation without bias of v:
% wp = I/(e - np).(H - M.A)?.
The calculated values of 4 yp are given in table 3 and those of parameters in
table 4.
Table 3. Variance without bias for the three criteria,
productivity (P), yield (Y) and composition (Co).
Productivity Yield Composition
V wb 206.6 106.4 63.9
Table 4. Calculated parameters for the three criteria,
productivity (P), yield (Y) and composition (Co).
a0 ay ay a3 a4 as a6 a7 ag ag
P -0.95 | 124.6 | 27.9 57.3 -54.9 | -49.6 | -32.1 | -158.2 | -84.2 | 53.2
Y 45 62.4 -14 ~6.1 -9.6 -16.0 4.5 -55.2 | -55.6 | 56.1
Co 67.9 | -151.3 | -64.9 | -11.2 | 90.5 64.3 16.0 51.0 22.7 -7.1
5.3. Model validity
Considering previous values of variance, the acceptable errors on the three
criteria are:
+ 28.2 for productivity;
+ 20.2 for yield;
+ 15.7 for composition.
Table 5 shows calculated results so as to verify the similarity of the model
predictions and measurements.
Results show that no error exceeds the 95% confidence interval.
So as to verify the validity of the model, seven new experiments will be
determined with the help of F.D.E.D.. These ones will be chosen to increase covering
rates in any areas. They are presented in table 6. New covering rates are shown in
figure 6.
Table 5. Comparison between values of the model and experimental ones
for the fourteen experiments.
Exp Factors Productivity (P) Yield (Y) Composition (Co)
N°. 54 y z meas. calc. € meas. calc. € meas. calc. &
1 0.64 | 0.34 | 0.17] 23.9 28.3 44 27.6 22.5 5.1 8.0 5.2 2.8
2 0.0 | 0.0 | 0.81] 26.6 24.4 2.2 3.6 2.6 1.0 70.0 69.4 0.6
3 0.0 | 0.99 | 0.43 | 11.2 19.7 -8.5 3.9 9.7 5.8 70.0 61.8 8.2
4 0.0 | 0.35 | 0.61 | 33.2 37.1 -3.9 10.2 12.0 -1.8 49.5 50.7 -12
5 0.50 | 0.50 | 0.44 | 37.5 21.8 15.7 19.3 13.1 6.2 TS 12.8 5.3
6 | 0.39 | 0.60 | 0.55 | 30.0 22.4 7.6 18.7 12.4 6.3 14.6 20.0 -5.4
7 10.98 | 0.0 | 0.13 | 74.5 64.6 9.9 55:9. 48.6 7.3 3.9 8.2 43
8 0.0 | 0.0 | 0.27] 10.6 12.0 -14 2.7 3.2 -0.5 65.5 66.0 -0.5
9 0.0 | 0.98 | 0.26 8.7 5.0 3.7 3.5 0.7 2.8 56.2 62.5 -6.3
10 | 0.71 | 0.29 | 0.43] 16.5 30.9 -14.4 9.2 19.2 -10.0 14.9 73 7.6
11 | 0.70 | 0.30 | 0.93 | 14.3 15.6 -13 73 78 -0.5 19.7 19.6 0.1
12 | 0.75 | 0.24 | 0.09} 28.9 37.4 -8.5 20.8 31.8 -11.0 5.2 3.1 2.1
13 | 0.0 | 0.49] 0.11 | 13.1 9.5 3.6 3.8 2.3 15 514 50.1 13
14 | 0.49] 0.0 | 040] 47.8 48.2 -0.4 19.6 20.2 -0.6 18.2 18.0 0.2
Table 6. Comparison between values of the model and experimental ones
for the seven new experiments.
Exp Factors Productivity (P) Yield (Y) Composition (Co)
N°’. x y L meas. | calc. & meas. | calc. € meas. | calc. €
15 | 0.00 | 0.00 | 0.38 79 16.2 8.3 1.9 2.9 -1.0 70.0 66.0 4.0
16 | 0.00 | 1.00 | 0.37 9:3 13.4 Al 2.4 5.9 3.5 62.0 62.8 -0.8
17 | 0.00 | 0.00 | 0.25 9.1 113 -2.2 43 3.3 1.0 73.0 66.1 6.9
18 | 0.97 | 0.00 | 0.11 69 65.2 3.8 44.1 49.5 -5.4 1.0 7.6 -6.6
19 | 0.00 | 0.97 | 0.18 9.6 -2.4 12.0 3.6 -3.3 6.9 62.0 62.8 -0.8
20 | 0.44 | 0.34 | 0.32] 12.7 32.3 -19.6 11.5 16.3 48 13.9 12.3 1.6
21 | 0.00 | 0.41 | 0.35] 11.0 26.1 15.1 6.0 78 -1.8 59.1 49.1 10.0
Again, errors do not exceed statistically admissible limits.
Qa
38
0.39[glycerol]
[c] m) 57 50 [c]
7 poe
A q 100 =” IN)
mM 100 | 93 100 50 IN
Gy
a 7”
16
100 }96 | 3?
100 | 70
“Oe ®
30. 0.76[stearin]
[cl]
Figure 6. Representation of the domain with the covering rate
of each area for 21 experiments.
6. Conclusion
The results show that F.D.E.D. proposed a good compromise between D-
optimality and rotatability. However, if the design is supplemented, first, with three
experiments which take into account D-optimality, and secondly, with two ones
considering the rotatability, the efficiency of F.D.E.D. is a little bit lower. One of the
principal advantage of F.D.E.D. is the simplicity to find rapidly supplementary
experiments. The use of it needs few calculations and can be done without any
optimization algorithm. Optimalities are decreased with F.D.E.D. but this technique is
a good guide for heuristic design. Moreover, F.D.E.D. is well suited to determine
exploratory experiments at the beginning of a new process and to find quickly
generalization ones to validate a model.
7. References
Draper, N.R. and Smith, H. (1981). Applied Regression Analysis, John Wiley & Sons,
Inc., 2" edition.
Fonteix, C., Viennet, R. and Marc, I. (7-9 mai 1997). A new experimental design for
multicriteria optimization: application to a biochemical synthesis. 2"4 Int.
Symposium on Mathematical modelling and simulation in agriculture and bio-
industries, IMACS, Budapest, 57-62.
Gill, C. Hall, M. and Ratledge, C. (1977). Lipid Accumulation in an Oleaginous Yeast
with possession of ATP:Citrate lyase. Applied and Environmental
Microbiology, 33, 231-239.
Khuri, A.I. (1988). A Measure of Rotatability for Response-Surface Designs.
Technometrics, 30, 1, 95-104.
Kuehm, N., Fonteix, C., Marc, I. (1996). Fuzzy dynamic experimental sets for the
determining of the validity of a state estimation. RAIRO-APII-JESA Journal
Européen des Systémes Automatisés, 30, 5, 737-753.
Perrin, E., Mandrille, A., Vivalda, J.C., Fonteix, C., Marc, I. (1997). Optimisation
globale par stratégie d'évolution : technique utilisant la génétique des
individus diploides. RAIRO-Operation Research, 31, 2, 151-201.
Walter, E. and Pronzato, L. (1997). Identification of parametric models from
experimental data. Communications and Control Engineering Series,
Springer, London.