It was asked how feasible it would be to put a GNUe implementation
into production. Nick Rusnov (nickr) noted that
People are using it in production - well
components of it
. Jason Cater (jcater) explained
most gnue implementations right now are using
the form builder/interface components
.
if you are looking for a complete financials
package, then GNUe is not ready for you
as of time of writing.
However, if you were looking for tools to
easily build an interface to your own database, then gnue
might be suitable. It was more a framework for business/database
application development rather than a shrink-wrapped package of
applications as of time of writing.
Malek Hadj-Ali (lekma) reported after
benchmarking a little bit the appserver, i got some ugly
results
. The bottleneck appeared to be in the XML-RPC (remote
procedure call) code rather than in the application server itself. So
Malek had searched for an alternative to
xmlrpc
and finally found a spec
for a binary rpc protocol, that was partly implemented in
python
, called hessian.
He re implemented it in python
taking care to keep the api
the same as for the existing XML-RPC used by GNUe. The
results are: the encoding part is
slightly slower than xmlrpc - the decoding is way faster than xmlrpc
- so on the overall hessian is faster
. However, Reinhard
Müller (reinhard) was still concerned about the loss in performance
compared to just running a query directly against the database -
xmlrpc adds an overhead of 20000% -
and hessian adds an overhead of 6000% - which both are disgusting
numbers :(
.
In any case, the simple fact that
appserver itself (without the rpc protocol) adds 4000 % overhead
is not nice
- what strikes me
odd is that I think we did performance tests and they were
acceptable
. Johannes Vetter (johannesV) wondered if
maybe the number of records was too
low then
, noting that the GNUe Schema Definition (.gsd)
format has shown to be not very well
suited for large files
. Malek confirmed that generating
the gsd file seemed to be the bottleneck.
Malek sent his sample data and code to Johannes, who ran them
himself, and got even more extreme results - just over a second
for running a query directly against the database, and almost 18
minutes going via Application Server and XML-RPC. Malek said that
he had dug into the code enough to determine that
the for loop in fetch is the
killer
, but did not understand the GNUe code internals
enough to progress any further. Reinhard said that the data
from the database should only be being fetched once and then
cached before being processed to XML-RPC -
it might be a good idea to check
postgresql logs to see if the sql statements against the db are
those that we would expect - acually finding out that a bug
somewhere causes a new sql statement to be issued for every
record would be an easy eplanation of the bad performance,
and should be fixable - but i fear it's not that
easy...
. Johannes confirmed from the database
logs that only one SQL statement was being generated.
Johannes did some testing with hotshot, a high performance logging profiler for python, and was able to produce some statistics on what parts of the code were taking the most time to run.
The
next day, Malek reported my hessian
impl seems to be now faster than xmlrpc in decoding and encoding
(after a bit of tuning on unicode)
- also
it generates smaler messages which
is good for network
. He confirmed there were no new
external dependencies in the code - he had implemented the
hessian protocol directly within the existing GNUe code base.
The
next day, Reinhard developed further optimisation improvements.
Previously, GNUe had been using
the same datasources library for forms and appserver
but this had a lot of overhead
that was only for forms - like tracking dirty records, caching,
etc
. For Application Server, these sorts of issues
were anyway done in appserver
itself
. So he had written a more 'raw' ResultSet
function, removing this overhead, which Application Server
could then use directly. He added as
a side effect, it should also greatly reduce the memory
footprint
.
Also, he wondered if these optimisations
might also be interesting for
reports, as AFAICT reports should also be able
to work without that overhead
- except possibly
the missing master/detail ability
might be a problem...
.
However, this was still only
a part of the performance loss in appserver
, which was
still significantly slower than accessing the data directly using
the psycopg database driver. Although GNUe Application Server was
already faster, as of time of writing, than some other, slower,
database drivers!
Reinhard said he would change Application Server's data
access code to use the new
ResultSet.raw() function
he had written. He also made
various other optimisations, not all of which yielded the
expected results - python profiling
is full of surprises - changing a single
assignment statement
into a simpler form increased
overall appserver performance by > 5%
!