*** SachaS has joined #gnuenterprise *** reinhard has joined #gnuenterprise good morning all *** rajiv_ has quit IRC *** derek has quit IRC *** SachaS has quit IRC *** SachaS has joined #gnuenterprise hi reinhard i have the open source ebXML messaging system running on my laptop there is a add on to it which scanns a directory. so gnue could write an outgoing business document to a directory and the add on will pick it up and create an ebXML message and send it to the ebXML messaging system which then sends the ebXML message including the business document to a trading partner same for an incoming business document the ebXML messaging system receives it (and sends maybe a signed acknowlegement if that was agreed) ... and forards the business document to the add on which then in return can forward the business document to gnue either also through a directory or some ... gnue import or gnue webservice or some other gnue glue good night bye *** SachaS has quit IRC *** btami has joined #gnuenterprise good morning *** johannesV has joined #gnuenterprise good morning *** johannesV_ has joined #gnuenterprise *** johannesV has quit IRC *** kilo has joined #gnuenterprise good morning *** dimas_ has quit IRC *** lekma has joined #gnuenterprise good morning reinhard: the file RpcDoc.py in gnue.common.rpc.drivers.xmlrpc, is it still used/needed by xmlrpc? needed: no, AFAICT I think it is a tool to create an interface description, but I don't know how it works, and have never used it actually I don't even know whether it is still functional or not ok, thx reinhard: do you have 5 minutes to spare? yes after benchmarking a little bit the appserver, i got some ugly results 4min for a simple request on 185k records 47 secs were spent in appserver the rest in xmlrpc including or not including fetch? erh spent in xmlrpc? well that was a fetch reaues, count fetch request, count, fetch yep building the response the direct request with psycopg yuck the server side xmlrpc code? is 1,something second well part of it is serverside, the other part is decoding client side have you tried pyro? erh you don't have python on client side, right? so there is 2 pbs, i tried to address one, and i need your help to finish ok i tried pyro, but i couldn't make it work, i'm talking only python, server/client no javascript involved in my tests ok the first pb is the 47s spent in appserver for a simple from bla select bla when the driver spend only 1s but the pb i tried to address is the second one the rpc i searched for an alternative to xmlrpc ok there are few i finally found a spec for a binary rpc protocol, that was partly implemented in python it's called hessian, url: http://caucho.com/hessian/ it seemed to be faster on paper so i re implemented it in python and i built a python server, all based on xmlrpclib and SimpleXMLRPC Server to keep the api results are: the encoding part is slightly slower than xmlrpc the decoding is way faster than xmlrpc so on the overall hessian is faster small test to simulate the same request show only 1min40 sec for a 4min xmlrpcrequest I really would wonder where all that performance gets lost parsing xml that would mean you still loose 1 minute for a 1,something sec query in the rpc layer so now i'm trying to get a server and client adapter to work i.e. wile xmlrpc adds an overhead of 20000% to really compare both protocols, before submitting and hessian adds an overhead of 6000% which both are disgusting numbers :( i do agree just fyi, i asked friends to make the same kind of request on very well known erp and? and for example, a sap requests returns 1 min 45 sec. for 516.390 records and display two fields. Second try: 1 min 35 sec. the db is in munic(de) the sap server is in hildersheim(de) the client is in breda (nl) the link between the client and sap is a 64k line so we're not in good shape performance wise indeed even if the sap db and server used for this request are some kind of monstruous machine that's not the point serving the worldwide production of a big german company hm these are horrible numbers ... the simple fact that appserver itself (without the rpc protocol) adds 4000 % overhead is not nice ha! reinhard, right I knew that johannesV_ couldn't stay quiet in this discussion ;-) i think i'd like to dig into that ... right this is on an average server, on local network lekma, what about your test-data is it generated or real-life data ? only a string(2) lekma, can you post your benchmarks ? i mean the code used to perform it wait guys, before i do that i need to finish hessian integration and test again so i can do the same tests what strikes me odd is that I think we did performance tests and they were acceptable well, that hessian stuff is one thing ... appserver is the other one reinhard, maybe the number of records was to low then ok wait i'll try to send you files in a minute lekma, don't hurry ... :) the pb is the gsdgen thing that takes forever for 185000 records if i send you csv files and a script to put it in db is it ok? lekma, yes that's ok gsd has shown to be not very well suited for large files lekma: FWIW, we've noticed too that gsdgen takes forever and, apart from that, reading it in with gnue-schema also takes forever it's not the gsdgen only, it is gsd at all right johannesV_: sould i resend? lekma, 01003_* is not working the other files are ok ok, thanks ... all files here you should have them all now so what i need to finnish my puzzle is the little bit of magic that makes: gnue-appserver --rpctype blah working s/finnish/finish apart from adding the driver in common what should i do? you might need to add the rpctype to geasRpcServer.py if your rpctype is called "foorpc" then you have to add it to gnue.common.rpc.drivers and there must be a module implementing a class called "ServerAdapter" * johannesV_ populating backend from csv file ok, this looks like it needs some time ... why haven't you read the csv-file once, put it into an array and then reused it for insertion ? hmmm... well that's why you're the one doing gnue!! :) :) well, actually this doesn't really matter ... moving the commit out of the loop gains much more ok, i'm off for lunch now ... but i can find some time in the afternoon i'll do some base-tests ... i think I'll use profile module ... i need help with sockets if i do server.system.listMethods() everything is fine but if i do server.open(), i get httplib.ResponseNotReady client side and DB000: File "/usr/lib/gnue/python/gnue/common/rpc/drivers/hessian/ServerAdapter.py", line 379, in close_request DB000: self.serverAdapter._clearClientObjects (request.getpeername ()) DB000: File "", line 1, in getpeername DB000: error: (107, 'Transport endpoint is not connected') server side hm, i haven't checked sockets recently ... but i got an interesting result with your benchmark johannes@chicago:~/prj/gnue/benchmarks$ python 00000_test_psycopg.py 0:00:01.558194 johannes@chicago:~/prj/gnue/benchmarks$ python 00000_test_xmlrpc.py 0:17:48.472561 that's quite shocking ... ewwwww 17 mins wow your server is even slower than mine or what i've got the database on server a, and appserver on server b as well as the client (server b) obviously you postgresql server is slower than mine cause : python 00000_test_psycopg.py 0:00:00.005379 the 1 sec i was refering was with another connection, sorry which is even worse anyway what about this: johannes@chicago:~/prj/gnue/benchmarks$ gcvs other.py -c appserver R= 185000 0:00:01.433922 using the language-interface ... :) j/k ... that is only the session.find() ... (doing the default-cache-size fetch only) i've added a ObjectList iteration now ... shoul correspond to the 47 secs i was referring to modulo server/network performance just to compare: python 00000_test_xmlrpc.py 0:04:36.839102 you should buy yourself faster computers :) well, my client is a 3ghz machine with 1gb ram and the db-server is a 750mhz cpu with 512 mb ram the network is 100mbit maybe i should start a new instance of appserver after populating the backend ... oh yes something i forgot: the xmlrpc request eats 500mb of memory on the appserver which is a lot so the 17 minutes may partly come from this particular point I think the first step to optimize would be to find out which parts of the time fall on the request, the count, and the fetch ah, easy that is ... johannes@chicago:~/prj/gnue/benchmarks$ python 00000_test_xmlrpc.py List: 0:00:01.223186 Count: 0:00:01.225652 the fetch is the killer yes in fact the for loop in fetch is the killer i did get to this point, when i started to look for rpc alternatives cause gnue code and me we're not really friends :) did you test whether anything changes if you fetch bigger chunks ? nope *** btami has quit IRC appserver is eating mor than 500mb memory and takes about 30% of the cpu-time ok, that's definitley way to much ... (at least for my machine ...) apart from the rpc driver perf pbs, i believe there is space for improvement in what i call the pre rpc stage (typeconv for example) and in the way appserver request the db I don't think it's the way appserver requests the db because that happens in "request" rpc driver wise, encoding can't really be faster (exept if you use cPickle/marshal modules, whic you shouldn't do) not in fetch (AFAICT) the instance__getValue doesn't request db? no it should get the data from the cache in data.py and if data is not in cache? data is in cache if it was included in the request the case that data is not in cache can only happen on load not on fetch but still it might be a good idea to check postgresql logs to see if the sql statements against the db are those that we would expect acually finding out that a bug somewhere causes a new sql statement to be issued for every record would be an easy eplanation of the bad performance, and should be fixable but i fear it's not that easy... johannesV_: another idea might be to do the same performance test on top of the data.py layer that's the nice thing about appserver's code stack - we can test on datasources, data.py, geasSession and xmlrpc layers and see which layer add what amount of overhead there is only one sql statement generated i'm about to create a hotshot-environment for appserver what is a hotshot-environment? :) 10.9 hotshot -- High performance logging profiler see python.doc/org oh ah you remember that you also did a samples/testing/benchmark dir in appserver some time ago? yes i know bbl *** dimas_ has joined #gnuenterprise wow, a profile for the fetch() call only has about 6mb :) *** jamest has joined #gnuenterprise back ok, I've got some first results having a backend with 18500 records the xmlrpc-stuff takes about 10 seconds (overall) one second is used for checktype another second is used withing EventControler.dispatchEvent 0.7 seconds are used in data.read and data.write takes another 0.7 speaking of which total time? yes for that calls only that is about 30% of the total time I mean what is the total time ? 10secs 10.042 ah sorry i misunderstood are u sure ther's xmlrpc in these 10secs? ah, i've to mention, that these results are derived from the fetch-call only ah, ok erm, so this is a different test as the one that used 17 min? yea i don't want to spend 17minutes per test :( that seems coherent with my results i was curious where the time is being used within the fetch 47 secs spents in fetch for 185k records oh, I didn't see you reduced the number of records by factor 10 can you find out whether this is a specific call to checktype? we might consider doing "assert checktype" in performance critical loops where are the 6 missing seconds spent? actually did you consider testing the difference with an optimized run? reinhard: what is an optimized run? python -o ... (IIRC) then python compiles the code with an optimizer and removes all asserts i'll try that it performs in the same overall time (the xmlrpc-script takes 21 seconds) how can i pass -o to the gnue-appserver script? add it in bin/gacvs python2.x -OO ... checktype is called 425500 times for 18500 records johannesV_: where are the 6 missing seconds spent? in about 40 other calls ... ranging from 0.043 seconds up to 0.5 seconds i'd say most of the time is in data.py 4.3 seconds are spent in data.py 1.33 in ResultSet.py so appearantly data.py adds quite a share of the overhead looking at the cumulative times it appears that data.__nextBackendRecord takes 5.5 seconds of the 10 that's at least a starting point well, there's a lot of stuff happening from within there AFAICS it only does resultSet.[first|next]Record and then a __buildRecords is __buildRecords using most of the time obviously? no. much more ... all that rubbish with merging with cache, checking for ordering and the like 2.302 is for __buildRecords the -OO trick doesn't seem to be really helpful, 1 sec gain on a 20 sec run (xmlrpc included) lekma, that's the way it looks here too *** kilo has left #gnuenterprise 2.302 for _buildRecords is cumulative? (i mean including everything called from within there)? yes where are the other 3 seconds? 2.865 0.000 ResultSet.py:397(nextRecord) ah ok I misinterpreted johannesV_: 1.33 in ResultSet.py so that was non-cumulative? right that was the timeing per call it's quite tricky to read all that numbers from the Stats class :) I think we might really try to optimize bottom up i.e. first look that we can make ResultSet/Connection as fast as possible then try to remove overhead from data.py etc ok, in ResultSet.py we have 1.486 in __cacheNextRecord() and 1.212 in __move so that is 2.69 out of that 2.865 1.212 in _move sounds like way too much given that IIRC this function doesn't do any useful in our context here could be the __sync only ... 1.087 in __sync ui isEmpty 0.058, _acitvate 0.052 and 1.071 for EventControler.dispatchEvent(dsMoved) ok, so that's the next candidate :) there should be (AFAICT) no event handler attached to this event at all 0.2 seconds for creating an Event instance and 0.822 seconds for RecordSet.py __dispatchEvent hmm not sure who is listening to dsCursorMoved usually the user interface is listening I don't think anybody should listen in appserver hmm GDataSource always creates an EventControler instance but i has to that's ok yes but there should be no listeners so the dispatchEvent should actually do nothing (more or less) yeah, should be that way but where do that 1.072 seconds go then (in dispatchEvent) *** btami has joined #gnuenterprise can you read that from the profile? yes i've got 1.072 seconds in EventController.py dispatchEvent and another 0.822 in RecordSet.py __dispatchEvent can you break that down to the lines? which lines ? 37000 0.861 0.000 1.071 0.000 EventController.py:114(dispatchEvent) from EventController.py dispatchEvent 18500 0.213 0.000 0.822 0.000 RecordSet.py:751(__dispatchEvent) ah ok maybe i can find out something about the callee what do those numbers mean? of these funcs ncalls tottime percall cumtime percall filename:lineno(function) so if dispatchEvent is a function with 10 lines, you can't find out how time is distributed in these lines? not really i can look at which functions are called within dispatchEvent and then look for their timings >>> st.print_callees('dispatchEvent') Ordered by: cumulative time List reduced from 45 to 2 due to restriction <'dispatchEvent'> Function called... EventController.py:114(dispatchEvent) Event.py:58(__init__)(37000) 0.210 RecordSet.py:751(__dispatchEvent) EventController.py:114(dispatchEvent)(18500) 1.071 so this looks like the time really is spent *within* dispatchEvent, not in a subfunction? you could, just for testing, cut the dispatchEvent into 3 or 4 pieces to get more detailed data on the rpc side: repr(value) length == 600000 marshal encode 0.17 seconds 7000005 bytes marshal decode 0.14 seconds pickle encode 4.72 seconds 5300075 bytes pickle decode 2.18 seconds cPickle encode 0.53 seconds 5300075 bytes cPickle decode 0.43 seconds without the long, boolean, and unicode. New data length is represented: 500000 marshal encode 0.17 seconds 5200005 bytes marshal decode 0.09 seconds pickle encode 4.0 seconds 2550061 bytes pickle decode 1.54 seconds cPickle encode 0.32 seconds 2550061 bytes cPickle decode 0.16 seconds xmlrpclib encode 3.28 seconds 26500019 bytes xmlrpclib decode 22.0 seconds hessianrpclib encode 4.37 seconds 4350004 bytes hessianrpclib decode 6.81 seconds various encode/decode test the xmlrpclib runs are with sgmlop wich is supposed to be faster than expat *** jcater has joined #gnuenterprise *** jamest has quit IRC *** ramingo has joined #gnuenterprise *** ramingo has left #gnuenterprise *** jamest has joined #gnuenterprise *** btami has quit IRC anybody with sockets experience? *** johannesV_ has quit IRC *** johannesV_ has joined #gnuenterprise what kind of experience? I've had to do a few sockets clients in python well that was to help me solve a curious pb but i found where my error was in the meantime, thx *** derek has joined #gnuenterprise *** klasstek has joined #gnuenterprise *** johannesV_ has quit IRC *** lekma has quit IRC *** sjc has joined #gnuenterprise *** jamest has left #gnuenterprise *** jamest has joined #gnuenterprise *** jamest has quit IRC *** jamest has joined #gnuenterprise *** kilo has joined #gnuenterprise good night all *** reinhard has quit IRC *** kilo has quit IRC *** klasstek has quit IRC *** jamest has quit IRC *** derek has quit IRC *** derek has joined #gnuenterprise *** sjc has quit IRC