Repeated lookup operation returns different results
Repeated calls to at least 2 different REST calls returned
different results.
Calls were .../space/memories/category and
.../space/attributes/cp=type&v=*
Also tested using only the java api with the same behavior.
Rebooting the cluster seemed to resolve the issue. The
hypothesis is that it is some kind of caching issue.
Demonstrated problem to Jim.
2 Posted by Kyle Nakamoto on 12 Mar, 2010 09:07 PM
The data for this space was ingested solely with the REST interface. We are using a 3 machine cluster for this case. Saffron version 8.0.0-1784
Support Staff 3 Posted by Jim Fleming on 12 Mar, 2010 09:32 PM
Submitted to development.
Support Staff 4 Posted by Jim Fleming on 14 Mar, 2010 01:31 PM
Kyle, Is this reproducible? If so, can you turn on your MySQL general query log so we can see the queries being issued?
see: http://dev.mysql.com/doc/refman/5.1/en/query-log.html for more information.
Thanks,
Jim
Support Staff 5 Posted by Jim Fleming on 15 Mar, 2010 01:22 PM
Kyle, We also have a page on MySQL logging:
http://docs.saffronsierra.com/MySQL-Logging
If you see the problem again, you can make the change to the MySQL configuration, then run:
creset -r
to restart MySQL. You should then be able to run the REST query and see what queries MySQL is receiving. We are trying to reproduce here, but if you are able to reproduce first it would be good to see that log.
Thanks,
Jim
Support Staff 6 Posted by Jim Fleming on 16 Mar, 2010 06:42 PM
Kyle, Unfortunately we cannot reset MySQL (with a new config) once the cluster has been started, so I would recommend that you try turn on the general query log for MySQL and then immediately try and reproduce the problem. The general query log tends to grow pretty quickly, and there are scripts that you can run to rotate them if you feel inclined, see:
http://themattreid.com/wordpress/?p=34
However, your best bet may be to try and reproduce first. If it doesn't work, you can think about using the script to maybe delete the log nightly, for example.
Lets start with logging the head node and use a wild-card query, so that it is broadcasted to all nodes. Follow the steps here and add the line:
log
to /etc/my.cnf on the head node. Restart MySQL via creset -r and you should be good. If you see the problem, you can tail the log: /var/run/mysqld/mysqld.log. Let me know if you have any questions.
Thanks,
Jim
Support Staff 7 Posted by Jim Fleming on 04 May, 2010 07:03 PM
Issue has been resolved.
Jim Fleming resolved this discussion on 04 May, 2010 07:03 PM.
Kyle Nakamoto re-opened this discussion on 03 Aug, 2010 08:53 PM
8 Posted by Kyle Nakamoto on 03 Aug, 2010 08:54 PM
This problem is still an issue. It has not been fixed as of smb-8.0.0-2913-patch5.
Support Staff 9 Posted by Jared Peterson on 04 Aug, 2010 01:31 PM
Kyle,
In looking at the notes on bug #379 (attached to this issue) I see that the original issue was due to caches not being cleared properly when clearing a space in SaffronAdmin. Is this still one of the steps in reproducing now, or is this a different use case?
Also, is this a cluster or single box install?
Thanks,
Jared
10 Posted by Kyle Nakamoto on 04 Aug, 2010 01:46 PM
Jared,
This is a cluster environment still dealing with clearing the space. We saw this behavior appear again after clearing a full space (w/ admin), reingesting the data, and then querying the space (all without restarting). This behavior would not be seen on a single machine cluster as the queries tend to cycle with the number of node machines in the cluster.
Thanks,
Kyle
Support Staff 11 Posted by Jared Peterson on 04 Aug, 2010 01:54 PM
Kyle,
Thanks... I'll attempt to reproduce here.
Jared
Support Staff 12 Posted by Jared Peterson on 04 Aug, 2010 03:21 PM
Kyle,
When you ingest after having cleared the space do you notice any failures during your ingest? Have you looked at the logs after the ingest? The reason I'm asking is that I want to make sure that the difference in results isn't actually related to some other ingestion failure. In my testing here I'm seeing a difference between runs, but it appears to be related to an ingestion failure.
Thanks,
Jared