So Lynux, that’s an interesting sounding solution. Would be great if it made some difference for Guitsboy. We’ll see. I notice that you say you’ve not yet tried it, though, and fair enough. Thanks for offering it.
But I’m curious: did you ever resolve your original problem? And if not, hopefully you saw the note I just wrote to Guitsboy, asking him something that may well interest you if you still have your problem. On rereading this thread, from back in April, I’ve also had some new thoughts come to mind which I’ll share, if it may help either of you, or others with this seeming same issue.
To remind readers who may not want to review the whole thread, you had said originally that “all is fine with low to moderate load, however under heavy load and at random times the replication fails“, and that this failure “manifests in users not being able to login to our application (we store a token in session scope to store logged in status)”. Then it seems you may have concluded that things were down to the error you were seeing in the logs:
Mar 05, 2014 9:55:19 PM org.apache.catalina.ha.session.DeltaManager messageReceived
SEVERE: Manager : Unable to receive message through TCP channel
java.lang.IllegalStateException: removeAttribute: Session already invalidated
And now guitsboy reports seeing the same error.
But here’s the thing that came to mind for me tonight as I read this: you know, there can be a lot of other reasons that users can feel that they “lose their session”, even without using clustering and replication.
There are issues related sometimes to folks having duplicate session tokens (which can happen for various reasons, including perhaps ones in your code, and maybe only when people visit pages in a certain pattern, so that it happens only occasionally and not always).
Then there is an issue that can arise if you are supporting both http and https requests, where Tomcat (not CF) balks at that (see http://www.petefreitag.com/item/817.cfm, and though he shows a solution in IIS you should be able to implement a similar one in mod_rewrite if that was indeed perhaps your issue).
So I’d be curious if either of you may be in a position to have a failing client use any sort of client tool (like Chrome’s dev tools, or Firebug or Firefox’s new builtin tools, or IE’s f12 dev tools) to watch the communication between the client and the server, and especially to watch the cookies being sent. You guys both mention using jsessionid. Are they the same cookie value on each request? And/or are there more than jsessionid? I’ve seen it happen. There could be differences in the domain property reported for the cookie, the httponly property, the secure property, and so on. And you really do want to view the value sent from the client to the server, because if you view the cookie scope on the server a) it may show values set ON the server rather than sent TO the server, and b) it won’t show these additional cookie properties that were in play on the client. CF only sees the cookie name and value.
I’ve helped many people find out that this was the reason for the seeming session loss (and sometimes it was not all requests by all clients but perhaps only some requests for some clients, all on the same server). At least if this is the crux of the problem, you can then tackle WHY it’s happening. There can be many reasons, from code to configuration, so I won’t belabor them now.
But if either of you may be able to confirm this, perhaps we can help you both get a little closer to a real explanation and solution for your problem. Again, I’m just guessing a bit based on what you’ve written. I realize it may be that none of this is the problem and you have hit some other real unrelated bug. But I really feel confident that you ought to try to check this out first, as it’s indeed been the crux of problems for others, without respect to clustering. It seems worth ruling out, so that you don’t get misled chasing the problem on the assumption that it is about clustering.
As always, hope that helps.
/charlie