MantisBT - VTK
View Issue Details
0014310VTK(No Category)public2013-09-27 03:212016-08-12 09:55
arjenvr 
Berk Geveci 
urgentminorhave not tried
closedmoved 
5.10.1 
6.2.0 
TBD
crash
0014310: VTK-based applications completely freeze after a couple of hours
When running an animation or simulation where VTK objects are updated each frame, VTK completely freezes after a couple of hours.

The problem is caused by VTK's implementation of vtkTimeStamp. The class uses a static 32-bit integer to hold the current 'time', and increments this value every time Modified is called on any VTK object. This number eventually overflows, after which all modified time comparisons in VTK can break.

This behaviour usually occurs within a day, but may occur within a few hours, depending on the number of objects updated each frame.

Note that Paraview also freezes when a animation is left to run overnight.

The documentation of vtkTimeStamp actually states that the overflow can occur, but continues to state that "the typical consequence should be that some filters will update themselves when really they don't need to". Needless to say that the opposite is the case. This problem also occurs when compiling VTK in 64-bits, as the unsigned long type is 32-bits even on 64-bit platforms.

Attached is a patch to fix this issue in VTK 5.10.1. It modifies vtkTimeStamp to use a uint64_t type and replaces the direct use of the unsigned long type throughout VTK to the vtkTimeStamp interface. The vtkTimeStamp implementation itself is only modified for Windows systems, but since only this class would have to be changed it should be trivial to modify for Mac and Linux systems.

The problem still occurs in VTK 6.0, but due to the rearrangement of the code in directories, the patch requires some editing in order to apply it to VTK 6.0 (and probably some other changes would have to be made).

Since this is a major oversight in VTK and the timestamping mechanism is crucial to VTK's functioning, this should also be fixed in VTK 6.0. When using VTK in long running simulations, people are very likely to run into this issue.

hackaton
patch vtk_5.10.1_vtkTimeStamp.patch (384,236) 2013-09-27 03:21
https://www.vtk.org/Bug/file/9535/vtk_5.10.1_vtkTimeStamp.patch
? vtk_6_1_0_vtkTimeStamp (277,585) 2015-12-16 10:41
https://www.vtk.org/Bug/file/9982/vtk_6_1_0_vtkTimeStamp
Issue History
2013-09-27 03:21arjenvrNew Issue
2013-09-27 03:21arjenvrFile Added: vtk_5.10.1_vtkTimeStamp.patch
2013-12-17 21:52Dave DeMarleTarget Version => 6.2.0
2013-12-30 15:01David ColeNote Added: 0032094
2014-01-02 08:52David ColeNote Added: 0032106
2014-01-02 10:40David ColeNote Added: 0032107
2014-01-02 13:06Brad KingNote Added: 0032109
2014-05-14 10:11David ColeNote Added: 0032629
2014-10-01 12:48Berk GeveciAssigned To => Berk Geveci
2014-10-01 12:48Berk GeveciStatusbacklog => tabled
2014-10-01 12:49Berk GeveciStatustabled => backlog
2014-10-01 20:06Sean McBrideTag Attached: hackaton
2015-12-16 10:41SylvainFile Added: vtk_6_1_0_vtkTimeStamp
2015-12-16 10:43SylvainNote Added: 0035574
2016-08-12 09:55Kitware RobotNote Added: 0037310
2016-08-12 09:55Kitware RobotStatusbacklog => closed
2016-08-12 09:55Kitware RobotResolutionopen => moved

Notes
(0032094)
David Cole   
2013-12-30 15:01   
For whoever wants to work on fixing the actual overflow problem, here's a good "trick" you can use to reproduce the overflow quickly, rather than running something for hours or days to get there...:

In the source file, vtkTimeStamp.cxx, set the initial value of the GlobalTimeStamp variable to INT_MAX - 1000 rather than 0.

Fiddle around with your own program, and see what the time stamp values are just a few seconds into running your program. Start at INT_MAX minus that number and the overflow will occur just a few seconds in...

Then you can reproduce this, (and then verify that you've fixed the actual problem), by forcing an early overflow.
(0032106)
David Cole   
2014-01-02 08:52   
Here is a gerrit topic to assist in tracking down this problem, and developing a fix for it:

    http://review.source.kitware.com/#/t/3791 [^]

The first commit in that topic adds some output to many of the VTK C++ tests that emits output like this:

    <DartMeasurement name="Final_vtkTimeStamp_GetMTime" type="numeric/double">10374</DartMeasurement>

This enables you to measure how high the vtkTimeStamp MTime count gets during the run of each test... Only 3 out of 300+ tests have final MTime values of less than 200, which leads to the code in the second commit.

The code in the second commit forces overflow at *some* point during many of the VTK tests using code like this:

//-------------------------------------------------------------------------
#define CALLS_UNTIL_OVERFLOW 200

//-------------------------------------------------------------------------
void vtkTimeStamp::Modified()
{
#if VTK_SIZEOF_VOID_P == 8
  static vtkAtomicInt<vtkTypeInt64> GlobalTimeStamp(0xffffffffffffffffULL - CALLS_UNTIL_OVERFLOW);
#else
  static vtkAtomicInt<vtkTypeInt32> GlobalTimeStamp(0xffffffff - CALLS_UNTIL_OVERFLOW);
#endif

You can see that this results in many failed tests by observing the CDash results from this gerrit topic.

The tests after pushing just the first commit *all* *pass* on *all* the CDash@home builds:

http://open.cdash.org/index.php?&project=VTK&filtercount=2&showfilters=1&filtercombine=and&field1=buildname/string&compare1=63&value1=T3791timestamp-experiments-1&field2=buildstarttime/date&compare2=83&value2=2014-1-1 [^]

The tests after pushing the second commit (and getting good builds) have from 43 to 63 (roughly ~5%) failing tests:

http://open.cdash.org/index.php?&project=VTK&filtercount=2&showfilters=1&filtercombine=and&field1=buildname/string&compare1=63&value1=T3791timestamp-experiments-5&field2=buildstarttime/date&compare2=83&value2=2014-1-1 [^]

I suspect if you use different numbers than 200 for the CALLS_UNTIL_OVERFLOW value, you'll see variations in which tests fail and how they fail.

Seems to me like the only reasonable solution would be to track all vtkObjects in a global map somewhere, and then reset the modified count of all objects to be 0 or 1 at overflow time, and then bump the modified time of the object that caused the overflow after that. And of course, you'd have to track down all the objects that have a cached MTime variable in them, and update those as well...

This seems like a very large can of worms... Anybody have a good suggestion about how to revamp things to make everything work across the overflow point?

This will also happen more quickly in highly-interactive many-object scenarios, where many MTimes are updated on each mouse move, for example.
(0032107)
David Cole   
2014-01-02 10:40   
Actually, I guess the obvious "simple" fix to this is to make sure that all cached MTimes in all objects are represented as an abstract type, and then simply make that type wide enough to handle more and more bits as needed to avoid overflow.

That may still be quite challenging. I'm not sure it's at all obvious that there's a good way to track down every occurrence of objects caching MTimes...

It would certainly be extensible to avoid the problem indefinitely if we can just make that abstract type wide enough.
(0032109)
Brad King   
2014-01-02 13:06   
Re 0014310:0032107: The attached "vtk_5.10.1_vtkTimeStamp.patch" changes hard-coding of the "unsigned long" timestamp APIs to "vtkTimeStamp", which is similar to the abstract type you propose. I think it is cleaner to separate the integer type from the vtkTimeStamp API so that all GetMTime methods return the integer type and all vtkTimeStamp instances are hidden inside objects.

On 64-bit Linux the current "unsigned long" underlying type is already 64-bit. It is on 32-bit platforms and 64-bit Windows that "unsigned long" is still only 32 bits.

I think it is simplest to just make the integer type always be vtkTypeUInt64, but use a typedef like "vtkTimeStampInt" to make the type easy to replace later.

A future abstract timestamp integer type like that proposed in 0014310:0032107 could generalize itself to a bigint rollover of a simple integer type would occur.
(0032629)
David Cole   
2014-05-14 10:11   
Holy huge patch files, Batman!

Too bad the patch is not against VTK 6.1, or I'd test it out today. I have a real-world scenario right now where we're running into this a few hours into our run that features live high-frequency animation.

Is anybody actively working on fixing this in VTK 'master' ...?
(0035574)
Sylvain   
2015-12-16 10:43   
Thank you guys for the original patch.
I uploaded an updated one for VTK 6.1.0 where the issue is still there.

We are using VTK in a medical software and were hit by the very issue after about 2 hours.

I currently applied the 64-bit integer solution but I would like a better correction as this does not really correct the issue.
(0037310)
Kitware Robot   
2016-08-12 09:55   
Resolving issue as `moved`.

This issue tracker is no longer used. Further discussion of this issue may take place in the current VTK Issues page linked in the banner at the top of this page.