View Issue Details Jump to Notes ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0014310VTK(No Category)public2013-09-27 03:212016-08-12 09:55
Reporterarjenvr 
Assigned ToBerk Geveci 
PriorityurgentSeverityminorReproducibilityhave not tried
StatusclosedResolutionmoved 
PlatformOSOS Version
Product Version5.10.1 
Target Version6.2.0Fixed in Version 
Summary0014310: VTK-based applications completely freeze after a couple of hours
DescriptionWhen running an animation or simulation where VTK objects are updated each frame, VTK completely freezes after a couple of hours.

The problem is caused by VTK's implementation of vtkTimeStamp. The class uses a static 32-bit integer to hold the current 'time', and increments this value every time Modified is called on any VTK object. This number eventually overflows, after which all modified time comparisons in VTK can break.

This behaviour usually occurs within a day, but may occur within a few hours, depending on the number of objects updated each frame.

Note that Paraview also freezes when a animation is left to run overnight.

The documentation of vtkTimeStamp actually states that the overflow can occur, but continues to state that "the typical consequence should be that some filters will update themselves when really they don't need to". Needless to say that the opposite is the case. This problem also occurs when compiling VTK in 64-bits, as the unsigned long type is 32-bits even on 64-bit platforms.

Attached is a patch to fix this issue in VTK 5.10.1. It modifies vtkTimeStamp to use a uint64_t type and replaces the direct use of the unsigned long type throughout VTK to the vtkTimeStamp interface. The vtkTimeStamp implementation itself is only modified for Windows systems, but since only this class would have to be changed it should be trivial to modify for Mac and Linux systems.

The problem still occurs in VTK 6.0, but due to the rearrangement of the code in directories, the patch requires some editing in order to apply it to VTK 6.0 (and probably some other changes would have to be made).

Since this is a major oversight in VTK and the timestamping mechanism is crucial to VTK's functioning, this should also be fixed in VTK 6.0. When using VTK in long running simulations, people are very likely to run into this issue.

Tagshackaton
ProjectTBD
Typecrash
Attached Filespatch file icon vtk_5.10.1_vtkTimeStamp.patch [^] (384,236 bytes) 2013-09-27 03:21
? file icon vtk_6_1_0_vtkTimeStamp [^] (277,585 bytes) 2015-12-16 10:41

 Relationships

  Notes
(0032094)
David Cole (developer)
2013-12-30 15:01

For whoever wants to work on fixing the actual overflow problem, here's a good "trick" you can use to reproduce the overflow quickly, rather than running something for hours or days to get there...:

In the source file, vtkTimeStamp.cxx, set the initial value of the GlobalTimeStamp variable to INT_MAX - 1000 rather than 0.

Fiddle around with your own program, and see what the time stamp values are just a few seconds into running your program. Start at INT_MAX minus that number and the overflow will occur just a few seconds in...

Then you can reproduce this, (and then verify that you've fixed the actual problem), by forcing an early overflow.
(0032106)
David Cole (developer)
2014-01-02 08:52

Here is a gerrit topic to assist in tracking down this problem, and developing a fix for it:

    http://review.source.kitware.com/#/t/3791 [^]

The first commit in that topic adds some output to many of the VTK C++ tests that emits output like this:

    <DartMeasurement name="Final_vtkTimeStamp_GetMTime" type="numeric/double">10374</DartMeasurement>

This enables you to measure how high the vtkTimeStamp MTime count gets during the run of each test... Only 3 out of 300+ tests have final MTime values of less than 200, which leads to the code in the second commit.

The code in the second commit forces overflow at *some* point during many of the VTK tests using code like this:

//-------------------------------------------------------------------------
#define CALLS_UNTIL_OVERFLOW 200

//-------------------------------------------------------------------------
void vtkTimeStamp::Modified()
{
#if VTK_SIZEOF_VOID_P == 8
  static vtkAtomicInt<vtkTypeInt64> GlobalTimeStamp(0xffffffffffffffffULL - CALLS_UNTIL_OVERFLOW);
#else
  static vtkAtomicInt<vtkTypeInt32> GlobalTimeStamp(0xffffffff - CALLS_UNTIL_OVERFLOW);
#endif

You can see that this results in many failed tests by observing the CDash results from this gerrit topic.

The tests after pushing just the first commit *all* *pass* on *all* the CDash@home builds:

http://open.cdash.org/index.php?&project=VTK&filtercount=2&showfilters=1&filtercombine=and&field1=buildname/string&compare1=63&value1=T3791timestamp-experiments-1&field2=buildstarttime/date&compare2=83&value2=2014-1-1 [^]

The tests after pushing the second commit (and getting good builds) have from 43 to 63 (roughly ~5%) failing tests:

http://open.cdash.org/index.php?&project=VTK&filtercount=2&showfilters=1&filtercombine=and&field1=buildname/string&compare1=63&value1=T3791timestamp-experiments-5&field2=buildstarttime/date&compare2=83&value2=2014-1-1 [^]

I suspect if you use different numbers than 200 for the CALLS_UNTIL_OVERFLOW value, you'll see variations in which tests fail and how they fail.

Seems to me like the only reasonable solution would be to track all vtkObjects in a global map somewhere, and then reset the modified count of all objects to be 0 or 1 at overflow time, and then bump the modified time of the object that caused the overflow after that. And of course, you'd have to track down all the objects that have a cached MTime variable in them, and update those as well...

This seems like a very large can of worms... Anybody have a good suggestion about how to revamp things to make everything work across the overflow point?

This will also happen more quickly in highly-interactive many-object scenarios, where many MTimes are updated on each mouse move, for example.
(0032107)
David Cole (developer)
2014-01-02 10:40

Actually, I guess the obvious "simple" fix to this is to make sure that all cached MTimes in all objects are represented as an abstract type, and then simply make that type wide enough to handle more and more bits as needed to avoid overflow.

That may still be quite challenging. I'm not sure it's at all obvious that there's a good way to track down every occurrence of objects caching MTimes...

It would certainly be extensible to avoid the problem indefinitely if we can just make that abstract type wide enough.
(0032109)
Brad King (developer)
2014-01-02 13:06

Re 0014310:0032107: The attached "vtk_5.10.1_vtkTimeStamp.patch" changes hard-coding of the "unsigned long" timestamp APIs to "vtkTimeStamp", which is similar to the abstract type you propose. I think it is cleaner to separate the integer type from the vtkTimeStamp API so that all GetMTime methods return the integer type and all vtkTimeStamp instances are hidden inside objects.

On 64-bit Linux the current "unsigned long" underlying type is already 64-bit. It is on 32-bit platforms and 64-bit Windows that "unsigned long" is still only 32 bits.

I think it is simplest to just make the integer type always be vtkTypeUInt64, but use a typedef like "vtkTimeStampInt" to make the type easy to replace later.

A future abstract timestamp integer type like that proposed in 0014310:0032107 could generalize itself to a bigint rollover of a simple integer type would occur.
(0032629)
David Cole (developer)
2014-05-14 10:11

Holy huge patch files, Batman!

Too bad the patch is not against VTK 6.1, or I'd test it out today. I have a real-world scenario right now where we're running into this a few hours into our run that features live high-frequency animation.

Is anybody actively working on fixing this in VTK 'master' ...?
(0035574)
Sylvain (reporter)
2015-12-16 10:43

Thank you guys for the original patch.
I uploaded an updated one for VTK 6.1.0 where the issue is still there.

We are using VTK in a medical software and were hit by the very issue after about 2 hours.

I currently applied the 64-bit integer solution but I would like a better correction as this does not really correct the issue.
(0037310)
Kitware Robot (administrator)
2016-08-12 09:55

Resolving issue as `moved`.

This issue tracker is no longer used. Further discussion of this issue may take place in the current VTK Issues page linked in the banner at the top of this page.

 Issue History
Date Modified Username Field Change
2013-09-27 03:21 arjenvr New Issue
2013-09-27 03:21 arjenvr File Added: vtk_5.10.1_vtkTimeStamp.patch
2013-12-17 21:52 Dave DeMarle Target Version => 6.2.0
2013-12-30 15:01 David Cole Note Added: 0032094
2014-01-02 08:52 David Cole Note Added: 0032106
2014-01-02 10:40 David Cole Note Added: 0032107
2014-01-02 13:06 Brad King Note Added: 0032109
2014-05-14 10:11 David Cole Note Added: 0032629
2014-10-01 12:48 Berk Geveci Assigned To => Berk Geveci
2014-10-01 12:48 Berk Geveci Status backlog => tabled
2014-10-01 12:49 Berk Geveci Status tabled => backlog
2014-10-01 20:06 Sean McBride Tag Attached: hackaton
2015-12-16 10:41 Sylvain File Added: vtk_6_1_0_vtkTimeStamp
2015-12-16 10:43 Sylvain Note Added: 0035574
2016-08-12 09:55 Kitware Robot Note Added: 0037310
2016-08-12 09:55 Kitware Robot Status backlog => closed
2016-08-12 09:55 Kitware Robot Resolution open => moved


Copyright © 2000 - 2018 MantisBT Team