[vtkusers] K-means values
Sara Rolfe
smrolfe at u.washington.edu
Wed Mar 16 16:07:35 EDT 2011
Hi David,
It's not a difficult fix, I mainly found it awkward since I thought I
was missing something simple. I understand how it works better now
and can certainly implement it this way. I appreciate your
clarification.
Sara
On Mar 16, 2011, at 11:22 AM, David Thompson wrote:
> Hi Sara,
>
>> It seems like I could solve this by using learning and specifying
>> one iteration, but this seems awkward. If anyone is aware of a
>> better way to access the means from the kMeansStatistics output,
>> could you let me know?
>
> That would indeed be the only way to have the filter compute the
> mean coordinates for each cluster; the mean coordinates are part of
> the statistical model, not the assessment of data, and the model is
> computed by Learn and Derive. I'm sorry you find it awkward, but
> that's the way things are at the moment. Do you have some suggestion
> on how to change things? It doesn't seem to me to involve a
> significant amount of code to get the means computed:
> kMeansStatistics->SetLearnOption( 1 ); // This is on by default.
> kMeansStatistics->SetMaxNumIterations( 1 );
> nor a lot of code to access them:
> vtkTable* tab = vtkTable::SafeDownCast( kMeansStatistics-
> >GetOutputDataObject( 1 ).GetBlock( 0 ) );
> double xc = tab->GetValueByName( label, "x" ).ToDouble();
>
> David
>
>> On Mar 15, 2011, at 3:51 PM, Sara Rolfe wrote:
>>
>>> Hi David,
>>>
>>> Thanks for your reply. Right now I'm using vtkKmeansStatistics
>>> without learning and am following the example here:
>>>
>>> http://www.vtk.org/Wiki/VTK/Examples/InfoVis/KMeansClustering
>>>
>>> The output that I get using kMeansStatistics->GetOutput()->Dump()
>>> shows the original value, the distance to the nearest cluster, and
>>> cluster id it is assigned to, instead of the cluster mean.
>>>
>>> +-----------------+-----------------+------------------+
>>> | Magnitude | distance (0) | closest id (0) |
>>> +-----------------+-----------------+------------------+
>>> | 0.0657005 | 6.44972e-06 | 4 |
>>> | 0.0652216 | 4.24651e-06 | 4 |
>>> | 0.0646891 | 2.33557e-06 | 4 |
>>> | 0.0641142 | 9.08931e-07 | 4 |
>>> | 0.0635069 | 1.19747e-07 | 4 |
>>> | 0.0666587 | 1.2235e-05 | 4 |
>>>
>>> I think I will probably use learning, but I'd like to get it
>>> working without first.
>>>
>>> Thanks,
>>> Sara
>>>
>>> On Mar 15, 2011, at 3:27 PM, Thompson, David C wrote:
>>>
>>>> Hi Sara,
>>>>
>>>>> I'm using vtkKmeansStatistics to successfully cluster data points.
>>>>> However, I'm missing how you access the actual cluster mean
>>>>> values,
>>>>> instead of just their labels. It looks like the order of the
>>>>> labels
>>>>> may not correspond to the values of the means, is this true?
>>>>
>>>> I'm not clear on what you mean by "label". I've run the filter on
>>>> data with 2 columns (named x & y) and with 2 sets of initial
>>>> cluster center coordinates specified on the LEARN_PARAMETERS
>>>> input: one for k=2 and one for k=3. I get this table:
>>>>
>>>> +----------------+----------------+----------------
>>>> +----------------+----------------+----------------
>>>> +-----------------+
>>>> | Run ID | k | Iterations |
>>>> Error | Cardinality | x |
>>>> y |
>>>> +----------------+----------------+----------------
>>>> +----------------+----------------+----------------
>>>> +-----------------+
>>>> | 0 | 2 | 3 |
>>>> 1528.94 | 772 | 0.166201 |
>>>> 0.12059 |
>>>> | 0 | 2 | 3 |
>>>> 498.266 | 228 | 2.79467 |
>>>> 2.99856 |
>>>> | 1 | 3 | 15 |
>>>> 546.596 | 397 | -0.341883 |
>>>> -0.486857 |
>>>> | 1 | 3 | 15 |
>>>> 546.946 | 405 | 0.758854 |
>>>> 0.855424 |
>>>> | 1 | 3 | 15 |
>>>> 381.077 | 198 | 2.99941 |
>>>> 3.14951 |
>>>> +----------------+----------------+----------------
>>>> +----------------+----------------+----------------
>>>> +-----------------+
>>>>
>>>> as the first block of output 1 (i.e.,
>>>> GetOutputDataObject( 1 ).GetBlock( 0 ).Dump() will produce the
>>>> above). The first 2 rows contain the cluster mean values
>>>> corresponding to the run with k=2 and the final 3 rows have the
>>>> same for the run with k=3. Because there are 2 coordinates (x &
>>>> y) for each cluster center, there is no good way to order cluster
>>>> centers by their means. Instead, their order matches the initial
>>>> guesses at cluster centers specified on the LEARN_PARAMETERS
>>>> input if it exists. Otherwise, the order is random because the
>>>> initial guesses are produced randomly. Is this what you wanted to
>>>> know?
>>>>
>>>> David
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the VTK FAQ at: http://www.vtk.org/Wiki/VTK_FAQ
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.vtk.org/mailman/listinfo/vtkusers
>>
>> <ATT00002..txt>
>
>
More information about the vtkusers
mailing list