<html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office"><head><!--[if gte mso 9]><xml><o:OfficeDocumentSettings><o:AllowPNG/><o:PixelsPerInch>96</o:PixelsPerInch></o:OfficeDocumentSettings></xml><![endif]--></head><body><div style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:16px;"><div><div><div><div>I was worried that might be the case. I recently came across an article where the author suggested using UTF8 for all internal C/C++ strings. <a href="http://www.nubaria.com/en/blog/?p=289" rel="nofollow" target="_blank" class="enhancr_card_6506204616">Using UTF-8 as the internal representation for strings in C and C++ with Visual Studio | Nubaria Blog</a> <br><div><br>This seems like a sensible approach to me. It is also the solution adopted by the free Pascal compiler (FPC) team.</div></div><div><div><br>If the VTK readers/writers were instantiated with a particular encoding via a constructor parameter, would this not permit flexibility and certainty? Has there been any discussion about this? Is all this work a lower priority?<br></div></div></div><br></div><div><br></div><div id="ydp8baac77cenhancr_card_6506204616" class="ydp8baac77cyahoo-link-enhancr-card ydp8baac77cymail-preserve-class ydp8baac77cymail-preserve-style" style="max-width:400px;font-family:"Helvetica Neue", "Segoe UI", Helvetica, Arial, sans-serif;" data-url="http://www.nubaria.com/en/blog/?p=289" data-type="YENHANCER" data-size="MEDIUM" contenteditable="false"><a href="http://www.nubaria.com/en/blog/?p=289" style="text-decoration:none !important;color:#000 !important;" class="ydp8baac77cyahoo-enhancr-cardlink" rel="nofollow" target="_blank"><table class="ydp8baac77ccard-wrapper ydp8baac77cyahoo-ignore-table" style="max-width:400px;" border="0" cellspacing="0" cellpadding="0"><tbody><tr><td width="400"><table class="ydp8baac77ccard ydp8baac77cyahoo-ignore-table" style="max-width:400px;border-width:1px;border-style:solid;border-color:rgb(224, 228, 233);border-radius:2px;" width="100%" border="0" cellspacing="0" cellpadding="0"><tbody><tr><td class="ydp8baac77ccard-primary-image-cell" style="background-color: rgb(0, 0, 0); background-repeat: no-repeat; background-size: cover; position: relative; border-radius: 2px 2px 0px 0px; min-height: 175px;" valign="top" height="175" background="https://s.yimg.com/lo/api/res/1.2/HU2c3s6BTkPtZkE7LAyA1w--~A/Zmk9ZmlsbDt3PTQwMDtoPTIwMDthcHBpZD1pZXh0cmFjdA--/http://www.nubaria.com/../../images/logo_nubaria.png.cf.jpg" bgcolor="#000000"><!--[if gte mso 9]><v:rect fill="true" stroke="false" style="width:396px;height:175px;position:absolute;top:0;left:0;"><v:fill type="frame" color="#000000" src="https://s.yimg.com/lo/api/res/1.2/HU2c3s6BTkPtZkE7LAyA1w--~A/Zmk9ZmlsbDt3PTQwMDtoPTIwMDthcHBpZD1pZXh0cmFjdA--/http://www.nubaria.com/../../images/logo_nubaria.png.cf.jpg"/></v:rect><![endif]--><table class="ydp8baac77ccard-overlay-container-table ydp8baac77cyahoo-ignore-table" style="width:100%;" border="0" cellspacing="0" cellpadding="0"><tbody><tr><td class="ydp8baac77ccard-overlay-cell" style="background-color: transparent; border-radius: 2px 2px 0px 0px; min-height: 175px;" valign="top" background="https://s.yimg.com/cv/ae/nq/storm/assets/enhancrV21/1/enhancr_gradient-400x175.png" bgcolor="transparent"><!--[if gte mso 9]><v:rect fill="true" stroke="false" style="width:396px;height:175px;position:absolute;top:-18px;left:0;"><v:fill type="pattern" color="#000000" src="https://s.yimg.com/cv/ae/nq/storm/assets/enhancrV21/1/enhancr_gradient-400x175.png"/><v:textbox inset="0,0,20px,0"><![endif]--><table class="ydp8baac77cyahoo-ignore-table" style="width: 100%; min-height: 175px;" height="175" border="0"><tbody><tr><td class="ydp8baac77ccard-richInfo2" style="text-align:left;padding:15px 0 0 15px;vertical-align:top;"></td><td class="ydp8baac77ccard-actions" style="text-align:right;padding:15px 15px 0 0;vertical-align:top;"><div class="ydp8baac77ccard-share-container"></div></td></tr></tbody></table><!--[if gte mso 9]></v:textbox></v:rect><![endif]--></td></tr></tbody></table></td></tr><tr><td><table class="ydp8baac77ccard-info ydp8baac77cyahoo-ignore-table" style="background-color: rgb(255, 255, 255); background-image: none; background-repeat: repeat; background-attachment: scroll; background-size: auto auto; position: relative; z-index: 2; width: 100%; max-width: 400px; border-radius: 0px 0px 2px 2px; border-top: 1px solid rgb(224, 228, 233);" border="0" cellspacing="0" cellpadding="0" align="center"><tbody><tr><td style="background-color:#ffffff;padding:16px 0 16px 12px;vertical-align:top;border-radius:0 0 0 2px;"></td><td style="vertical-align:middle;padding:12px 24px 16px 12px;width:99%;font-family:"Helvetica Neue", "Segoe UI", Helvetica, Arial, sans-serif;border-radius:0 0 2px 0;"><h2 class="ydp8baac77ccard-title" style="font-size:14px;line-height:19px;margin:0 0 6px 0;font-family:"Helvetica Neue", "Segoe UI", Helvetica, Arial, sans-serif;word-break:break-word;color:#26282a;">Using UTF-8 as the internal representation for strings in C and C++ with...</h2><p class="ydp8baac77ccard-description" style="font-size:12px;line-height:16px;margin:0px;color:#979ba7;word-break:break-word;"></p></td></tr></tbody></table></td></tr></tbody></table></td></tr></tbody></table></a></div><div><br></div><div><br></div><div><br></div><div><br></div><div class="ydpdae72f64signature"><div style="font-size:16px;"><div>Todd Martin, Ph.D.<br></div><div>Freelance Engineer/Software Architect.</div><br></div></div></div>
<div><br></div><div><br></div>
<div id="yahoo_quoted_6908739160" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
On Friday, May 11, 2018, 12:02:41 PM GMT+12, David Gobbi <david.gobbi@gmail.com> wrote:
</div>
<div><br></div>
<div><br></div>
<div><div id="yiv3520944144"><div><div dir="ltr"><div>For the most part, VTK strings are in the local 8-bit encoding, whatever that happens to be. On Linux and Mac, the local 8-bit encoding is pretty much guaranteed to be utf-8. On Windows, if you're in North America or western Europe, its latin1 or more precisely Windows-1252.</div><div><br clear="none"></div><div>The reason this is so is that the IO classes (readers, writers, etc) simply use 8-bit strings filenames etc. when calling system IO functions. VTK uses ifstream(const char *fname, ...) and let's the system decide how "fname" is encoded. But this is not consistent across all of the readers, since some readers use third-party libraries to handle the IO and then you're at the mercy of whatever encoding that third-party library uses.</div><div><br clear="none"></div><div>On the display side of things (e.g. when using the VTK text mapper classes, I believe that VTK actually does use utf-8, but I haven't experimented to be sure that all the VTK display classes work the same.</div><div><br clear="none"></div><div>In other words, strings are a bit of a mess in VTK unless you're willing to be satisfied with ASCII.</div><div><br clear="none"></div><div><div style="color:rgb(34,34,34);font-family:arial, sans-serif;font-size:small;font-style:normal;font-weight:400;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-color:initial;">The vtkUnicodeString is UCS-4 (32-bit code points).</div></div><div><br clear="none"></div> - David<div><br clear="none"><div class="yiv3520944144yqt0499830681" id="yiv3520944144yqt05808"><div class="yiv3520944144gmail_extra"><br clear="none"><div class="yiv3520944144gmail_quote">On Thu, May 10, 2018 at 5:35 PM, Todd via vtk-developers <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:vtk-developers@vtk.org" target="_blank" href="mailto:vtk-developers@vtk.org">vtk-developers@vtk.org</a>></span> wrote:<br clear="none"><blockquote class="yiv3520944144gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;"><div>Can someone please tell me the default/expected encoding for a std::string in VTK. I'm assuming it is UTF8. Therefore I expect vtkUnicodeString (a terrible name) is encoded as UTF16. Is that correct?</div></blockquote></div><br clear="none"></div></div></div></div></div></div></div>
</div>
</div></div></body></html>