Share this page:

question-circle Found issue with space encoding in Description file in RF2 for SNOMED CT release 20210331

  • Posts: 50
2 years 7 months ago #7121 by Anibal Jodorcovsky
OK, so what does that mean exactly? I'm new to this, so please bear with me. I'm assuming your answer is telling me that somebody within CHI is then responsible for fixing this?

Please Log in or Create an account to join the conversation.

  • Posts: 13
2 years 7 months ago #7120 by Jon Zammit
Hi Anibal,

The descriptions you have high-lighted in your screen shot are part of the Canadian extension. You can determine that based on the moduleId attribute which in this case is 20611000087101 |Canada Health Infoway French module (core metadata concept)|.

I hope that helps.

Regards,

Jon Zammit

Please Log in or Create an account to join the conversation.

  • Posts: 50
2 years 7 months ago #7119 by Anibal Jodorcovsky
Hi all,

Not sure if this is the right place to post this, but given the group description I thought it'd be worth a shot.

I'm trying to automate a whole bunch of tasks that were done by hand previously within our group.

To this end, I’m writing several scripts and SQL against a MS Access DB that houses the RF2 SNOMED CT CAD release.

One of our tools is not working as expected when doing comparisons and after a lot of digging, I discovered that the source files within the RF2 release are encoding the space between words differently in some cases.

The file in question is this:

C:\Users\aniba\Desktop\SnomedCT_Canadian_EditionRelease_PRODUCTION_20210331T120000Z\Full\Terminology sct2_Description_Full_CanadianEdition_20210331.txt

See attachment - taken from a Sublime text capture - where we can see the issue [hmmm, I can't find a way to upload an attachment to a topic].

I uploaded the screenshot to a public google folder, here it is:

drive.google.com/file/d/1XtIPvxFXQNHFPyHdzIlwXRxNsKFQ7lFI/view?usp=sharing

That’s the screen where I’m seeing “hidden” characters in some of the terms.

Notice how the space between several terms is encoded as <0xa0> rather than <0x20> as it should be and all other terms are.

<0xa0> is part of the extended ASCII char set and it should not be used in txt files like this, in particular when we need to be consistent. So, we either use <0xa0> for all spaces, or <0x20>.

This is causing our tools to break and are unable to compare terms automatically.

Is this something that comes from SNOMED International or is this something that comes from CHI?

Please Log in or Create an account to join the conversation.

InfoCentral logo

Improving the quality of patient care through the effective sharing of clinical information among health care organizations, clinicians and their patients.