Found issue with space encoding in Description file in RF2 for SNOMED CT release 20210331

Anibal Jodorcovsky
Auteur du sujet
Hors Ligne

Messages : 50

il y a 3 ans 7 mois #7121 par Anibal Jodorcovsky

Réponse de Anibal Jodorcovsky sur le sujet Found issue with space encoding in Description file in RF2 for SNOMED CT release 20210331

OK, so what does that mean exactly? I'm new to this, so please bear with me. I'm assuming your answer is telling me that somebody within CHI is then responsible for fixing this?

Connexion ou Créer un compte pour participer à la conversation.

Jon Zammit
Hors Ligne

Messages : 13

il y a 3 ans 7 mois #7120 par Jon Zammit

Réponse de Jon Zammit sur le sujet Found issue with space encoding in Description file in RF2 for SNOMED CT release 20210331

Hi Anibal,

The descriptions you have high-lighted in your screen shot are part of the Canadian extension. You can determine that based on the moduleId attribute which in this case is 20611000087101 |Canada Health Infoway French module (core metadata concept)|.

I hope that helps.

Regards,

Jon Zammit

Connexion ou Créer un compte pour participer à la conversation.

Anibal Jodorcovsky
Auteur du sujet
Hors Ligne

Messages : 50

il y a 3 ans 7 mois #7119 par Anibal Jodorcovsky

Found issue with space encoding in Description file in RF2 for SNOMED CT release 20210331 a été créé par Anibal Jodorcovsky

Hi all,

Not sure if this is the right place to post this, but given the group description I thought it'd be worth a shot.

I'm trying to automate a whole bunch of tasks that were done by hand previously within our group.

To this end, I’m writing several scripts and SQL against a MS Access DB that houses the RF2 SNOMED CT CAD release.

One of our tools is not working as expected when doing comparisons and after a lot of digging, I discovered that the source files within the RF2 release are encoding the space between words differently in some cases.

The file in question is this:

C:\Users\aniba\Desktop\SnomedCT_Canadian_EditionRelease_PRODUCTION_20210331T120000Z\Full\Terminology sct2_Description_Full_CanadianEdition_20210331.txt

See attachment - taken from a Sublime text capture - where we can see the issue [hmmm, I can't find a way to upload an attachment to a topic].

I uploaded the screenshot to a public google folder, here it is:

drive.google.com/file/d/1XtIPvxFXQNHFPyHdzIlwXRxNsKFQ7lFI/view?usp=sharing

That’s the screen where I’m seeing “hidden” characters in some of the terms.

Notice how the space between several terms is encoded as <0xa0> rather than <0x20> as it should be and all other terms are.

<0xa0> is part of the extended ASCII char set and it should not be used in txt files like this, in particular when we need to be consistent. So, we either use <0xa0> for all spaces, or <0x20>.

This is causing our tools to break and are unable to compare terms automatically.

Is this something that comes from SNOMED International or is this something that comes from CHI?

Connexion ou Créer un compte pour participer à la conversation.

Modérateurs: Linda Monico, Himanshu Khetarpal, Helen Wu

Propulsé par Kunena

Found issue with space encoding in Description file in RF2 for SNOMED CT release 20210331

La santé numérique à votre service