Extracting information from SNOMED CT RF2 files

Début
Précédent
1
2
Suivant
Fin

Guillermo Reynoso
Hors Ligne

Messages : 11

il y a 3 ans 1 mois #7237 par Guillermo Reynoso

Réponse de Guillermo Reynoso sur le sujet Extracting information from SNOMED CT RF2 files

Hi Anibal,

The terminology server returns data about one concept, organized as collections of descriptions, axioms, and relationships. Filter for the respective language reference sets Ids rather than by the module Ids to identify the descriptions relevant to Canadian French or Canadian English,

The "Canadian preferences" represented in the language reference sets could reference descriptions from different module Ids and languages.

For each description in the array of descriptions bound to the concept, you have a collection of acceptabilities called the "acceptabilityMap." Each entry represents the acceptability (Preferred, or just acceptable as a synonym) for a given language reference set.

For example, in this JSON segment:

{
"active": true,
"moduleId": "11000241103",
"released": true,
"releasedEffectiveTime": 20210331,
"descriptionId": "849831000241115",
"term": "frère (personne)",
"conceptId": "70924004",
"typeId": "900000000000003001",
"acceptabilityMap": {
"20581000087109": "PREFERRED"
},
"type": "FSN",
"caseSignificance": "CASE_INSENSITIVE",
"lang": "fr",
"effectiveTime": "20210331"
},
{
"active": true,
"moduleId": "900000000000207008",
"released": true,
"releasedEffectiveTime": 20170731,
"descriptionId": "811033015",
"term": "Brother (person)",
"conceptId": "70924004",
"typeId": "900000000000003001",
"acceptabilityMap": {
"900000000000509007": "PREFERRED",
"19491000087109": "PREFERRED",
"900000000000508004": "PREFERRED"
},
"type": "FSN",
"caseSignificance": "CASE_INSENSITIVE",
"lang": "en",
"effectiveTime": "20170731"
}

The first description ("frère (personnel)") is "PREFERRED" for the Canadian French language reference set (represented by the id 20581000087109) - Please ignore the language code ("lang") and the module Id, the server is informing you that this one is the preferred FSN for Canadian French.

"acceptabilityMap": {
"20581000087109": "PREFERRED"
}

The second description ("Brother (person)") is referenced as preferred in three language reference sets:

"acceptabilityMap": {
"900000000000509007": "PREFERRED",
"19491000087109": "PREFERRED",
"900000000000508004": "PREFERRED"
}

It is the preferred FSN in the "900000000000509007" US English language refset, the "19491000087109" Canadian English language reference set, and also in the "900000000000508004" GB English language reference set.

In conclusion, to extract the Preferred Term for Canadian English, iterate over the collection of descriptions of type: "SYNONYM" and select the one that has the value "PREFERRED" for the acceptability map entry "19491000087109". Acceptable synonyms will have the value "19491000087109": "ACCEPTABLE". The same would apply for FSNs (filtering for type FSN) and for Canadian French (filtering for the corresponding Canadian French Reference set concept Id 20581000087109 instead of the Canadian English one (19491000087109)

Connexion ou Créer un compte pour participer à la conversation.

Anibal Jodorcovsky
Auteur du sujet
Hors Ligne

Messages : 50

il y a 3 ans 1 mois #7226 par Anibal Jodorcovsky

Réponse de Anibal Jodorcovsky sur le sujet Extracting information from SNOMED CT RF2 files

Guillermo,

I'm trying to write a Python script to extract info from snowstorm as you've hinted. It's working mostly well, except that I'm finding a weird situation now with one particular set.

I'm calling this:

browser.snomedtools.org/snowstorm/snomed-ct/browser/MAIN/SNOMEDCT-CA/2021-09-30/concepts/70924004?descendantCountForm=inferred

Within the JSON response I'm looking at a block that has a moduleId of 20621000087109 (Canadian English) or this 20611000087101 (Canadian French). However, I don't see those blocks at all in the JSON response. How's that possible when I can clearly see the entries in the SNOMED CT Canadian Edition browser?

Am I missing something in the URL call?

Connexion ou Créer un compte pour participer à la conversation.

Anibal Jodorcovsky
Auteur du sujet
Hors Ligne

Messages : 50

il y a 3 ans 1 mois #7206 par Anibal Jodorcovsky

Réponse de Anibal Jodorcovsky sur le sujet Extracting information from SNOMED CT RF2 files

Excellent response! This is great! Now I have some work to do.

The main issue on my side is to make sure the people working on the terminology team have access to the tools. I can develop a Python script to automate this, but they'll need to have Python installed (which they don't right now - or find a way to distribute the Python interpreter included).

Thanks again.

Connexion ou Créer un compte pour participer à la conversation.

Guillermo Reynoso
Hors Ligne

Messages : 11

il y a 3 ans 1 mois #7205 par Guillermo Reynoso

Réponse de Guillermo Reynoso sur le sujet Extracting information from SNOMED CT RF2 files

Hi Anibal,

I agree with mlambot that Access is not an option. Most implementations querying SNOMED using relational databases use MySQL (for example, the SNOMED International Release Validation Framework [RVF]) when intensive QA or content/pattern analysis is required. It works great.

However, for the kind of lookups you need to do for your users, I would say it depends entirely on the size of the list of concepts to retrieve.
1) up to a few thousand concepts, the most efficient way is to use one of the Snowstorm terminology servers on the serving internet browsers containing the SNOMED CT Canadian Edition (usually the last official release, other options might be available) instead of using the RF2 distribution or even installing SnowStorm locally. Just use the API, retrieve the JSON corresponding to a "SNOMED CT Concept", and navigate the collections to find the information you need for your output.

For example, if you lookup browser.snomedtools.org/snowstorm/snomed-ct/browser/MAIN/SNOMEDCT-CA/2021-09-30/concepts/195967001?descendantCountForm=inferred you will get all the descriptions for "195967001 |Asthma (disorder)|" in the SNOMED CT Canadian Edition, 20210930 version. Just replace the conceptid parameter and extract the descriptions from the JSON you receive (cURL, Python, any programming language able to retrieve content from an URL.

That would be an easy solution. However, learning the full SnowStorm API is very useful, as mlambot says. The 12-hour, self-paced Terminology Services course from SNOMED International is free for anyone from member countries like Canada. You will find more information here: courses.ihtsdotools.org/product?catalog=TSC and here: elearning.ihtsdotools.org/course/view.php?id=15§ion=5

2) if you need to lookup information for more than a few thousand concepts but not the entire release, you can download Snowstorm and install it locally. You can also obtain it from Docker hub, not sure the last version of Snowstorm (handling concrete domains) is on Docker hub, but you can obtain the source code or the release binaries and install it locally. Then you need to follow the instructions and load the Canadian Edition. Takes some time, but it is a good exercise and it is well explained in the above course.

3) if you need to get info for 400,000 concepts, you will likely not query the server one by one. There are some options to batch it or perhaps use "concept-minis" (concepts coming with only the info required for browser lists or results). However, using a full-blown terminology server to grab all the data again for a very limited purpose is less efficient. So I would probably just go for creating a script in Python or node.js or Java to read the Canadian Edition descriptions snapshot into memory, then reading the language refset snapshots line by line to identify the preferred descriptions and output them. This executes in a few seconds and is useful for simple, repetitive tasks. For very intensive and repetitive data handling operations (like converting entire releases into other formats) we usually consider scripts and in-memory concept representations instead of databases or terminology servers. I will ask in my team if they have something similar to what you need to do already coded, but it is certainly worth experimenting with any option other than Access. Perhaps an in-memory SQL database, if you prefer not to manage the RF2 files directly.

Hope this helps. I encourage everyone to get familiar with Snowstorm in particular and terminology servers in general, as in a few years terminology distribution would be more like a service. Most users don't want to go through he hassle of getting big files, loading them, debugging load issues, update them frequently, etc. They just want to use the terminology in efficient ways that avoid overhead.

Cordially,
Guillermo

Connexion ou Créer un compte pour participer à la conversation.

Anibal Jodorcovsky
Auteur du sujet
Hors Ligne

Messages : 50

il y a 3 ans 1 mois #7204 par Anibal Jodorcovsky

Réponse de Anibal Jodorcovsky sur le sujet Extracting information from SNOMED CT RF2 files

OK, that's good, but how do I perform these type of queries against a terminology server? Say, I have a terminology server available (Snowstorm, or some other), what's the next step to do what I'm trying to achieve?

From what I understand, I'd have to write some code in some language to issue queries against the terminology server for each of the concept IDs that I'm given, do the reference checks in my code, so that I can find what I'm looking for? As you may surmise from my answer is that I'm not very familiar with the query capabilities of FHIR and the terminology server, but with a little bit of guidance or sample code I can manage it afterwards.

If you prefer to have a call to discuss let me know. Don't want to be a burden.

Connexion ou Créer un compte pour participer à la conversation.

Marie-alexandra Lambot
Hors Ligne

Messages : 10

il y a 3 ans 1 mois #7203 par Marie-alexandra Lambot

Réponse de Marie-alexandra Lambot sur le sujet Extracting information from SNOMED CT RF2 files

Hi,

Why don't you use a real terminology server like Snowstorm instead of dreadful MS access which can't support a full release and nearly dies out every time you try to charge the description file?

You can use the test instance of Snowstorm SNOMED international has made available if you need only to do a few queries. If the latest canadian release isn't uploaded yet, you can likely ask Rory to add it.
snowstorm-training.snomedtools.org/snowstorm/snomed-ct/swagger-ui.html

You can also download Snowstorm on github and install it on your computer given he has the minimal RAM memory needed. There is a course on how to set up snowstorm on SNOMED international elearning platform. It's not complicated, even I managed it with only a bit of help from ICT to change the virtual machine part in the BIOS.

There you can easily extract a specific description type in a specific language.

Connexion ou Créer un compte pour participer à la conversation.

Début
Précédent
1
2
Suivant
Fin

Modérateurs: Linda Monico, Naomi Brooks, Helen Wu

Propulsé par Kunena

Extracting information from SNOMED CT RF2 files

La santé numérique à votre service