User:Fæ/Radiopaedia

From NC Commons
Jump to navigation Jump to search
Annotated X-ray of pelvis.
Ultrasound in mp4 format.

Introduction

Radiopaedia is a free open-edit website, creating a database of reference radiology cases.

This project is to upload copies of the individual images, photographs and video from Radiopaedia for easier use on wiki projects.

Analysis

The source site is composed of various users contributing cases, each with a unique rID. The user contribution is one or more study which has a unique number. The study may include one or more images or videos in different formats. Multiple images may be collated in a carousel view. Images may be presented as a stack where the viewer can scroll up and down the image thumbnails. Ref https://radiopaedia.org/articles/stacks. Images have a unique database number used to create links but these and study numbers are invisible to the website viewer.

Carousels display the images within a given stack as named by "plane_projection" and "aux_modality" in the order given by "position". Example metadata for a displayed image:

{
  "id":52747474,
  "fullscreen_filename":"https://prod-images-static.radiopaedia.org/images/52747474/0ebafbd8d4d8728e8aa1eb97b157fd_big_gallery.jpeg",
  "public_filename":"https://prod-images-static.radiopaedia.org/images/52747474/0ebafbd8d4d8728e8aa1eb97b157fd.png",
  "plane_projection":"Axial",
  "aux_modality":"FLAIR",
  "position":22,
  "content_type":"image/png",
  "width":1010,
  "height":1075,
  "show_feature":false,
  "show_pin":false,
  "show_case_key_image":false,
  "show_stack_key_image":false,
  "download_image_url":"/images/52747474/download?layout=false"
}

Note that the filenames are URLs to thumbnails, the full-size version always requires the download button to be pressed for an authorized download to launch.

Due to the potential for 3D scans of a patient being represented by image stacks, a large number of images may be under one case. For example case 25855 has 3 stacks with a total of 250 images.

Configuration

All files are under the parent category Category:Radiopaedia images with images classed by system Category:Radiopaedia images by system, modality Category:Radiopaedia images by type or Category:Radiopaedia images by case for multiple stacks of images. Many "non-case" photographs fall under Radiopaedia images for Not Applicable, some of which have been taken or derived from Wikimedia Commons photographs.

Naming scheme:

<title> (Radiopedia <rID>).<ext>
<title> (Radiopaedia <rID>-<study number> <plane_projection> <aux_modality> <position>).<ext>

The version without a study number is used when the case only has one image or video file. In some cases, aux_modality and/or plane_projection is 'null' or even both together are not unique between image stacks, so when unusable a sequential letter based on the number of stacks is used (like A, B, C...). For example, https://radiopaedia.org/cases/flash-mode-ceus uses "Oblique" six times and has one null.

Filename example:

Abdominal aortic aneurysm (Radiopaedia 83581-98689 Sagittal C+ portal venous phase 91).jpg

On upload the file has an upload comment in the filehistory, this links to this page along with:

rID:<case ID> (batch #<batch sequential count>-<image case sequence number> <stack ref><image stack sequence number>)

Technical

Python3 with Pywikibot is used to gather the data from Radiopaedia. This uses both the conventional Python Requests module to read the web pages, but as the pages are generated rather than static, they are not instantly available. The workaround is to use Selenium to run a Firefox browser instance both to gather metadata after the case page is fully rendered, and also to operate the download process. As a result the batch process is fairly slow, needing seconds of lags to ensure rendering is finished, and it's fairly fragile needing manual oversight to handle unexpected extension types or other failures so cannot be farmed out to a headless server. As a live browser is used, it's not suitable to run when the same local machine is being used for other activities, like videoconferencing, and this also means that upload-by-url cannot be used to speed up the processing.

A stumbling block has been by-passing the download 'Save to' pop-up. This is achieved by having all possible mime-types for the download project set in the Webdriver.FirefoxProfile(). Without this, even if manual intervention works, those choices get lost between run sessions:

fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/force-download,image/png,image/part,image/jpeg,image/x-bmp,text/plain,image/gif,video/mp4,application/pdf,")

Source code is available on Github.

Copyright

All files published on Radiopaedia are covered by a CC-BY-NC-SA 3.0 release, ref license. As a fall-back check at the time of upload to NCC, the web page source header is checked for a "license" link and this is used to create the Permission statement on the image page. Should the license on Radiopaedia change in the future, this automated check and the source code is reliable evidence of verification.

The template {{CC-BY-NC-SA-3.0}} is used to provide a link to the Creative Commons license page, this may later be replaced by a version of the Wikimedia Commons license template(s) if imported.

Housekeeping

Wikimedia Commons duplicates Some files, such as scans of illustrations in Gray's anatomy, are already on Wikimedia Commons. A generalized housekeeping script can check for identical SHA1 checksums between NC Commons and Wikimedia Commons files based on any NC Commons site search. Ref source code.

Bugs and known errors

Failure cases:

Known errors:

  • https://radiopaedia.org/cases/buckle-fracture-distal-radius-1, the image downloaded is a bmp file, which is not an allowed format. These are skipped.
  • After a few hours of running, there may be a Bad gateway error. It is unknown if this is caused by an anti-bot tool or something else. The page is reloaded after a pause.
  • Local download filenames are chosen by Radiopaedia. These may get fixed with unpredictable names and the same name used for a series of different sequential downloads, for example Abdominal aortic aneurysm (Radiopaedia 75131-86203 Sagittal C+ arterial phase 11).jpg (one image in a stack of 47) was given the local download name of abdominal-aortic-aneurysm-38.jpg, where "38" was unrelated to the stack count or any other index. Fortunately, this appears to create only a local cosmetic problem.