Contributor

GVHD

CLASSIFICATION FOR GVHD AND TUBULAR ADENOMA 

LABELING 

In this task, coordinates of the marked region of interest (ROI) are extracted with the QuPath application. ROIs have been chosen according to the blue marked areas on the image and each blue marked has been labeled with different sizes and shapes to increase the data variation. Extracted coordinates are utilized on unmarked images to prevent blue marks exist on marked images.  

For each image, around 40-50 ROIs are extracted. The center coordinate of each ROI is used as the base for cropping areas with different sizes of 128×128 and 256×256. 

The ROIs are labelled in three class identifiers; 0. Tubular Adenoma, 1. GVHD, 2. None 

The ROIs with ‘None’ labels are selected randomly based on the non-overlapping coordinates that exclude GVHD and Tubular Adenoma coordinates.

A blue line on a purple background

Description automatically generated

Figure 1: Selected ROIs for a GVHD label 

Figure 2: Randomly selected ROIs for ‘None’ label 

FEATURE EXTRACTION 

Feature extraction is a method for generating numerical representations of an image, which can then be used as input for a model.  The models known as foundational models are trained especially for this purpose.  

->   [0.0010228882,0.821333,0.48481488,0.13029815] 

The foundational model we used for feature extraction on GVHD images is ‘Prov-Gigapath’. This model generates a vector with a shape of 1×1536 floats. 

For 728 different ROIs, the total features generated consist of 728 rows and 1536 columns within a CSV file. 

IDENTIFYING AND CLASSIFICATION  

The classification task utilized in our study aims to see whether generated features are distinguishable among the other classes. As it is mentioned there are 3 different classes that are labelled regarding the coordinates of ROIs. This label information was also added as a separate column in our CSV file as a ‘label’ column.  

The classification experiment conducted on the random forest algorithm shows that the AUC (area under the curve), which represents the model’s ability to distinguish between classes, demonstrates the model’s performance. An AUC value closer to 1 indicates a better performing model, while a value closer to 0.5 suggests a model with no discriminative ability. 

dataset model test_auc test_acc test_precision test_recall test_kappa cvt_auc cvt_acc cvt_precision cvt_recall 
gvhd_128.csv randomforest 0.982 0.877 0.9 0.854 0.796 0.944 0.86 0.874 0.849 

You can also see the additional classification results in the image below: