# HRA API Use Case Documentation

[Click here](hra-api-client-usage.ipynb) to view the installation and general documentation for hra-api python module.

In this notebook we will demonstrate one particular use case for which the HRA API module can be utilized. 

# Comments

 - An error log is [here](README.md).

# INDEX

1. [Import HRA API module](#import_client)
2. [Imports](#import_packages)
3. [Configuration](#configuration)
4. [Check database status](#status)
5. [Using kidney](#kidney) 


<a id='import_client'></a>
# We first import all the necessary packages from the hra_api_client. 

In [17]:
!pip install requests 
!pip install hra_api_client 
!pip install pandas

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
import hra_api_client
import requests
from hra_api_client.api import v1_api as default_api

<a id='import_packages'></a>
# Import all other packages required to produce output as expected.
The pandas module is used for a dataframe, which is used for displaying and aggregating. <br>
The csv module is used for writing all the data into a csv file.

In [4]:
import pandas as pd
import time
from pprint import pprint

<a id='configuration'></a>
# Configuration
You'll need to point the host in the configuration to the instance you'd like to work with. 
More Info [here](https://github.com/hubmapconsortium/ccf-ui/blob/main/ccf-api-usage.ipynb).

In [5]:

configuration = hra_api_client.Configuration(
    host = "https://apps.humanatlas.io/api"
)
api_client = hra_api_client.ApiClient(configuration)

api_instance = default_api.V1Api(api_client)

<a id='status'></a>
# Check Database Status
This is a optional step which can be used to reduce wait times for other methods.

In [6]:
# replace dict notation with dot notation
db_ready = False
result = None
while not db_ready:
    result = api_instance.db_status()
    pprint(result)
    if result.status == 'Ready':
        db_ready = True
    else:
        print('Database not ready yet! Retrying...', result)
        time.sleep(2)
print('Database ready!\n', result)

DatabaseStatus(status='Ready', checkback=3600000, load_time=22594, message='Database successfully loaded')
Database ready!
 status='Ready' checkback=3600000 load_time=22594 message='Database successfully loaded'


<a id='kidney'></a>
# Our use case is using Kidney as the reference organ. 
We are using the ontology link for kidney directly: http://purl.obolibrary.org/obo/UBERON_0002113

The `scene` method gives us a list of all the organs and tissue blocks that make up a 3D scene. 

In [7]:
sex = "both"
ontology_terms = ["http://purl.obolibrary.org/obo/UBERON_0002113"]
sceneResult = None
try:
    sceneResult = api_instance.scene(sex=sex, ontology_terms=ontology_terms)
    pprint(sceneResult)
except hra_api_client.ApiException as e:
    print("Exception when calling DefaultApi->scene: %s\n" % e)

[SpatialSceneNode(name=None, tooltip=None, unpickable=None, geometry=None, lighting=None, zoom_based_opacity=None, zoom_to_on_load=None, scenegraph=None, scenegraph_node=None, color=None, opacity=None, transform_matrix=None, priority=None, entity_id=None, ccf_annotations=None, representation_of='http://purl.obolibrary.org/obo/UBERON_0002097', reference_organ='https://purl.humanatlas.io/ref-organ/skin-female/v1.4#primary', sex='Female', additional_properties={'@id': 'https://purl.humanatlas.io/ref-organ/skin-female/v1.4#primary', '@type': 'SpatialSceneNode', 'tooltip': 'skin', 'scenegraph': 'https://cdn.humanatlas.io/digital-objects/ref-organ/skin-female/v1.4/assets/3d-vh-f-skin.glb', 'scenegraphNode': 'VH_F_skin', 'transformMatrix': [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0.4, 0, 0, 1], 'color': [255, 255, 255, 255], 'opacity': 0.5, 'unpickable': True, '_lighting': 'pbr', 'zoomBasedOpacity': False}),
 SpatialSceneNode(name=None, tooltip=None, unpickable=None, geometry=None, lighting=None,

We convert the `sceneResult` into a dictionary where the keys are the `entity_id`s of the tissue blocks and the values are the anatomical structures in the kidney the tissue blocks collide with.

In [8]:
sceneEntityASDict = {}
for scene in sceneResult:
    if scene.entity_id == None:
        continue
    if scene.entity_id not in sceneEntityASDict:
        sceneEntityASDict[scene.entity_id] = []
    sceneEntityASDict[scene.entity_id].extend(scene.ccf_annotations)

pprint(sceneEntityASDict)

{'http://dx.doi.org/10.1681/ASN.2016091027#Donor1_TissueBlock1': ['http://purl.obolibrary.org/obo/UBERON_0013702',
                                                                  'http://purl.obolibrary.org/obo/UBERON_0002113',
                                                                  'http://purl.obolibrary.org/obo/UBERON_0004539'],
 'http://dx.doi.org/10.1681/ASN.2016091027#Donor2_TissueBlock1': ['http://purl.obolibrary.org/obo/UBERON_0013702',
                                                                  'http://purl.obolibrary.org/obo/UBERON_0002113',
                                                                  'http://purl.obolibrary.org/obo/UBERON_0004539'],
 'http://dx.doi.org/10.1681/ASN.2016091027#Donor3_TissueBlock1': ['http://purl.obolibrary.org/obo/UBERON_0013702',
                                                                  'http://purl.obolibrary.org/obo/UBERON_0002113',
                                                                  'http://purl

We then use a SPARQL query to get the AS label, CT Label and the CT iri using the AS IRIs we obtained in the last step. We then make a dictionary to hold CTs by AS for use as a look-up when compiling the final table later.

In [9]:
request = {"query": '''
           PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ccf: <http://purl.org/ccf/>

SELECT DISTINCT (STR(?asLabel) as ?as_label) (STR(?qlabel) as ?cell_label) ?as_iri ?cell_iri WHERE {
  ?cell_iri ccf:ccf_located_in ?as_iri .
  ?cell_iri rdfs:label ?qlabel .
  ?as_iri rdfs:label ?asLabel .

  FILTER (?as_iri in (<http://purl.obolibrary.org/obo/UBERON_0001227>, <http://purl.obolibrary.org/obo/UBERON_0006517>, <http://purl.obolibrary.org/obo/UBERON_0002015>, <http://purl.obolibrary.org/obo/UBERON_0001228>, <http://purl.obolibrary.org/obo/UBERON_0004538>, <http://purl.obolibrary.org/obo/UBERON_0008716>, <http://purl.obolibrary.org/obo/UBERON_0001224>, <http://purl.obolibrary.org/obo/UBERON_0004200>, <http://purl.obolibrary.org/obo/UBERON_0001284>, <http://purl.obolibrary.org/obo/UBERON_0001223>, <http://purl.obolibrary.org/obo/UBERON_0001225>, <http://purl.obolibrary.org/obo/UBERON_0004539>, <http://purl.obolibrary.org/obo/UBERON_0000362>, <http://purl.obolibrary.org/obo/UBERON_0013702>, <http://purl.obolibrary.org/obo/UBERON_0002113>, <http://purl.obolibrary.org/obo/UBERON_0002189>, <http://purl.obolibrary.org/obo/UBERON_0001226>, <http://purl.obolibrary.org/obo/UBERON_0000056>))

}'''
           }


# use only application/json in format, any other formats will result in errors.
try:
    # help(api_instance.sparql_post)
    api_response = api_instance.sparql_post(
        sparql_query_request=request, format='application/json')
    pprint(api_response)
except hra_api_client.ApiException as e:
    print("Exception when calling DefaultApi->sparql_post: %s\n" % e)

{'head': {'vars': ['as_label', 'cell_label', 'as_iri', 'cell_iri']},
 'results': {'bindings': [{'as_iri': {'type': 'uri',
                                      'value': 'http://purl.obolibrary.org/obo/UBERON_0000362'},
                           'as_label': {'type': 'literal',
                                        'value': 'renal medulla'},
                           'cell_iri': {'type': 'uri',
                                        'value': 'http://purl.obolibrary.org/obo/CL_1000718'},
                           'cell_label': {'type': 'literal',
                                          'value': 'kidney inner medulla '
                                                   'collecting duct principal '
                                                   'cell'}},
                          {'as_iri': {'type': 'uri',
                                      'value': 'http://purl.obolibrary.org/obo/UBERON_0000362'},
                           'as_label': {'type': 'literal',
                   

In [10]:
# Defining a SPARQL query with a string of IRIs, to be added by variable
query = '''PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ccf: <http://purl.org/ccf/>

SELECT DISTINCT (STR(?asLabel) as ?as_label) (STR(?qlabel) as ?cell_label) ?as_iri ?cell_iri WHERE {
  ?cell_iri ccf:ccf_located_in ?as_iri .
  ?cell_iri rdfs:label ?qlabel .
  ?as_iri rdfs:label ?asLabel .

  FILTER (?as_iri in (%s))

}'''

# Collect IRIs
purlIris = set()

# Collecting all unique AS IRIs
for key in sceneEntityASDict.keys():
    for value in sceneEntityASDict[key]:
        purlIris.add(value)

# construct the purlString for the query
purlString = ", ".join("<" + purl + ">" for purl in purlIris)
queryResponse = None

# Finalize the query
query = query % purlString

# Declare and initialize a variable to hold results from the query
bindings = []

try:
    api_response = api_instance.sparql(
        query=query, format='application/json')
    bindings = api_response['results']['bindings']
   
except hra_api_client.ApiException as e:
    print("Exception when calling DefaultApi->aggregate_results: %s\n" % e)

# Declare and initialize a variable to hold a dictionary with processed results
anatomical_structure_cell_type_combinations = {}

for dict in bindings:
    for key in dict:
        as_label = dict['as_iri']['value']
        # Check if as_label is already in dict, then add if not, otherwise grab cell
        if as_label not in anatomical_structure_cell_type_combinations:
            anatomical_structure_cell_type_combinations[as_label] = set()
        else:
            anatomical_structure_cell_type_combinations[as_label].add(dict['cell_iri']['value'])
   
pprint(anatomical_structure_cell_type_combinations)     

{'http://purl.obolibrary.org/obo/UBERON_0000056': {'http://purl.obolibrary.org/obo/CL_1000308',
                                                   'http://purl.obolibrary.org/obo/CL_1000708',
                                                   'http://purl.obolibrary.org/obo/CL_1001428',
                                                   'https://purl.org/ccf/ASCTB-TEMP_basal-cell-of-ureter-urothelium',
                                                   'https://purl.org/ccf/ASCTB-TEMP_detrusor-smooth-muscle-cell-of-ureter',
                                                   'https://purl.org/ccf/ASCTB-TEMP_dpt-fibroblast-cell-of-ureter',
                                                   'https://purl.org/ccf/ASCTB-TEMP_endothelial-cell-of-ureter',
                                                   'https://purl.org/ccf/ASCTB-TEMP_intermediate-cell-of-ureter-urothelium',
                                                   'https://purl.org/ccf/ASCTB-TEMP_lipofibroblast-cell-of-ureter',


We now reverse the sceneEntityASDict to now have AS: entity_id relationship.

In [11]:
def reverseObject(input_object):
    reversed_object = {}

    for key, values in input_object.items():
        for value in values:
            if value not in reversed_object:
                reversed_object[value] = set()
            reversed_object[value].add(key)
    return reversed_object

ASEntityDict = reverseObject(sceneEntityASDict)
pprint(ASEntityDict)

{'http://purl.obolibrary.org/obo/UBERON_0000056': {'https://entity.api.hubmapconsortium.org/entities/11d794402e067e54888bb8a4af4184e0',
                                                   'https://entity.api.hubmapconsortium.org/entities/263854e0346766efa516bd69bfc1e035'},
 'http://purl.obolibrary.org/obo/UBERON_0000362': {'https://entity.api.hubmapconsortium.org/entities/015f9976f3fed4c35dbbaad622a413d9',
                                                   'https://entity.api.hubmapconsortium.org/entities/0a558711c6b9d5b9029717d722a2c31c',
                                                   'https://entity.api.hubmapconsortium.org/entities/0be6dee31365fd1e590a45263df043c6',
                                                   'https://entity.api.hubmapconsortium.org/entities/0ec930e0f4a8fcf12ea136a28e1acad4',
                                                   'https://entity.api.hubmapconsortium.org/entities/11d794402e067e54888bb8a4af4184e0',
                                               

Using the method `tissue_blocks`, we get donor label information for each `entity_id`.

In [12]:
sex = "both"

# We use the ontology term for "left kidney". Just using the term for "kidney" will result in an error.
ontology_terms = ["http://purl.obolibrary.org/obo/UBERON_0004538"] 

tissueBlockResult = None

try:
    tissueBlockResult = api_instance.tissue_blocks(sex=sex, ontology_terms=ontology_terms)
except hra_api_client.ApiException as e:

    print("Exception when calling DefaultApi->tissue_blocks: %s\n" % e)

# Convert the tissueBlockResult object to a list of dict
tissueBlockResultList = list(map(lambda b: b.to_dict(), tissueBlockResult))
pprint(tissueBlockResultList)

[{'@id': 'https://entity.api.hubmapconsortium.org/entities/039ce6e8d5fa342811a4b317846b7281',
  '@type': 'Sample',
  'datasets': [],
  'description': '35 x 19 x 3 millimeter, 3 millimeter, 1 Sections',
  'donor': {'@id': 'https://entity.api.hubmapconsortium.org/entities/2510644405f4e7fd86d31af0001b840f',
            '@type': 'Donor',
            'description': 'Entered 12/26/2019, Jamie Allen, TMC-Vanderbilt',
            'label': 'Male, Age 56, BMI 32.5',
            'link': 'https://portal.hubmapconsortium.org/browse/donor/2510644405f4e7fd86d31af0001b840f',
            'providerName': 'TMC-Vanderbilt'},
  'label': 'Registered 6/10/2020, Jamie Allen, TMC-Vanderbilt',
  'link': 'https://portal.hubmapconsortium.org/browse/sample/039ce6e8d5fa342811a4b317846b7281',
  'sampleType': 'Tissue Block',
  'sectionCount': 1,
  'sectionSize': 3,
  'sectionUnits': 'millimeter',
  'sections': [{'@id': 'https://entity.api.hubmapconsortium.org/entities/03dd86ef9ffd739b4cad8b99eb1e3fbf',
              

To make it easier to find the donor labels we convert the above result to a dictionary that related the entity_id to the donor_label.

In [13]:
blockDonorLabel = {}
for tissueBlock in tissueBlockResultList:
    blockDonorLabel[tissueBlock['@id']] = tissueBlock['donor']['label']

pprint(blockDonorLabel)

{'https://entity.api.hubmapconsortium.org/entities/039ce6e8d5fa342811a4b317846b7281': 'Male, '
                                                                                      'Age '
                                                                                      '56, '
                                                                                      'BMI '
                                                                                      '32.5',
 'https://entity.api.hubmapconsortium.org/entities/07c995983aac113b442c0dc0c0b42ae5': 'Male, '
                                                                                      'Age '
                                                                                      '53, '
                                                                                      'BMI '
                                                                                      '26.5',
 'https://entity.api.hubmapconsortium.org/entities/088adc477043f

<a id='export'></a>
# Export 

Now we create the report, using data from the above methods. The goal is to compile the data with these columns: `tissue_block_id`, `as_iri`, `cell_iri`, `donor_label`. 

In [14]:
# Flatten the dictionaries to a list of tuples
long_scene_entity_as_dict = [(k, v) for k, values in sceneEntityASDict.items() for v in values]
long_as_ct_combinations = [(k, v) for k, values in anatomical_structure_cell_type_combinations.items() for v in values]

# Create DataFrames
df_tissue_block_as_iri = pd.DataFrame(long_scene_entity_as_dict, columns=['tissue_block_id', 'as_iri'])
df_tissue_block_donor_label = pd.DataFrame(blockDonorLabel.items(), columns=['tissue_block_id', 'donor_label'])
df_as_ct_combinations = pd.DataFrame(
    long_as_ct_combinations, columns=['as_iri', 'cell_iri'])

# Merge the DataFrames
merged_intermediate = pd.merge(df_tissue_block_as_iri, df_tissue_block_donor_label, on="tissue_block_id")
df = pd.merge(merged_intermediate,
                    df_as_ct_combinations, on="as_iri")

# Print the merged DataFrame
pprint(df)

                                        tissue_block_id  \
0     https://entity.api.hubmapconsortium.org/entiti...   
1     https://entity.api.hubmapconsortium.org/entiti...   
2     https://entity.api.hubmapconsortium.org/entiti...   
3     https://entity.api.hubmapconsortium.org/entiti...   
4     https://entity.api.hubmapconsortium.org/entiti...   
...                                                 ...   
5265                https://www.atlas-d2k.org/id/W-RA1W   
5266                https://www.atlas-d2k.org/id/W-RA1W   
5267                https://www.atlas-d2k.org/id/W-RA1W   
5268                https://www.atlas-d2k.org/id/W-RA1W   
5269                https://www.atlas-d2k.org/id/W-RA1W   

                                             as_iri             donor_label  \
0     http://purl.obolibrary.org/obo/UBERON_0002113  Male, Age 56, BMI 32.5   
1     http://purl.obolibrary.org/obo/UBERON_0002113  Male, Age 56, BMI 32.5   
2     http://purl.obolibrary.org/obo/UBERON_0002113  M

This will save the output as a csv file. <br>
You can change the name of the file in first line of the next block.

In [15]:
# Set file name
fileName = "exampleFileName"

# Header mapping
header_mapping = {
    'Tissue Block ID': 'tissue_block_id',
    'AS IRI': 'as_iri',
    'AS Label': 'as_label',
    'CT IRI': 'cell_iri',
    'Donor Label': 'donor_label'
}

# Rename columns using the header mapping
df_renamed = df.rename(columns=header_mapping)

# Save the DataFrame to a CSV file with the new headers
df_renamed.to_csv(f'{fileName}.csv', index=False)

print(df_renamed)

                                        tissue_block_id  \
0     https://entity.api.hubmapconsortium.org/entiti...   
1     https://entity.api.hubmapconsortium.org/entiti...   
2     https://entity.api.hubmapconsortium.org/entiti...   
3     https://entity.api.hubmapconsortium.org/entiti...   
4     https://entity.api.hubmapconsortium.org/entiti...   
...                                                 ...   
5265                https://www.atlas-d2k.org/id/W-RA1W   
5266                https://www.atlas-d2k.org/id/W-RA1W   
5267                https://www.atlas-d2k.org/id/W-RA1W   
5268                https://www.atlas-d2k.org/id/W-RA1W   
5269                https://www.atlas-d2k.org/id/W-RA1W   

                                             as_iri             donor_label  \
0     http://purl.obolibrary.org/obo/UBERON_0002113  Male, Age 56, BMI 32.5   
1     http://purl.obolibrary.org/obo/UBERON_0002113  Male, Age 56, BMI 32.5   
2     http://purl.obolibrary.org/obo/UBERON_0002113  M

# Getting a summary

In [16]:
# Summary -- text
AsCounts = df['as_iri'].value_counts()
CtCounts = df['cell_iri'].value_counts()
blockCounts = df['tissue_block_id'].value_counts()

print(f'SUMMARY:\n\
Number of Blocks: {len(blockCounts)},\n\
Number of Unique ASs identified: {len(AsCounts)},\n\
Number of Unique CTs identified: {len(CtCounts)}')

SUMMARY:
Number of Blocks: 67,
Number of Unique ASs identified: 8,
Number of Unique CTs identified: 71
