Create PubMed Citations Automatically Using PubMed API / by Alexander Hadik

Recently I was working on a small website for a research lab at Brown University. Naturally the lab requested a list of their publications be included on the site, however this is a list that changes frequently and I didn't want to burden the lab with updating their site every time they publish.

Luckily, PubMed has an API that allows you to retrieve data from their databases in XML or JSON form. For me, JSON was perfect, as I could easily parse the data and present it on the website using JQuery.

There are a few different resources PubMed offers and each allows you to retrieve different info. In my case, I wanted to get a list of all publications by a specific author, and then retrieve all the publications details about each of those articles. Unfortunately I haven't found a way to do that in one query, so I resorted to a temporary two step process of getting the article IDs for every article by an author, and then getting the details for each of those IDs with separate requests. In a scenario where this request is done every time a page loads, this isn't practical. However, in a scenario where the update is done once every 24 hours, this process is permissible.

The data can be retrieved in JSON format using specific URLs that contain the desired search terms. I found a great resource that details the different PubMed search sources available, and the parameters available to search each of them with.

http://entrezajax.appspot.com/developer.html

The two services I used were ESearch and Esummary. ESearch is for retrieving the full list of work by an author, and ESummary is for getting the details on each work. I'm working on a website so I used JavaScript and JQuery to retrieve the data with the following code:

$.getJSON('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=larschan,erica[author]&retmode=json', function(data){
    var ids = data.esearchresult.idlist;
    var publications = [];
    iterateJSON(ids, publications);
});

Let's break this first piece of code down. I'm using the JQuery getJSON function to retrieve and parse the JSON returned from the following GET request:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=larschan,erica[author]&retmode=json

There's four important sections to this, and you can construct your own version of a request URL using the link I provided above as reference.

  • The domain: 
    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
    • This specifies that I want to use the ESearch feature.
  • The database: 
    db=pubmed
    • This specifies that I want to search the PubMed database, but others can be used as well.
  • The search term: 
    term=larschan,erica[author]
    • This specifies that I want to search for entries that match the full author name of Erica Larschan.
  • The return form: 
    retmode=json
    • This specifies that I want the data returned in JSON format as opposed to XML.

This request returns a JSON object that you can extract a list of article IDs from, which I do with 

var ids = data.esearchresult.idlist;

Now comes the problem of retrieving the summary for each of these articles. This needs to be done recursively with a callback function so that the IDs are iterated through only as fast as the data can be retrieved. JavaScript's asynchronous properties won't pause for the summary request to complete before moving on to the next ID, which causes big namespace issues.

Instead, my recursive approach involves popping the requested ID from the list of IDs, and passing the now smaller ID list along to the next iteration, like so

function iterateJSON(idlist, publications) {

    var id = idlist.pop();
    $.getJSON('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id='+id+'&retmode=json', function(summary){

        var citation = "";
        
        for(author in summary.result[id].authors){
            citation+=summary.result[id].authors[author].name+', ';
        }
        citation+=' \"'+summary.result[id].title+'\" <i>'+summary.result[id].fulljournalname+'</i> '+summary.result[id].volume+'.'+summary.result[id].issue+' ('+summary.result[id].pubdate+'): '+summary.result[id].pages+'.';
        
        console.log(citation);
        publications.push(citation);
        
        if(idlist.length!=0){
            iterateJSON(idlist, publications);
        }else{
            console.log(publications);
        }
                
    });
}

This function builds strings of citations according to MLA format, and returns an array of them, which can then be used on the front end of my website, using something like Angular JS. If the list of IDs is not empty, the function calls itself recursively with the reduced ID list. Once the list is empty, the function uses the data and terminates, in this case just printing it to the console for testing purposes.