Looking for information about meta-analysis in R (subject for an upcoming post as it has become a popular practice to analyze data from different Genome Wide Association studies) I came across this tutorial from The R User Conference 2013 – I couldn´t make it this time, even when it was held so close, maybe Los Angeles next year…
Back to the topic at hand, that is how I found out about the RISmed package which is meant to retrieve information from PubMed. It looked really interesting because, as you may imagine,this is one of the most used resources in my daily routine.
Its use is quite straightforward. First, you define the query and download data from the database (be careful about your IP being blocked from accessing NCBI in the case of large jobs!) . Then, you might use the information to look for trends on a topic of interest or extracting specific information from abstracts, getting descriptives,…
In order to try it out, I decided to get data regarding what has been published relating to Next Generation Sequencing. For doing so, I adopted the search terms proposed in the paper by Jia et al. Through the following code we can get the PubMed results for these search terms since 1980:
query = "(exome OR whole OR deep OR high-throughput OR (next AND generation) OR (massively AND parallel)) AND sequencing"
ngs_search <- EUtilsSummary(query, type="esearch",db = "pubmed",mindate=1980, maxdate=2013, retmax=30000)
ngs_records <- EUtilsGet(ngs_search)
years <- Year(ngs_records)
ngs_pubs_count <- as.data.frame(table(years))
This code allow us to get published papers on this topic per year. By getting also data about the total number of publications per year, we are able to normalize the data. The complete R code, once the data are downloaded and edited can be found at FreshBiostats GitHub Gist. In the next graph, we can see the publication trend for Next Generation Sequencing per year:
I was also curious about which ones would be the journals with the highest number of publications on this topic. Using the following code we can get the count of NGS publications per journal:
journal <- MedlineTA(ngs_records)
ngs_journal_count <- as.data.frame(table(journal))
ngs_journal_count_top25 <- ngs_journal_count[order(-ngs_journal_count[,2]),][1:25,]
Again, the complete code that allows us to normalize the data by the total number of publications per journal, as well as the following barplots showing the result, is available at our Gist
You cand find some other examples using this package at Dave Tangs Bioinformatics blog. Additionally, some alternatives to the use of RISmed package can be found at R Chronicle and R Psychologist blogs.
Other potential applications of this package include creating a co-author network, as is described in Matthew Maenner´s blog.
Search and analyze carefully!