Text mining with Julie Glanville

The two advanced searching workshops went brilliantly with lots of group discussions and questions from the audience. The only downside to the day was the too-cold air-conditioning. Can you concentrate with icy air being blasted on you? Luckily the day was warm  so lots (including me) went outside during breaks to warm up. Lots of ground was covered over the day and gave everyone lots to think about.

The first workshop was about using text mining tools to build search strategies. I had heard of PubReMiner – a tool that mines PubMed records clip_image002using words from your query to bring up MeSH terms, top authors and journals, word frequency and other data. I used this tool last week to get a list of words for a search about spiritual care I am working on. Another tool similar to PubReMiner is GoPubMed (you could use both to build strategies as each have different visuals). MeSH on Demand, a tool provided by the National Library of Medicine, takes a piece of text, say a systematic review protocol, and mines PubMed to bring up relevant MeSH terms and PMIDs. If you have a strategy ready, you could use this tool to find out if relevant PMIDs have been captured in your results. Both these tools identify single words and MeSH terms, but what if you want to identify phrases? There is a tool for that and it is called TerMine. The example Julie used was a full Medline record, but you can use larger pieces of text. Then there is EndNote which you can use to analyse frequency data from databases other than Medline. There is a bit of work involved to set it up to do this though. And I guess that the records that you analyse would have to be relevant ones in order to build up a list of terms. If you can do this to the full text in EndNote, that would be great! I will have to do some experimenting. The last two freely available tools demonstrated was Quertle and Knalij. Knalji is fun tool that looks like it demonstrates relationships between concepts. And Quertle, which I had forgotten about, now has relationship terms (called power terms) that you can use to connect concepts and bring up records mined from the web. And if you want to take a break from looking at heaps of text, you can take an alternate route by using a tool called VOS Viewer.

So now that you have a selection of terms, how do you combine them into a search strategy? The most popular concept model is PICO, but not all questions fit into this model. What if you have questions about public health or adverse effects? What do you do when you have questions that don’t lend themselves to A AND B AND C strategies? This is where searching for multiple combinations comes into play. I did one of these recently and one combination I used didn’t come up with a relevant article I found using a different combination and vice versa.

The last section of this 1/2 day workshop was about ensuring strategy quality. A common method is to have some gold standard articles to see if your strategy  captures those. You could also use the Related Items in PubMed to find relevant articles to test with. Other methods include testing with specialised sets eg CENTRAL or a service called Epistemonikos (Database of Best Evidence). Another way to test your strategy is to work out the ratio of relevant records to irrelevant records. And finally, there is the Capture-Recapture method. This method, which I hadn’t heard of before, is a way of estimating size by caputuring and then recapturing records.

Lunch! Then the following post will be about workshop 2.


2 responses to “Text mining with Julie Glanville

  1. Hi Catherine

    Thanks so much for this post, I’m doing a lot of work with researchers doing systematic reviews. Would have really benefitted from this session but a pre Easter trip from Newcastle to Melbourne was unfortunately impossible. Keen to check a few of these resources out. Looking forward to your next post.


  2. Hi Catherine
    this is great, lots of good ideas and information, really makes me wish even more I had been able to go to this! Thank you for putting it up and I look forward to the blog about workshop 2!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s