Understanding PubMed User Behavior

From The Bioinformationista:


A recently published article in the journal, Database: the Journal of Biological Databases and Curation, investigates the needs and behavior of PubMed users  through the analysis of log data.  The authors analyzed 23 million user sessions with more than 58 million user queries. [The article: Understanding PubMed user search behavior through log analysis]

For the most part, the results - and the authors have many - don't surprise me.  Users tend to use few words* in their searches.  Result sets vary considerably in size.  Searches begin and end on the first page for a vast majority of searchers.  Etc.  

But there are a few surprises (and of course it's nice to have statistical evidence to support 'intuitions'):

1.  The article's supplemental site includes a table comparing user search data from March '08 and February '09.  According to the data, there are over 50 million searches each month and well over 1 million each day.  Note: the authors discarded several million sessions that seemed to represent atypical PubMed usages, so the chart doesn't represent the full picture.  In reality - it's much larger.

From the article's supplemental site. Click the image to pop over to the site. Source: Dogan RI, Murray GC, Névéol A, and Lu Z. 'Understanding PubMed user search behavior through log analysis.' Database: The Journal of Biological Databases and Curation.

2.  Author searches are by far the most frequent PubMed query.  I knew author searching was common, but I didn't realize the extent to which it predominated. 

Figure 4 in the article. Source: Dogan RI, Murray GC, Névéol A, and Lu Z. 'Understanding PubMed user search behavior through log analysis.' Database: The Journal of Biological Databases and Curation.

3. The interface design affects which abstracts are viewed in a set of results:

PubMed users are more likely to click the first and last returned citation of each result page. This suggests that rather than simply following the retrieval order of PubMed, users are influenced by the results page format when selecting returned citations.

As you can see from the chart (again, from the paper), the first result is clicked on the most frequently.  After that there is a steady decline until the third to last article, at which point the number of clicks and abstracts viewed steadily increases.

Source: Dogan RI, Murray GC, Névéol A, and Lu Z. 'Understanding PubMed user search behavior through log analysis.' Database: The Journal of Biological Databases and Curation.

4. 5% of PubMed searches result in no subsequent action (i.e., clicks) on the part of the searcher.  The authors interpret this as a 5% abandonment rate, which would definitely be a statistical floor, considering there'd be many abandoned searches that include user 'actions'.  You search, check a few abstracts, consider them irrelevant, and move on, abandoning your search. This type of (common) scenario isn't discussed in the study and perhaps can't be captured in the log data.  

5. Not necessarily a surprise, but a 'thing' of interest.  The larger the result set, the more likely the user is to run another query - which is something that corroborates my experience working with students and researchers in all areas of health.  Searchers tend to be put off by large numbers of results in PubMed.  Unlike Google, where result numbers are ignored.  

There are several other interesting observations in the article, so take a look.


* The authors use the word 'tokens' to represent individual search terms.  'Words' would, of course, be misleading since things that aren't words (like proteins, genes, acronyms, etc.) make up a large portion of searches. 

The images all come from Dogan RI, Murray GC, Neveol A, Lu Z.  Understanding PubMed user search behavior through log analysis.  Database: The Journal of Biological Databases and Curation (2009)


How can I make my presentations this exciting?


Early Morning Presentation, Open Access, WHO's HINARI

WHO's HINARI connects developing countries with database content.After a presentation I gave this morning (at what was to me the unknown hour of 7am), a Fellow asked me if our library resources were available to people not affiliated with UM.  More specifically, people not affiliated with UM from a not-for-profit organization.  More specifically, people not affiliated with UM from a not-for-profit organization from a developing country.

I mentioned that our license agreements were restrictive, and only UM affiliates could access our e-resources.  I further went on to talk about open access, and in particular PubMed Central and DOAJ as access alternatives.  I noted his contact information and promised to get back to him with more info.*

To make a(n unnecessarily) long story short, I ran the question by a colleague who's directing the library's global outreach initiatives, and she mentioned the WHO's HINARI program.  This program provides developing countries, identified by GNI per capita, with access to a strong collection of biomedical literature at no- to low-cost.  Not only do they provide access, but they also provide access to great training documentation.

I knew philanthropic endeavours like this existed, but it was the first I'd heard of (as far as I can remember) HINARI (even though it's been around forevs).  If only infrastructure wasn't a problem...

* And get back to him I did. 




On the future of music...

...from Gert Jonke's Homage to Czerny:


...Not only are musical instruments bothersome crutches that have to be thrown away, but audible notes and tones also do nothing but interfere with the pure experience of music, distorting it beyond recognition!  Listen, said Schleifer to Pfeifer, hammering out a few loud chords on the keyboard, these unnecessarily jarring noises, these sounds, these so-called notes and melodies that we hear while listening to music, these don't have anything to do with the actual intent of music, and it's wrong to assume that the note, that notes themselves make up music - rather, these sounds that become audible when music is played are only an offensive acoustic excretory product that results from the performance of the content of music and in the process falsifies and destroys it! In future, Schleifer explained, all that matters will be that music can be experienced without the disruptive additional noise of these so-called notes and tones. (61)


PubMed Advanced Search

Not sure why this video was picked up by MedGadget nearly 6 months after it was created, but it's an effective overview of PubMed's Advanced Search.