OR/16/047 Analysis of responses to questions

Nkisi-Orji, I. 2016. Semantic information retrieval for geoscience resources: results and analysis of an online questionnaire of current web search experiences. British Geological Survey Internal Report, OR/16/047.

Section 1: literature or data gathering searches

Questions in this section (Q1–4) relate to web searches where one wants to find a comprehensive list of all relevant results (e.g. literature search, data gathering), so completeness of results is the most important measure of success.

Q1

Q1: For this first sort of search, which search applications are useful to you?

	Popular search engine (e.g. Google, Bing, Yahoo)
	Publication citations (e.g. Google Scholar, Science Direct)
	Cross discipline data portal (e.g. data.gov, INSPIRE geoportal, Scottish SDI)
	Earth Sciences catalogue (e.g. NERC Data Catalogue, NERC library, NORA)
	Discipline/community specific catalogue (e.g. MEDIN for marine data, ESDAC for soil data etc.)
	BGS intranet tools (dtSearch for text resources, discovery metadata)
	Other (please specify)

Results and analysis

Fifty nine percent (59%) of respondents chose publication citations (e.g. Google Scholar and Science Direct) as the most useful application for this sort of search, see Figure 1.

**Figure 1** Search engine preferences for literature searches.

Eighty two percent (82%) found publication citations most or often useful for literature and data gathering search. This is expected because these are repositories for journal articles and other scholarly works. The popularity of publication citations is closely matched by popular search engines (79% most or often useful) for this sort of search. It is not unusual to begin literature search using popular search engines since they often look in publication citations as well. For example, Google search often displays results from Google Scholar. Earth science catalogues are fairly popular as 53% found them most or often useful.

Over 57% never used discipline specific catalogues and no search application was reported as most useful by 26% of respondents. Wikipedia, BGS Library Catalogue and Web of Science^[1] were identified as other useful search tools.

Q2

Q2: For this first sort of search, how often are you satisfied with the results after:

	Using a small number (<5) of words in a free text search
	Using a large number (>5) words in a free text search
	Using logical operators in a free text search (AND/NOT/OR etc.)
	Using advanced search features to search within specific metadata fields (keywords, title, author etc.)

Results and analysis

Fifty percent (50%) of respondents were usually satisfied with search results of short queries (that is, when less than 5 words were used in search).

Usual satisfaction with search results dropped to 44% for long queries (that is, above 5 words) as seen in Figure 2. Often, search applications only return documents which contain most of the search terms. Hence, fewer search terms will increase the number of hits. Since search intent is to gather a comprehensive list of relevant results, more results may translate to greater satisfaction. A likely alternative explanation is that the use of long queries indicates search instances where users find it difficult to properly express their information needs. Search results will not be satisfactory if the search terms used fail to appropriately convey an information need.

Thirty six percent (36%) were usually satisfied with results of advanced search features (that is, search within specific metadata fields); 47% had never used logical operators and results from their use were the least satisfactory. The use of logical operators in search requires additional skills which may seem excessive to many users.

**Figure 2** Satisfaction with search results according to query length or search engine features used.

Q3

Q3: For this first sort of search, what is the maximum number of search results you are willing to assess, rather than refining your search criteria or changing the search engine?

1–10

10–20

20–50

50+

Eighty eight percent (88%) of respondents will assess more than the first 10 search results.

As shown in Figure 3, majority of respondents (59%) will assess 10 to 20 search results. With default settings, this is the second search result page of popular search applications (e.g. Google, Bing, and Yahoo). Since search intent is to identify multiple relevant documents, there is increased tendency to assess more than the first few results. A significant 24% will go on to assess between 20 and 50 search result entries.

Results and analysis

**Figure 3** Number of search results assessed.

Q4

Q4: For this first sort of search, could you give a few examples of some recent searches you conducted, and any comments on the relevance of results returned?

Forty two (42) sample queries were collected which showed a mixture of navigational and informational queries.

Highlighted sample queries in Figure 4 (e.g. #9: Garcia et al 2006 Tetrapods) are instances where search intent seem navigational. Navigational queries are queries whose underlying information needs are specific web sites or publications which are known (or assumed) to exist. On the other hand, the intent of informational queries is to find the best documents that meet an information need. Our interest is in informational queries which are more pertinent to information retrieval (IR) research. Query #7 (shale gas mechanics) is an interesting example as the respondent noted that search results from a different domain influenced search precision.

Most of the sample queries are short queries (<5 words) with an average query length of 2.8 words. Query #36 in Figure 4 shows an interesting search strategy where search results are limited to documents which contain exact phrases in quotes. This is an implicit use of the AND logical operator and can help to filter off irrelevant documents.

Results and analysis

**Figure 4** Examples of search queries with comments. Highlighted entries are where specific content are being sought.

Section 2: searches that ask a specific question

Questions in this section (Q5–8) relate to web searches where one is looking for the answer to a specific question, i.e. the relevance ranking of the results is the most important measure of success.

Q5

Q5: For this second sort of search, which search applications are useful to you?

	Popular search engine (e.g. Google, Bing, Yahoo)
	Publication citations (e.g. Google Scholar, Science Direct)
	Cross discipline data portal (e.g. data.gov, INSPIRE geoportal, Scottish SDI)
	Earth Sciences catalogue (e.g. NERC Data Catalogue, NERC library, NORA)
	Discipline/community specific catalogue (e.g. MEDIN for marine data, ESDAC for soil data etc.)
	BGS intranet tools (dtSearch for text resources, discovery metadata)
	Other (please specify)

Results and analysis

Seventy nine percent (79%) of respondents chose popular search engines (e.g. Google, Bing and Yahoo) as their most useful application for this sort of search as shown in Figure 5.

Ninety five percent (95%) said that popular search engines were most or often useful for this sort of search. The next popular search applications are publication citations (e.g. Google Scholar) which are most useful to about 33% of respondents. Earth Sciences catalogue and BGS intranet tools were occasionally useful to a significant proportion of respondents (47% and 35% respectively). Cross discipline data portals (e.g. data.gov) and Discipline/community specific catalogues (e.g. MEDIN for marine data) were never used by majority of respondents. Wikipedia, GSW, GDI and COPAC are other useful search applications that were identified by respondents.

**Figure 5** Search engine preferences to ask specific questions.

Q6

Q6: For this second sort of search, how often are you satisfied with the results after:

	Using a small number (<5) of words in a free text search
	Using a large number (>5) words in a free text search
	Using logical operators in a free text search (AND/NOT/OR etc.)
	Using advanced search features to search within specific metadata fields (keywords, title, author etc.)

Results and analysis

Sixty one percent (61%) of respondents were usually satisfied with search results when more than 5 words were used in search query.

As seen in Figure 6, there was no clear distinction for search result satisfaction between short queries (<5 words) and long queries (>5 words). Sixty one percent (61%) were usually satisfied with results of long queries and is slightly higher than 58% for usual satisfaction with results of short queries. However, the proportion of respondents who were sometimes satisfied was higher for short queries. Similar to the previous type of search (see Section 1, Q2), the use of logical operators produced the least satisfaction among respondents who use it.

**Figure 6** Satisfaction with search results according to query length or search engine features used.

Q7

Q7: For this second sort of search, what is the maximum number of search results you are willing to assess, rather than refining your search criteria or changing the search engine?

1–10

10–20

20–50

50+

Results and analysis

**Figure 7** Number of search results assessed.

Figure 7 shows that 50% of respondents assess the first 10 search results only. Eighty percent (80%) will not assess more than 20 search results.

Fifty percent (50%) will not assess more than 10 search result entries before attempting other search strategies in order to obtain better results. Thirty percent (30%) assesses 10 to 20 search results while 20% assesses more than 20 results. Compared to the previous sort of search (see Section 1, Q3), the number of search results which respondents assess is less for this sort of search. Respondents are either unwilling to assess many search results or have no need to assess many search results. The latter is a possible explanation since a relevant entry among the first few search results can make assessing additional results unnecessary.

Q8

Q8: For this second sort of search, could you give examples of some recent searches you conducted, and any comments on the relevance of results returned?

Results and analysis

Fourteen sample queries were collected with associated comments.

Query #11 (deposit) in Figure 8 is an example where the search term have popular alternative senses. When using popular search engines (e.g. Google), results that describe the financial sense of ‘deposit’ are expected to be popular in search results thereby reducing search precision (if the geological sense was intended). Such interference is expected to be less pronounced when searching in domain-specific document collections. The average query length in collected examples is 3.5 words per query.

**Figure 8** Examples of recent searches to ask specific questions.

Section 3: semantic search features

Questions in this section assess respondents’ reception of semantic search features and how they want such features to be implemented.

Q9

Q9: How often do you have to perform multiple searches or construct an advanced search query in order to also search all the narrower/child terms of your original search intent?

always

usually

sometimes

seldom

never

Results and analysis

Although no one performed searches to include narrower/child terms to search intent all the time, 64% of respondents usually or sometimes used this search strategy.

Narrower/child terms are more specific terms used to describe a concept. For example, when search intent is the ‘Longmyndian Supergroup’ one may also use the narrower ‘Wentnor Group’ in search terms which is a more specific rock unit. This way, relevant documents which did not mention ‘Longmyndian Supergroup‘ are also discovered. Twenty seven percent (27%) of respondents include narrower terms to search intent sometimes and 36% usually does this. Fourteen percent (14%) have never attempted to include narrower terms to original search intent.

**Figure 9** Tendency to perform multiple searches or use advanced search features to include narrower terms to original search intent.

Q10

Q10: How often do you have to perform multiple searches or construct an advanced search query in order to include all the equivalent terms or alternative spellings of your original search intent?

always

usually

sometimes

seldom

never

Results and analysis

**Figure 10** Tendency to perform multiple searches or use advanced search features to include equivalent terms to original search intent.

Sixty four percent (64%) of respondents usually or sometimes include equivalent terms in search results.

While no respondent includes equivalent terms or alternative spellings to original search intent at all times, only 9% have never attempted this search strategy (see Figure 10). Although summary responses here are very similar to those of Q9 (that is, tendency to include narrower or child terms to original search intent), as much as 55% responded differently between Q9 and Q10. This reflects differences in how respondents go about meeting their information needs and a need to separate features that can assist in both search strategies.

Q11

Q11: If a search feature was available that could include the narrower and equivalent terms from controlled vocabularies, would you prefer that this functionality was:

	always included implicitly
	included by default but can be turned off by the user
	not included by default but can be turned on by the user
	not included at all, not of benefit to me

Results and analysis

Ninety five percent (95%) of respondents think that a search feature to include narrower or equivalent terms from controlled vocabularies to original search intent is beneficial.

Ninety percent (90%) of those who want narrower or equivalent terms included from controlled vocabularies prefer to have control over its use (that is, ability to turn the feature off). The other 10% want such feature included implicitly. Forty eight percent (48%) of respondents prefer that such feature be included by default with the ability to turn it off while 38% do not want it turned on by default. Only 5% of respondents think that such feature is not of benefit to them. Considering that a significant 43% do not want this feature as default search option or do not deem it beneficial, it may be most appropriate to include it as an optional search feature which a user can turn on.

**Figure 11** How a search feature to add narrower or equivalent terms to search intent should be included.

Q12

Q12: How often do you find that your search results are dominated by results that are not relevant?

always

usually

sometimes

seldom

never

Results and analysis

All respondents have at some point, found their search results dominated by irrelevant result entries.

**Figure 12** How often search results are dominated by results that are not relevant.

Seventy seven percent (77%) of respondents always, usually or sometimes find their search results dominated by irrelevant content. Figure 12 shows that although this does not happen always for most respondents, only 23% said it happened rarely. Hence, a feature that can filter out irrelevant search results will be beneficial.

Q13

Q13: If a search function was available that could search on the intended context/meaning of the search term entered, rather than just matching the term as typed, would you prefer to

	always specify the context/meaning of your search terms as you build the search (e.g. pick them from a controlled vocabulary)
	specify the context/meaning of your search terms only if there is ambiguity (e.g. pick the correct definition from a list)
	let the search engine decide which context/meaning to use, depending on my previous actions or preferences
	not have this feature, not of benefit to me

Results and analysis

Eighty one percent (81%) of respondents want to be able to select intended context or meaning of search terms only when there is ambiguity.

Responses to this question indicate a strong support for a feature that allows users to select from a list of competing definitions whenever there is ambiguity in search terms. As shown in Figure 13, about 10% want intended context/meaning of search terms to be decided by the search engine. Only 5% think that a search function to resolve ambiguity in search terms is of no benefit to them.

**Figure 13** How a search feature to disambiguate search terms should be included.

Q14

Q14: Which vocabularies would be useful to you in the sort of semantic search functionality described above?

Results and analysis

A list of geoscience-related vocabularies was presented to respondents so that they selected as much as they thought useful. As shown in Figure 14, about 78% of respondents selected the Geoscience thesaurus as a useful vocabulary for implementing sematic search features. This thesaurus describes general geoscience-related concepts so it is not surprising that is was thought useful by most respondents. This is unlike more specialised vocabularies like Chemical analytes (selected by 5.6%) and Fossil taxonomy (selected by 11.1%) which may not be relevant to most respondents.

The usefulness distribution of selected vocabularies can provide an indication of which vocabularies to prioritise if a sub-selection is to be used for implementing semantic search features. As an added comment to responses, it was pointed out that a semantic search feature which requires users to have knowledge of the content of vocabularies being used will be too complicated for users.

**Figure 14** Preference of vocabularies to implement semantic search.

Q15

Q15: Might you be willing to volunteer 1 hour of your time to help evaluate a search tool which implements features like the above?

Results and analysis

Eight (8) respondents indicated willingness to volunteer to help in semantic search tool evaluation.

Q16

Q16: Please provide any other relevant comments such as current search challenges, features you value in a search engine (existing or desired), preferred search engines not mentioned in questionnaire etc. mentioned in questionnaire, etc.

Results and analysis

There were 4 comments received from respondents. The importance of reusing existing tools when implementing a search feature and the need to allow for different search strategies were pointed out.

References

↑ [1] Web of Science. [cited 20 September 2016]. Available from wok.mimas.ac.uk

[Web_of_Science-1] [1] Web of Science. [cited 20 September 2016]. Available from wok.mimas.ac.uk

[1]

OR/16/047 Analysis of responses to questions

Section 1: literature or data gathering searches

Q1

Results and analysis

Q2

Results and analysis

Q3

Results and analysis

Q4

Results and analysis

Section 2: searches that ask a specific question

Q5

Results and analysis

Q6

Results and analysis

Q7

Results and analysis

Q8

Results and analysis

Section 3: semantic search features

Q9

Results and analysis

Q10

Results and analysis

Q11

Results and analysis

Q12

Results and analysis

Q13

Results and analysis

Q14

Results and analysis

Q15

Results and analysis

Q16

Results and analysis

References

Navigation menu

Search