Complex Solr search
Last updated
Last updated
SureChEMBL uses the Solr Standard Query Parser internally. Therefore, if you are comfortable writing a complex query with the Solr syntax, you may do so using all available Solr Query field names.
Note: See Solr query field names and examples for a complete list of fields.
Solr is a high-performance, full-featured text search engine library. Written in Java, it is a technology specifically for applications that require full text search. SureChEMBL relies on Solr because of the extensive full text found within patent literature.
All queries are built using some basic components: Terms, Phrases, Fields, and Operators. The exact method for combining the components may vary from one query language to the next. Because SureChEMBL is based on the Solr query parser, the following details will assist you in building complex queries in SureChEMBL.
There are two types of terms: words and phrases. A “word” is a continuous string of characters without any spacing, such as "gleevec" or "kinase."
A phrase is a group of words treated as an individual unit such as "kinase inhibitor". For Solr to locate the phrase exactly as it is written, the phrase must be surrounded by double quotes.
Solr supports field-specific data. A field is a holder for a particular kind of data, for example patent numbers or abstracts. When performing a search you may specify a field. In some parts of SureChEMBL the field names are provided by the application itself. In the query field you may choose to provide the specific fields. You can search a field by typing the field name followed by a colon ":" and then the term you are looking for. See Solr query field names and examples for more details on the Solr field names and their definition.
Boolean operators are used to combine separate queries into a single complex query. SureChEMBL uses three Boolean operators: AND, OR, and NOT.
Note: The Boolean operators must ALWAYS be written in uppercase.
Operator | Definition |
---|---|
Assignee(s) or Applicant(s):
pa:ASTRAZENECA AND ic:A61K0031497
Only those patents containing the Applicant value of "ASTRAZENECA" and also containing the IPCR of "A61K 31/497" appear in the results.
pa:(ASTRAZENECA OR ic:A61K0031497)
All those patents containing the Applicant value of "ASTRAZENECA" would appear in the results set regardless of the IPCR. Likewise all the patents containing the IPCR of "A61K 31/497" appear in the results, no matter who the applicant was.
pa:(ASTRAZENECA NOT ic:A61K0031497)
Only those patents containing the Applicant value of "ASTRAZENECA" where the IPCR of "A61K 31/497" does not occur appear in the results.
pa:novartis AND pd:[20101201 TO 20120812] AND ic:(C07D040910 OR A61K or A61P)
This search criteria returns all the Novartis Grants and Applications published between 1 December 2010 and 12 August 2012 in IPCR C07D 409/10 or A61K or A61P.
(ic:A61K OR ic:A61P) AND pdyear:2011 AND clm:"kinase inhibitor"
This search criteria returns patents in IPCR A61K or A61P published in 2011 whose claims include references to a kinase inhibitor.
pd:[20120101 TO 20120331] pa:(pfizer OR Merck OR Astrazeneca)
When the example above is combined with US Granted, selected in the Document Sources section, it returns US granted patents, published in the first quarter of 2012, issued to Pfizer or Merck or AstraZeneca.
pa:(bayer OR astra OR novartis OR Genentech OR merck) AND desc:(chemotherap* AND ("Phosphoinositide 3-kinases"~3 OR Pi3K))
This search criteria returns publications with Bayer or AstraZeneca or Novartis or Genentech or Merck as patent applicant/assignee and with a description containing chemotherapy or chemotherapies and either Phosphoinositide 3-kinases within 3 words of each other or the term Pi3K.
The default Boolean operator for SureChEMBL is AND. If you enter two word into a query field without quotes or any operator, the application will return all documents containing both terms. The terms will not necessarily appear immediately adjacent to one another.
AND
Intersection: For an individual document to be included in the results set, it must contain both of the individual query elements.
OR
Union: For an individual document to be included in the results set, it only has to contain one of the individual query elements.
NOT
Difference: For an individual document to be included in the results set, it must contain the first listed individual query element and must not contain the second.