REFERENCE.tsv

The REFERENCE file describes the provenance of the deposition, including the title, authors and abstract. It can contain the details of a published reference, or a submitted paper pre-publication, or without being associated with a publication.

The file should contain sufficient details for a user to locate your publication or project by reading the reference. For example, data associated with a project the ABSTRACT field could contain a short description of the dataset and a link to the project site.

  • RIDX, TITLE, YEAR, ABSTRACT, AUTHORS and REF_TYPE are mandatory, all other fields are optional.

  • For a deposited dataset, YEAR should be the year we received the file.

  • It is mandatory to have either a PMID or DOI. If your data do not have one please contact us before submission. We can provide a ChEMBL DOI for datasets until an external DOI has been generated.

  • We strongly recommend adding the information of up to 3 stable points of contact in the CONTACT field.

The TITLE, ABSTRACT and AUTHORS fields of the REFERENCE file should be populated for datasets, as well as publications. These can be brief (i.e. an organisation name can be provided for the AUTHORS field) and the abstract can be a simple summary of the experiments and their overall purpose. It should contain sufficient details for a user to locate your publication or project by reading the reference; the ABSTRACT field could contain a short description of the dataset and a link to the project site.

Guidance on the use of the CONTACT field

The contact field shall contain a contact profile of someone willing to be contacted about details of the dataset (ideally an ORCID ID, RESEARCHER ID, potentially LinkedIn, Academia.edu profile etc.; up to 3 contacts can be provided).

This information is not equivalent to a corresponding author, but should be a stable point of contact at the main institution(s) involved. To simplify everyone’s data protection needs, we need it to be a link to a profile, rather than something directly personally contactable like an email address. See https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier

Header

Description

Existence

Data Type

RIDX

The RIDX set by the depositor - a primary key

Mandatory

Any character up to a length of 200. Should not start with 0. Should be a meaningful unique identifier for each Reference, not just a number.

PUBMED_ID

PubMed ID

Mandatory (if the document has a PMID)

Any positive integer up to a length of 11.

JOURNAL_NAME

Journal name

Optional

Any character up to a length of 50. Use the standard NIH NLM Catalog abbreviated name of the journal.

YEAR

Year of publication

Mandatory

Any integer up to a length of 4 between 1900 and 2050

VOLUME

The volume of the publication

Optional

Any character upt o a length of 50

ISSUE

The issue of the publication

Optional

Any character up to a length of 50

FIRST_PAGE

The first page of the article

Mandatory if it is a Publication

Any positive integer up to a length of 50

LAST_PAGE

The last page of the article

Optional

Any positive integer up to a length of 50

REF_TYPE

The type of reference (Publication, Patent, Dataset, Book)

Mandatory

One of Patent, Publication, Dataset or Book

TITLE

The title of the reference

Mandatory

Any character up to a length of 500

DOI

The Digital Object Identifier

Mandatory - Must be present, may be empty if there is a PUBMED_ID

Any DOI up to a length of 200

PATENT_ID

The Patent Identifier

Optional

Any Patent Identifier up to a length of 200

ABSTRACT

The abstract of the article. For a dataset, include a description of the dataset here

Mandatory

A very large text field

AUTHORS

A list of the authors of the publication

Mandatory

Any character up to a length of 4000

CONTACT

A contact profile of someone willing to be contacted about details of the dataset (see below)

Recommended

Any character up to a length of 200

Last updated