Constraints on input files

Rules

The 'rules' referred to are summarized in table form at the end of this page.
  • Rules are applied when validating the file.
  • Each rule is associated with a single β€˜Penalty Score’ (PS) value, which can range from 0 to 9 inclusive.
  • The higher the score, the more serious the problem. Scores of 9 cause an automatic load failure.

Field names and data types for the deposited files

(Existence PS is just the internal error code for if this field is missing. External depositors may ignore this)

ASSAY

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
AIDX
9
VARCHAR2(200 BYTE) NOT NULL ENABLE
Any character upto a length of 200
9
Content
Content
Content
Content
RIDX
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
Content
Content
Content
Content
ASSAY_DESCRIPTION
9
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content
ASSAY_TYPE
9
VARCHAR2(1 BYTE)
Any character upto a length of 1
9
Accepted Assay types
(ADMET, A, Functional, F, Binding, B, Unassigned, U, Physiochemical, P, Toxicity, T)
[gp019 ]
5
gd2 A short desc of gd2 targ fld:ASSAY_TAX_ID
0
ASSAY_TEST_TYPE
0
VARCHAR2(20 BYTE)
Any character upto a length of 20
9
Accepted Assay test types
(in vitro, in vivo, ex vivo)
0
Content
Content
ASSAY_ORGANISM
0
VARCHAR2(250 BYTE)
Any character upto a length of 250
9
Content
Content
Content
Content
ASSAY_STRAIN
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
Content
Content
Content
Content
ASSAY_TAX_ID
0
NUMBER(11,0)
Any integer upto a length of 11
9
Positive integer or '0' (regex='^\d*#x27;)
[gp003]
9
gd3 If populated with an integer, then some text expected in this field targ fld:ASSAY_ORGANISM
0
ASSAY_SOURCE
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
ASSAY_TISSUE
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
ASSAY_CELL_TYPE
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
ASSAY_SUBCELLULAR_FRACTION
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
TARGET_TYPE
0
VARCHAR2(25 BYTE)
Any character upto a length of 25
9
​Target type list​
5
gd1 Type of target restricts kind of accession used. targ fld:TARGET_ACCESSION
7
TARGET_NAME
0
VARCHAR2(400 BYTE)
Any character upto a length of 400
9
Content
Content
Content
Content
TARGET_ACCESSION
0
VARCHAR2(255 BYTE)
Any character upto a length of 255
9
An integer or a UniProt ID
[gp024]
2
Content
Content
TARGET_ORGANISM
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
TARGET_TAX_ID
0
NUMBER(11,0)
Any integer upto a length of 11
9
Positive integer or '0' (regex='^\d*#x27;)
[gp003]
9
Content
Content

ASSAY_PARAM

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
AIDX
9
VARCHAR2(200 BYTE) NOT NULL ENABLE
Any character upto a length of 200
9
Content
Content
Content
Content
TYPE
9
VARCHAR2(250 BYTE)
Any character upto a length of 250
9
Content
Content
Content
Content
RELATION
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Relation symbol
(=, >, <, ~, <=, >=, >>, <<)
[gp022 ]
2
Content
Content
VALUE
0
NUMBER
Any number (incl decimals, negatives and sci Notn)
9
Any Number. Decimal, Sci Notn, +/-
[gp005 }
9
Content
Content
UNITS
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
TEXT_VALUE
0
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content
COMMENTS
0
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content

COMPOUND_RECORD

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
CIDX
9
VARCHAR2(200 BYTE) NOT NULL ENABLE
Any character upto a length of 200
9
Content
Content
Content
Content
RIDX
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
Content
Content
Content
Content
COMPOUND_KEY
0
VARCHAR2(250 BYTE)
Any character upto a length of 250
9
Content
Content
Content
Content
COMPOUND_NAME
0
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content
COMPOUND_SOURCE
0
VARCHAR2(400 BYTE)
Any character upto a length of 400
9
Content
Content
Content
Content

COMPOUND_CTAB

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
CIDX
9
VARCHAR2(200 BYTE) NOT NULL ENABLE
Any character upto a length of 200
9
Content
Content
Content
Content
CTAB
0
CLOB
A very large text field
9
Content
Content
Content
Content

REFERENCE

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
RIDX
9
VARCHAR2(200 BYTE) NOT NULL ENABLE
Any character upto a length of 200. Will warn if this starts with 0
9
Content
Content
Content
Content
PUBMED_ID
0
NUMBER(11,0)
Any integer upto a length of 11
9
Positive integer (regex='^[1-9]\d*#x27;)
[gp006]
1
Content
Content
JOURNAL_NAME
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Content
Content
Content
Content
YEAR
0
NUMBER(4,0)
Any integer upto a length of 4
9
1900 > year > 2050
[gp031]
9
Content
Content
VOLUME
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Content
Content
Content
Content
ISSUE
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Content
Content
Content
Content
FIRST_PAGE
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Positive integer (regex='^[1-9]\d*#x27;)
[gp006]
4
Content
Content
LAST_PAGE
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Positive integer (regex='^[1-9]\d*#x27;)
[gp006]
3
Content
Content
REF_TYPE
9
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
An accepted reference type [case ins]
(Patent, Publication, Dataset, Book)
[gp032]
2
Content
Content
TITLE
0
VARCHAR2(500 BYTE)
Any character upto a length of 500
9
Content
Content
Content
Content
DOI
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
A Digital Object Identifier (regex='^(10\.\d\d\d\d+\/.*)#x27;)
[gp010]
5
Content
Content
PATENT_ID
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
A Patent Identifier (regex='^(WO|EP|US)\-?\d+.*#x27;)
[gp011]
5
Content
Content
ABSTRACT
0
CLOB
A very large text field
9
Content
Content
Content
Content
AUTHORS
0
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content

ACTIVITY

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
CIDX
9
VARCHAR2(200 BYTE) NOT NULL ENABLE
Any character upto a length of 200
9
Content
Content
Content
Content
CRIDX
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
Content
Content
Content
Content
SRC_ID_CIDX
0
NUMBER(4,0)
Any integer upto a length of 4
9
Positive integer (regex='^[1-9]\d*#x27;)
[gp006]
9
Content
Content
AIDX
9
VARCHAR2(200 BYTE) NOT NULL ENABLE
Any character upto a length of 200
9
Content
Content
Content
Content
SRC_ID_AIDX
0
NUMBER(4,0)
Any integer upto a length of 4
9
Positive integer (regex='^[1-9]\d*#x27;)
[gp006]
9
Content
Content
RIDX
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
Content
Content
Content
Content
TEXT_VALUE
0
VARCHAR2(1000 BYTE)
Any character upto a length of 1000
9
Content
Content
Content
Content
RELATION
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Relation symbol
(=, >, <, ~, <=, >=, >>, <<)
[gp022 ]
2
Content
Content
VALUE
0
NUMBER
Any number (incl decimals, negatives and sci Notn)
9
Any Number. Decimal, Sci Notn, +/-
9
Content
Content
UPPER_VALUE
0
NUMBER
Any number (incl decimals, negatives and sci Notn)
9
Content
Content
Content
Content
UNITS
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
SD_MINUS
0
NUMBER
Any number (incl decimals, negatives and sci Notn)
9
Content
Content
Content
Content
SD_PLUS
0
NUMBER
Any number (incl decimals, negatives and sci Notn)
9
Content
Content
Content
Content
ACTIVITY_COMMENT
0
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content
CRIDX_CHEMBLID
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
CHEMBLID format (regex='^CHEMBL\d+#x27;)
[gp023]
0
Content
Content
CRIDX_DOCID
0
VARCHAR2(200 BYTE)
Any character upto a length of 200
9
Content
Content
Content
Content
ACT_ID
0
NUMBER(11,0)
Any integer upto a length of 11
9
Content
Content
Content
Content
TEOID
0
NUMBER(11,0)
Any integer upto a length of 11
9
Content
Content
Content
Content
TYPE
9
VARCHAR2(250 BYTE)
Any character upto a length of 250
9
Content
Content
Content
Content

ACTIVITY_PROPERTIES

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
ACT_ID
9
NUMBER(11,0)
Any integer upto a length of 11
9
Content
Content
Content
Content
TYPE
9
VARCHAR2(250 BYTE)
Any character upto a length of 250
9
Content
Content
Content
Content
RELATION
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Relation symbol
(=, >, <, ~, <=, >=, >>, <<)
[gp022 ]
2
Content
Content
VALUE
0
NUMBER
Any number (incl decimals, negatives and sci Notn)
9
Any Number. Decimal, Sci Notn, +/-
[gp005]
9
Content
Content
UNITS
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
TEXT_VALUE
0
VARCHAR2(1000 BYTE)
Any character upto a length of 1000
9
Content
Content
Content
Content
COMMENTS
0
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content
RESULT_FLAG
0
NUMBER(1,0)
Any integer upto a length of 1
9
0 or 1 (regex='^(0|1)*#x27;)
[gp001]
9
Content
Content

ACTIVITY_SUPPLEMENTARY

Header
Existence PS
DataType in database
Datatype rule
Datatype rule PS
Pattern
Pattern PS
Depend
Depend PS
TYPE
9
VARCHAR2(250 BYTE)
Any character upto a length of 250
9
Content
Content
Content
Content
RELATION
0
VARCHAR2(50 BYTE)
Any character upto a length of 50
9
Relation symbol
(=, >, <, ~, <=, >=, >>, <<)
[gp022 ]
2
Content
Content
VALUE
0
NUMBER
Any number (incl decimals, negatives and sci Notn)
9
Any Number. Decimal, Sci Notn, +/-
[gp005]
9
Content
Content
UNITS
0
VARCHAR2(100 BYTE)
Any character upto a length of 100
9
Content
Content
Content
Content
TEXT_VALUE
0
VARCHAR2(1000 BYTE)
Any character upto a length of 1000
9
Content
Content
Content
Content
COMMENTS
0
VARCHAR2(4000 BYTE)
Any character upto a length of 4000
9
Content
Content
Content
Content
REGID
9
NUMBER(11,0)
Any integer upto a length of 11
9
Content
Content
Content
Content
ACT_ID
9
NUMBER(11,0)
Any integer upto a length of 11
9
Content
Content
Content
Content
SAMID
9
NUMBER(11,0)
Any integer upto a length of 11
9
Content
Content
Content
Content

Summary of Pattern and Dependency Rules

Regexes are only shown for Pattern rules. Dependency rules involve a number of regexes, and so are not easily shown here.
Rule Type
Rule ID
Short Description
Regex
Long Description
DEPENDENCY
gd1
Type of target restricts kind of accession used.
​
The type of target restricts the kind of accession that should be used. For example, if the target type is 'Protein', then a UniProt ID is expected.. Target Field='TARGET_ACCESSION'
DEPENDENCY
gd2
A short desc of gd2
​
A longer desc of gd2 Target Field='ASSAY_TAX_ID'
DEPENDENCY
gd3
If populated with an integer, then some text expected in this field
​
If this sfield is populated with an integer, then the target field should contain some text. Thus if the ASSAY_TAX_ID is given, then a name should also be provided for the organism. Target Field='ASSAY_ORGANISM'
PATTERN
gp001
0 or 1
^(0|1)*$
0 or 1
PATTERN
gp003
Positive integer or '0'
^\d*$
A positive integer or zero
PATTERN
gp005
Any Number. Decimal, Sci Notn, +/-
^\-?(\d+(\.\d+)?|\.\d+)(e\-?\+?\d\d?)?$
A number. May be a decimal and may be positive or negative, May be scientific notation. Case Ins
PATTERN
gp006
Positive integer
^[1-9]\d*$
gp006 A positive integer. Not zero
PATTERN
gp010
A Digital Object Identifier
^(10\.\d\d\d\d+\/.*)$
A Digital Object Identifier
PATTERN
gp011
A Patent Identifier
^(WO|EP|US)\-?\d+.*$
A Patent Identifier. WO,EP,US
PATTERN
gp018
Accepted Assay test types [case Ins]
^(in vitro|in vivo|ex vivo)$
Accepted Assay test types. Case insensitive match
PATTERN
gp019
Accepted Assay types [case Ins]
^(ADMET|A|Functional|F|Binding|B|Unassigned|U|Physiochemical|P|Toxicity|T)$
Accepted Assay types. Case insensitive match
PATTERN
gp021
Accepted Target types [case ins]
^(None|NUCLEIC\-ACID|NUCLEIC ACID|TISSUE|PROTEIN|ORGANISM|CELL\-LINE|CELL\_LINE| CELL LINE|ADMET|UNKNOWN|UNCHECKED|SUBCELLULAR|NO TARGET|PROTEIN COMPLEX|PROTEIN FAMILY|PROTEIN COMPLEX GROUP|CHIMERIC PROTEIN|SELECTIVITY GROUP|PROTEIN\-PROTEIN INTERACTION|SINGLE PROTEIN|MOLECULAR|NON\-MOLECULAR|UNDEFINED|PHENOTYPE|PROTEIN NUCLEIC\-ACID COMPLEX|SMALL MOLECULE|OLIGOSACCHARIDE|METAL|LIPID|MACROMOLECULE| Other)$
Accepted Taregt types. Case insensitive match.
PATTERN
gp022
relation symbol (=,>,etc).
^(\=|\>|\<|\~|\<\=|\>\=|\>\>|\<\<)\=?$
An accepted relation symbol
PATTERN
gp023
CHEMBLID format
^CHEMBL\d+$
The accepted format for a CHEMBLID
PATTERN
gp024
An integer or a UniProt ID
^(\d+)|([OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2})
An integer or a UniProt ID. Acceptable 'Accession IDs'
PATTERN
gp031
1900 > year > 2050
^(19\d\d|20(0|1|2|3|4)\d)$
1900 > year > 2050
PATTERN
gp032
An accepted reference type [case ins]
^(Patent|Publication|Dataset|Book)$
A accepted reference type. Case insensitive match