The datasets below are included in the development work.

ECLIPSE is a large administrative data set in England that includes data from 25 million subjects and covers one third of the English population. The de-identified patient database is ideally suited to this project, containing regular extractions from General practice records (including medication utilisation, blood results, clinical conditions and basic demographic information) from over 2000 GP practices For the purpose of our development work we will use the electronic health records of around 1 million Norfolk residents, related to disease and treatment spanning up to 25 years.

THE NORFOLK ARTHRITIS REGISTER (NOAR) is an inception cohort of patients with early inflammatory arthritis that was established in 1989, and now holds the records of over 5,000 patients followed serially. .Participants enrolled in NOAR have been followed by nurse-led clinical assessments at 1,3,5,7,10, 15 and 20 years. Over 90,000 samples (including DNA and serial serum samples including inflammatory markers) are stored and curated in the Biorepository in Norwich. In addition to the size, length of follow up and depth of phenotypic and genetic data that have been gathered.

NOAR is linked to the EPIC-NORFOLK cohort, a long term population based register of 25,000 initially healthy subjects followed longitudinally for 30 years. This provides unique data on health status and biomarkers prior to the onset of their inflammatory disease.

UK BIOBANK is a population-based cohort of 502,506 participants aged 40-69 years at recruitment between 2006 and 2010 (25). Baseline assessments included an extensive physical assessment with samples of bolo urine, and saliva. Dietary questionnaires have been completed on up to four occasions by over 200,000 participants. Genotyping has been undertaken on all participants.