Discovery Sciences | Resources

Dictionary of Discovery Sciences terms

A high level list of frequently used terms, acronyms and abbreviations used by the Discovery Sciences team

Dictionary

Algorithm

[Definition]

A sequence of instructions or a set of rules that are followed to complete a task, investigate and solve a problem or perform a computation. They are unambiguous and in the case of Sensyne Health for [insert – e.g. calculations, data processing and automated reasoning].

Anonymised data

[Definition]

Information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. Most medical studies use pseudo anonymised data, which refers to the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information, as long as such additional information is kept separately and subject to technical and organisational measures to ensure non-attribution to an identified or identifiable individual.

Artificial Intelligence (AI)

[Definition]

The ability of a machine to demonstrate ‘intelligence’ – such as simulating human intelligence or exhibiting traits associated with a human mind such as learning and problem-solving - meaning that the device perceives its environment, rationalise and take actions that maximise its chance to successfully achieving its goals.

BNF

[Abbreviation]

The British National Formulary (BNF) provides prescribers with information (including indications, dose and side effects) on medications which are prescribed in the UK and is published jointly by the British Medical Association and the Royal Pharmaceutical Society.

CHA2DS2–VASc

[Abbreviation]

A tool used in clinical practice to predict the one-yearrisk of ischaemic stroke in patients with atrial fibrillation. It is commonlyused in conjunction with the HAS-BLED score to inform the decision to prescribeanticoagulation to patients with atrial fibrillation.

Classification (algorithms)

[Definition]

(DEFN) Classification is the general process of grouping items into categories. Within the field of machine learning, classification algorithms describe the subset of machine learning algorithms which automatically assign items into categories based on a learned mathematical representation using ‘training data’. One example is a decision tree algorithm which classifies if a patient has heart failure or not.

Clinical Trial: ‘ACTIVE Arm’

[Definition]

The part of a clinical trial where a group of participants receives an intervention / treatment considered to be effective (or active) by health care providers – as opposed to a placebo.

Clinical Trial: ‘CONTROL Arm’

[Definition]

The part of a clinical trial where people who are not receiving the substance with a proposed effect (i.e. they are receiving a placebo) and are not actively enrolled into a trial.

Clinical Trial: ‘SYNTHETIC CONTROL Arm’

[Definition]

Instead of collecting data from patients recruited for a trial who have been assigned to the control arm, synthetic control arms model those comparators using real-world data that has previously been collected from sources such as health data generated during routine care, including electronic health records; administrative claims data; patient-generated data from fitness trackers or home medical equipment; disease registries; and historical clinical trial data.

Clinical pathway

[Definition]

(DEFN) A care/clinical pathway is a complex intervention for the mutual decision-making and organisation of care processes for a well-defined group of patients during a well-defined period. The aim of a care pathway is to enhance the quality of care across the continuum by improving risk-adjusted patient outcomes, promoting patient safety, increasing patient satisfaction, and optimising the use of resources.

Clustering

[Definition]

Grouping patients, admissions or other entities into groups based on similar characteristics. These characteristics also differ between groups.

Control population / control group

[Definition]

A group that doesn't receive a treatment or other intervention in a study. In machine learning this could be used to define a population that doesn't have an outcome that is being predicted.

Data Representation (Machine Learning)

[Definition]

A representation of the data that is suitable for machine learning algorithms. This is commonly a vector of features / characteristics per entity but can also be a sequence of features.

Deep Learning

[Definition]

A type of machine learning that trains multiple layers of a network to learn characteristics from low level features to high-level features from raw input.

Diagnosis codes

[Definition]

A structured and standardised approach of recording diseases, disorders and symptoms. Currently, the International Statistical Classification of Diseases and Related Health Problems (ICD) revision 10 is the standard for medical classification.

Dimensionality reduction / Embedding

[Definition]

The process of reducing the number of variables that are considered. This is often performed to identify and visualise structure within a high dimensional space. An embedding is this low dimensional space that the entity is now represented in.

Electronic Health Records / Electronic Patient Records

[Definition]

Data that is generated during a patient's journey in the medical system and is captured in an electronic format.

Endotype

[Definition]

A subtype of a disease / condition, which is defined by a common underlying cause / mechanism usually referred to as the pathophysiology.

Feature selection

[Definition]

The process of selecting a subset of relevant features (variables, predictors) for use in model construction.

Features

[Definition]

An individual measurable property or characteristic of a phenomenon being observed.

Fine-tuning

[Definition]

A process to take a network model that has already been trained for a given task, and make it perform a second similar task.

HAS-BLED

[Abbreviation]

A risk prediction tool used in clinical practice which provides an estimate of the risk of major bleeding over one year in patients taking anticoagulants for a trial fibrillation.

Hyperparameter search

[Definition]

The task of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process.

ICD-10

[Abbreviation]

The International Statistical Classification of Diseases and Related Health Problems is a classification system produced by the World Health Organisation and contains a coding system for diseases as well as other medical information such as symptoms and causes of disease or injury.

IO (immuno-oncology)

[Abbreviation]

Used to refer to immunotherapy drugs used to treat cancer, trials of such drugs, etc.

LIMS

[Abbreviation]

Laboratory Information Management System

Machine Learning (ML)

[Definition]

A subset of artificial intelligence which relies on algorithms and statistical models that allow a computer system to perform a specific task such as identifying subpopulations within a patient cohort. In contrast to explicitly programmed algorithms, ML algorithms are data driven and learn a mathematical representation based on training data.

NICE

[Abbreviation]

The National Institute for Health and Care Excellence is a UK body which produces national evidence-based guidelines in several areas including clinical practice.

OPCS-4

[Abbreviation]

Classification of Interventions and Procedures, Version 4. It is the procedural classification used by clinical coders within National Health Service (NHS) hospitals of NHS England, NHS Scotland, NHS Wales and Health and Social Care in Northern Ireland.

Patient cohort

[Definition]

A group of patients that display a particular set of common characteristics. This could be defined by disease, geography, or outcomes, for example. Within Sensyne, patient cohorts are requested from NHS trusts to answer a particular clinical question.

Patient outcomes

[Definition]

Clinically-relevant endpoints, such as occurrence or elapse of the disease, as well as death or any other important events.

Patient stratification

[Definition]

Division of a larger patient group into subgroups with particular phenotypes or endotypes. Identification of these subgroups can improve patient selection for clinical trials or treatment.

Patient subtypes

[Definition]

Patient or disease subtypes are subtypes of a wider disease definition that display different phenotypes, endotypes or respond differently to a treatment, for examples.

Phenotypes

[Definition]

A set of observable physical and clinical characteristics.

Precision medicine

[Definition]

Tailoring of medical treatment to the individual characteristics of each patient to classify individuals into subpopulations that differ in their susceptibility to a particular disease or their response to a specific treatment.

Prediction

[Definition]

Output of a machine learning algorithm to unseen data.

QRISK

[Definition]

An algorithm used in clinical practice which predicts a patients risk of developing a heart attack or stroke in the next 10 years. It is commonly used inform the decision to prescribe statins to at risk patients.

Real world data

[Definition]

Data collected and stored without a preconditioned cause, and reflects the natural volume and content of patient medical history.

Real world evidence

[Definition]

Real-world studies provide a line of complementary evidence to the other studies of observational or experimental design (randomised controlled trials (RCTs)). RCTs are still held as the gold standard for investigating causality, but real-world studies produce essential evidence of therapeutic effectiveness in real-world setting. Evidence from Real-world studies is very important to understand the utility of medical approaches in a broader and more representative patient population.

Regression

[Definition]

Statistical method used in a variety of disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables).

Semi-supervised learning

[Definition]

Subtype of machine learning which contains un- and supervised learning aspects. In this scenario, labelled data is only available for a fraction of the complete dataset. One example is the prediction of patients with a normal or preserved ejection fraction based on their electronical healthcare records while only a small portion of the patients were being tested.

Supervised learning

[Definition]

Subtype of machine learning which aims to discover patterns within data by making explicit use of (often human annotated) labels. An example is an algorithm which predicts the likelihood of patients having a particular disease by learning the underlying patterns within the patient history. In contrast to unsupervised learning, supervised learning algorithms aim to minimise the difference between the prediction and the actual label.

TLFs

[Abbreviation]

Tables, lists and figures. Usually a term used to describe a document containing summary statistics.

Training, validation and test set

[Definition]

A commonly used technique within machine learning to develop robust algorithms. Thereby the available data is split into a training, validation and test set. An algorithm is trained on the validated on the training and validation set, respectively, to find the optimal set of algorithm parameters. The optimal model is further evaluated on a completely independent test set.

Transfer learning

[Definition]

A machine learning technique which aims to transfer knowledge which was gained by training an ML algorithm on one problem and applying it to another different problem. Transfer learning approaches are often used when only a small dataset is available for the target problem.

Unsupervised learning

[Definition]

Subtype of machine learning which aims to discover undetected patterns within data without any pre-existing labels. An example is the discovery of patient subpopulations in a larger cohort. Unsupervised learning algorithms try to optimise different optimisation functions such as finding optimal clusters or lower dimensional representations of the input data.

Weak learning

[Definition]

Subtype of machine learning, which is also referred to as weak supervision and is related to supervised learning. In weak learning scenarios, very noisy and imprecise sources are used to generate labels. Weak learning approaches are often considered, when the acquisition of precise labels is either not possible, time-consuming, or expensive.

Partner with us

Like to hear more about how we partner with NHS trusts and understand how we could partner with you?
Contact us