Please upgrade your browser

We built this website using the latest browser technologies to deliver the very best experience.

This makes the site run faster and easier to use. Unfortunately, your browser is out of date and will not support some of these technologies.

We recommend that you use a modern browser such as Google Chrome or Microsoft Edge to view this website.

Download ChromeDownload Microsoft Edge
Blog

A novel machine learning algorithm for patient stratification of longitudinal real world data using clinical endpoints

Using de-identified and anonymised longitudinal patient data from Oxford University Hospitals NHS Foundation Trust, Sensyne Health’s data science team has developed a novel algorithm to discover sub-groups of patients who share both a similar medical history and future clinical endpoints.

December 1, 2021

This can be used, for instance, to identify more suitable patient sub-groups for clinical trials with specific clinical endpoints such as survival or future cardiac events. The findings will be published in the proceeding of the Machine Learning for Health (ML4H) 2021 conference (4th December 2021). A preprint of the publication is already available here.

Figure 1: Illustration of the longitudinal clustering algorithm. Previous patient observations, as well as clinical endpoints, are used as inputs for a deep learning model which clusters patients based on their medical history and future risk of clinical events. The focus of the model can be adjusted toward trajectory oriented, endpoints oriented, or a combination of both.


Most algorithms developed to cluster patients from electronic health records (EHRs) typically focus on finding similarities within the medical history in a purely unsupervised manner, meaning that they do not consider any future clinical endpoints. As EHRs collect data for clinical care, and not often with research in mind, unsupervised clustering methods often find clusters driven by spurious associations such as patient drop out, IT infrastructure changes, or administrative differences between healthcare providers.

More recently, predictive clustering algorithms have been developed which use patient endpoints to guide the clustering of patients from their EHRs. These approaches ensure that the discovered groups of patients share similar clinically relevant endpoints such as survival or the onset of new diseases. They are, however, unable to distinguish between the different patient histories which lead to these endpoints. 

A new algorithm developed by Sensyne Health has been designed to find clusters of patients who share similar longitudinal medical histories, whilst also ensuring patients in each cluster share similar risks of future clinical endpoints. The deep learning-based approach consists of a recurrent neural network autoencoder with a reconstruction loss (to learn patient history), an endpoint loss (to learn future clinical endpoints), and a clustering loss (to discover similar groups of patients). The losses can be balanced or removed depending on the desired results of the longitudinal patient clustering by the researcher. The algorithm leverages a range of data types (continuous, binary, nominal, ordinal, categorical) from EHR data, including diagnoses, procedures, medications, and laboratory measurements and can be further extended to include information from images and clinical reports. 

The algorithm was demonstrated using Sensyne’s cardiovascular dataset (available on the SENSIGHT™ platform). Sensyne Health data science team created a cohort of 29,229 patients with a diagnosis of diabetes using the time to a cardiovascular event after the first diabetes diagnosis as an endpoint. We evaluated our algorithm to identify patient clusters defined by only the patient trajectory (Figure 2a), only the clinical endpoint (Figure 2c) and a combination of both (Figure 2b). Using a combination of the reconstruction, endpoint, and clustering losses the algorithm found distinct clusters of diabetes patients who share similar histories, whilst ensuring the same clusters of patients share similar risks of future cardiovascular events. The arrows in Figure 2 indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk (Figure 2a → Figure 2b) or clusters with the same endpoint risk are split into groups with distinct trajectories (Figure 2c → Figure 2b), with alternative methods unable to be used to identify these clusters. 


Figure 2: Cumulative incidence curves for time to a cardiovascular endpoint (stroke or myocardial infarction) for each cluster discovered by the machine learning algorithms. The plots show results for: (a) unsupervised clustering with 3 clusters, focussing on past trajectories and showing smaller differences between incident curves; (b) a combined trajectory and endpoint clustering with 5 clusters; and (c) a predictive clustering with 3 clusters, focussing on the cardiovascular endpoint and showing large differences between incidence curves. The arrows indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk ((a)(b)) or clusters with the same endpoint risk are split into distinct trajectories ((c)(b)).


This approach could be used as a tool to aid clinical decision making by indicating future risks for patients. In addition, it can be used to improve recruitment and patient cohort design in clinical trials through indicating whether patients are more or less likely to be at risk of a clinical event based on their medical histories. 

Reference:

Oliver Carr, Avelino Javer, Patrick Rockenschaub, Owen Parsons, and Robert Dürichen. Longitudinal patient stratification of electronic health records with flexible adjustment for clinical outcomes. Proceedings of Machine Learning for Health. 2021 Dec 4.


Robert Dürichen, Head of Machine Learning Research, Sensyne Health 

Blog

A novel machine learning algorithm for patient stratification of longitudinal real world data using clinical endpoints

December 1, 2021
Using de-identified and anonymised longitudinal patient data from Oxford University Hospitals NHS Foundation Trust, Sensyne Health’s data science team has developed a novel algorithm to discover sub-groups of patients who share both a similar medical history and future clinical endpoints.

This can be used, for instance, to identify more suitable patient sub-groups for clinical trials with specific clinical endpoints such as survival or future cardiac events. The findings will be published in the proceeding of the Machine Learning for Health (ML4H) 2021 conference (4th December 2021). A preprint of the publication is already available here.

Figure 1: Illustration of the longitudinal clustering algorithm. Previous patient observations, as well as clinical endpoints, are used as inputs for a deep learning model which clusters patients based on their medical history and future risk of clinical events. The focus of the model can be adjusted toward trajectory oriented, endpoints oriented, or a combination of both.


Most algorithms developed to cluster patients from electronic health records (EHRs) typically focus on finding similarities within the medical history in a purely unsupervised manner, meaning that they do not consider any future clinical endpoints. As EHRs collect data for clinical care, and not often with research in mind, unsupervised clustering methods often find clusters driven by spurious associations such as patient drop out, IT infrastructure changes, or administrative differences between healthcare providers.

More recently, predictive clustering algorithms have been developed which use patient endpoints to guide the clustering of patients from their EHRs. These approaches ensure that the discovered groups of patients share similar clinically relevant endpoints such as survival or the onset of new diseases. They are, however, unable to distinguish between the different patient histories which lead to these endpoints. 

A new algorithm developed by Sensyne Health has been designed to find clusters of patients who share similar longitudinal medical histories, whilst also ensuring patients in each cluster share similar risks of future clinical endpoints. The deep learning-based approach consists of a recurrent neural network autoencoder with a reconstruction loss (to learn patient history), an endpoint loss (to learn future clinical endpoints), and a clustering loss (to discover similar groups of patients). The losses can be balanced or removed depending on the desired results of the longitudinal patient clustering by the researcher. The algorithm leverages a range of data types (continuous, binary, nominal, ordinal, categorical) from EHR data, including diagnoses, procedures, medications, and laboratory measurements and can be further extended to include information from images and clinical reports. 

The algorithm was demonstrated using Sensyne’s cardiovascular dataset (available on the SENSIGHT™ platform). Sensyne Health data science team created a cohort of 29,229 patients with a diagnosis of diabetes using the time to a cardiovascular event after the first diabetes diagnosis as an endpoint. We evaluated our algorithm to identify patient clusters defined by only the patient trajectory (Figure 2a), only the clinical endpoint (Figure 2c) and a combination of both (Figure 2b). Using a combination of the reconstruction, endpoint, and clustering losses the algorithm found distinct clusters of diabetes patients who share similar histories, whilst ensuring the same clusters of patients share similar risks of future cardiovascular events. The arrows in Figure 2 indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk (Figure 2a → Figure 2b) or clusters with the same endpoint risk are split into groups with distinct trajectories (Figure 2c → Figure 2b), with alternative methods unable to be used to identify these clusters. 


Figure 2: Cumulative incidence curves for time to a cardiovascular endpoint (stroke or myocardial infarction) for each cluster discovered by the machine learning algorithms. The plots show results for: (a) unsupervised clustering with 3 clusters, focussing on past trajectories and showing smaller differences between incident curves; (b) a combined trajectory and endpoint clustering with 5 clusters; and (c) a predictive clustering with 3 clusters, focussing on the cardiovascular endpoint and showing large differences between incidence curves. The arrows indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk ((a)(b)) or clusters with the same endpoint risk are split into distinct trajectories ((c)(b)).


This approach could be used as a tool to aid clinical decision making by indicating future risks for patients. In addition, it can be used to improve recruitment and patient cohort design in clinical trials through indicating whether patients are more or less likely to be at risk of a clinical event based on their medical histories. 

Reference:

Oliver Carr, Avelino Javer, Patrick Rockenschaub, Owen Parsons, and Robert Dürichen. Longitudinal patient stratification of electronic health records with flexible adjustment for clinical outcomes. Proceedings of Machine Learning for Health. 2021 Dec 4.


Robert Dürichen, Head of Machine Learning Research, Sensyne Health 

Blog

A novel machine learning algorithm for patient stratification of longitudinal real world data using clinical endpoints

A novel machine learning algorithm for patient stratification of longitudinal real world data using clinical endpoints

December 1, 2021
Using de-identified and anonymised longitudinal patient data from Oxford University Hospitals NHS Foundation Trust, Sensyne Health’s data science team has developed a novel algorithm to discover sub-groups of patients who share both a similar medical history and future clinical endpoints.

This can be used, for instance, to identify more suitable patient sub-groups for clinical trials with specific clinical endpoints such as survival or future cardiac events. The findings will be published in the proceeding of the Machine Learning for Health (ML4H) 2021 conference (4th December 2021). A preprint of the publication is already available here.

Figure 1: Illustration of the longitudinal clustering algorithm. Previous patient observations, as well as clinical endpoints, are used as inputs for a deep learning model which clusters patients based on their medical history and future risk of clinical events. The focus of the model can be adjusted toward trajectory oriented, endpoints oriented, or a combination of both.


Most algorithms developed to cluster patients from electronic health records (EHRs) typically focus on finding similarities within the medical history in a purely unsupervised manner, meaning that they do not consider any future clinical endpoints. As EHRs collect data for clinical care, and not often with research in mind, unsupervised clustering methods often find clusters driven by spurious associations such as patient drop out, IT infrastructure changes, or administrative differences between healthcare providers.

More recently, predictive clustering algorithms have been developed which use patient endpoints to guide the clustering of patients from their EHRs. These approaches ensure that the discovered groups of patients share similar clinically relevant endpoints such as survival or the onset of new diseases. They are, however, unable to distinguish between the different patient histories which lead to these endpoints. 

A new algorithm developed by Sensyne Health has been designed to find clusters of patients who share similar longitudinal medical histories, whilst also ensuring patients in each cluster share similar risks of future clinical endpoints. The deep learning-based approach consists of a recurrent neural network autoencoder with a reconstruction loss (to learn patient history), an endpoint loss (to learn future clinical endpoints), and a clustering loss (to discover similar groups of patients). The losses can be balanced or removed depending on the desired results of the longitudinal patient clustering by the researcher. The algorithm leverages a range of data types (continuous, binary, nominal, ordinal, categorical) from EHR data, including diagnoses, procedures, medications, and laboratory measurements and can be further extended to include information from images and clinical reports. 

The algorithm was demonstrated using Sensyne’s cardiovascular dataset (available on the SENSIGHT™ platform). Sensyne Health data science team created a cohort of 29,229 patients with a diagnosis of diabetes using the time to a cardiovascular event after the first diabetes diagnosis as an endpoint. We evaluated our algorithm to identify patient clusters defined by only the patient trajectory (Figure 2a), only the clinical endpoint (Figure 2c) and a combination of both (Figure 2b). Using a combination of the reconstruction, endpoint, and clustering losses the algorithm found distinct clusters of diabetes patients who share similar histories, whilst ensuring the same clusters of patients share similar risks of future cardiovascular events. The arrows in Figure 2 indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk (Figure 2a → Figure 2b) or clusters with the same endpoint risk are split into groups with distinct trajectories (Figure 2c → Figure 2b), with alternative methods unable to be used to identify these clusters. 


Figure 2: Cumulative incidence curves for time to a cardiovascular endpoint (stroke or myocardial infarction) for each cluster discovered by the machine learning algorithms. The plots show results for: (a) unsupervised clustering with 3 clusters, focussing on past trajectories and showing smaller differences between incident curves; (b) a combined trajectory and endpoint clustering with 5 clusters; and (c) a predictive clustering with 3 clusters, focussing on the cardiovascular endpoint and showing large differences between incidence curves. The arrows indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk ((a)(b)) or clusters with the same endpoint risk are split into distinct trajectories ((c)(b)).


This approach could be used as a tool to aid clinical decision making by indicating future risks for patients. In addition, it can be used to improve recruitment and patient cohort design in clinical trials through indicating whether patients are more or less likely to be at risk of a clinical event based on their medical histories. 

Reference:

Oliver Carr, Avelino Javer, Patrick Rockenschaub, Owen Parsons, and Robert Dürichen. Longitudinal patient stratification of electronic health records with flexible adjustment for clinical outcomes. Proceedings of Machine Learning for Health. 2021 Dec 4.


Robert Dürichen, Head of Machine Learning Research, Sensyne Health 

Blog

A novel machine learning algorithm for patient stratification of longitudinal real world data using clinical endpoints

Using de-identified and anonymised longitudinal patient data from Oxford University Hospitals NHS Foundation Trust, Sensyne Health’s data science team has developed a novel algorithm to discover sub-groups of patients who share both a similar medical history and future clinical endpoints.

This can be used, for instance, to identify more suitable patient sub-groups for clinical trials with specific clinical endpoints such as survival or future cardiac events. The findings will be published in the proceeding of the Machine Learning for Health (ML4H) 2021 conference (4th December 2021). A preprint of the publication is already available here.

Figure 1: Illustration of the longitudinal clustering algorithm. Previous patient observations, as well as clinical endpoints, are used as inputs for a deep learning model which clusters patients based on their medical history and future risk of clinical events. The focus of the model can be adjusted toward trajectory oriented, endpoints oriented, or a combination of both.


Most algorithms developed to cluster patients from electronic health records (EHRs) typically focus on finding similarities within the medical history in a purely unsupervised manner, meaning that they do not consider any future clinical endpoints. As EHRs collect data for clinical care, and not often with research in mind, unsupervised clustering methods often find clusters driven by spurious associations such as patient drop out, IT infrastructure changes, or administrative differences between healthcare providers.

More recently, predictive clustering algorithms have been developed which use patient endpoints to guide the clustering of patients from their EHRs. These approaches ensure that the discovered groups of patients share similar clinically relevant endpoints such as survival or the onset of new diseases. They are, however, unable to distinguish between the different patient histories which lead to these endpoints. 

A new algorithm developed by Sensyne Health has been designed to find clusters of patients who share similar longitudinal medical histories, whilst also ensuring patients in each cluster share similar risks of future clinical endpoints. The deep learning-based approach consists of a recurrent neural network autoencoder with a reconstruction loss (to learn patient history), an endpoint loss (to learn future clinical endpoints), and a clustering loss (to discover similar groups of patients). The losses can be balanced or removed depending on the desired results of the longitudinal patient clustering by the researcher. The algorithm leverages a range of data types (continuous, binary, nominal, ordinal, categorical) from EHR data, including diagnoses, procedures, medications, and laboratory measurements and can be further extended to include information from images and clinical reports. 

The algorithm was demonstrated using Sensyne’s cardiovascular dataset (available on the SENSIGHT™ platform). Sensyne Health data science team created a cohort of 29,229 patients with a diagnosis of diabetes using the time to a cardiovascular event after the first diabetes diagnosis as an endpoint. We evaluated our algorithm to identify patient clusters defined by only the patient trajectory (Figure 2a), only the clinical endpoint (Figure 2c) and a combination of both (Figure 2b). Using a combination of the reconstruction, endpoint, and clustering losses the algorithm found distinct clusters of diabetes patients who share similar histories, whilst ensuring the same clusters of patients share similar risks of future cardiovascular events. The arrows in Figure 2 indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk (Figure 2a → Figure 2b) or clusters with the same endpoint risk are split into groups with distinct trajectories (Figure 2c → Figure 2b), with alternative methods unable to be used to identify these clusters. 


Figure 2: Cumulative incidence curves for time to a cardiovascular endpoint (stroke or myocardial infarction) for each cluster discovered by the machine learning algorithms. The plots show results for: (a) unsupervised clustering with 3 clusters, focussing on past trajectories and showing smaller differences between incident curves; (b) a combined trajectory and endpoint clustering with 5 clusters; and (c) a predictive clustering with 3 clusters, focussing on the cardiovascular endpoint and showing large differences between incidence curves. The arrows indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk ((a)(b)) or clusters with the same endpoint risk are split into distinct trajectories ((c)(b)).


This approach could be used as a tool to aid clinical decision making by indicating future risks for patients. In addition, it can be used to improve recruitment and patient cohort design in clinical trials through indicating whether patients are more or less likely to be at risk of a clinical event based on their medical histories. 

Reference:

Oliver Carr, Avelino Javer, Patrick Rockenschaub, Owen Parsons, and Robert Dürichen. Longitudinal patient stratification of electronic health records with flexible adjustment for clinical outcomes. Proceedings of Machine Learning for Health. 2021 Dec 4.


Robert Dürichen, Head of Machine Learning Research, Sensyne Health 

Arrange to meet us
Blog

A novel machine learning algorithm for patient stratification of longitudinal real world data using clinical endpoints

December 1, 2021
Using de-identified and anonymised longitudinal patient data from Oxford University Hospitals NHS Foundation Trust, Sensyne Health’s data science team has developed a novel algorithm to discover sub-groups of patients who share both a similar medical history and future clinical endpoints.

This can be used, for instance, to identify more suitable patient sub-groups for clinical trials with specific clinical endpoints such as survival or future cardiac events. The findings will be published in the proceeding of the Machine Learning for Health (ML4H) 2021 conference (4th December 2021). A preprint of the publication is already available here.

Figure 1: Illustration of the longitudinal clustering algorithm. Previous patient observations, as well as clinical endpoints, are used as inputs for a deep learning model which clusters patients based on their medical history and future risk of clinical events. The focus of the model can be adjusted toward trajectory oriented, endpoints oriented, or a combination of both.


Most algorithms developed to cluster patients from electronic health records (EHRs) typically focus on finding similarities within the medical history in a purely unsupervised manner, meaning that they do not consider any future clinical endpoints. As EHRs collect data for clinical care, and not often with research in mind, unsupervised clustering methods often find clusters driven by spurious associations such as patient drop out, IT infrastructure changes, or administrative differences between healthcare providers.

More recently, predictive clustering algorithms have been developed which use patient endpoints to guide the clustering of patients from their EHRs. These approaches ensure that the discovered groups of patients share similar clinically relevant endpoints such as survival or the onset of new diseases. They are, however, unable to distinguish between the different patient histories which lead to these endpoints. 

A new algorithm developed by Sensyne Health has been designed to find clusters of patients who share similar longitudinal medical histories, whilst also ensuring patients in each cluster share similar risks of future clinical endpoints. The deep learning-based approach consists of a recurrent neural network autoencoder with a reconstruction loss (to learn patient history), an endpoint loss (to learn future clinical endpoints), and a clustering loss (to discover similar groups of patients). The losses can be balanced or removed depending on the desired results of the longitudinal patient clustering by the researcher. The algorithm leverages a range of data types (continuous, binary, nominal, ordinal, categorical) from EHR data, including diagnoses, procedures, medications, and laboratory measurements and can be further extended to include information from images and clinical reports. 

The algorithm was demonstrated using Sensyne’s cardiovascular dataset (available on the SENSIGHT™ platform). Sensyne Health data science team created a cohort of 29,229 patients with a diagnosis of diabetes using the time to a cardiovascular event after the first diabetes diagnosis as an endpoint. We evaluated our algorithm to identify patient clusters defined by only the patient trajectory (Figure 2a), only the clinical endpoint (Figure 2c) and a combination of both (Figure 2b). Using a combination of the reconstruction, endpoint, and clustering losses the algorithm found distinct clusters of diabetes patients who share similar histories, whilst ensuring the same clusters of patients share similar risks of future cardiovascular events. The arrows in Figure 2 indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk (Figure 2a → Figure 2b) or clusters with the same endpoint risk are split into groups with distinct trajectories (Figure 2c → Figure 2b), with alternative methods unable to be used to identify these clusters. 


Figure 2: Cumulative incidence curves for time to a cardiovascular endpoint (stroke or myocardial infarction) for each cluster discovered by the machine learning algorithms. The plots show results for: (a) unsupervised clustering with 3 clusters, focussing on past trajectories and showing smaller differences between incident curves; (b) a combined trajectory and endpoint clustering with 5 clusters; and (c) a predictive clustering with 3 clusters, focussing on the cardiovascular endpoint and showing large differences between incidence curves. The arrows indicate how using our proposed method of combining trajectories and endpoints, clusters with similar trajectories are split into groups with distinct endpoint risk ((a)(b)) or clusters with the same endpoint risk are split into distinct trajectories ((c)(b)).


This approach could be used as a tool to aid clinical decision making by indicating future risks for patients. In addition, it can be used to improve recruitment and patient cohort design in clinical trials through indicating whether patients are more or less likely to be at risk of a clinical event based on their medical histories. 

Reference:

Oliver Carr, Avelino Javer, Patrick Rockenschaub, Owen Parsons, and Robert Dürichen. Longitudinal patient stratification of electronic health records with flexible adjustment for clinical outcomes. Proceedings of Machine Learning for Health. 2021 Dec 4.


Robert Dürichen, Head of Machine Learning Research, Sensyne Health