Our Commitment to Cancer Research

Doctor on blurred background using digital ribbon cancer interface 3D rendering
Deep 6 AI Blog

Deep 6 AI Blog

Get insights about clinical trials, patient identification, health IT, data science, machine learning, and more.

May is Cancer Research Month so we asked our Chief Oncologist, Dr. BJ Rimel, and our lead data scientist, Scott Hoch, to shed some light on how Deep 6 is helping advance cancer research through AI-enabled Clinical Trial Acceleration Software.

Deep 6 AI: Why is oncology important to Deep 6?

Scott: Our goal is to help clinical research staff match patients with their trials in minutes not months. This means we need to extract the specific medical information from a patient’s Electronic Medical Record (EMR) that is relevant to clinical trial eligibility so that the right patients are identified.


To find which information will be most useful, we speak with our clients, clinical researchers, and disease specialists about what they are looking for, and what causes them the most difficulty. We also explore the inclusion criteria from actual trials on clinicaltrials.gov.


Because of the large quantity of oncology research and the complexity of its trial criteria, we have found that targeting cancer-focused patient attributes has some of the biggest impact to our clients and their patients. We believe this trend will accelerate as research continues to progress and cancer treatment becomes more and more personalized.


Deep 6 AI: How do we optimize our cohort building tool to solve problems that oncologists face every day?

Scott: There are three main efforts we focus on that impact cancer-related clinical trials.

  • Measured values both in structured and unstructured patient data: Laboratory test results are a great example of this phenomenon. We identify key metrics that relate directly to oncology inclusion and exclusion criteria. These include measurements like PSA doubling times, cancer stage detection, chemotherapy treatments and the like. If you want to find a PSA doubling time of 4, it’s easy enough to match that exact phrase in a record, but if you’re looking for PSA doubling time of >2, more care has to be taken when ingesting data. 
  • Relationships between medical concepts: There are many ways to say the same medical concept. If you’re searching for “breast cancer” you also want to find “malignant neoplasms of the left breast.” In order to find more variations of a concept, we have an internal graph of medical knowledge that we utilize to find many variations of concepts that you can select from in our cohort builder. We are always updating our internal understanding so that the tool is continuously improving over time.
  • Individually targeted medical concepts: While our classification system helps identify various representations of concepts, there are still other topics that require extra care either because they are more difficult to identify within unstructured data or they are particularly useful to a large number of trials. We build out specific models to identify these concepts.


We are continuously improving our internal classification models so that users can find every instance of concepts they are looking for. Above you can see a subsection of our model that highlights “cancer” as a concept (blue dot in the center). When a researcher looks for “cancer,” they will also be searching for all of the other representations of cancer (red dots) so that they can find “malignant neoplasms”, “Ca – Unspecified site NOS”, etc. 

Dr. BJ Rimel
Chief Oncologist, Deep 6 AI
Gynecologic Oncologist, Cedars-Sinai Medical Center

Deep 6 AI: What is Triple Negative Breast Cancer (TNBC) and why is it challenging for researchers?

Dr. Rimel: Breast cancer is the most common cancer in women, affecting one out of every 9 women in their lifetimes. It is also a heavily researched cancer and generates a large demand for clinical trial patients. If a breast cancer has estrogen receptor, progesterone receptor, or the HER-2/neu receptor negativity, then it is inherently resistant to many of the anti-estrogen therapies available to the more common estrogen/progesterone positive breast cancers. Unfortunately, this makes TNBC harder to treat and options remain limited. Clinical trials specifically targeting this population are more common due to the increased medical need. 


Triple negative breast cancer (TNBC) is hard to identify because measurements of the tumor’s hormone status can appear in different places in a medical record. Even if they are found in the same sentence, we have found over 800,000 different ways of writing this diagnosis from “triple negative breast cancer” to “triple – left breast ca”, “(er/pr/h2)-“, etc. These phrasings can further be complicated by the inclusion of staging, TNM measurements, and hereditary cancer genes such as BRCA1/2.

Deep 6 AI: Why is TNBC challenging for machine learning?

Scott: This is an interesting machine learning issue because the complications in this rule-based approach don’t lie in complicated neural nets and natural language processing, but rather in the surfacing of all the relevant patterns without a definitive source of truth [RBJM1].

Deep 6 AI: Can you translate that into non data-science terms?

Scott: Sorry…It’s hard because doctors and physicians refer to TNBC in hundreds of different ways in the patient record, so we need to identify them all and then create special rules for making sense of them so that our tool understands them in their proper context.

Deep 6 AI: How did we solve it?

Scott: This type of issue comes up quite often when dealing with medical language as there are many ways to say the same thing. This is further complicated by slang, dialects, and short-hand notations that organically arise across hospitals and even departments. 


We focused our energy on building an infrastructure that allows data scientists and medical professionals to rapidly explore data around a particular topic such as TNBC. This system asks medical professionals to answer questions about data and identifies all of the logical units within a concept. For example, an estrogen receptor can be represented as ER => [er, estrogen, estrogen receptor, -er, etc.] and breast cancers can have their locations described by LOCATION => [left, right, right-sided, left-sided, left-sided, bilateral, unilateral, etc.]. Our systems then examine the permutations of these logical units such as [‘ER’, ‘PR’, ‘H2’, ‘NEG’], [‘TRIPLE_NEG’, ‘POSITION’, ‘BREAST_CANCER’], etc. to generate a comprehensive list of the ways a concept can be represented in a way that is easy to keep up to date.


Deep 6 AI: How long did it take?

Scott: Building the software for collaboration around pattern identification was a considerable effort. However, we now reuse these tools to annotate other highly complex medical concepts that customers bring to our attention and can go from a prioritized issue to a deployed solution in around 2 weeks.


Deep 6 AI: What’s next?

Scott: As personalized medicine and genomic testing become more common in healthcare, we are building out a more robust suite of biomarker identification tools in Deep6. Beyond biomarkers, we are excited to release tools that will make it easier to search for positive lymph node counts, cancer staging, and lines of therapy a patient receives. These goals are ambitious, but we have run some small-scale experiments internally that have me very optimistic about the future.

Scott Hoch
Lead Data Scientist, Deep 6 AI

Share this post

Share on linkedin
Share on facebook
Share on twitter
Share on print
Share on email