AI-designed “hyperfoods” can possibly help prevent cancer

Wenow live longer than ever. Yet, we are not necessarily living healthier anymore: with a rapidly aging population, people are experiencing a continuous growth of chronic diseases such as cancer, metabolic, neurological, and heart disorders. This drives healthcare costs through the roof and puts a significant strain on the public health systems [1].

A large part of the problem resides in poor dietary choices. Unhealthy diets kill more than cigarettes and are responsible for 1 out of 5 deaths globally — in 2018, this amounted to nearly 11 million lives. Besides the obvious culprit — unhealthy highly processed food — a less obvious killer is the low intake of healthy foods such as whole grains, vegetables, fruits, nuts, seeds, and legumes [2].

Take cancer for example: rightfully considered the plight of the modern era, it will affect every second reader of this post at one point in their lifetime. Although the perspective seems gloomy, the good news is that nearly 40% of all oncological diseases could be prevented through dietary and lifestyle changes alone [3] — a finding that encourages us to look more carefully at what we eat, as diet is perhaps the single most important modifiable risk factor for cancer.

The dark matter

In the past decades, nutrition science has made excellent progress in analysing the six major nutrition categories influencing human health and disease: proteins, carbohydrates, fats, minerals, vitamins, and water. National nutritional databases track about 150 components from these categories and they appear on every food package.

Yet, there is growing evidence that thousands of other molecules from a broad variety of chemical classes — such as polyphenolsflavonoidsterpenoids, and indoles that are abundant in plants and are often naturally responsible for their colour, taste, and odour — might help prevent and fight diseases [4]. Most of these compounds still remain largely unexplored by experts, not tracked by regulators, and unknown to the public at large, thus truly deserving the name “dark matter of nutrition” [5].

With every bite of food, we put in our mouth hundreds of such biologically active compounds. These molecules interact with each other the moment we swallow them, and as the food gets digested and metabolised, also react with other biomolecules in our body and with trillions of bacteria in our guts.

Many of the compounds found in plant-based foods belong to the same chemical classes as drugs. It is not surprising therefore that almost half of small molecules approved for anti-cancer therapies are derived from natural products. These drugs are generally more tolerated and less toxic to healthy cells [6].

Drugs and foods

Traditional drug molecules are designed to bind to biomolecular targets associated with particular disease processes, the most important of which involve proteins [7]. Classical drug therapy follows the “one disease–one drug–one target” paradigm, trying to identify one “druggable” protein associated with the disease that can be targeted by the drug. In reality, drugs are rarely so selective [8], and the intricate network (or “graph”) of interactions between proteins produces a “network effect” that can interfere with multiple biological processes — like one falling domino knocking off an entire row.

Protein-protein interactions (PPIs) [9] are considered as the next generation of therapeutic targets, and most pharmaceutical industries have now extended their drug discovery programs to PPIs [10]. And in order to leverage the big amounts of molecular interaction data produced by modern high-throughput technologies, machine learning (ML) is becoming increasingly prominent. However, unlike images and audio signals where ML has achieved breakthrough results in the past decade, network-structured data require a different type of methods called “graph ML”.

Graph ML, also known as “graph representation learning” or “geometric deep learning” is a recent hot topic in machine learning, which I have covered extensively in my Toward Data Science blog. Typical graph ML architectures, called graph neural networks implement some form of message passing on the graph allowing different nodes to exchange information. In the simplest formulation, message passing takes the form of linear diffusion or “random walk” on the graph [11].

In a paper published last year in Nature journal Scientific Reports, we applied graph ML to hunt for anti-cancer molecules in food using protein-protein and drug-protein interaction graphs [12]. The drug-to protein interactions were represented as signals on the PPI graph, and a learnable diffusion process was applied to model the network-wide effect of a drug.

We used a training set of nearly 2000 clinically-approved drugs with about 10% labeled as anti-cancer in order to train a classifier that predicts the anti-cancer drug-likeness of new molecules from the way they interact with the PPI graph. We then fed the trained classifier with about 8000 food-based molecules for which protein interactions are known. Our model identified over a hundred anti-cancer drug-like candidates, which we called “cancer-beating molecules”. The main advantage of using an ML approach is that such molecules can be discovered automatically exploiting massive openly available datasets.

We then resorted again to machine learning, using natural language processing (NLP) techniques to mine the trove of medical literature for experimental evidence of the anti-cancer effects of the identified molecules [13]. We also had to rule out compounds with excessive toxicity. This is the first validation step relying on the reported in vitro and in vivo experiments.

The good stuff

A key limitation of existing literature on food-based compounds is its focus on specific compounds, such as antioxidants, in isolation. You have certainly seen foods hailed as being rich in antioxidants and often marketed under the label of “superfoods”. Yet, while the regular consumption of such foods can reduce the risk of cancer formation (“oncogenesis”), the antiproliferative agents contained therein do not appear to consistently confer the same level of benefit when acting individually [14].

This phenomenon is similar to the concurrent use of multiple drug substances in medical practice (technically known as “polypharmacy”), which can often lead to both undesired side effects as well as synergistic action more powerful than from each drug on its own [15]. The anti-cancer effect of some foods is thus the result of a combination of biologically active substances and is determined by both their antagonistic and synergistic actions and the way in which these simultaneously act on different oncogenic biological mechanisms.

Tea and citrus fruits are examples of foods fulfilling both of these conditions: first, they contain multiple anti-cancer drug-like compounds identified by our ML model and confirmed from medical literature, and second, these compounds exert complementary anti-cancer effects [16].

With this understanding, we constructed the anti-cancer molecular profiles of over 250 different food ingredients, highlighting prominent champions that we called “hyperfoods”. Besides the aforementioned tea and citruses, cabbage, celery, and sage are rather common, cheap, and broadly available hyperfoods. In a sense, this comes to no surprise, as many of these foods are advocated as healthy choices by nutrition experts and there is overwhelming evidence of their health benefits.

Do not however rush to make a cabbage smoothie after reading this post, as you will likely be disappointed — most chances are that it will taste awful. We are still missing a final step of putting the hyperfood ingredients together into recipes that taste and look great. This is where we availed to the help of Jozef Youssef, the founder and Chef Patron of Kitchen Theory [17], who created simple, affordable, and delicious recipes using our hyperfood ingredients. In fact, hyperfoods are not only for the Michelin restaurant-goers and fans of the haute cuisine: many simple, traditional, everyday recipes are already packed with cancer-beating ingredients.

Next steps

We also need to bear in mind that food cooking involves physical and chemical processes that may change its molecular content. If for example, we fry our ingredients at high temperatures, many of the cancer-beating molecules are likely to disappear. We can represent food preparation as a computational graph with cooking transformations modeled as edges, and optimise it by choosing such operations that preserve in the best way the anti-cancer molecular composition.

Second, in addition to cancer-beating molecules, food also contains molecules giving it taste, smell, and characteristic flavour [18]. Many foods share multiple such components: you would be surprised, for example, that garlic and tea have more than a hundred flavour molecules in common. The secret of food pairing is to combine ingredients with similar or complementary flavour molecular profiles [19]. And what was thought to be some kind of ”black magic” of top cuisiniers can now be automated — we can potentially use graph ML to generate recipes that strike the optimal balance between health, taste, and maybe even aesthetics. It is not impossible that one day our computer-generated recipes will even challenge Michelin-star chefs [20].

Last but not least, when it comes to tastes, the only common truth is that de gustibus non est disputandum, as the Latin proverb goes. Recipe design must be highly personalised taking into account one’s taste preferences — but also many other parameters such as dietary restrictions, genetics, disease history, and the gut microbiome. We envision a future where everyone will have a digital “food passport” storing personal nutritional data, so when you order food online or eat out, your meal will be optimised for your health and food profile.

Final thoughts

Hyperfoods is the first attempt to apply graph-based ML methods in order to predict the health effects of biologically active molecules in foods by modelling the “network effects” of their interactions with biomolecules in our body. The use of graph ML methods allows us to identify which foods contain ingredients that might work in a similar way to medical drugs and have the potential to prevent or beat diseases. While cancer is an important class of diseases, the same approach can be applied to discover foods that can help prevent neurodegenerative, cardiovascular, or viral diseases [21].

In a longer perspective, our ambition is to provide a quantum leap in how our food is “prescribed”, designed, and prepared — to make us all live healthier, happier, and better lives.


[1] M. J. Prince et al., The burden of disease in older people and implications for health policy and practice (2015), Lancet 385:549–562.

[2] GBD 2017 Diet Collaborators, Health effects of dietary risks in 195 countries (2019), Lancet 393:1958–1972 found that 11 million deaths were attributable to dietary risk factors, of which 3 million are due to low intake of whole grains, and 2 million are due to low intake of fruits. For comparison, the WHO estimates that tobacco kills 8 million people each year.

[3] M. S. Donaldson, Nutrition and cancer: a review of the evidence for an anti-cancer diet (2004), Nutrition Journal 3, estimates that 30%-40% of all cancers can be prevented by lifestyle and dietary measures alone and suggests that adhering to proposed dietary guidelines (that include recommended intake of cruciferous vegetables, flax seeds, and fruits) is likely to lead to a 60%–70% decrease in breast, colorectal, and prostate cancers and a 40%–50% decrease in lung cancer, along with similar reductions in cancers at other sites.

[4] Experimental studies suggest that these molecules participate in multiple mechanisms contributing to the prevention or treatment of various cancers, including regulating the activity of inflammatory mediators and growth factors, suppressing cancer cell survival, proliferation, and invasion, as well as angiogenesis and metastasis. See e.g. A. K. Singh et al., Emerging importance of dietary phytochemicals in fight against cancer: Role in targeting cancer stem cells (2017), Critical Reviews in Food Science and Nutrition 57: 3449–3463 or R. Baena Ruiz and P. Salinas Hernandez, Cancer chemoprevention by dietary phytochemicals: Epidemiological evidence (2016), Maturitas 94:13–19.

[5] This analogy to “dark matter” in physics appeared in R. R. da Silva et al., Illuminating the dark matter in metabolomics (2015), PNAS 112(41):12549–12550 as well as in A.-L. Barabási et al., The unmapped chemical complexity of our diet (2019), Nature Food 1:33–37.

[6] Many drugs are derived from plants, which is often reflected in their names: for example, ephedrine takes its name from the plant genus Ephedra, atropine from the belladonna plant (Atropa belladonna), and the acetylsalicylic acid, commonly known as aspirin, from the bark of the willow tree (Salix alba), whose medicinal properties are known since antiquity. In oncological medicine, prominent examples are the analogs of the natural compound camptothecin, which is extracted from the Camptotheca acuminata tree and also well-known in traditional medicine. Four such molecules — topotecan, irinotecan, belotecan, and trastuzumab deruxtecan — are widely used in cancer chemotherapy. D. J. Newman and G. M. Cragg, Natural products as sources of new drugs from 1981 to 2014 (2016), Journal of Natural Products 79(3):629–661 report that almost half of anti-cancer therapies are derived from natural products.

[7] Proteins are the powerhouses of living cells and are literally the “molecules of life”, as we are currently unaware of any life forms that are not protein-based. In our body, proteins are responsible for catalysing chemical reactions (enzymes), giving structure to tissues (collagen), transporting oxygen (haemoglobin), and defending us against pathogens (antibodies), to mention a few. Proteins are synthesised in cells by a special chemical mechanism that reads out the genetic code and translates it into a sequence of aminoacids: short sequences of DNA nucleotides called “codons” encode the 20 proteinogenic aminoacids. We have around 20 thousands proteins encoded in our genome that interact with each other and with other molecules. Because of the crucial role of proteins in biochemical processes, they are used as drug targets: typical drugs are small molecules designed in such a way that they can chemically attach themselves (“bind”) to specific proteins.

[8] It is estimated that a drug molecule can bind to nearly 50 proteins, see B. Srinivasan et al. Experimental validation of FINDSITE(comb) virtual ligand screening results for eight proteins yields novel nanomolar and micromolar binders (2014), Cheminformatics 6:16, so the one drug–one target assumption is very far from reality.

[9] Protein-protein interactions are one example of a graph that are exploited in “network medicine”, a term coined and popularised in A.-L. Barabási, Network medicine — from obesity to the “diseasome” (2007), New England Journal of Medicine 357:404–407.

[10] A. Mullard, Protein-protein interaction inhibitors get into the groove (2012). Nature Review Drug Discovery 11(3):173–175 calls PPI targets an “unmined gold reserve”. Macrocycles are one example of drug-like small molecules that disrupt protein-protein interaction and accelerate cancer cell death. Despite their therapeutic relevance and untapped abundance, their adoption is hindered by technical hurdles, see T. L. Nero et al., Oncogenic protein interfaces: small molecules, big challenges (2014), Nature Review Cancer 14(4):248–262 and D. E. Scott et al., Small molecules, big targets: drug discovery faces the protein–protein interaction challenge (2016), Nature Review Drug Discovery 15:533–550.

[11] See M. M. Bronstein et al. Geometric deep learning: going beyond Euclidean data (2017), IEEE Signal Processing Magazine 34(4):18–42 and my blog posts on this topic.

[12] K. Veselkov et al., HyperFoods: Machine intelligent mapping of cancer-beating molecules in foods (2019), Scientific Reports 9.

[13] The amount of scientific literature on cancer is enormous, with one paper published every 3 to 4 minutes on average, making it impossible to digest even for the most industrious human scientists. We used an NLP system for named entity recognition developed earlier by D. Galea et al., Exploiting and assessing multi-source data for supervised biomedical named entity recognition (2018), Bioinformatics 34(14):2474–2482. The supplementary materials to our paper provide a detailed list of compounds found in food and the experimental evidence of their anti-cancer effects.

[14] Apples are a good example of why one has to consider the antagonistic or synergistic effects of multiple compounds: apple extracts contain bioactive compounds that have been shown to inhibit tumor cell growth in vitro. Yet, the effect varies greatly depending on whether the peel is preserved: apples with peel inhibit colon cancer cell proliferation by 43% vs only 29% for apples without peel, see M. V. Eberhardt et al. Antioxidant activity of fresh apples (2000), Nature 405:903–904.

[15] M. Zitnik et al., Modeling polypharmacy side effects with graph convolutional networks (2018), Bioinformatics 34(13):457–466, applied graph ML to protein-protein and protein-drug interaction graphs in order to predict the side effects of polypharmacy.

[16] Tea is a rich source of catechins (epigallocatechingallate), terpenoids (lupeol), and tannins (procyanidin), which exert strong and complementary anti-cancer effects, by protecting reactive oxidative species induced DNA damage, suppressing inflammation, and inducing apoptosis and cancer cell cycle arrest, respectively. Several recent meta-analyses demonstrated that the consumption of green tea leads to delayed cancer onset, lower rates of cancer recurrence after treatment, and increased rates of long-term cancer remission, see V. Gianfredi et al. Green tea consumption and risk of breast cancer and recurrence-A systematic review and meta-analysis of observational studies (2018), Nutrients 10, and Y. Guo et al. Green tea and the risk of prostate cancer: A systematic review and meta-analysis (2017), Medicine 96(13). The second example, sweet orange, is a citrus fruit containing compounds dydimin (citrus flavonoid), obacunone (limonoid glucose) and β-elemene, known for their strong antioxidant, pro-apoptotic, and chemosensitization effects. The inverse association between citrus fruit intake and incidence of cancer was shown in S. Cirmi et al. Anticancer potential of citrus juices and their extracts: A systematic review of both preclinical and clinical studies (2017), Frontiers in Pharmacology 8.

[17] I first met Kirill at the World Economic Forum meeting in 2015. We have quickly become friends in part due to our shared Russian background, and then colleagues once I joined Imperial College in 2018. Kirill met Jozef at the conference of Future of Computing and Food in 2018. Jozef is the Chef Patron of Kitchen theory, a high-end restaurant that looks like a chemical lab and where guests are invited to participate in psychophysical experiments such as eating jellyfish while listening to crunchy sounds (this changes the perception of food’s taste).

[18] FlavorDB is an online resource allowing us to explore over 25 thousand flavour molecule content in nearly 1000 foods.

[19] Y.-Y. Ahn et al., Flavor network and the principles of food pairing (2011). Scientific Reports 1 showed by analysing the graph of shared flavour molecules, that Western cuisines tend to use ingredient pairs with common flavour compounds, whereas East Asian cuisines tend to avoid compound-sharing ingredients.

[20] We are not the first to work on automatic recipe generation: multiple attempts have been done, with the most notable being the IBM Cognitive Cooking project. However, we are the first, to the best of my knowledge, to go beyond flavour and try to account for bioactive molecules.

[21] The use of graph ML appears very promising in repurposing existing drugs for new diseases (“drug repositioning”), which can dramatically cut the costs and times of developing new therapies. As part of projects DRUGS and CoronaAI, we are currently using the Vodafone distributed computing platform DreamLab to find antiviral compounds in food and combinations of existing drugs that could have a therapeutic effect on cancer and COVID-19. I also serve as a scientific advisor to the pharmaceutical startup Relation Therapeutic that, in collaboration with Mila and the Gates Foundation, is combining graph ML with active learning techniques to find combinatorial therapy against COVID-19.

Original post: https://towardsdatascience.com/hyperfoods-9582e5d9a8e4

Leave a Reply

Your email address will not be published. Required fields are marked *