The Machine Learning (ML) hype is real. The term is appearing everywhere. You hear phrases like “AI is the new electricity.” The anti-hype is also starting to gain traction. Some still believe that any ML method is just a black box. One thing is clear: the hype is due to the remarkable and hard-to-dismiss impacts of ML on fields ranging from computational medicine to finance. The field of metal casting has not yet felt the impact. In fact, the first paper to extend the application of ML to the field of alloy solidification modeling was published only very recently . This work is intended to shape a productive debate in the casting simulation community about how to utilize ML capabilities in the field of metal casting.
Why Machine Learning for Metal Casting?
Metal casting simulations have so far had to rely entirely on mesh-based numerical methods such as finite element or finite-difference. Computer simulations based on these methods have helped to improve the quality of castings to the extent that nowadays, using casting simulation software is an everyday practice for casting engineers. Despite the positive impact that those simulations have had, simulating realistic solidification models with a spatial resolution that is high enough to fully resolve the physical phenomena incorporated in the simulated model is still computationally intractable. Therefore, engineers are left with no choice but to rely on models that make oversimplifying assumptions. As an example, they typically use models that disregard melt convection entirely or consider it by just increasing the heat conductivity. As another example, dynamic coupling of solidification calculations to thermodynamic databases, which is necessary for realistic simulation of multicomponent alloys, remains beyond what can be achieved in an industrial simulation. Even the over-simplified models can only be simulated on meshes that are typically not fine enough, and therefore the simulation results are typically not fully resolved. Despite all these limitations, running simulations still takes a few hours to a few days and requires expensive hardware.
The underlying reason for all the above shortcomings is surprisingly simple: it is the fact that current simulations use numerical differentiation to compute the derivatives in an equation, and numerical differentiation suffers from discretization errors. These errors are proportional to the size of the differentiation step (mesh size or time step) and keeping them small requires having a relatively small step size. That increases the computational cost of the simulations and limits the size of the simulation domain.
In addition to the problems associated with computational cost, current casting simulations have reached maturity over the past few decades, and the chance of developing a radical new technique, to address problems known to be very challenging in solidification simulations, is unlikely. In other words, if we in the casting simulation community use only the methods that have been tried in the past decades, revolutionary progress in the field will be improbable.
Performing fast, fully resolved, and large-scale casting simulations is an intimidating goal. In the following, I discuss why we can reasonably expect to achieve that goal by adopting modeling concepts initially developed in the ML domain, more specifically, a method known as deep neural networks.
Theory-Trained Neural Networks (TTNs)
Machine learning owes its current popularity mainly to deep neural networks. These networks are computing systems that consist of several simple but highly interconnected processing elements called neurons, which map an input array to one or multiple outputs. Each neuron has a bias and connection weights, the values of which are determined in a process called training. After a network is trained, it can be used to make predictions on new input data.
Deep neural networks are now transforming fields such as speech recognition, computer vision, and computational medicine. Their application was recently extended to the field of alloy solidification modeling by the author. In a procedure termed theory-training, I used a theoretical (i.e., mathematical) solidification model to train neural networks for a solidification benchmark problem. Theory-trained Neural Networks (TTNs) do not need any prior knowledge of the solution of the governing equations or any external data for training. They self-train by relying on the ability of a neural network to learn the solution of partial differential equations (PDEs). In the deep learning literature, that ability is sometimes referred to by the term “solving PDEs”; I, however, prefer and use the term “learning the solution of PDEs” instead. The reason is TTNs can predict the solution of a PDE without actually solving it; the term “solving PDEs” simply neglects that powerful capability. Paul Dirac, one of the most significant theoretical physicists of the 20th century, once said, “I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it.” Since TTNs can predict the solution of an equation without actually solving it, it is reasonable to argue that TTNs have learned the equations they were trained on, and the term “learning the solution” correctly emphasizes that.
Before discussing the benefits that TTNs can potentially bring to the casting simulation community, I will touch on their two main advantages over a mesh-based method. Consider a finite difference method and imagine we want to perform a d dimensional simulation in a domain with length L in each spatial direction and from time zero to t1. To estimate the CPU time of that simulation, let me perform a first-order analysis. Because all the nodes in the mesh need to be scanned at all the time steps, the computational time tcpu is proportional to the total number of time steps Nt and the number of grid points in the simulation Nx: tcpu~NtNx~t1/∆t (L/∆x)d~t1Ld (∆x)-(2+d). The last relation follows from the stability limit of an explicit time marching scheme in a diffusion-controlled system. The fact that d appears in the exponent in the relation is causing two problems, which are discussed next.
Declaration of consent
Consent to the use of data for promotional purposes
I consent to NürnbergMesse GmbH, Exhibition Centre, 90471 Nuremberg, including any affiliated companies according to §§ 15 et seq. AktG (hereafter: NürnbergMesse) and Vogel Communication Group GmbH & Co. KG, Max-Planckstr. 7-9, 97082 Würzburg, including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. I have accessed lists of the respective companies here for Vogel Communications Group
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
In case I access protected data on Internet portals of NürnbergMesse including any affiliated companies according to §§ 15 et seq. AktG or Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access of such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here.
Right of revocation
The first problem is that the appearance of d in the exponent makes fully resolved simulations (i.e., simulation on a sufficiently fine mesh) computationally very expensive. Imagine d = 3 (i.e., three-dimensional simulations). Refining the mesh by just a factor of two will increase the computational time by a factor of thirty-two. In other words, a simulation that was taking one day will now take one month. The second problem is that the appearance of d in the exponent severely restricts large-scale simulations (simulations on the scale of industrial parts produced with casting). Increasing the size of the domain by a factor of only two in each direction will increase CPU time by a factor of eight. Because of these two problems, a large-scale, fully resolved simulation is typically impossible in practice and never fast. Again, these problems are because the number of dimensions d appeared in the exponent, and they are linked to a problem referred to as the curse of dimensionality in fields such as finance.
In our first-order analysis for estimating the computational time, d appeared in the exponent (and caused the problems discussed above) because equations were discretized. As TTNs do not discretize the equations, they can be expected to not to have those problems. In other words, one can expect to train networks that can simulate a phenomenon with full resolution and at large-scale. This appears to be the first main advantage of TTNs compared to a mesh-based method. That expectation is further supported by the fact that, in fields such as mathematical finance, the curse of dimensionality has been successfully overcome by using deep neural networks (instead of mesh-based methods).
The second advantage of TTNs compared to a mesh-based method stems from the fact that, as mentioned earlier, TTNs can predict the solution of an equation without actually solving it. This reduces the computational cost associated with predictions to nearly zero. In other words, regardless of the computational cost associated with training TTNs, which as explained above can be expected to be entirely tractable even for a fully resolved and large-scale simulation, their predictions are almost instantaneous. This opens the possibility for having fast, fully resolved, large-scale simulations.
Properly utilizing the above two advantages (overcoming the problems linked to the curse of dimensionality and having instantaneous predictions) can potentially re-invent casting simulations by enabling us to perform fast (almost instantaneous), fully resolved (i.e., equivalent to a mesh independent simulation in current mesh-based methods), and large-scale (i.e., at the scale of the actual part and not smaller) casting simulations. In practice, this can result in having a network that can instantaneously predict, for example, hard-to-resolve defects, such as channel segregates (i.e., freckles, or a-segregates in steel casting) or porosity, regardless of the size of the simulation domain.
Although utilizing the above advantages of TTNs in the field seems to be very promising, actually achieving those goals is a challenging task mainly because of difficulties in training TTNs. For example, as I have shown in , ensuring something as fundamental as having non-negative predicted solid fractions turns out to be a non-trivial task. A few of the outstanding research issues are:
- comparing the performance of different optimizers
- understanding the role of network depth and width and the size of training dataset in the performance of a TTN
- theory-training using solidification models that incorporate melt convection
 Torabi Rad, M., Viardin, A., Schmitz, G. J., and Apel, M. “Theory-training deep neural networks for an alloy solidification benchmark problem.” Computational Materials Science 180 (2020) 109687