How I became an AI Consultant: Interview Questions & Answers

In the past few weeks, I interviewed at Accenture, Wavestone, Google and IBM for an AI consultant position, and I enjoyed it a lot. As such, I decided to write something about this positive experience. I will discuss how I prepared, present the questions I was asked during the process and share some tips at the end to help you get hired.


I always wanted to work for an AI-driven company, however, I realized that I enjoyed more the strategic aspect rather than coding. Indeed, I appreciate having to understand the client’s strategy, challenges, and opportunities faced by the company, the client’s data & analytics capabilities. Ideally, by combining these with an understanding of state-of-the-art AI, a consultant can identify the most important AI initiatives for the company. Still, every AI consultant must understand all steps of an AI project. Since I also have a full-time job, it took me 2–3 months in total to prepare. I received three offers at the end of this process.

Before getting into the questions, I recommend everyone interested in becoming an AI consultant to understand the elements below-mentioned because you will be challenged on them:

  • Ability to solve complex business challenges using analytic algorithms and AI
  • Ability to design, build and deploy predictive and prescriptive models using statistical modeling, machine learning, and optimization
  • Ability to use structured decision-making to complete projects.
  • Ability to manage an entire ML project from business issue identification, data audit to model maintenance in production.

Beyond the technological aspect, I was told that the real skills AI consultants need to have are the ability to read between the lines when a client is asking for one thing but means another and to think strategically in terms of data… (how to build solid data network effects, how to create barriers to entry thanks to Machine Learning, etc.)

The interview — Part I (Machine Learning)

My interview was divided into three parts: Machine Learning, AI Strategy and general knowledge. Below is the list of the questions I had to answer.

1. What is the difference between Regression and classification ML techniques?
 Both regression and classification machine learning techniques come under Supervised machine learning algorithms. In Supervised machine learning algorithm, we have to train the model using labelled data set, during training we have to provide the correct labels and algorithm tries to learn the pattern from input to output. If our labels are discrete values then it will a classification problem, e.g A, B etc. but if our labels are continuous values then it will be a regression problem, e.g 1.23, 1.333 etc.

2. What are dropouts?
Answer: Dropout is a simple way to prevent a neural network from overfitting. It is the dropping out of some of the units in a neural network. It is similar to the natural reproduction process, where nature produces offsprings by combining distinct genes (dropping out others) rather than strengthening the co-adapting of them.

3. What is the Hidden Markov Model?
Answer: Hidden Markov Models (HMMs) are a class of probabilistic graphical model that allows us to predict a sequence of unknown (hidden) variables from a set of observed variables. A simple example of an HMM is predicting the weather (hidden variable) based on the type of clothes that someone wears (observed).

4. What is model accuracy and model performance?
Answer: Model accuracy is a subset of model performance. Model performance operates on the datasets feed as input to the algorithm and Model accuracy is based on the algorithm of model performance.

Model performance measures the ability of a model to correctly predict examples. Accuracy is a performance measure for classification models, that puts the number of correctly classified examples in relation to all examples.

5. What do you mean by Overfitting and Underfitting algorithms?
Answer: Overfitting and Underfitting are responsible for poor performance.
Overfitting gives a good performance on the trained data, poor generalization to other data.
Underfitting gives a poor performance on the training data and good generalization to other data. Both overfitted and underfitted models show bad performance on new data compared to “good” models. An underfitted model most likely will not show “good generalization to other data”.

6. Which of the following data augmentation technique would you prefer for an object recognition problem?
Answer: Cropping, changing luminosity, Horizontal flipping, Rescaling, Zooming; Deep learning model actually require so much of data to train the models. It is very data-hungry. And to take advantage of training the models with various angles of objects I usually go with data augmentation techniques.

7. What are the different types of Machine Learning?

8. What are hyperparameters in Deep Neural Networks?
: Hyperparameters are variables that define the structure of the network. For example, variables such as the learning rate, define how the network is trained.They are used to define the number of hidden layers that must be present in a network.
Examples for hyperparameters:
Learning rate, maximum tree depth, number of hidden layers, …

9. How does data overfitting occur and how can it be fixed?
: Overfitting occurs when a statistical model or machine learning algorithm captures the noise of the data. This causes an algorithm to show low bias but high variance in the outcome.

10. How can you prevent Overfitting (name and explain different techniques)?
: First of all, you should consider model selection (choosing a less complex model).

Secondly, it is important to know that cross-validation does not prevent overfitting. However, it might be able to detect overfitting. It can split the training data in order to generate multiple mini train-test splits. These splits can then be used to tune your model. Feeding more data to the machine learning model can help in better analysis and classification. However, this does not always work.

Many times, the data set contains irrelevant features or predictor variables that are not needed for analysis. Such features only increase the complexity of the model, thus leading to possibilities of data overfitting. Therefore, such redundant variables must be removed.

A machine learning model is trained iteratively, this allows us to check how well each iteration of the model performs. But after a certain number of iterations, the model’s performance starts to saturate. Further training will result in overfitting, thus one must know where to stop the training. This can be achieved thanks to early stopping.

Regularization: Regularization can be done in n number of ways, the method will depend on the type of learner we’re implementing. For example, pruning is performed on decision trees, the dropout technique is used on neural networks and parameter tuning can also be applied to solve overfitting issues.

Ensemble models: Ensemble learning is a technique that is used to create multiple Machine Learning models, which are then combined to produce more accurate results. This is one of the best ways to prevent overfitting. An example is Random Forest, it uses an ensemble of decision trees to make more accurate predictions and to avoid overfitting. Ensemble models can be used to avoid underfitting by combining several weak learners. As ensembling increases the resulting model’s complexity it is prone to overfitting and you need to be careful to avoid it.

11. What is the role of the Activation Function?
 It is used to introduce non-linearity into the neural network helping it to learn more complex function. Without which the neural network would be only able to learn linear function which is a linear combination of its input data. An activation function is a function in an artificial neuron that delivers an output based on inputs.

The interview — Part II (AI Strategy)

1. Before considering implementing AI-powered solutions, what should a company do?
Organizations need data strategies that cover the current state of business and technology but also future objectives. Three elements are key:

  • Strategic data acquisition
  • Unified data warehouses
  • Processes to determine data value

2. From a strategy point of view, how should a company start their first AI project?
Answer: According to 
Andrew Ngit is key for the first few AI projects to succeed rather than be the most valuable AI projects. They should be meaningful enough so that the initial successes will help the company gain familiarity with AI and also convince C-level executives in the company to invest in further AI projects; they should not be so small that others would consider it trivial.

3. How would you describe the “virtuous circle of AI” and what company uses this strategy?
Answer: Product → Users → Data → Product. The best products have the most users, and the most users usually means getting the most data, and with modern ML, the product becomes better. Uber & Amazon (Alexa) have implemented this strategy.

4. Why don’t most AI PoCs reach production stage?


5. How would you help a company determine if they should make or buy an AI solution?
This decision depends mostly on data maturity, exclusive access to datasets, presence of in-house data scientists, strategic need to own the AI model, budget and type of project. In some cases, there are some use-cases where relying on an external service provider is not a feasible option (privacy concerns). If a company not only has exclusive access to valuable data, but also their business model relies on this data, building an in-house AI model is likely to be the best strategy.

6. What will be the first steps of your work if a company wants to implement AI?
 It would be wise to spend time identifying together a relevant AI use case that would the overall company objective. Furthermore, it is difficult to promise a working model without thoroughly evaluating the data. Thus, it is difficult to estimate the concrete business impact of AI projects without an initial data analysis.

7. How would you educate C-level executives and employees about AI?
 It is essential to not only build an AI solution but also create a real data/AI culture in the company. The AI Strategy teams should consist of Product Managers, Data Scientists, Business Developers and end-users. Employee education can be accomplished through a combination of in-person and online courses in addition to hands-on experience.

8. What are the five core components of an AI strategy?
Answer: The core components of an AI Strategy are Data, Infrastructure, Algorithms, Skills, and Organization

9. How would you help a small company improve its data strategy?
Answer: First of all, it is key to discover the data maturity level of the company. 
Often, companies already have the data they need to tackle business issues, but decision-makers simply don’t know how they can use this data to make key decisions. Companies can encourage a more comprehensive look at data by being specific about the business problems and opportunities they need to address. Furthermore, existing IT architectures may prevent the integration of siloed information, and managing unstructured data often remains beyond traditional IT capabilities.

10. If a company lacks data for a project, what would you recommend them?
Obviously, it depends on the nature of the project. Before exploring technical solutions, it might worth it to build a data-gathering mechanism in advance or rely on open source data. There are a lot of data available for ML, and some companies are ready to give it away. It can be beneficial to form partnerships with other organizations in order to get relevant data.

In general, the simpler the machine learning algorithm, the better it will learn from small data sets. From an ML perspective, small data requires models that have low complexity (or high bias) to avoid overfitting the model to the data. I noticed that the Naive Bayes algorithm is among the simplest classifiers and as a result learns remarkably well from relatively small data sets.

You can also rely on other linear models and decision trees. Indeed, they can also perform relatively well on small data sets. Basically, simple models are able to learn from small data sets better than more complicated models (neural networks) since they are essentially trying to learn less.

For very small datasets, Bayesian methods are generally the best in class, although the results can be sensitive to your choice of prior. I think that the naive Bayes classifier and ridge regression are the best predictive models.

Transfer learning techniques should be considered when you do not have enough target training data, and the source and target domains have some similarities but are not identical.

Data augmentation means increasing the number of data points. In terms of traditional row/column format data, it means increasing the number of rows or objects. Finally, Synthetic Minority Over-sampling Technique (SMOTE) and Modified-SMOTE are two techniques which generate synthetic data. Simply put, SMOTE takes the minority class data points and creates new data points that lie between any two nearest data points joined by a straight line.

The interview — Part III (General Knowledge)

At this point, I was asked only three questions:
1. Can you describe your latest AI project?
Answer (depends on your past projects): 
I had to develop a solution that could identify fake products based on pictures uploaded by consumers. As part of a team, we created a solution based on custom Deep Learning CNN models using transfer learning. During the project, our biggest issue was the lack of data and scalability for the company. We solved this issue by relying on data augmentation techniques and synthetic data.

2. Can you name new business models related to AI?
Some examples can include Smart Cities & Data monetization, Predictive Maintenance and industry 4.0, Federated Learning and healthcare data, etc. [If you know others, please mention them in the comments]

3. What major evolution will impact AI development?
The increasing AI democratization and the development of low-code/no-code AI tools that could disrupt the way AI projets are managed.


  • An AI consultant is more than a data scientist. They must assist in designing an overall system from requirements to deployment, manage the project (project management skills) and help provide expertise with capacity building so that internal resources in the DevOps team(s) can support the AI infrastructure and code.
  • I recommend making sure you understand basic concepts such as bias-variance trade-off, overfitting, gradient descent, L1/L2 regularization, Bayes Theorem, etc.
  • Understand an AI project from the business point of view (KPIs, long term goals, scalability, etc.)
  • You’ll be challenged on your ability to structure your ideas and present complex concepts in an easy and precise way



Original post:

Leave a Reply

Your email address will not be published. Required fields are marked *