I tried ChatGPT, a variant of the GPT-3 language model that is specifically designed for generating human-like text in a conversational context. And of course, like most of us, I wondered: can an AI do my job? And can it do it better than me?
I have 2 years of experience working as a Data Analyst and an Analytics Engineer. According to BBC Science Focus, ChatGPT has ingested 570 GB of data. So who writes better SQL?
This game will be based on 3 LeetCode SQL challenges (one easy, two medium). I will write every solution first, and then send the exercise to ChatGPT and see which solution works best.
I will provide links to every challenge so you can try as well to beat ChatGPT.
Challenge 1 (Easy)
This challenge is called Customer placing the largest number of orders.
Here is the query I wrote:
WITH layer_1 AS ( SELECT customer_number, COUNT(DISTINCT order_number) AS order_number FROM orders GROUP BY customer_number ) SELECT customer_number FROM layer_1 ORDER BY order_number DESC LIMIT 1
It passed with a correct runtime:
Now let’s see how ChatGPT performs on this one.
Here is what ChatGPT answered:
ChatGPT even explains what it did. I don’t find the query very readable — one of the reasons why I love common table expressions is readability — but let’s see how it performs.
Impressively, it works, but it is slower than my result. Although I am glad to do better than ChatGPT, I would have loved to know how I can improve this query.
Challenge 2 (Medium)
The next challenge is called Tree Node.
The first query I wrote was this one:
# Write your MySQL query statement below WITH l1 AS ( SELECT t.id, c.id AS c_id, t.p_id FROM Tree t LEFT JOIN Tree c ON c.p_id = t.id ), l2 AS ( SELECT id, COUNT(DISTINCT c_id) AS nb_childrens, COUNT(DISTINCT p_id) AS nb_parents FROM l1 GROUP BY id ) SELECT id, CASE WHEN nb_childrens >0 AND nb_parents >0 THEN "Inner" WHEN nb_childrens > 0 THEN "Root" ELSE "Leaf" END AS type FROM l2
I ran it before submitting it, and it got the wrong result…
Turns out I hadn’t paid enough attention to the examples, especially the second one:
When a node is both a leaf and a root, it should be output as a root. I changed the order of my CASE WHEN to this:
CASE WHEN nb_childrens >0 AND nb_parents >0 THEN "Inner" WHEN nb_parents > 0 THEN "Leaf" ELSE "Root" END
And this time, it passed!
My query got an average result:
Time for ChatGPT to play the game!
ChatGPT read all the examples and did not make my mistake on nodes that are both a root and a leaf:
But it still gave me a wrong answer:
ChatGPT didn’t return one row per id.
So I decided to give ChatGPT a hint:
And it was able to correct it!
We both made a mistake in this challenge and both were able to fix it (although I had to give a hint to ChatGPT). I had a slightly better result in terms of runtime, so point goes to me but it was close!
Challenge 3 (Medium)
The last challenge is called Capital Gain/Loss.
Here is the query I wrote:
SELECT stock_name, SUM( CASE WHEN operation = "Buy" THEN -1*price ELSE price END ) AS capital_gain_loss FROM Stocks GROUP BY stock_name
It passed but the runtime is pretty bad, more than 90% of players did better than me on this one.
Let’s see if ChatGPT is one of them.
Let’s try this solution:
Again, it gets the wrong result. ChatGPT joins one buy with every future sell and not only the corresponding one, hence its solution only works if there is just one pair of Buy/Sell operations.
I tried to tell ChatGPT to fix its mistake, without giving it a hint:
Unfortunately, it gives me exactly the same query. So I complained:
This time, ChatGPT and I used a similar structure and the query passed. However, I am not sure that ChatGPT really understood what it did wrong in the first place.
The runtime of ChatGPT is slightly worse than mine.
Challenge ChatGPT 🤖 Data Analyst 👩💻 Winner ----------- --------------- --------------------- --------------------- 1 ✅ (22%) ✅ (62%) Data Analyst 👩💻 2 ❌ ✅ (36%) ❌ ✅ (54%) Data Analyst 👩💻 3 ❌ ❌ ✅ (8%) ✅ (5%) Data Analyst 👩💻
I think it’s fair to say that I “won” this SQL challenge against ChatGPT. I was still impressed by its abilities, and amazed that it is able to correct its mistake! ChatGPT definitely beats me on speed; it writes valid SQL syntax in a few seconds, whereas I need a few minutes.
But it still got 50% of it wrong. And even if it had completed successfully every challenge, I wouldn’t worry about my job. Stakeholders never come to analysts with such a well-defined request and output examples. They come with a business question, and we have to think about the best way to answer it with available data. Can ChatGPT do that?
I asked ChatGPT its opinion on the matter:
That sums it all up!
I hope you enjoyed this article! If you did, please follow me for more content on SQL and Data Analysis.
Original post: https://towardsdatascience.com/can-chatgpt-write-better-sql-than-a-data-analyst-f079518efab2