For further information on artificial intelligence, first AI story.
In this story, I will walk you through a complete coding example of a machine learning application. This application will be for an online music store that needs a reliable way to predict what kind of music its users are interested in.
The steps to follow in developing an artificial, intelligence application, using python, was mentioned in my first story. I mention them again.
- Import the data
- Clean the data — remove duplicate data. If the data is text-based. Convert the data to numerical, values.
- Split the data into training and test sets — Make sure our model produces the correct result.
- Create a model — Select an algorithm to analyze the data. Decision trace, Neural networks… Each algorithm has pros and cons. What makes python such a popular language in AI is some libraries already exist that implemented many of the algorithms. The library I will use is pcikit-learn.
- Train the model.
- Make predictions.-When you start, your predictions are likely inaccurate.
- Evaluate and improve.
My initial story contains the link to acquire the python programming language,
The initial data will be derived from the users of the music store it currently has. The main purpose of this sample application will be to increase music sales. It should reliably predict what kind of music each new user likes.
The final data will come in the form of an Excel spreadsheet or CSV file. This is a popular format in which initial data is stored, one location that provides data like this is kagle.com.
I will use the python programming language and two common libraries used on AI,
Pandas — A data analyst library that provides a concept called data framing. A data frame is a two-dimensional object, similar to an Excel spreadsheet.
Scikit-learn — provides algorithms such as decision trace and neural networks.
It’s best to use anaconda to install Jupyter, for application development similar to this. Anaconda is available here.
I created three, initial data with the free, open-source program LibreOffice. A CSV file is just a text file viewable and editable with notepad, VI, or Word pad. The data in these excel CSV files are loaded onto computer memory with the python library pandas. The input .csv file and the output .csv file is merely the user data profile data called musin.csv split onto input and output sections.
There are three columns of data. The first two are the users, age, and, gender. This was split off into the input. In the third column, the genre was split off into the output.
The three excel CSV files, for use in the program, are based on several assumptions. I then split the data into an output portion and an input portion to offer to feed the decision tree algorithm.
Keeping in mind that python is an interpreted language, I display the code, the content of the original CSV file, and the contents of the split CSV files.
The assumptions made were that males younger than 26 years old preferred hop-hop. Males between 25 and 30 liked jazz, and males over 30 liked classical. Females younger than 26 liked the dance genre, and females between 25 and 30 liked acoustic. Females over 30 liked classical.
For gender, a 1 means male, and a 0 means female.
age, gender, genre
For the model, the example uses the Decision tree algorithm. The decision tree algorithm comes from the decision tree class contained in the learn module contained in the Scikit-learn library. The use of the decision tree algorithm takes input and output data in order to form its prediction. If the results look bad, you can try using the neural network algorithm.
Noting, there is no data for a male twenty-one years old, or a female. Twenty-two years old. The model was asked for predictions of these genders and ages. So the machine learned the music preferences of these new users, and their data could be added, resulting in machine learning.
Predictions = model.predict([ [21,1], [22, 0] ] )
Below is the entire code along with the displayed output listed in the interpreter.
Import pandas as pd
from sklearn. Tree import DecisionTreeClassifier
music_data = pd.read_csv(“music.csv”)
input = pd.read_csv(‘InputMusic.csv’)
age gender 0 20 1 1 23 1 2 25 1 3 26 1 4 29 1 5 20 0 6 31 1 7 33 1 8 37 1 9 20 0 10 21 0 11 25 0 12 26 0 13 27 0 14 30 0 15 31 0 16 34 0 17 35 0
output = pd.read_csv('outputMusic.csv')print(output)genre 0 Hip-hop 1 Hip-hop 2 Hip-hop 3 jazz 4 jazz 5 jazz 6 classical 7 classical 8 classical 9 dance 10 dance 11 dance 12 acoustic 13 acoustic 14 acoustic 15 classical 16 classical 17 classical
model = DecisionTreeClassifier()model.fit(input, output)predictions = model.predict([ [21,1], [22, 0] ] )print(predictions)['Hip-hop' 'dance']
print(music-data)age gender genre 0 20 1 Hip-hop 1 23 1 Hip-hop 2 25 1 Hip-hop 3 26 1 jazz 4 29 1 jazz 5 20 0 jazz 6 31 1 classical 7 33 1 classical 8 37 1 classical 9 20 0 dance 10 21 0 dance 11 25 0 dance 12 26 0 acoustic 13 27 0 acoustic 14 30 0 acoustic 15 31 0 classical 16 34 0 classical 17 35 0 classical Original post: https://medium.com/illumination/complete-example-of-machine-learning-74939b6e24ee