This is continuation with the previous posts,
You need to complete Feature engineering for machine learning models before proceeding.
This is the file used in previous post, where, at the end we had single feature engineered DataFrame.
At the end we will add storeDF method
def storeDF(dataFrame: DataFrame, path: String) = {
dataFrame.write.option("header", "true").csv(path)
}
In the main method, we will store the moviesFeaturedDF at the end.
val moviesFeaturedFile = "../../../Downloads/ml-latest-small/movies-featured.csv"
storeDF(moviesFeaturedDF.repartition(1), moviesFeaturedFile)
We are repartitioning it so that it is stored as a single file. We shouldn't do this in most of the cases.
$ ls -lh ../../../Downloads/ml-latest-small/movies-featured.csv
total 600
-rw-r--r-- 1 sakthipriyan staff 0B Dec 25 12:19 _SUCCESS
-rw-r--r-- 1 sakthipriyan staff 298K Dec 25 12:19 part-00000-538b57a1-58c0-4801-9258-fbd806151d80-c000.csv
$ head ../../../Downloads/ml-latest-small/movies-featured.csv/part-00000-538b57a1-58c0-4801-9258-fbd806151d80-c000.csv
movieId,title,label,gCount,tCount,Drama,Comedy,Romance,Thriller,Action,Adventure,Crime,Sci-Fi,Mystery,Fantasy,Children,Horror,Animation,Musical,War,Documentary,Film-Noir,IMAX,Western,In Netflix queue,atmospheric,superhero,Disney,religion,funny,quirky,surreal,psychology,thought-provoking,tCrime,suspense,politics,visually appealing,tSci-fi,dark comedy,twist ending,dark,mental illness,tComedy
1,Toy Story (1995),0.0,5,2,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Jumanji (1995),0.0,3,4,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Grumpier Old Men (1995),1.0,2,2,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Father of the Bride Part II (1995),1.0,1,2,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Sabrina (1995),1.0,2,1,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11,"American President, The (1995)",0.0,3,2,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14,Nixon (1995),0.0,1,2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
16,Casino (1995),0.0,2,1,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17,Sense and Sensibility (1995),0.0,2,1,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
As you can see, the moviesFeaturedFile
is written as CSV file in the output.
Without repartition, it might have been multiple files as output.
Let's move on to the building the machine learning models now.
Following is the scala source code for HandsOn3.scala
.
We will be writing all the new steps into the def main(args: Array[String])
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._
object HandsOn3 {
// Method to load data from csv to dataframes
def loadDF(spark: SparkSession, path: String): DataFrame = {
spark.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(path)
}
// Method to create Spark Session.
def getSpark() = {
val spark = SparkSession.builder
.appName("Simple Application")
.master("local[*]")
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
spark
}
def main(args: Array[String]) {
// Get Spark
val spark = getSpark()
val moviesFeaturedFile = "../../../Downloads/ml-latest-small/movies-featured.csv"
val moviesFeaturedDF = loadDF(spark,moviesFeaturedFile)
// More code goes here.
}
}
Which will be used for training and testing the model.
Schema
moviesFeaturedDF.printSchema()
root
|-- movieId: integer (nullable = true)
|-- title: string (nullable = true)
|-- label: double (nullable = true)
|-- gCount: integer (nullable = true)
|-- tCount: integer (nullable = true)
|-- Drama: double (nullable = true)
|-- Comedy: double (nullable = true)
|-- Romance: double (nullable = true)
|-- Thriller: double (nullable = true)
|-- Action: double (nullable = true)
|-- Adventure: double (nullable = true)
|-- Crime: double (nullable = true)
|-- Sci-Fi: double (nullable = true)
|-- Mystery: double (nullable = true)
|-- Fantasy: double (nullable = true)
|-- Children: double (nullable = true)
|-- Horror: double (nullable = true)
|-- Animation: double (nullable = true)
|-- Musical: double (nullable = true)
|-- War: double (nullable = true)
|-- Documentary: double (nullable = true)
|-- Film-Noir: double (nullable = true)
|-- IMAX: double (nullable = true)
|-- Western: double (nullable = true)
|-- In Netflix queue: double (nullable = true)
|-- atmospheric: double (nullable = true)
|-- superhero: double (nullable = true)
|-- Disney: double (nullable = true)
|-- religion: double (nullable = true)
|-- funny: double (nullable = true)
|-- quirky: double (nullable = true)
|-- surreal: double (nullable = true)
|-- psychology: double (nullable = true)
|-- thought-provoking: double (nullable = true)
|-- tCrime: double (nullable = true)
|-- suspense: double (nullable = true)
|-- politics: double (nullable = true)
|-- visually appealing: double (nullable = true)
|-- tSci-fi: double (nullable = true)
|-- dark comedy: double (nullable = true)
|-- twist ending: double (nullable = true)
|-- dark: double (nullable = true)
|-- mental illness: double (nullable = true)
|-- tComedy: double (nullable = true)
Sample Data
moviesFeaturedDF.show(5)
+-------+--------------------+-----+------+------+-----+------+-------+--------+------+---------+-----+------+-------+-------+--------+------+---------+-------+---+-----------+---------+----+-------+----------------+-----------+---------+------+--------+-----+------+-------+----------+-----------------+------+--------+--------+------------------+-------+-----------+------------+----+--------------+-------+
|movieId| title|label|gCount|tCount|Drama|Comedy|Romance|Thriller|Action|Adventure|Crime|Sci-Fi|Mystery|Fantasy|Children|Horror|Animation|Musical|War|Documentary|Film-Noir|IMAX|Western|In Netflix queue|atmospheric|superhero|Disney|religion|funny|quirky|surreal|psychology|thought-provoking|tCrime|suspense|politics|visually appealing|tSci-fi|dark comedy|twist ending|dark|mental illness|tComedy|
+-------+--------------------+-----+------+------+-----+------+-------+--------+------+---------+-----+------+-------+-------+--------+------+---------+-------+---+-----------+---------+----+-------+----------------+-----------+---------+------+--------+-----+------+-------+----------+-----------------+------+--------+--------+------------------+-------+-----------+------------+----+--------------+-------+
| 1| Toy Story (1995)| 0.0| 5| 2| 0.0| 1.0| 0.0| 0.0| 0.0| 1.0| 0.0| 0.0| 0.0| 1.0| 1.0| 0.0| 1.0| 0.0|0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|
| 2| Jumanji (1995)| 0.0| 3| 4| 0.0| 0.0| 0.0| 0.0| 0.0| 1.0| 0.0| 0.0| 0.0| 1.0| 1.0| 0.0| 0.0| 0.0|0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|
| 3|Grumpier Old Men ...| 1.0| 2| 2| 0.0| 1.0| 1.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|
| 5|Father of the Bri...| 1.0| 1| 2| 0.0| 1.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|
| 7| Sabrina (1995)| 1.0| 2| 1| 0.0| 1.0| 1.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|
+-------+--------------------+-----+------+------+-----+------+-------+--------+------+---------+-----+------+-------+-------+--------+------+---------+-------+---+-----------+---------+----+-------+----------------+-----------+---------+------+--------+-----+------+-------+----------+-----------------+------+--------+--------+------------------+-------+-----------+------------+----+--------------+-------+
only showing top 5 rows
def trainAndTest(df:DataFrame, features: Array[String]): (DataFrame,DataFrame) = {
// Data Transformation
val assembler = new VectorAssembler().setInputCols(features).setOutputCol("features")
// Only movieId, features, label selected here.
val data = assembler.transform(df).select("movieId","features","label")
// Split the data into train and test
val splits = data.randomSplit(Array(0.8, 0.2), seed = 1234L)
val train = splits(0)
val test = splits(1)
// Return train and test as Tuple
(train,test)
}
data.randomSplit
to split input dataFrame into train and test.val features = Array("gCount","tCount", "Drama")
val (train,test) = trainAndTest(moviesFeaturedDF, features)
gCount
, tCount
and Drama
We will go through Multilayer perceptron classifier machine learning models as part of this post.
Multilayer perceptron classifier (MLPC) is a classifier based on the feedforward artificial neural network.
def trainMLPC(train:DataFrame, features: Array[String]) : MultilayerPerceptronClassificationModel = {
// Setting up Hyper Parameters. Specify layers for the neural network:
// Input layer of size (features.length),
// Two intermediate layers of size (features.length + 2) and (features.length + 1)
// Output of size 4 (classes)
val layers = Array[Int](features.length, features.length + 2, features.length + 1, 4)
// create the trainer and set its parameters
val trainer = new MultilayerPerceptronClassifier()
.setLabelCol("label")
.setFeaturesCol("features")
.setLayers(layers)
.setBlockSize(128)
.setSeed(1234L)
.setMaxIter(100)
// This would train the MultilayerPerceptronClassifier for the given data and hyper parameters.
trainer.fit(train)
}
layers
we can change the intermediate hidden layersIn the main, add following.
val mlpcModel = trainMLPC(train,features)
// Here, we apply the MLPC model built using train data on the test dataset.
val resultMLPC = mlpcModel.transform(test)
// This method detailed below will print the accuracy of the model on the test data.
printAccuracy(resultMLPC)
Following is the printAccuracy
that will print the accuracy of the model by comparing the actual vs predicted class.
def printAccuracy(dataFrame: DataFrame, features: Array[String]): Unit = {
val predictionAndLabels = dataFrame.select("prediction", "label")
val evaluator = new MulticlassClassificationEvaluator().setMetricName("accuracy")
println("Accuracy = " +
s"${(100 * evaluator.evaluate(predictionAndLabels)).formatted("%.2f")}%" +
s" Features: ${features.mkString(", ")}. ")
}
def main(args: Array[String]): Unit = {
// Init spark and load dataframe
val spark = getSpark()
val moviesFeaturedFile = "../../../Downloads/ml-latest-small/movies-featured.csv"
val moviesFeaturedDF = loadDF(spark,moviesFeaturedFile)
// Select features and create train and test dataFrames
val features = Array("gCount","tCount", "Drama")
val (train,test) = trainAndTest(moviesFeaturedDF, features)
// Train Model using train data for the selected features.
val mlpcModel = trainMLPC(train,features)
// Apply trained model on the test and predict the class.
val resultMLPC = mlpcModel.transform(test)
// Print the accuracy of the model.
printAccuracy(resultMLPC, features)
}
In this function, all subset of the given features are derived and empty set is filtered out.
def getFeatures(features: Array[String]): List[Array[String]] = {
features.toSet.subsets().toList.filter(set => set.size > 0).map(set => set.toArray)
}
Updated main method with iterations.
def main(args: Array[String]): Unit = {
// Init spark and load dataFrame
val spark = getSpark()
val moviesFeaturedFile = "../../../Downloads/ml-latest-small/movies-featured.csv"
val moviesFeaturedDF = loadDF(spark,moviesFeaturedFile)
// Select set of features
val featuresArray = Array("gCount", "tCount", "Drama", "Comedy", "Romance", "Thriller", "Action")
for (features <- getFeatures(featuresArray)) {
val (train,test) = trainAndTest(moviesFeaturedDF, features)
// Train Model using train data for the selected features.
val mlpcModel = trainMLPC(train,features)
// Apply trained model on the test and predict the class.
val resultMLPC = mlpcModel.transform(test)
// Print the accuracy of the model.
printAccuracy(resultMLPC, features)
}
}
Output of the above program with given set of features.
Accuracy = 38.46% Features: Thriller.
Accuracy = 45.79% Features: tCount.
Accuracy = 40.29% Features: Comedy.
Accuracy = 47.25% Features: Romance.
Accuracy = 47.25% Features: Drama.
Accuracy = 47.25% Features: gCount.
Accuracy = 40.29% Features: Action.
Accuracy = 45.79% Features: Thriller, tCount.
Accuracy = 39.56% Features: Thriller, Comedy.
Accuracy = 41.39% Features: Thriller, Romance.
Accuracy = 43.22% Features: Thriller, Drama.
Accuracy = 49.08% Features: Thriller, gCount.
Accuracy = 42.12% Features: Thriller, Action.
Accuracy = 48.72% Features: tCount, Comedy.
Accuracy = 46.15% Features: tCount, Romance.
Accuracy = 43.96% Features: tCount, Drama.
Accuracy = 53.85% Features: tCount, gCount.
Accuracy = 50.92% Features: tCount, Action.
Accuracy = 42.12% Features: Comedy, Romance.
Accuracy = 42.49% Features: Comedy, Drama.
Accuracy = 51.28% Features: Comedy, gCount.
Accuracy = 46.52% Features: Comedy, Action.
Accuracy = 41.76% Features: Romance, Drama.
Accuracy = 45.79% Features: Romance, gCount.
Accuracy = 40.29% Features: Romance, Action.
Accuracy = 49.45% Features: Drama, gCount.
Accuracy = 41.76% Features: Drama, Action.
Accuracy = 46.15% Features: gCount, Action.
Accuracy = 50.92% Features: Thriller, tCount, Comedy.
Accuracy = 43.96% Features: Thriller, tCount, Romance.
Accuracy = 46.52% Features: Thriller, tCount, Drama.
Accuracy = 54.58% Features: Thriller, tCount, gCount.
Accuracy = 51.28% Features: Thriller, tCount, Action.
Accuracy = 40.29% Features: Thriller, Comedy, Romance.
Accuracy = 39.19% Features: Thriller, Comedy, Drama.
Accuracy = 50.92% Features: Thriller, Comedy, gCount.
Accuracy = 48.35% Features: Thriller, Comedy, Action.
Accuracy = 41.39% Features: Thriller, Romance, Drama.
Accuracy = 49.82% Features: Thriller, Romance, gCount.
Accuracy = 42.12% Features: Thriller, Romance, Action.
Accuracy = 50.18% Features: Thriller, Drama, gCount.
Accuracy = 40.29% Features: Thriller, Drama, Action.
Accuracy = 50.55% Features: Thriller, gCount, Action.
Accuracy = 48.72% Features: tCount, Comedy, Romance.
Accuracy = 46.52% Features: tCount, Comedy, Drama.
Accuracy = 56.78% Features: tCount, Comedy, gCount.
Accuracy = 52.38% Features: tCount, Comedy, Action.
Accuracy = 45.42% Features: tCount, Romance, Drama.
Accuracy = 51.65% Features: tCount, Romance, gCount.
Accuracy = 49.45% Features: tCount, Romance, Action.
Accuracy = 53.85% Features: tCount, Drama, gCount.
Accuracy = 48.72% Features: tCount, Drama, Action.
Accuracy = 58.24% Features: tCount, gCount, Action.
Accuracy = 42.86% Features: Comedy, Romance, Drama.
Accuracy = 52.01% Features: Comedy, Romance, gCount.
Accuracy = 46.15% Features: Comedy, Romance, Action.
Accuracy = 50.92% Features: Comedy, Drama, gCount.
Accuracy = 43.22% Features: Comedy, Drama, Action.
Accuracy = 51.28% Features: Comedy, gCount, Action.
Accuracy = 49.82% Features: Romance, Drama, gCount.
Accuracy = 41.76% Features: Romance, Drama, Action.
Accuracy = 50.55% Features: Romance, gCount, Action.
Accuracy = 50.55% Features: Drama, gCount, Action.
Accuracy = 50.92% Features: Thriller, tCount, Comedy, Romance.
Accuracy = 52.01% Features: Thriller, tCount, Comedy, Drama.
Accuracy = 51.65% Features: Thriller, tCount, Comedy, gCount.
Accuracy = 54.95% Features: Thriller, tCount, Comedy, Action.
Accuracy = 45.42% Features: Thriller, tCount, Romance, Drama.
Accuracy = 52.01% Features: Thriller, tCount, Romance, gCount.
Accuracy = 49.82% Features: Thriller, tCount, Romance, Action.
Accuracy = 57.14% Features: Thriller, tCount, Drama, gCount.
Accuracy = 49.45% Features: Thriller, tCount, Drama, Action.
Accuracy = 54.21% Features: Thriller, tCount, gCount, Action.
Accuracy = 39.56% Features: Thriller, Comedy, Romance, Drama.
Accuracy = 51.65% Features: Thriller, Comedy, Romance, gCount.
Accuracy = 48.35% Features: Thriller, Comedy, Romance, Action.
Accuracy = 51.28% Features: Thriller, Comedy, Drama, gCount.
Accuracy = 48.72% Features: Thriller, Comedy, Drama, Action.
Accuracy = 51.65% Features: Thriller, Comedy, gCount, Action.
Accuracy = 51.28% Features: Thriller, Romance, Drama, gCount.
Accuracy = 42.49% Features: Thriller, Romance, Drama, Action.
Accuracy = 50.55% Features: Thriller, Romance, gCount, Action.
Accuracy = 50.92% Features: Thriller, Drama, gCount, Action.
Accuracy = 49.08% Features: tCount, Comedy, Romance, Drama.
Accuracy = 52.75% Features: tCount, Comedy, Romance, gCount.
Accuracy = 54.21% Features: tCount, Comedy, Romance, Action.
Accuracy = 57.14% Features: tCount, Comedy, Drama, gCount.
Accuracy = 50.92% Features: tCount, Comedy, Drama, Action.
Accuracy = 57.51% Features: tCount, Comedy, gCount, Action.
Accuracy = 54.58% Features: tCount, Romance, Drama, gCount.
Accuracy = 47.99% Features: tCount, Romance, Drama, Action.
Accuracy = 54.21% Features: tCount, Romance, gCount, Action.
Accuracy = 56.78% Features: tCount, Drama, gCount, Action.
Accuracy = 52.01% Features: Comedy, Romance, Drama, gCount.
Accuracy = 43.96% Features: Comedy, Romance, Drama, Action.
Accuracy = 52.38% Features: Comedy, Romance, gCount, Action.
Accuracy = 52.01% Features: Comedy, Drama, gCount, Action.
Accuracy = 49.82% Features: Romance, Drama, gCount, Action.
Accuracy = 50.55% Features: Thriller, tCount, Comedy, Romance, Drama.
Accuracy = 53.11% Features: Thriller, tCount, Comedy, Romance, gCount.
Accuracy = 55.31% Features: Thriller, tCount, Comedy, Romance, Action.
Accuracy = 55.68% Features: Thriller, tCount, Comedy, Drama, gCount.
Accuracy = 54.95% Features: Thriller, tCount, Comedy, Drama, Action.
Accuracy = 56.41% Features: Thriller, tCount, Comedy, gCount, Action.
Accuracy = 54.95% Features: Thriller, tCount, Romance, Drama, gCount.
Accuracy = 48.35% Features: Thriller, tCount, Romance, Drama, Action.
Accuracy = 51.65% Features: Thriller, tCount, Romance, gCount, Action.
Accuracy = 57.14% Features: Thriller, tCount, Drama, gCount, Action.
Accuracy = 51.65% Features: Thriller, Comedy, Romance, Drama, gCount.
Accuracy = 48.72% Features: Thriller, Comedy, Romance, Drama, Action.
Accuracy = 52.01% Features: Thriller, Comedy, Romance, gCount, Action.
Accuracy = 51.65% Features: Thriller, Comedy, Drama, gCount, Action.
Accuracy = 51.65% Features: Thriller, Romance, Drama, gCount, Action.
Accuracy = 54.58% Features: tCount, Comedy, Romance, Drama, gCount.
Accuracy = 48.72% Features: tCount, Comedy, Romance, Drama, Action.
Accuracy = 57.51% Features: tCount, Comedy, Romance, gCount, Action.
Accuracy = 55.68% Features: tCount, Comedy, Drama, gCount, Action.
Accuracy = 56.04% Features: tCount, Romance, Drama, gCount, Action.
Accuracy = 52.75% Features: Comedy, Romance, Drama, gCount, Action.
Accuracy = 54.58% Features: Thriller, tCount, Comedy, Romance, Drama, gCount.
Accuracy = 53.48% Features: Thriller, tCount, Comedy, Romance, Drama, Action.
Accuracy = 56.78% Features: Thriller, tCount, Comedy, Romance, gCount, Action.
Accuracy = 57.88% Features: Thriller, tCount, Comedy, Drama, gCount, Action.
Accuracy = 53.85% Features: Thriller, tCount, Romance, Drama, gCount, Action.
Accuracy = 52.75% Features: Thriller, Comedy, Romance, Drama, gCount, Action.
Accuracy = 54.58% Features: tCount, Comedy, Romance, Drama, gCount, Action.
Accuracy = 57.88% Features: Thriller, tCount, Comedy, Romance, Drama, gCount, Action.
Accuracy = 58.24% Features: tCount, gCount, Action. As you can see, this one has given most accuracy so for.
In next post, we can simplify this problem to binary classification instead of multi class classification. We will be using both Random forest classifier and Multilayer perceptron classifier machine learning models to compare against each other.
Feature engineering for machine learning models
spark
scala
sbt
bigdata
code
machine learning
|
|
Learning Scala Spark basics
spark
scala
shell
learning
bigdata
code
|
|
Migrating from http to https
https
code
nginx
cerbot
letsencrypt
|
|
Powerbot connecting real world to twitter
hardware
iot
python
code
circuit
raspberry pi
twitter
|
|
Creating Base64 UUID in Java
uuid
java
code
base64
|
|
Documenting Play Framework using Swagger
api
swagger
play framework
documentation
design
|