In this blog post, we will quickly upload, analyze and visualize the data of YouTube. The full analysis cycle will be performed using Ideata Analytics interface and will only take 10 minutes.
This exercise will help you understand how easy and fast it is to analyze your datasets in apache spark a using Ideata Analytics.
We have a sample youtube data file with us and we will try to bring out some insights like top rated videos, highest rated video, top users etc
Step 1 : Upload your data
We have a sample youtube file in tab delimited format downloaded. Ideata Analytics provide in box connector to upload the delimited file. The step is straight forward, we will create a new connection and click on the delimited file and provide necessary details like “tab” as a separator and click on upload to upload the file in the system.
Step 2 : Clean your data
Once the data is uploaded successfully we can see the preview of the data. It seems pretty structured data and do not need any formatting. The only thing missing in the file is column names which we can quickly provide on the preview screen. We will rename all the columns to what it represents like video id, etc so that it is easy for us to understand the
Column Rename : The only thing missing in the file is column names which we can quickly provide on the preview screen. We will rename all the columns to what it represents like video id, etc so that it is easy for us to understand the data on analysis screen, Once done, we will click on finish to create the dataset and make it available in the system for analysis.
Step 3 : Find out Answer of Question 1 :
We will now try to figure out which are the top video categories with the most number of videos uploaded.
In order to answer the question, we will go to analysis screen and drag category column from the left panel into dimension(x-axis). This will show us a quick bar chart with all the categories along with its number of times it is present. As we are only interested in top 10 we can just specify that in right panel by selecting the checkbox of “limit” as top 10.
Drag the column “category” from left panel here
Limit the results to top 10 by selecting this checkbox
We will see the above chart which shows us the top categories of videos. We can see the actual number by clicking on the underlying data checkbox, which will show us the data table
Step 4: Find out Answer of Question 2 :
In this step, we will try to find the top 5 videos with 5 stars and the highest number of ratings.
For answering this question, we will first drag video id in dimension which will give all the video ids present in the data. Now we want to filter the data where rating is 5.0. so we will quickly add the filter from right panel.
Now we have list of videos with 5 star rating only. We will quicky drag another column “number of ratings” in y axis which will gives us the graph which will show how many number of ratings each video has.
we can now find the highest number of rating video by limiting the result to top 5. In this case we changed the chart type from area to bar to showcase the results better.
Step 5: Share Results
You can then finally share the reports with your team by clicking on export which will give you options to download it as image, excel, pdf or embed report.
So, we have seen through a simple exercise how easy and simple it is to analyze and visualize your datasets in Ideata Analytics
Try it out on your own by signing up for 15 days free trial of application here – http://www.ideata-analytics.com/trial