GSoC 2108 Project [ Business Intelligence with daru ] discussion

Discussions for the Google Summer of Code project 2018: Business Intelligence with daru

Hello everyone,

In the first phase, I will be adding importers to daru-io for storing the output of log parsing gem in dataframe and generating several metrics and corresponding plot objects.

  1. Modifications in the analyzing gem: The gem request-log-analyzer is currently used as a CLI tool. We will need to modify it a bit for our use. I will create a local fork of the gem and use it for the purpose.
  2. Next, I will use the methods added in above gem to add importers in daru-io.

I am not sure about where to add the methods which calculate the metrics and generating their plot objects? Should I create a new repository for it? I request everyone to provide inputs.

You can extend or override the methods of the request-log-analyze in our gem.

Please review PR#75.

What particular methods you have in mind? What would be their signature and behavior?

As of now, I have only thought of the metrics to be generated from rails log which are akin to those in the request-log-analyzer gem.

  1. Most requested assets
    def most_requested()
    #return df/plot containing list of most requested assets and no of hits
    #columns of parsed df used: :rendered_file
  2. Distribution of HTTP requests
    def http_dist()
    #columns of parsed df used: :method :line_type
    #return df/plot name of frequency of each type of request
  3. HTTP status returned
    def http_status()
    #return df or plot of different types of HTTP status returned
    #columns of parsed df used: :status
  4. Request Duration
    def request_time()
    #return df/plot of view names with their converge time
    #columns of parsed df used: :rendered_file :partial_duration
  5. Rendering Time
    def render_time()
    #return df/plot of assets and their rendering time
    #columns of parsed df used: :rendered_file :view
  6. Database Time
    #return df/plots activerecord time for each rendering
    #columns of parsed df used: :rendered_file :db

These methods can be used to return dataframes or plots. I want your opinion on which repository to add these methods to. A new repository can also be made to club this together with the data cleaning library.

I am planning to add all these plotting metrics to Daru::View this week. Please let me know if you would like to have any changes in the workflow.

Sorry for answering late, but I believe that direction should be updated a bit.
The thing is, goal is “more generic goodness in daru (based on ‘analyze rails logs’ task)”, not “make a Rails log analyzer (eventually using some of daru)”.

Therefore, what should be thought upon is not very particular Rails-log-centric analyzing methods, but something like:

  • make some (very rough at the beginning!) demo app – it can be just one script that reads some log and outputs stats (first just numbers/data, then plots).
  • while doing so, look at current DataFrame and daru-view abilities, and detect what missing for it to be comfortable and one-liners, and reimplement that (not specific methods like most_requested_rails_http_routes, but generic statistical and plotting gimmiks)

The far result of the work should be envisioned as:

  • daru can read now rails (and other) logs too!
  • cool new statistical/business analytics methods in DataFrame
  • some generic, but pretty useful for logs things in daru-view

…and some demo app to showcase how Daru can be useful (for Rails too, but also for lot of other things).

Hello @rohitner,

Please keep reporting us where are you now, what’s the plan, and challenges you are facing in every 1 or 2 days.

Regards,
Shekhar

Hey everyone,

Going by the updated directions, I think daru/view is already a friendly plugin for plotting any type of metrics. As the user-to-user requirement of metrics varies, one can do it on his own if given a sample rails app plotting some of the common metrics (which I will be doing in the last phase as per timeline).
If the aim is to introduce more generic methods rather than log-centric ones, it will be better to directly start with the next part of creating the business intelligence module for daru.
WDYT? @zverok @shekharrajak

Rohit

Hey everyone,

I have started working on the Daru::BI module and will be adding data cleaning methods for Daru::DataFrame this week. Please let me know about any altercations required in the plan.

Rohit