Discussions for the Google Summer of Code project 2018: Business Intelligence with daru
Hello everyone,
In the first phase, I will be adding importers to daru-io for storing the output of log parsing gem in dataframe and generating several metrics and corresponding plot objects.
- Modifications in the analyzing gem: The gem request-log-analyzer is currently used as a CLI tool. We will need to modify it a bit for our use. I will create a local fork of the gem and use it for the purpose.
- Next, I will use the methods added in above gem to add importers in daru-io.
I am not sure about where to add the methods which calculate the metrics and generating their plot objects? Should I create a new repository for it? I request everyone to provide inputs.
You can extend or override the methods of the request-log-analyze
in our gem.
Please review PR#75.
What particular methods you have in mind? What would be their signature and behavior?
As of now, I have only thought of the metrics to be generated from rails log which are akin to those in the request-log-analyzer gem.
- Most requested assets
def most_requested()
#return df/plot containing list of most requested assets and no of hits
#columns of parsed df used: :rendered_file - Distribution of HTTP requests
def http_dist()
#columns of parsed df used: :method :line_type
#return df/plot name of frequency of each type of request - HTTP status returned
def http_status()
#return df or plot of different types of HTTP status returned
#columns of parsed df used: :status - Request Duration
def request_time()
#return df/plot of view names with their converge time
#columns of parsed df used: :rendered_file :partial_duration - Rendering Time
def render_time()
#return df/plot of assets and their rendering time
#columns of parsed df used: :rendered_file :view - Database Time
#return df/plots activerecord time for each rendering
#columns of parsed df used: :rendered_file :db
These methods can be used to return dataframes or plots. I want your opinion on which repository to add these methods to. A new repository can also be made to club this together with the data cleaning library.
I am planning to add all these plotting metrics to Daru::View this week. Please let me know if you would like to have any changes in the workflow.
Sorry for answering late, but I believe that direction should be updated a bit.
The thing is, goal is “more generic goodness in daru (based on ‘analyze rails logs’ task)”, not “make a Rails log analyzer (eventually using some of daru)”.
Therefore, what should be thought upon is not very particular Rails-log-centric analyzing methods, but something like:
- make some (very rough at the beginning!) demo app – it can be just one script that reads some log and outputs stats (first just numbers/data, then plots).
- while doing so, look at current DataFrame and daru-view abilities, and detect what missing for it to be comfortable and one-liners, and reimplement that (not specific methods like
most_requested_rails_http_routes
, but generic statistical and plotting gimmiks)
The far result of the work should be envisioned as:
- daru can read now rails (and other) logs too!
- cool new statistical/business analytics methods in DataFrame
- some generic, but pretty useful for logs things in daru-view
…and some demo app to showcase how Daru can be useful (for Rails too, but also for lot of other things).
Hello @rohitner,
Please keep reporting us where are you now, what’s the plan, and challenges you are facing in every 1 or 2 days.
Regards,
Shekhar
Hey everyone,
Going by the updated directions, I think daru/view is already a friendly plugin for plotting any type of metrics. As the user-to-user requirement of metrics varies, one can do it on his own if given a sample rails app plotting some of the common metrics (which I will be doing in the last phase as per timeline).
If the aim is to introduce more generic methods rather than log-centric ones, it will be better to directly start with the next part of creating the business intelligence module for daru.
WDYT? @zverok @shekharrajak
Rohit
Hey everyone,
I have started working on the Daru::BI module and will be adding data cleaning methods for Daru::DataFrame this week. Please let me know about any altercations required in the plan.
Rohit