Discussions for the Google Summer of Code project 2018: Business Intelligence with daru
In the first phase, I will be adding importers to daru-io for storing the output of log parsing gem in dataframe and generating several metrics and corresponding plot objects.
- Modifications in the analyzing gem: The gem request-log-analyzer is currently used as a CLI tool. We will need to modify it a bit for our use. I will create a local fork of the gem and use it for the purpose.
- Next, I will use the methods added in above gem to add importers in daru-io.
I am not sure about where to add the methods which calculate the metrics and generating their plot objects? Should I create a new repository for it? I request everyone to provide inputs.
You can extend or override the methods of the
request-log-analyze in our gem.
What particular methods you have in mind? What would be their signature and behavior?
As of now, I have only thought of the metrics to be generated from rails log which are akin to those in the request-log-analyzer gem.
- Most requested assets
#return df/plot containing list of most requested assets and no of hits
#columns of parsed df used: :rendered_file
- Distribution of HTTP requests
#columns of parsed df used: :method :line_type
#return df/plot name of frequency of each type of request
- HTTP status returned
#return df or plot of different types of HTTP status returned
#columns of parsed df used: :status
- Request Duration
#return df/plot of view names with their converge time
#columns of parsed df used: :rendered_file :partial_duration
- Rendering Time
#return df/plot of assets and their rendering time
#columns of parsed df used: :rendered_file :view
- Database Time
#return df/plots activerecord time for each rendering
#columns of parsed df used: :rendered_file :db
These methods can be used to return dataframes or plots. I want your opinion on which repository to add these methods to. A new repository can also be made to club this together with the data cleaning library.
I am planning to add all these plotting metrics to Daru::View this week. Please let me know if you would like to have any changes in the workflow.
Sorry for answering late, but I believe that direction should be updated a bit.
The thing is, goal is “more generic goodness in daru (based on ‘analyze rails logs’ task)”, not “make a Rails log analyzer (eventually using some of daru)”.
Therefore, what should be thought upon is not very particular Rails-log-centric analyzing methods, but something like:
- make some (very rough at the beginning!) demo app – it can be just one script that reads some log and outputs stats (first just numbers/data, then plots).
- while doing so, look at current DataFrame and daru-view abilities, and detect what missing for it to be comfortable and one-liners, and reimplement that (not specific methods like
most_requested_rails_http_routes, but generic statistical and plotting gimmiks)
The far result of the work should be envisioned as:
- daru can read now rails (and other) logs too!
- cool new statistical/business analytics methods in DataFrame
- some generic, but pretty useful for logs things in daru-view
…and some demo app to showcase how Daru can be useful (for Rails too, but also for lot of other things).
Please keep reporting us where are you now, what’s the plan, and challenges you are facing in every 1 or 2 days.
Going by the updated directions, I think daru/view is already a friendly plugin for plotting any type of metrics. As the user-to-user requirement of metrics varies, one can do it on his own if given a sample rails app plotting some of the common metrics (which I will be doing in the last phase as per timeline).
If the aim is to introduce more generic methods rather than log-centric ones, it will be better to directly start with the next part of creating the business intelligence module for daru.
WDYT? @zverok @shekharrajak
I have started working on the Daru::BI module and will be adding data cleaning methods for Daru::DataFrame this week. Please let me know about any altercations required in the plan.