Question on statistics collection

View: New views
5 Messages — Rating Filter:   Alert me  

Question on statistics collection

by SaraR :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

can you please help me understant the following comment in StatisticsUtil::needsToBeConsidered(),
 "// data came from filter, not from resource: this needs to be considered only if this is robot"

Why do we want to collect statistics of requests "caught" in the only if they were requested by a robot and not a "normal" browser request?

The purpose of my question is because I'm trying to find out a way to log visits to the pages on my site.
In the main page of the application I call the StatisticsServlet, so accesses to this main page are correctly logged in statistics. However the majority of subsequent navigations in the site aren't because they are loaded with ajax in a <div> (using setting the innerHTML attribute of that element) and hence, the StatisticsServlet is not called again.
The statistics filter always gets the request but then doesn't log then because extractFromParams is false...

I'm hoping that by understanting this, I'll be able to better figure out my options and decide for the correct solution, so any help will be mostly appreciated!

Thanks in advance!
Sara




Re: Question on statistics collection

by Roman Puchkovskiy-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, Sara.

There are two 'entry points' for statistics collection in AtLeap:

1. StatisticsCollectionServlet which must be called manually. As you can
see in CollectStatsTag (which is used on frontendLayout.jsp), this
implies that values like referrer, screen resolution and others are
obtained by javascript and passed using GET parameters.
This method is the preferred if you want to get something which is
available only through javascript but not available through HTTP headers
(like screen resolution). At the same time, this does not work for:
a. browsers with JS disabled,
b. indexing robots which usually do not execute JS at all.
So, the data is passed using GET parameters (and this is the case when
extractFromParams is true).

2. StatisticsCollectionFilter: it wraps requests to our resource
servlet, actions and JSPs. It's a fallback for cases when
StatisticsCollectionServlet cannot be used. It's intended to be used for
the following:
a. browsers with JS disabled,
b. indexing robots
c. statistics about non-pages (i.e. resources)
Here, data (for statistics) is extracted from request headers
(extractFromParams is false in this case).

Please note that some requests may be processed twice in these scheme:
if a request is made to a page which uses the servlet (and browser
executes our JS), then it will be considered both by servlet and a
filter. To prevent this, there's a check in
StatisticsUtil.needsToBeConsidered() which begins with the following code:
        if (!skip && data.isNotResource() && !extractFromParams) {
            // data came from filter, not from resource: this needs to be
            // considered only if this is robot

Here the following is done: if this is a request to a page (i.e. not
resource) AND extractFromParams is false (which means that call comes
from the filter), then we get suspicious: maybe we should ignore this
request. But it should NOT be ignored if this is a known robot as it
never calls our statistics servlet (and hence no double counting occurs
for the robot).

As for enabling counting of asynchronous requests, I see the following ways:
1. Modify CollectStatsTag so that it can be used in HTML document body,
too (say, add a parameter which would switch from generation of STYLE
tag to generation of a SCRIPT tag). Probably in this case you would not
even need to modify the StatisticsCollectionServlet.
After this modification just insert <atleap:collectStats type="script"
/> in JSP which gets called using the async request.
2. Change the logic of that check in the needsToBeConsidered() method so
it's aware about your async request.

Actually, there's one more problem with the first way: it will count the
main page URL each time an async request is made. Not sure is this
correct for you. So possibly you'll have some more modifications.

If you make some modifications to AtLeap code while working on your
issue, we would be grateful if you share them with us.

Roman Puchkovskiy

SaraR writes:

> Hi,
>
> can you please help me understant the following comment in
> StatisticsUtil::needsToBeConsidered(),
>  "// data came from filter, not from resource: this needs to be considered
> only if this is robot"
>
> Why do we want to collect statistics of requests "caught" in the only if
> they were requested by a robot and not a "normal" browser request?
>
> The purpose of my question is because I'm trying to find out a way to log
> visits to the pages on my site.
> In the main page of the application I call the StatisticsServlet, so
> accesses to this main page are correctly logged in statistics. However the
> majority of subsequent navigations in the site aren't because they are
> loaded with ajax in a <div> (using setting the innerHTML attribute of that
> element) and hence, the StatisticsServlet is not called again.
> The statistics filter always gets the request but then doesn't log then
> because extractFromParams is false...
>
> I'm hoping that by understanting this, I'll be able to better figure out my
> options and decide for the correct solution, so any help will be mostly
> appreciated!
>
> Thanks in advance!
> Sara
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@...
For additional commands, e-mail: dev-help@...


Re: Question on statistics collection

by SaraR :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Roman,

First of all, thanks for the quick feedback!

Regarding your answer, ok, now I understand the use of the StatisticsCollectionFilter as fallback for:
   b. indexing robots
   c. statistics about non-pages (i.e. resources)
   
I just didn't get how the StatisticsCollectionFilter is also a fallback for the other scenario you pointed:
   a. browsers with JS disabled (which kind of can be seen as my case, if browser had JS disabled)

In this case, as you mentioned, the servlet is not called, so we reach the StatisticsUtil.serverVisit() from the StatisticsCollectionFilter and hence extractFromParams == false.
So this implies the page will be discarded in the StatisticsUtil.needsToBeConsidered() method because it enters in this "if":
    if (!skip && data.isNotResource() && !extractFromParams)
but since it's not a robot it then returns false, and so the page is not registered in statistics....

I understand that the "if" protects double counting pages if Servlet is called, but it seems that in the case the Servlet is not called, then no statistics are registered at all! The changes you propose are to get around this, right?

Or maybe just I didn't get the sentence "Here, data (for statistics) is extracted from request headers ". Is there another class collecting the statistics data from header specifically, besides the code executed in StatisticsUtil.serverVisit() when the StatisticsUtil.needsToBeConsidered() returns false?...

Thanks again and best regards,
Sara

PS.: I'll be glad to share any changes I may do! Although I'll try first to get around my issue by calling the Servlet, which also seems to me the preferable entry point (since in principle we won't have problems with the fallbacks...)

Re: Question on statistics collection

by Roman Puchkovskiy-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Err, my fault, a. case will not actually count anyone, you are right :)
As far as I remember, we decided to not count a few users who disable JS
rather than count twice users who do not disable it.

It is the StatisticsCollectionFilter who extracts data from request
headers, but it's made before serveVisit is called() (which actually
processes data of type RawData which is already extracted either from
GET parameters or from request headers).

If you call the servlet directly, I thing it would be enough, but in
this case you will have some nasty JS code in your JSPs. Actually,
that's why a tag was written, but now it seems a bit non-flexible as it
inserts STYLE tag which is ok in page head but not in its body.

Roman Puchkovskiy

SaraR writes:

> Hi Roman,
>
> First of all, thanks for the quick feedback!
>
> Regarding your answer, ok, now I understand the use of the
> StatisticsCollectionFilter as fallback for:
>    b. indexing robots
>    c. statistics about non-pages (i.e. resources)
>    
> I just didn't get how the StatisticsCollectionFilter is also a fallback for
> the other scenario you pointed:
>    a. browsers with JS disabled (which kind of can be seen as my case, if
> browser had JS disabled)
>
> In this case, as you mentioned, the servlet is not called, so we reach the
> StatisticsUtil.serverVisit() from the StatisticsCollectionFilter and hence
> extractFromParams == false.
> So this implies the page will be discarded in the
> StatisticsUtil.needsToBeConsidered() method because it enters in this "if":
>     if (!skip && data.isNotResource() && !extractFromParams)
> but since it's not a robot it then returns false, and so the page is not
> registered in statistics....
>
> I understand that the "if" protects double counting pages if Servlet is
> called, but it seems that in the case the Servlet is not called, then no
> statistics are registered at all! The changes you propose are to get around
> this, right?
>
> Or maybe just I didn't get the sentence "Here, data (for statistics) is
> extracted from request headers ". Is there another class collecting the
> statistics data from header specifically, besides the code executed in
> StatisticsUtil.serverVisit() when the StatisticsUtil.needsToBeConsidered()
> returns false?...
>
> Thanks again and best regards,
> Sara
>
> PS.: I'll be glad to share any changes I may do! Although I'll try first to
> get around my issue by calling the Servlet, which also seems to me the
> preferable entry point (since in principle we won't have problems with the
> fallbacks...)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@...
For additional commands, e-mail: dev-help@...


Re: Question on statistics collection

by SaraR :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just to "conclude" this thread, we followed the solution of calling the statistics servlet in our jsp's and js.
It's not the prettiest way but, it was what it was possible, and it works!

Roman Puchkovskiy-2 wrote:
Err, my fault, a. case will not actually count anyone, you are right :)
As far as I remember, we decided to not count a few users who disable JS
rather than count twice users who do not disable it.

It is the StatisticsCollectionFilter who extracts data from request
headers, but it's made before serveVisit is called() (which actually
processes data of type RawData which is already extracted either from
GET parameters or from request headers).

If you call the servlet directly, I thing it would be enough, but in
this case you will have some nasty JS code in your JSPs. Actually,
that's why a tag was written, but now it seems a bit non-flexible as it
inserts STYLE tag which is ok in page head but not in its body.

Roman Puchkovskiy

SaraR writes:
> Hi Roman,
>
> First of all, thanks for the quick feedback!
>
> Regarding your answer, ok, now I understand the use of the
> StatisticsCollectionFilter as fallback for:
>    b. indexing robots
>    c. statistics about non-pages (i.e. resources)
>    
> I just didn't get how the StatisticsCollectionFilter is also a fallback for
> the other scenario you pointed:
>    a. browsers with JS disabled (which kind of can be seen as my case, if
> browser had JS disabled)
>
> In this case, as you mentioned, the servlet is not called, so we reach the
> StatisticsUtil.serverVisit() from the StatisticsCollectionFilter and hence
> extractFromParams == false.
> So this implies the page will be discarded in the
> StatisticsUtil.needsToBeConsidered() method because it enters in this "if":
>     if (!skip && data.isNotResource() && !extractFromParams)
> but since it's not a robot it then returns false, and so the page is not
> registered in statistics....
>
> I understand that the "if" protects double counting pages if Servlet is
> called, but it seems that in the case the Servlet is not called, then no
> statistics are registered at all! The changes you propose are to get around
> this, right?
>
> Or maybe just I didn't get the sentence "Here, data (for statistics) is
> extracted from request headers ". Is there another class collecting the
> statistics data from header specifically, besides the code executed in
> StatisticsUtil.serverVisit() when the StatisticsUtil.needsToBeConsidered()
> returns false?...
>
> Thanks again and best regards,
> Sara
>
> PS.: I'll be glad to share any changes I may do! Although I'll try first to
> get around my issue by calling the Servlet, which also seems to me the
> preferable entry point (since in principle we won't have problems with the
> fallbacks...)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@atleap.dev.java.net
For additional commands, e-mail: dev-help@atleap.dev.java.net