Web Analytics: what about Packet Sniffing?

by

As from time to time we come along a web analytics solution that makes use of Packet (or IP) Sniffing, I took the opportunity during the Emetrics Summit 2006 in London to ask the different vendors that were present what they thought about this quite new technology, as the opinions were all on the same line.

The first Packet Sniffer everFor those who don't know about this technique, packet sniffing is designed for the purpose of monitoring network traffic in order to recognize and decode certain packets of interest. It makes use of a black box which is installed in the network.
Packet sniffing is also widely used by hackers and crackers to gather information illegally about networks they intend to break into. Using a packet sniffer it is possible to capture data like passwords, IP addresses, protocols being used on the network and other information that will help the attacker infiltrate the network.

And lately it is being used for web analytics reporting as well.

So what was the outcome of the nice little discussion I started during Emetrics:

  1. The technique is very useful if you want to see detailed IT related reports such as pages and server error reporting, time to serve, bandwidth usage, etc.
    Although these reports are important, they should be seen separately from pure web analytics. These days, web analytics is about campaign reports, about segmentation, about conversion, about integration with other data sources. In other words it has become a marketing/business tool, and it is no longer a tool for IT to see the technical info and the number of visitor and page views, as it was up until 2 years ago.
    That is also the reason why at OX2 we have partnered up with K-Performance. With their application IT can automatically monitor, load test and improve the performance of their network and applications.
  2. Packet Sniffing makes use of a black box that is put in the network. But what if you don’t have 1 centralized network? This means that you have to put a black box in each network, which will become rather expensive.
    And at that point consultancy isn’t even included yet! The hard part using a packet sniffing tool for Web Analytics is the configuration of all the raw data that is coming in. This is something that goes much faster using tagging or standard log files as this data is much more structured.
  3. In most packet sniffing implementation for web analytics, tagging is still needed to acquire the necessary reports. This was confirmed by John Marshall, CEO of ClickTracks, which offers both log file analysis, client side tagging as packet sniffing. John also said that packet sniffing isn’t that flexible after all, and only a very small percentage of his clients makes use of it.
    In fact most vendors which offer packet sniffing also offer the client side tagging solution.
  4. And then of course we have the privacy issue. Packet sniffing captures all info that is send over the network, so also logins and passwords and all other personal data that could be required when you have to register on a site or when you buy online.
    Most of the company policies don’t allow this kind of data capturing, and even some countries don’t allow it.

 

I think we can conclude that packet sniffing is still a very good tool for IT, and in some case also for web analytics, but as Jim Sterne said: “It isn’t a standard in web analytics, and it won’t become one either.” Or as the representative of Omniture described it vividly: "A pretty dress on an ugly elephant!"

For the time being we keep on using the client side tagging solution!

About these ads

6 Responses to “Web Analytics: what about Packet Sniffing?”

  1. Ivo Rehberger Says:

    Hello.
    I have some comments to your article. Anytime I hear about using packet sniffing for web analytics, it is always the same: the packet sniffing tool is much more suitable for technical (traffic) analysis then for business oriented web analytics (I really do agree that web analytics is about business and not about traffic). However, packet sniffing shoul be considered as a technique to get detailed clickstream data and not as a solution that is not able to provide business oriented analytics. Once you have quality and complex clickstream data (in data warehouse), you may generate whatever type of report you want. And packet sniffing provides the most complex and detailed clickstream data collection that can be get.

    To point 2. “The hard part using a packet sniffing tool for Web Analytics is the configuration of all the raw data that is coming in.” – It only depends on tool you are using and is not generally true! NEXTWELL provides clickstream processing engine (called Clipen) that is based on packet sniffing and generates very well structured, processed clickstream data that are intentded for immediate load to data warehouse. The tool is intended to process raw data, clean them, normalize, do quality assurance and so on (for those interested see our website).

    To point 4. “And then of course we have the privacy issue.” This is a fact the logins and passwords are present within network communication, this is not more privacy issue than common – logins and password are also stored within back-end system and similar databases. What is important is to reject such information after capturing them and not to store them into database or data warehouse. On the other hand, data submitted from visitors to website may be very useful for web analytics purposes while not compromising user privacy. Use only tools that provides means to reject confidential data and support privacy!

    Ivo Rehberger

  2. web Says:

    A potential problem with the sniffer approach is that it only sees traffic arriving at the web server. Any page views that are satisfied by a proxy server or from any other caching mechanism beyond the network perimeter of the site will not be recorded.

    Tagging has it’s problems too, but far fewer than the sniffing approach.

  3. Aurélie Pols Says:

    Thanks Ivo for your reply and also web for your post.

    The thing is that in point 1, you mention the fact that everything is nicely stored within a datawarehouse and that clickstream analysis can be done. Great but…

    But most clients are not there yet.
    Let’s be honest, we start off WA projects with some sense of what needs to be monitored and then dissect the online strategy to come to the KPIs. That’s already a long road and ideally my contact persons are not IT people.
    So imagine that I start talking to a CMO, who’s had some experience in the online field but is totally unaware when we’re talking about the online channel, about a datawarehouse!! I can already imagine his/her face and the fact that (s)he’ll have to rely much more heavily on IT support in order to get the job done.

    WA needs to leave the IT realm. Turn it as you may, IT is not the business person and is not responsible for strategic decisions. They’re there to make the thing work and in the best casses have an idea of what doesn’t work. That’s it. Punt aan de lijn.

    Furthermore, except for the ANWB that uses Moniforce in the Netherlands, I’m still not overwhelmed by the examples that actually use packet sniffing. As long as the big players such as Microsoft or Amazon still go for tagging, I’m in.

    Last but not least, your website mentions pros and cons for the 3 techniques: logs, tags and sniffing.
    I’m afraid to say your information is inaccurate. I’m talking about http://www.nextwell.com/data-characteristics.html

    The first line is subjective.
    Line 2: not totally true. With tags, best practice is to place the script at the end of the page, assuring for full download of the page.
    Line 3: yes, POST methods is an issue with logs but can be circumvented with tags.
    Line 5: I don’t want hits! I want page views, visits and unique visitors.
    Lines 7 & 8: not for Web Analytics but for monitoring. Split the competences up or else it will never evolve!
    Line 9: totally untrue: broken links can be monitored with tagging if done well. I do it all the time, thank you very much and it’s not complicated.

    If I resume the entire list, except for monitoring, which can again be done with other, simpler and cheaper tools, from the list I find on your website, I still don’t see the advantage, sorry.

  4. Ivo Rehberger Says:

    Aurelie,
    thank you for your reply. As I wrote – WA must be always for business and must be driven by business people. I totally agree with you. However, if we discuss about technology fundamentals of WA, we discuss about log data, page tagging data, packet sniffed data. Each WA tool or platform has its technological base, but it does not mean it is intended for this or that audience. What I criticize is fact that packet sniffing is presented as technology only for IT people. No, it is just technology that may be used as a base for any WA tool also intended for business people. It is just technology. Each tool must be fed by data, but this itself does not anticipate business or technical purpose of the tool. The same about my data warehouse mention. Just replace data warehouse with database or data storage. Any WA tool need some king of data storage regardless it is for detailed or aggregated data. Data provided by packet sniffing technology are processed and aggregated in similar way like data from page tagging or log records. Analytical (presentation) layer of WA tool is determining for business applications, not the data collection layer or database layer.

    Of course, there is no need to uncover such technical information to business people when talking about benefits of particular WA tool. Business people need readable reports, KPI’s and valuable information. And they do not need technical information how the reports or KPI’s are acquired via page tagging, from logs or via packet sniffing.

    I must respond to your comments about “Data characteristics” page from NEXTWELL website.

    “The first line is subjective.”
    Why? When you are implementing page tagging, you have to implement the page tags into website application and this (more or less) affects the application – this is fact. Even when you are dealing with log files, you must download the files from website’s server, which may affect its operation. Packet sniffing is non-invasive, it even does not touch the sniffed website’s server; it does not require website modification and thus it is quickly and easily deployed.

    “Line 2: not totally true. With tags, best practice is to place the script at the end of the page, assuring for full download of the page.”
    So when the download of the page is stopped in a half you do not get the page view, because the script is not executed. But what if the page is readable – visitor may receive the page message or may click on any present link. Packet sniffing provides detailed information about how much of the page was really downloaded and this event may or may not be considered as page view. Page tagging provides no information about client communication disconnects. And are you sure the script is always executed even when the page is fully downloaded? Here is interesting experience about page tagging accuracy: http://groups.yahoo.com/group/webanalytics/message/5657

    “Line 3: yes, POST methods is an issue with logs but can be circumvented with tags.”
    Without quite intensive website modifications there is no way to get POST data into tags. So the page tagging itself does not have the ability. That is why I placed +/- scoring on our website to that line.

    “Line 5: I don’t want hits! I want page views, visits and unique visitors.”
    And what about measuring ad impressions when served from original site or internal ad impressions? What about measuring product brochure or catalog downloads? What about measuring software trials downloads? I agree, you do not need hits, you need valuable information from hits. Atomic data are about possibility to measure all what you need to measure, behind the page views.

    “Lines 7 & 8: not for Web Analytics but for monitoring. Split the competences up or else it will never evolve!”
    My opinion is different. Although more technical information, always helps to understand visitor interaction and her/his possible (dis)satisfaction.

    “Line 9: totally untrue: broken links can be monitored with tagging if done well. I do it all the time, thank you very much and it’s not complicated.”
    It is possible only via custom error pages provided by website server and the pages must include page tags. So it requires website modification and server application ability of custom error pages and thus it is not generally available within page tagging. However when such custom error page is not sent to client due to server error or is not fully downloaded to client – page tag script is not executed and no event is generated.

    The information presented on “Data characteristics” page is correct. There is missing even another disadvantage of page tagging. Let’s say a few words about third party cookies. Third party cookies are one of the most important techniques the page tagging is based on. Unlike first party cookies the packet sniffing and also web logs benefit from, third party cookies are commonly considered as a privacy issue due to fact the cookies allow to monitor Internet users behaviour across many websites and thus detailed user profile may be created if based on this technique. That is why users as well as many personal anti-spyware and firewall applications filter third party cookies. Even cookies from known web analytics hosted solutions that use page tagging are filtered by the anti-spyware applications (using list of domain names of the solutions). This situation has big negative impact on page tagging accuracy when determining unique visitors or even page views. It is so far that page tagging applications must use domain masking and similar techniques to prevent the filtering but here is still problem with third party cookies filtering. Do not expect the page tagging is so accurate as it is presented!

    web,
    thank you also for your post. You are right, caching anywhere between server and client may cause that particular hit never arrives to website’s server and thus it is not monitored by packet sniffer. However in todays Internet the caching is not problem as the main objective of caching – to save Internet bandwidth – is quite obsolete now. And so called “proxy busting” techniques effectively help to prevent caching of website content.

  5. Siegert Dierickx Says:

    Dear Ivo,

    Thanks for your comments regarding this topic.

    I just came back from a long and well-deserved holiday, and as it was I that originally posted about packet sniffing on this blog, I would like to reply to your latest comment. If Aurélie wants to add something, I’m pretty sure she will…

    I fully agree with you that data provided by packet sniffing technology is processed and aggregated in similar way like data from page tagging or log records.
    In our experience, we obtained the best results with tagging and we have experienced that this is the best solution for web analytics keeping in mind accuracy, automation and flexibility, but also the cost of the technology.
    We have also chosen tagging because of the integration with the CMS. At OX2 we have built our own CMS system, OniSystem (see http://www.OniSystem.eu). So we know what the best practise is for integrating the tags into the website, and to make sure the scripts is activated at each page visit, even when a visitor clicks on the stop button when only half the page has been loaded.
    And, in contrary to what you say, it is not difficult at all to report on broken links and other errors when using tagging. With a CMS it is quite easy to create custom error pages, and integrate the appropriate tagging.
    The same goes for ad impressions and any kind of downloads. Not difficult at all to report on these kind of things.
    What surprised me the most in your post was your statement about cookies: “Third party cookies are one of the most important techniques the page tagging is based on.”
    I know of some tagging-based web analytic products, like NedStat for example, that still use third-party cookies, but as for WebTrends it is not the case at all since the release of version 7. WebTrends gives you the possibility to either use first-party cookies that are generated by their tagging server, or gives you the possibility to use the existing first-party cookie that are generated by your own web server.
    So when talking about accuracy, it’s as accurate as you can get!

    But I guess the discussion about the best technology for web analytics is probably a never ending story with everybody having his preferred method of collecting data.
    Don’t get me wrong, we are not against packet sniffing! We position ourselves as web analytics in general and product independent, so we are open for all technologies. Our job is to help the client choose the solution that best reports reflects his needs to report as good as possible on his website and to meet as close as possible his objectives.
    In other words if you want to convince us of the packet sniffing technology through an online demo or conf call, we are happy to do so.
    And if you want to learn more about tagging, just give us a shout.

  6. karthikeyan Says:

    I need to know the ways to “Detect and Prevent Packet Sniffers” in the network.

    And also “Detecting and Preventing Promiscuous” devices on the Network

Comments are closed.


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: