BLOGGER TEMPLATES AND TWITTER BACKGROUNDS

Tuesday, March 16, 2010

Characterizing Browsing Strategies in the World-Wide Web

Abstract


This paper presents the results of a study conducted at Georgia Institute of Technology that captured client-side user events of NCSA's XMosaic. Actual user behavior, as determined from client-side log file analysis, supplemented our understanding of user navigation strategies as well as provided real interface usage data. Log file analysis also yielded design and usability suggestions for WWW pages, sites and browsers. The methodology of the study and findings are discussed along with future research directions.
Keywords


Hypertext Navigation, Log Files, User Modeling
Introduction


With the prolific growth of the World-Wide Web (WWW) [Berners-Lee et.al, 1992] in the past year there has been an increased demand for an understanding of the WWW audience. Several studies exist that determine demographics and some behavioral characteristics of WWW users via self-selection [Pitkow and Recker 1994a & 1994b]. Though highly informative, such studies only provide high level trends in Web use (e.g. frequency of Web browser usage to access research reports, weather information, etc). Other areas of audience analysis, such as navigation strategies and interface usage remain unstudied. Thus, the surveys provide estimations of who is using the WWW, but fail to provide detailed information on exactly how the Web is being used. Actual user behavior, as determined from client-side log file analysis, can supplement the understanding of Web users with more concrete data. Log file analysis also yields design and usability guidelines for WWW pages, sites and browsers.
This paper presents the results of a three week study conducted at Georgia Institute of Technology that captured client-side user events of NCSA's XMosaic. Specifically, the paper will first present a review of related hypertext browsing and searching literature and how it's related to the Web, followed by a description of the study's methodology. An analysis of user navigation patterns ensues. Lastly, a discussion and recommendations for document design are presented.

Literature Review


Many studies have addressed user strategies and usability of closed hypermedia systems, databases and library information systems [Caramel et. al., 1992]. Most distinguish between browsing and searching. Cove and Walsh [Cove et. al. 1988] include a third browsing strategy:
Search browsing; directed search; where the goal is known
General purpose browsing; consulting sources that have a high likelihood of items of interest
Serendipitous browsing; purely random
This continuum provides a nice middle ground to distinguish between browsing as a method of completing a task and open ended browsing with no particular goal in mind. Marchionini [Marchionini, 1989] further develops this distinction in designating open and closed tasks. Closed tasks have a specific answer and often integrate subgoals. Open tasks are much more subject oriented and less specific. Browsing can be used as a method of fulfilling either open or closed tasks.
Intuitively, it would seem that browsing and searching are not mutually exclusive activities. In Bates's [Bates, 1989] work on berrypicking, a user's search strategy is constantly evolving through browsing. Users often move back and forth between strategies. Similarly, Bieber and Wan [Bieber & Wan, 1994] discuss the use of backtracking within a multi-windowed hypertext environment. They introduce the concept of "task-based backtracking," in which a user backtracks to compare information from different sources for the same task or to operate two tasks simultaneously. A similar technique, in a Web environment, would be backtracking to review previously retrieved pages.

All of these studies were performed on closed, single-author systems. The WWW however, is an open, collaborative and exceedingly dynamic hypermedia system. These previous findings provide the basis and structure for the describing the ways a user population behaves in a dynamic information ecology, like the WWW.

Given that we expect to find the same kinds of strategies used in the WWW, supporting both the browser and the searcher in designing WWW pages and servers is necessary, although difficult. Furthermore, supporting the kind of task switching described by Bates and Beiber and Wan adds another level of complexity because the work implies that a user should be able to switch strategies at any time.

It has long been recognized that methods for supporting directed searching are needed. As a response to this, certain WWW servers are completely searchable and there are World-Wide Web search engines available.

Supporting browsing, though, may be a more difficult task. Both Laurel [Laurel, 1991] and Bernstein approach the topic of how to assess and design hypertexts for the browsing user. Laurel considers interactivity to be the primary goal. She defines a continuum for interactivity along three variables: frequency (frequency of choices), range (number of possible choices) and significance (implication of choices). Laurel contends that users will pay the price "often enthusiastically -- in order to gain a kind of lifelikeness, including the possibility of surprise and delight." Bernstein takes a slightly different approach with his "volatile hypertexts" [Bernstein, 1991]. He argues that the value of hypertext lies in its ability to create serendipitous connections between unexpected ideas.

There is a tension between designing for a browser and designing for a searcher. The logical hierarchy of a file structure or a searchable database may work fine for a closed-task, goal oriented user. But a user looking for the unexpected element or a serendipitous connection may be frustrated by the precision required by these methods. The first step in balancing this problem is to determine what strategies are being used by the population. In order to do this, we collected log files of users interacting with the Web.

History of internet

Before the wide spread of internetworking (802.1) that led to the Internet, most communication networks were limited by their nature to only allow communications between the stations on the local network and the prevalent computer networking method was based on the central mainframe computer model. Several research programs began to explore and articulate principles of networking between physically separate networks, leading to the development of the packet switching model of digital networking. These research efforts included those of the laboratories of Donald Davies (NPL), Paul Baran (RAND Corporation), and Leonard Kleinrock at MIT and at UCLA. The research led to the development of several packet-switched networking solutions in the late 1960s and 1970s, including ARPANET and the X.25 protocols. Additionally, public access and hobbyist networking systems grew in popularity, including unix-to-unix copy (UUCP) and FidoNet. They were however still disjointed separate networks, served only by limited gateways between networks. This led to the application of packet switching to develop a protocol for internetworking, where multiple different networks could be joined together into a super-framework of networks. By defining a simple common network system, the Internet Protocol Suite, the concept of the network could be separated from its physical implementation. This spread of internetworking began to form into the idea of a global network that would be called the Internet, based on standardized protocols officially implemented in 1982. Adoption and interconnection occurred quickly across the advanced telecommunication networks of the western world, and then began to penetrate into the rest of the world as it became the de-facto international standard for the global network. However, the disparity of growth between advanced nations and the third-world countries led to a digital divide that is still a concern today.
Following commercialization and introduction of privately run Internet service providers in the 1980s, and the Internet's expansion for popular use in the 1990s, the Internet has had a drastic impact on culture and commerce. This includes the rise of near instant communication by electronic mail (e-mail), text based discussion forums, and the World Wide Web. Investor speculation in new markets provided by these innovations would also lead to the inflation and subsequent collapse of the Dot-com bubble. But despite this, the Internet continues to grow, driven by commerce, greater amounts of online information and knowledge and social networking known as Web 2.0.

ARPANET


Len Kleinrock and the first IMP.
Promoted to the head of the information processing office at DARPA, Robert Taylor intended to realize Licklider's ideas of an interconnected networking system. Bringing in Larry Roberts from MIT, he initiated a project to build such a network. The first ARPANET link was established between the University of California, Los Angeles and the Stanford Research Institute on 22:30 hours on October 29, 1969. By December 5, 1969, a 4-node network was connected by adding the University of Utah and the University of California, Santa Barbara. Building on ideas developed in ALOHAnet, the ARPANET grew rapidly. By 1981, the number of hosts had grown to 213, with a new host being added approximately every twenty days.
ARPANET became the technical core of what would become the Internet, and a primary tool in developing the technologies used. ARPANET development was centered around the Request for Comments (RFC) process, still used today for proposing and distributing Internet Protocols and Systems. RFC 1, entitled "Host Software", was written by Steve Crocker from the University of California, Los Angeles, and published on April 7, 1969. These early years were documented in the 1972 film Computer Networks: The Heralds of Resource Sharing.
International collaborations on ARPANET were sparse. For various political reasons, European developers were concerned with developing the X.25 networks. Notable exceptions were the Norwegian Seismic Array (NORSAR) in 1972, followed in 1973 by Sweden with satellite links to the Tanum Earth Station and Peter Kirstein's research group in the UK, initially at the Institute of Computer Science, London University and later at University College London.




Leonard-Kleinrock-and-IMP1.png

Asynchronous vs. Synchronous

Asynchronous vs. Synchronous
Most communications circuits perform functions described in the physical and data link layer of the OSI Model. There are two general strategies for communicating over a physical circuit: Asynchronous and Synchronous. Each has it's advantages and disadvantages.

ASYNCHRONOUS

Asynchronous communication utilizes a transmitter, a receiver and a wire without coordination about the timing of individual bits. There is no coordination between the two end points on just how long the transmiter leaves the signal at a certain level to represent a single digital bit. Each device uses a clock to measure out the 'length' of a bit. The transmitting device simply transmits. The receiving device has to look at the incoming signal and figure out what it is receiving and coordinate and retime its clock to match the incoming signal.

Sending data encoded into your signal requires that the sender and receiver are both using the same enconding/decoding method, and know where to look in the signal to find data. Asynchronous systems do not send separate information to indicate the encoding or clocking information. The receiver must decide the clocking of the signal on it's own. This means that the receiver must decide where to look in the signal stream to find ones and zeroes, and decide for itself where each individual bit stops and starts. This information is not in the data in the signal sent from transmitting unit.

When the receiver of a signal carrying information has to derive how that signal is organized without consulting the transmitting device, it is called asynchronous communication. In short, the two ends do not synchronize the connection parameters before communicating. Asynchronous communication is more efficient when there is low loss and low error rates over the transmission medium because data is not retransmitted and no time is spent setting negotiating the connection parameters at the beginning of transmission. Asynchronous systems just transmit and let the far end station figure it out. Asynchronous is sometimes called "best effort" transmission because one side simply transmits, and the other does it's best to receive.

EXAMPLES:
Asynchronous communication is used on RS-232 based serial devices such as on an IBM-compatible computer's COM 1, 2, 3, 4 ports. Asynchronous Transfer Mode (ATM) also uses this means of communication. Your PS2 ports on your computer also use serial communication. This is the method is also used to communicate with an external modem. Asynchronous communication is also used for things like your computer's keyboard and mouse.

Think of asynchronous as a faster means of connecting, but less reliable.
SYNCHRONOUS

Synchronous systems negotiate the communication parameters at the data link layer before communication begins. Basic synchronous systems will synchronize both clocks before transmission begins, and reset their numeric counters for errors etc. More advanced systems may negotiate things like error correction and compression.

It is possible to have both sides try to synchronize the connection at the same time. Usually, there is a process to decide which end should be in control. Both sides can go through a lengthy negotiation cycle where they exchange communications parameters and status information. Once a connection is established, the transmitter sends out a signal, and the receiver sends back data regarding that transmission, and what it received. This connection negotiation process takes longer on low error-rate lines, but is highly efficient in systems where the transmission medium itself (an electric wire, radio signal or laser beam) is not particularly reliable.