Image Surfer Pro Toolbar

Processing Webpages
The Process Page Button

For most users, this button is the primary way in which new files are added to a fusker collection. The process of adding a URL to a fusker collection consists of the following steps:

How Image Surfer Pro processes data from a webpage depends upon what is currently displayed in the IE window. Image Surfer Pro treats the content displayed in the IE window as one of three types of webpage:

NOTE: The Process Page button from the Image Surfer Pro toolbar button can also be used to initiate a Directed Search when processing an ISP Form. However, in this section of the user's manual we will cover only how the data is collected from the currently displayed webpage and leave the directed search to the ISP Form section of the user's manual.

Extracting Relevant Data

During the extraction process for most webpages a progress bar similar to the following will be seen at the bottom right corner of your primary display. The window will remain on top of all other windows and can not be moved. Image Surfer Pro and Internet Explorer will both be unresponsive during this process as we need the webpage data to remain stable during the process.

Progress bar seen while extracting data from a webpage with Image Surfer Pro

The process of extraction involves not only determining what data from the webpage might be relevant but the assignment of an Image Surfer Pro data type to the URLs. In most cases the assignment is relatively straight forward (any MP4 file is a video etc.), but depending upon the processing being done the assignment may become more complex.

Direct Image Display

Internet Explorer will directly display image files as easily as it displays webpages. It can sometimes be difficult to determine if you are seeing a webpage or an image. Extraction in this case is rather obvious... the URL of the displayed image is treated as an ISP Image.

ISP Forms

The Process Pge button from the Image Surfer Pro toolbar button will check the header information of the webpage to determine if it is an Image Surfer Pro generated form. Each selection box on an ISP Form is processed from the top of the page to the bottom. Each Add box which has been checked will add a URL to a list for assimilation.

In most cases the wording of the selection box makes it clear which data type the URL will be assigned. However, data type assigned to all generic (i.e. non-image, non-video, non-frame) links is determined by the drop down list at the top of the table where the link is displayed.

Cut out from an ISP Form which shows the add all select boxes and data type drop down menu at the top of the Image and Text tables

The setting of the drop down box at the top of the table will be used for all of the generic Add Page To Collection selection boxes within the table but will not be used if the link is known to be a video or image. If you wish to add links as different types from within the same table, select Unknown. Selecting Unknown as the data type will then prompt you individually during assimilation for each of the selected links from the table.

Generic Webpages

If the page displayed is neither an image nor an ISP Form, data extraction gets much more complex. Image Surfer Pro breaks apart the HTML code of the website and generates lists of possibly relevant data. The following lists of information are created from the HTML code.

Video (MP4) Files The attributes of every tag within the HTML code is searched for any reference to an MP4 file. In some cases the "external" content of the tag is also searched.

If the URL of an MP4 file is in the general text of the page and not part of a hyperlink or other page construct it will not be found. If you wish to add such a URL to your fusker collection simply select the text on the page and use the URL Capture Bar to add it directly.

Other forms of video files (such as AVI of WMV) are not currently supported as the HTML5 <video> tag does not support them. They will be added in a later release with more general support for the <object> tag.
Frame Data Frame data is currently taken from <iframe> and <embed> tags. These generic tags are used consistently across the web to provide streaming video data. We do not currently process <object> tags which are also common, but which are typically used to display flash video and other active flash content. We hope to better support flash objects in a later release.

If the data source of an <iframe> or <embed> tag is actually an MP4 file, that source is treated as a video rather than a frame.
Image Files File references with the following extensions are assigned the Image data type:
.jpg .jpe .jpeg .gif .bmp .tif .tiff .pic .pct .pict .pcx .pxr .png

Reference to image files are kept in two assimilation lists:
  • Embedded (visible) images on the page
  • Direct Images referenced by a hyper link on the webpage
Page Reference The URL of the processed webpage is kept as a possible Page object for assimilation.

Assimilating Desired Content

Not all of the extracted content is always assimilated into the fusker collection and how the data is assimilated is again dependant upon the source of the data. The assimilation process can be summarized by the following steps:

What Data Will Be assimilated

All data received from the processing of a directly displayed image or an ISP Form will be assimilated because you expressly selected by the data. Images selected are not subjected to the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.

When a general webpage is processed your User Preferences are used to determine what data will automatically be assimilated.

Videos Automatically assimilated based on the {Automatically collect video information found} from the Videos Tab.
Frames Automatically assimilated based on the {Automatically collect frame information found} from the Frames Tab.
Images Directly Referenced Images
Images not visible on the webpage but directly referenced by hyper-links on the webpage are automatically assimilated based on the {Automatically collect direct image links} from the Images Tab.
Embedded (visible) Images
Images visible on the webpage will be assimilated if the {Automatically collect embedded image links} from the Images Tab is set to Always and they are larger than the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.

Detail of the portion of the User Configuration Images Tab dealing with the size of Embedded Image urls
Pages The webpage URL is never automatically assimilated.

Because assimilation is dependent upon your User Preferences and the information extracted from a page, it is possible to process a webpage and have no data automatically added to your fusker collection. When this happens, Image Surfer Pro will inform you what information it did extract from the webpage and let you chose which sets of data are added.

If no data was automatically added to the Fusker Collection this dialog allows the user to decide what extracted data they want addd to the fusker collection. You may select as many of the different sets as you like when presented with the choice. Options where no relevant data was extracted will be disabled.

If the {Automatically collect embedded image links} setting on User Preferences Images Tab was not set to Always it is possible that Embedded Image information was extracted but not automatically added.

Even after embedded images are selected as a group for assimilation, each of the images is validated against the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.

Detail of the portion of the User Configuration Images Tab dealing with the size of Embedded Image urls

The original webpage option will always be available even if no other option is.


Forming Trees

Each extracted URL that is chosen for assimilation is made into a single branch ISP Tree consisting of a blank collection segment, a domain access segment, potentially several directory segments, and a file segment of the type determined during extraction.

For example consider the file reference:
http://www.rexwallpapers.com/images/wallpapers/celebs/sarah/sarah_michelle_gellar_1.jpg

Closeup of the fusker collection view with this single image

This image reference converted to an ISP Tree would have 7 segments under a collection segment.

The domain access segment (http:) represents the protocol used to access the reference. This segment is determined when the URL information is extracted by either the Process Page button from Image Surfer Pro toolbar or the URL Capture Bar.

Below the domain access segment are 5 directory segments. The top most of these segments is called the Domain or Root Directory segment. These segments indicate where on the web this file is stored. In this case the file is stored at Rex Wallpapers under the directory images/wallpapers/celebs/sarah

The final segment is always a file segment (sarah_michelle_gellar_1.jpg). Extraction will have defined the URL as an image because the file extension of (.jpg) was recognized as the JPEG image format.

If the assimilated URL was found by processing a directly displayed image, the file segment of the single branch ISP Tree would have Auto Ranging applied to it. URLs extracted from any other type of page will not be auto ranged.

Tree Merging

Once the single branch ISP Tree is formed it is merged with your existing fusker collection. The merger process can be a bit complicated but the results are usually intuitive. New directory path information will appear as a new branch in the fusker tree. List and numeric fusks are not automatically generated or expanded at the directory level.

The merger process will equalize the roll up level of the new fusker tree to match that of the branches it is compared to in the existing fusker tree. This can help propagate your structural preferences automatically as you build your fusker collection.

At the file segment level the setting of the {Auto combine individual files into fusked files} configuration for each file type governs how file segments are merged. If it is enabled, the new file segment will combine information with the first file segment of the same type found within the same path to form an optimally fusked file segment. This includes any information added by the auto ranging done when the single branch ISP Tree was formed. The Optimization Process is applied after combining the information such that duplicate references are removed and the final segment may be either list or numerically fusked.

Sample interactive task progress window shown by Image Surfer Pro during the data assimilation processing. The single branch ISP Trees are merged into the fusker collection in groups. Each group may be contain several trees in the order they were found in the HTML of the processed page. Status of the group mergers is shown in an interactive task progress window similar to this one. The assimilation of any group may be stopped without affecting the assimilation of other groups.

The groups themselves are added in the following order:
  • Page
  • Embedded Images
  • Direct Image References
  • Frames
  • Videos

Displaying Results

As the single branch ISP Trees are merged into your fusker collection the selection is constantly moved to the last merged file segment. When the last URL from the last group has been assimilated into the fusker collection, a specific view of the last segment is generated. The resulting webpage looks similar to an Expanded view of the file segment except the entire path of the last URL is maintained and used to chose the correct iteration step at each fusked segment in the path. This should guarantee the content shown includes the last file assimilated.

The visualization may or may not include all of the information added. Information could be added which had different domain access protocols, different domains, different directory paths, or just different file types.

Related User Preferences:

Image of User Preferences Dialog with the General tab selected - nothing highlighted Image of User Preferences Dialog with the Processing tab selected - Auto Range Configuration highlighted Image of User Preferences Dialog with the Processing tab selected - nothing highlighted Image of User Preferences Dialog with the Views tab selected - nothing highlighted
Image of User Preferences Dialog with the Images tab selected - Collection and Optimization highlighted Image of User Preferences Dialog with the Videos tab selected - Collection and Optimization highlighted Image of User Preferences Dialog with the Frames tab selected - Collection and Optimization highlighted Image of User Preferences Dialog with the Pages tab selected - Optimization highlighted


Processing Tab: Auto Range
The six inputs in the Auto Ranging Configuration block of the processing tab all play a significant role in how images directly displayed in the browser display are added to the fusker collection. They determine whether or not Auto Range Fusking is performed, how large the numerical range of files will be, and where the range starts relative to the image file being processed.

Images Tab: Image Collection
There are three controls for image collection. The first, {Automatically collect direct image links} deals with images that are not visible on the webpage but which were directly referenced by hyper-links on the processed page. The second two, {Automatcially collect embedded image lnks} and {Min embedded image file Kbytes for auto collection} are applied to images visible on the processed page. They determine whether or not the images are extracted, automatically collected, and the minimum file size required for these images to be added to the fusker collection.

Images Tab: Auto Optimize: If the {Auto combine individual Images into fusked images} configuration is checked, any images being added to the same directory in the fusker collection will be grouped into a fusked file. The form of the fusked file will be optimized and may be either a list or numeric fusk.

Videos Tab: Video Collection
The {Automatically collect video information found} configurations determines whether or not MP4 video files found during the processing of the webpage are automatically added to the fusker collection.

Videos Tab: Auto Optimize
If the {Auto combine individual Videos into fusked Videos} configuration is checked, any videos being added to the same directory in the fusker collection will be grouped into a fusked file with other videos. The form of the fusked file will be optimized and may be either a list or numeric fusk.

Frames Tab: Frame Collection
The {Automatically collect frame information found} configurations determines whether or not data found in <iframe> and <embed> tags is automatically added to the fusker collection.

Frames Tab: Auto Optimize
If the {Auto combine individual Frames into fusked Frames} configuration is checked, any frame information being added to the same directory in the fusker collection will be grouped into a fusked file with other frames. The form of the fusked file will be optimized and may be either a list or numeric fusk.

Pages Tab: Auto Optimize
If the {Auto combine individual Pages into fusked Pages} configuration is checked, any page URL being added to the same directory in the fusker collection will be grouped into a fusked file with other pages. The form of the fusked file will be optimized and may be either a list or numeric fusk.

Differences in Free and Full Versions

Screen capture of free version limitation dialog Processing Image Galleries:
In the Free Version of Image Surfer Pro the Process Pge button from the Image Surfer Pro toolbar button will only process a directly displayed image file. It will not process an ISP Form or general webpage. If the displayed page is not a direct image file, you will be given the option of adding the webpage to the fusker collection as a page URL.

Use Limitation:
The Free Version of Image Surfer Pro will only allow you to use the Process Webpage button a limited number of times per browsing session. The primary use of the free version of Image Surfer Pro is to visualize existing fusker collection files and provide a limited feel for the ability to modify those fusker collections. Building extensive fusker collections with the free version will be difficult.

Screen Capture Examples

Sample screen capture after using the Process Page button Examples of using the button are separated in to several pages: