For most users, this button is the primary way in which new files are added to a fusker collection. The process of adding a URL to a fusker collection consists of the following steps:
How Image Surfer Pro processes data from a webpage depends upon what is currently displayed in the IE window. Image Surfer Pro treats the content displayed in the IE window as one of three types of webpage:
NOTE: The button can also be used to initiate a Directed Search when processing an ISP Form. However, in this section of the user's manual we will cover only how the data is collected from the currently displayed webpage and leave the directed search to the ISP Form section of the user's manual.
During the extraction process for most webpages a progress bar similar to the following will be seen at the bottom right corner of your primary display. The window will remain on top of all other windows and can not be moved. Image Surfer Pro and Internet Explorer will both be unresponsive during this process as we need the webpage data to remain stable during the process.
The process of extraction involves not only determining what data from the webpage might be relevant but the assignment of an Image Surfer Pro data type to the URLs. In most cases the assignment is relatively straight forward (any MP4 file is a video etc.), but depending upon the processing being done the assignment may become more complex.
Direct Image DisplayInternet Explorer will directly display image files as easily as it displays webpages. It can sometimes be difficult to determine if you are seeing a webpage or an image. Extraction in this case is rather obvious... the URL of the displayed image is treated as an ISP Image.
ISP FormsThe button will check the header information of the webpage to determine if it is an Image Surfer Pro generated form. Each selection box on an ISP Form is processed from the top of the page to the bottom. Each Add box which has been checked will add a URL to a list for assimilation.
In most cases the wording of the selection box makes it clear which data type the URL will be assigned. However, data type assigned to all generic (i.e. non-image, non-video, non-frame) links is determined by the drop down list at the top of the table where the link is displayed.
The setting of the drop down box at the top of the table will be used for all of the generic Add Page To Collection selection boxes within the table but will not be used if the link is known to be a video or image. If you wish to add links as different types from within the same table, select Unknown. Selecting Unknown as the data type will then prompt you individually during assimilation for each of the selected links from the table.
Generic WebpagesIf the page displayed is neither an image nor an ISP Form, data extraction gets much more complex. Image Surfer Pro breaks apart the HTML code of the website and generates lists of possibly relevant data. The following lists of information are created from the HTML code.
Video (MP4) Files |
The attributes of every tag within the HTML code is searched for any reference to an MP4
file. In some cases the "external" content of the tag is also searched.
If the URL of an MP4 file is in the general text of the page and not part of a hyperlink or other page construct it will not be found. If you wish to add such a URL to your fusker collection simply select the text on the page and use the URL Capture Bar to add it directly. Other forms of video files (such as AVI of WMV) are not currently supported as the HTML5 <video> tag does not support them. They will be added in a later release with more general support for the <object> tag. |
Frame Data |
Frame data is currently taken from <iframe> and <embed> tags. These generic tags
are used consistently across the web to provide streaming video data. We do not currently
process <object> tags which are also common, but which are typically used to display
flash video and other active flash content. We hope to better support flash objects in a
later release.
If the data source of an <iframe> or <embed> tag is actually an MP4 file, that source is treated as a video rather than a frame. |
Image Files |
File references with the following extensions are assigned the Image data type: .jpg .jpe .jpeg .gif .bmp .tif .tiff .pic .pct .pict .pcx .pxr .png Reference to image files are kept in two assimilation lists:
|
Page Reference | The URL of the processed webpage is kept as a possible Page object for assimilation. |
Not all of the extracted content is always assimilated into the fusker collection and how the data is assimilated is again dependant upon the source of the data. The assimilation process can be summarized by the following steps:
All data received from the processing of a directly displayed image or an ISP Form will be assimilated because you expressly selected by the data. Images selected are not subjected to the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.
When a general webpage is processed your User Preferences are used to determine what data will automatically be assimilated.
Videos | Automatically assimilated based on the {Automatically collect video information found} from the Videos Tab. |
Frames | Automatically assimilated based on the {Automatically collect frame information found} from the Frames Tab. |
Images |
Directly Referenced Images Images not visible on the webpage but directly referenced by hyper-links on the webpage are automatically assimilated based on the {Automatically collect direct image links} from the Images Tab. |
Embedded (visible) Images Images visible on the webpage will be assimilated if the {Automatically collect embedded image links} from the Images Tab is set to Always and they are larger than the {Min embedded image file KBytes for auto collection} configuration on the Images Tab. |
|
Pages | The webpage URL is never automatically assimilated. |
You may select as many of the different sets as you like when presented with the choice. Options where no
relevant data was extracted will be disabled.
If the {Automatically collect embedded image links} setting on User Preferences Images Tab was not set to Always it is possible that Embedded Image information was extracted but not automatically added. Even after embedded images are selected as a group for assimilation, each of the images is validated against the {Min embedded image file KBytes for auto collection} configuration on the Images Tab. The original webpage option will always be available even if no other option is. |
Each extracted URL that is chosen for assimilation is made into a single branch ISP Tree consisting of a blank collection segment, a domain access segment, potentially several directory segments, and a file segment of the type determined during extraction.
For example consider the file reference:
http://www.rexwallpapers.com/images/wallpapers/celebs/sarah/sarah_michelle_gellar_1.jpg
This image reference converted to an ISP Tree would have 7 segments under a collection segment.
The domain access segment (http:) represents the protocol used to access the reference. This segment is determined when the URL information is extracted by either the or the URL Capture Bar.
Below the domain access segment are 5 directory segments. The top most of these segments is called the Domain or Root Directory segment. These segments indicate where on the web this file is stored. In this case the file is stored at Rex Wallpapers under the directory images/wallpapers/celebs/sarah
The final segment is always a file segment (sarah_michelle_gellar_1.jpg). Extraction will have defined the URL as an image because the file extension of (.jpg) was recognized as the JPEG image format.
If the assimilated URL was found by processing a directly displayed image, the file segment of the single branch ISP Tree would have Auto Ranging applied to it. URLs extracted from any other type of page will not be auto ranged.
Tree MergingOnce the single branch ISP Tree is formed it is merged with your existing fusker collection. The merger process can be a bit complicated but the results are usually intuitive. New directory path information will appear as a new branch in the fusker tree. List and numeric fusks are not automatically generated or expanded at the directory level.
The merger process will equalize the roll up level of the new fusker tree to match that of the branches it is compared to in the existing fusker tree. This can help propagate your structural preferences automatically as you build your fusker collection.
At the file segment level the setting of the {Auto combine individual files into fusked files} configuration for each file type governs how file segments are merged. If it is enabled, the new file segment will combine information with the first file segment of the same type found within the same path to form an optimally fusked file segment. This includes any information added by the auto ranging done when the single branch ISP Tree was formed. The Optimization Process is applied after combining the information such that duplicate references are removed and the final segment may be either list or numerically fusked.
As the single branch ISP Trees are merged into your fusker collection the selection is constantly moved to the last merged file segment. When the last URL from the last group has been assimilated into the fusker collection, a specific view of the last segment is generated. The resulting webpage looks similar to an Expanded view of the file segment except the entire path of the last URL is maintained and used to chose the correct iteration step at each fusked segment in the path. This should guarantee the content shown includes the last file assimilated.
The visualization may or may not include all of the information added. Information could be added which had different domain access protocols, different domains, different directory paths, or just different file types.
Processing Tab: Auto Range
The six inputs in the Auto Ranging Configuration block of the processing tab all play a significant
role in how images directly displayed in the browser display are added to the fusker collection.
They determine whether or not Auto Range Fusking is performed, how large the numerical range of files
will be, and where the range starts relative to the image file being processed.
Images Tab: Image Collection
There are three controls for image collection. The first, {Automatically collect direct image links}
deals with images that are not visible on the webpage but which were directly referenced by hyper-links on
the processed page. The second two, {Automatcially collect embedded image lnks} and {Min embedded
image file Kbytes for auto collection} are applied to images visible on the processed page. They
determine whether or not the images are extracted, automatically collected, and the minimum file size required
for these images to be added to the fusker collection.
Images Tab: Auto Optimize: If the {Auto combine individual Images into fusked images} configuration is checked, any images being added to the same directory in the fusker collection will be grouped into a fusked file. The form of the fusked file will be optimized and may be either a list or numeric fusk.
Videos Tab: Video Collection
The {Automatically collect video information found} configurations determines whether or not MP4 video
files found during the processing of the webpage are automatically added to the fusker collection.
Videos Tab: Auto Optimize
If the {Auto combine individual Videos into fusked Videos} configuration is checked, any videos being
added to the same directory in the fusker collection will be grouped into a fusked file with other videos. The
form of the fusked file will be optimized and may be either
a list or numeric fusk.
Frames Tab: Frame Collection
The {Automatically collect frame information found} configurations determines whether or not data found
in <iframe> and <embed> tags is automatically added to the fusker collection.
Frames Tab: Auto Optimize
If the {Auto combine individual Frames into fusked Frames} configuration is checked, any frame
information being added to the same directory in the fusker collection will be grouped into a fusked file with
other frames. The form of the fusked file will be optimized
and may be either a list or numeric fusk.
Pages Tab: Auto Optimize
If the {Auto combine individual Pages into fusked Pages} configuration is checked, any page URL
being added to the same directory in the fusker collection will be grouped into a fusked file with other pages.
The form of the fusked file will be optimized and may be either
a list or numeric fusk.
Processing Image Galleries:
In the Free Version of Image Surfer Pro the
button will only process a directly displayed image file. It will not process an ISP Form or general webpage.
If the displayed page is not a direct image file, you will be given the option of adding the webpage to the
fusker collection as a page URL.
Use Limitation:
The Free Version of Image Surfer Pro will only allow you to use the Process Webpage button a limited number
of times per browsing session. The primary use of the free version of Image Surfer Pro is to visualize
existing fusker collection files and provide a limited feel for the ability to modify those fusker collections.
Building extensive fusker collections with the free version will be difficult.
Examples of using the button are separated in to several pages: