Each of the media types supported by Image Surfer Pro has an associated Auto Collection configuration. While the Processing Web Pages along with the Directed Search sections of this manual do a good job of describing how webpages are processed, they are not written from the perspective of each of the media types. The topics they cover are large and some details of how each media type is collected may need some clarification. In this section of the manual we wish to focus specifically on each media type, how and when it is collected, and why you might wish to have specific configurations for these User Preferences.
As detailed in Processing Web Pages, there are several steps to processing a web page: Extraction, Assemilation, and Results Display. Here we will focus in on the Assemilation Process.
Assemilation also consists of three steps: Determining what will be assemilated, forming trees, and merging trees. Other sections of this manual cover tree formation and how trees merge. Here we wish to detail how it is determined what data is assemilated. Specifically how it is decided if data extracted from a processed webpage is automatically assemilated.
Auto Collection is defined as the designation of specific data extracted from a webpage to be automatically processed for assemilation. In most cases you will have an idea of what type of media links you want to collect from the webpages you are surfing. If you enable the Auto Collect configuraitons for this type of media the processing will assemilate the desired data after extraction completes. However, you may want to simply turn all Auto Collection off - in which case you will be prompted to chose what data is assemilated from the sets of extracted data.
Warning – Any data set not configured to automatically collect will be discarded if data is extracted for a data set which is configured to automatically collect!
It has been said "An image is worth a thousand words." and this concept has never been more clearly realized than it is on the Internet. Every webpage you visit has multiple images on it. Images are sometimes the entire focus of a webpage and sometimes they are the accents on the page. They play the role of buttons, decoration, and even advertizements. In most cases, the images you wish to collect will seem obvious to you while you ignore thousands of other images on the very same pages.
Beacuase images are a static medium (with the possible exception of a few file formats such as GIF), images are often used as the source of a hyper link. Web browsers also treat images in a special way, if the URL links directly to an image, most browswers will simply display the image as if it were a webpage. In many cases the entire purpose of a webpage is to provide links to individual images. In almost every case these are the images you wish to collect because they will never be buttons, decoration, nor ads.
Image Surfer Pro gives you Auto Collection configurations for images broken into two groups:
|
{Automatically collect direct image links}
In almost every case you will want to enable this configuraiton. The original purpose of Image Surfer Pro was to collect quality links to quality images which could be shared between users. Direct image links found on a processed page are typically the focus of the page and the entire reason you were surfing the page to start with.
You may wish to turn this configuraiton off if you are processing pages which have mixed video and image content (such as a mixed media Free Hosted Gallery), and you want to collect only the video media.
{Automatically collect embedded image links}
In almost every case you will want to set this to Direct Search. In this case, only embedded images found on specific pages during a directed search will be automatically assemilated while in other processing they will only appear as an option for assemiliation if no other data set was automatically collected.
Unlike all the other Auto Collection configuraitons, this configuration affects both extraction and assemilation.Setting | Extracted | Assemilated |
---|---|---|
Always | From Every Page Processed | Always Automatic |
Directed |
Pages where not many direct image or video references found. Typically directed search 3rd level pages or any directly processed webpage that isn't a FHG. |
Automatic only after a direct search |
Never | Pages where not many direct image or video references found. Typically directed search 3rd level pages or any directly processed webpage that isn't a FHG. |
Never Automatic User Prompt if nothing automatically assemilated |
{Min embedded image file Kbytes for auto collection}
If you have set {Automatically collect embedded image links} to Directed Search you will probably want to set this slider bar configuraiton to Always. If on the other hand you have chosen to Always collect embedded images you will probalby want to increase the file size based on the types of images you wish to avoid. Images used for decoration and buttons will typically be small in size. Advertisements will be larger. The optimal setting will take some experimentation with the types of sites you typically process for embedded images.
All embedded images assemilated are subject to comparison with this configuration, reguardless of whether they are collected automatically or by choice after no data was selected for automatic collection.Like images, if the page has a hyper link to a direct video file, that link is what you are probably what you wish to collect.Unlike images, if a video is present on a page it is typically the focus of the page and at least one of the reasons the page exists. Videos are rarely used as advertizements and because they are interactive are not used as the source of a hyper link. Image Surfer Pro will extract any embedded or directly linked video URL.
However, not every thing which appears to be a video on a webpage is in fact a video. Internet Explorer supports a multitude of image file formats, but only a single video file format. To be collected as a video, a URL must directly reference an MP4 file. Make sure you understand how videos are presented on webpages.
{Automatically collect video information found}
Unless you do not wish to collect MP4 video file links in your fusker collection there is very little reason not to enable this configuraiton.
Frames are active embedded objects on webpages. They have a multitude of uses. Two very typical uses are to display either a video stream or an active advertizement. Unlike actual videos or image files they are not a direct reference to a source file. They are an open pipe to the hosting server, which makes them highly risky.
It would not be reasonable for Image Surfer Pro to claim support for video content if it did not support video presented in Frame format. Nearly every top teir tube site, such as YouTube presents their videos in this way.
{Automatically collect frame information found}
It is Highly Recommeded that frame collection be limited to either selection off of an ISP Form or through the use of the URL Capture Bar. Thus we recommed leaving this configuration disabled.
There is no Auto Collection for pages. The page type is inteneded to be a catch all to make sure that any web location can be collected and used, but there is no way to differentiate a general web link from another to say it would be of interest. Auto collection of non-media links would essentially collect every link found on a processed web page, while the vast majority of these links would be of little to no interest to the collector. Think of your favorites or bookmarks within your browser, those represnt the web locations you have some interest in returning to, however they represent a tiny fraction of the number of web locations you come accross in a normal day of surfing.
Page objects have a very real and valid use within Image Surfer Pro, but you need to designate each page you wish to have in your collection individually.
There are no differences in the capabilities of the free and registered versions of Image Surfer Pro related to setting your user preferences. The primary differences between the Free and Registered versions of the software are their ability to build fusker collections. There are constraints for , , and the URL Capture Bar which use the configurations discussed here. The Free version of the software is primarily a viewer for the fusker collection files.