Extract single elements
Top  Previous  Next

Related example macro: Demo-Extract, Demo-ExtractRelative

iMacros can extract data from Web sites [iMacros Browser only]. Click on the extract button while in recording mode to bring up the extraction wizard that will help you create the correct EXTRACT command:

extract_wiz

Note: Internet Explorer 6.0 or better must be installed in order to use the
EXTRACT command.

The
EXTRACT command and thus the extraction is controlled by three different parameters, the extraction anchor, the position and the type of extraction. The most important parameter is the extraction anchor. It contains information on the HTML code around the information which is to be extracted. You must use * at the end of the extraction anchor. If the HTML code given in the anchor appears more than once on a page, the position parameter determines which of the occurences is extracted. The type of extraction determines if the result is plain text, HTML source code, an URL, an element's title or the alternative text of an image.

All extraction results can be accessed inside the macro through the built-in variable
!EXTRACT. If this variable contains #EANF# (Extraction Anchor Not Found), the extraction was unsuccessful.

Results of multiple extractions in the same macro are separated by a
[EXTRACT] tag in the !EXTRACT variable.

During manual replay of macros including
EXTRACT commands in the iMacros Browser, the extraction result is displayed in a dialog window by default. This behaviour can be controlled by setting the built-in !EXTRACT_TEST_POPUP variable.

Some Background on HTML and Extraction
HTML is the language in which web sites are coded. The language consists of so-called tags, which determine how elements are formated, displayed and aligned. Each HTML tag consists of two parts, an opening part and a closing part. All text between the opening and closing tags is affected by the directives the HTML tag implies. E.g. the following HTML snippet

This text is <B>bold</B>

will result in

This text is bold

i.e. the
B tag is used to format text in bold face.
When extracting text with iMacros, the following procedure is applied:
·iMacros searches the HTML source of the currently active webpage for an occurence of the extraction anchor  
·If the anchor is found, all text between the opening HTML tag of the anchor and its equivalent closing tag is extracted  
·If the anchor is not found, the result is #EANF#  

Create Extraction Command
To define an EXTRACT command, proceed as follows:

Open the Extraction Wizard (extract button on the Rec tab of the control panel).
Note: If the information you want to extract is inside a framed web site, you need to click inside the frame that contains the information you want to extract before opening the Extraction Wizard. This generates the
FRAME command and marks the frame as active for the extraction.
In the browser window or frame, select the text that you want to extract.
Click the suggest_tag button. The marked information will be displayed in th eyellowish textarea on the left. iMacros also creates a suggestion for the extraction anchor which is display in the orange text field on the right.
Click test_tag to test run the extraction tag. The result of the generated extraction anchor will then be displayed in the yellow text area on the right side of the wizard. If the result is
#EANF# (Extraction Anchor Not Found) you have to alter the extraction anchor in order to successfully extract data.
If you are satisfied with the result, click add_tag to add the
EXTRACT statement to the macro.

Save Extraction Result
There are two methods to retrieve extracted data.

SAVEAS
You can save extracted data directly to a file by adding a
SAVEAS TYPE=EXTRACT command manually to the macro. All items that were extracted before the SAVEAS command are saved to the specified file in one row like

"item1", "item2", "item 3", ...

As you can see, the
[EXTRACT] tags are substituted by commas. The SAVEAS command erases the content of the !EXTRACT variable afterwards. With the next start of the macro or the next round of a loop, a new line is added to the file.

iimGetLastExtract()
You can also use the
iimGetLastExtract() of the Scripting Interface to access the extracted data in your application. Potential [EXTRACT] tags are included in the returned string and can be used to separate different extraction results.

Unsuccessful Extraction
As said above, if the extraction was unsuccessful, i.e. the extraction anchor could not be found on the page, the
!EXTRACT variable holds the string #EANF# (Extraction Anchor Not Found). However, the return value that informs you whether the execution of a macro was successful, is still positive (usually 2). The reason for this behaviour is that a macro can have many EXTRACT commands and often only one or a few of them do not find the extraction anchor. If you want to check if a particular EXTRACT command was successful, you just need to check if #EANF# is present in the returned string. Often, this can be very useful, for example if you use EXTRACT to check if a keyword is present on a page. A returned string containing #EANF# indicates that the keyword is not found.

Extraction of Dialog Text
To get the text of a dialog, use

SET !EXTRACTDIALOG YES

in the macro. Now, the content of a dialog is added to the extracted text, i.e. to the
!EXTRACT variable.

Extracting From
SELECT Elements
In HTML code, drop down lists are generated by a
SELECT tag. For SELECT boxes, the currently active value is extracted. If you want to select all values of a drop down list, manually add #ALL# before the extraction anchor:
Select currently active values:

EXTRACT POS=1 TYPE=TXT ATTR=<SELECT<SP>size=1<SP>name=main>*

Select all values in a list:

EXTRACT POS=1 TYPE=TXT  ATTR=#ALL#<SELECT<SP>size=1<SP>name=main>*

Extraction and the
PRE Tag
Some web pages make use of a
<PRE ...> tag in their HTML code. It marks the enclosed text as preformatted -- all the spaces and carriage returns are rendered exactly as you type them. The information enclosed in a <PRE> tag is extracted correctly (including the formatting!) by iMacros. Thus if you transfer the extracted data via the Scripting Interface all formatting information is retained unchanged. The formatting is only changed on two occasions: Line breaks are removed when displaying the result in the test dialog box and when saving the result using the SAVEAS command. This is necessary to ensure proper formatting of the CSV formatted text file because in the CSV format, a line break would start a new line.

Trouble Shooting
Sometimes iMacros cannot suggest a proper extraction anchor automatically. In this case you can create one manually, enter it in the orange text area on the right side of the extraction wizard and test it with the test_tag button. Please read all the information in this Chapter to get a good overview over how the
EXTRACT command can be tweaked manually.







Page URL http://www.iopus.com/imacros/help/extract_single_items.htm