Sometimes you have line breaks in extracted information inflicted by the HTML tag <BR> like so:
<table>
<tr>
<td>
John Smith<br>Main Street<br>Arlington
</td>
</tr>
</table>
On the web page, this might look like:
John Smith
Main Street
Arlington
You can only create one extraction tag for this line, i.e.
EXTRACT POS=1 TYPE=TXT ATTR=<TD>*
So you can not create a separate extraction commands for name, street and city because these informations are not enclosed by separate opening and closing HTML tags. With the default TYPE=TXT extraction, all parameters would be extracted into one line and are difficult to separate. The result would be
John Smith Main Street Arlington
To work around this problem, use the TYPE=HTM extraction. It preserves all HTML tags inside the text so that the extraction result is:
John Smith<br>Main Street<br>Arlington This result can be further processed and split with any programming or scripting language using the Scripting Interface. For example, in Visual Basic Script you can use the Split function:
MyArray = Split(extracted_string, "<br>")
MyArray will now have three elements, John Smith, Main Street and Arlington.