The XPATH configuration wizard is a tool that will assist you to specify how you want InterPublish to scan a site on the internet.
Based on your instructions the wizard will build an XML query which extracts the required data from the web page or pages. Even with the help of the wizard, this is a fairly technical area and most sites will need assistance from the vendor, at least in the first instance. Administrators who are using the wizard should be aware of the following pitfalls:
Index numbers
Be careful using index numbers (eg. a[1] where the index number is 1). These can make your queries more powerful, as you can choose the first A element as your title element, and the second A element as your URL element. However, if there are irregularities in how the A elements in a page are laid out you might sometimes miss a relevant A element if you are always searching for the first one. On the positive side, if there are irregularities sometimes using index numbers will help you capture only the relevant links. Use of index numbers should therefore be considered on a case-by-case basis.
Attributes
Sometimes you might find that your preview doesn't return any elements, even though it seems like it should. If this is the case you may need to check:
- that your Title and Url elements are set. Your preview won't return anything if the harvester cannot find a title and Url for the resource.
- whether the attributes in your XPATH queries might contain problematic numbers and words. Words like 'odd', 'even', 'first', or 'last' or words containing numbers can result in too few results being retrieved. This is because the query targets a very specific element instead of a broad range of elements on the page. Remove these words if your preview is incomplete, and try again. If problems persist, you may need to change the format of the attributes like so: li[contains(@class,"views-row")].
Based on your instructions the wizard will build an XML query which extracts the required data from the web page or pages. Even with the help of the wizard, this is a fairly technical area and most sites will need assistance from the vendor, at least in the first instance. Administrators who are using the wizard should be aware of the following pitfalls:
Index numbers
Be careful using index numbers (eg. a[1] where the index number is 1). These can make your queries more powerful, as you can choose the first A element as your title ...