com.sun.portal.providers.urlscraper
Class URLScraperProvider

java.lang.Object
  extended bycom.sun.portal.providers.ProviderAdapter
      extended bycom.sun.portal.providers.ProfileProviderAdapter
          extended bycom.sun.portal.providers.urlscraper.URLScraperProvider
All Implemented Interfaces:
Provider, ProviderEditTypes, com.sun.portal.providers.util.ProviderProperties, ProviderWidths
Direct Known Subclasses:
XMLProvider

public class URLScraperProvider
extends ProfileProviderAdapter
implements com.sun.portal.providers.util.ProviderProperties

A URLScraperProvider is a content provider that can retrieve and display content from a given URL.

URLScraperProvider acts as an HTTP client and makes a request for the content of the specified URL and then displays it in the channel.

Each URLScraper channel has its own timeout attribute. The channel will wait up to its individual timeout to receive content.

Forwarding of cookies
Each URLScraper channel has a cookiesToForwardList attribute that can be set on the in the display profile. If a cookie is allowed by this attribute, a cookie in the request coming from the browser will be forwarded to the web server specified for the URL. allCookies attribute can be set to true to allow all the cookies. A set-cookie request from that web server will be sent back to the browser. The set-cookie request is modified so that the cookie is only sent back to the portal server.

URL Rewriting
The content gathered by the channel will be rewritten if the rewriter is available. The ruleset used by the rewriter can be specified in the display profile attribute rulesetID. Relative URLs are converted to absolute URLs. For example, if your portal server is http://portal.iplanet.com/ and the web server specified in the URL is http://foo.sesta.com/ and the file contains

<IMG SRC="/images/blah.gif">

then the content sent back to browser via portal server will be rewritten as:

<IMG SRC="http://foo.sesta.com/images/blah.gif">

Because otherwise the browser will attempt to read the image from http://portal.sesta.com/images/blah.gif and will not resolve it.

SSL protected pages
In general the URLScraperProvider will work with SSL pages. The important thing to remember is that there can be no level of interaction required by the specified URL as there is no way to pass that information to the end user.

Timeouts
There are 2 timeout values to consider:

Each URLScraper channel has its own timeout attribute. The channel will wait up to its individual timeout to receive content.

Encoding
The order for determining the encoding would be HTTP header, if available (only applies to http(s) urls)
inputEncoding property, if non-blank
tag in content, e.g. meta tag in html & wml, xml header for xml, if available (only applies to HTML, XML,WML determined based on the MIMEType)
system default
MIMEType is determined from the jvm table. If not set, it is determined from the file extension.

Proxy Configuration
URLScraper channel uses a proxy to scrape the url specified if the proxy is set in jvm12.conf file for web server For Example the proxy can be set as
http.proxyHost=
http.proxyPort=

The refreshTime attribute is used for caching and will cause the URL not to be fetched again if the page is reloaded within that time.


Field Summary
protected static String[][] typeTable
          Array of File extensions mapped to the MIMETypes
 
Fields inherited from interface com.sun.portal.providers.util.ProviderProperties
ACTIVE_BULLET_IMAGE, ARRANGE_PROVIDER_JS, ATTACH_IMAGE, BANNER, BANNER_TEMPLATE, BANNER_TEMPLATE_NOCONTEXT, BARE_PROVIDER_WRAPPER_TEMPLATE, BG_COLOR, BGCOLOR, BORDER_COLOR, BORDER_SIZE, BORDER_WIDTH, BORDERLESS_CHANNELS, BRAND_BG_COLOR, BRAND_IMAGE, BRAND_IMAGE_BG_COLOR, BRAND_IMAGE_WIDTH, BRAND_IMAGE2, BRAND_IMAGE2_BG_COLOR, BULLET_COLOR, BULLET_COLOR_JS, CHANNEL_HIGHLIGHT_COLOR, CHANNEL_LINK_COLOR, CHANNELS_BACKGROUND_COLOR, CHANNELS_COLUMN, CHANNELS_HAS_FRAME, CHANNELS_IS_DETACHABLE, CHANNELS_IS_DETACHED, CHANNELS_IS_MAXIMIZABLE, CHANNELS_IS_MINIMIZABLE, CHANNELS_IS_MINIMIZED, CHANNELS_IS_MOVABLE, CHANNELS_IS_REMOVABLE, CHANNELS_ROW, CHANNELS_WIDTH, CONSUME_EVENT_LIST, CONTENT, CONTENT_BAR_IN_CONTENT, CONTENT_BAR_IN_CONTENT_TEMPLATE, CONTENT_BAR_IN_LAYOUT, CONTENT_BAR_IN_LAYOUT_TEMPLATE, CONTENT_LAYOUT, CONTENT_LAYOUT_LINK_COLOR, CONTENT_LAYOUT_TEMPLATE, CONTENT_LAYOUT_TEXT, CONTENT_TEMPLATE, DEFAULT_BORDERLESS_CHANNEL, DEFAULT_CHANNEL_COLUMN, DEFAULT_CHANNEL_HAS_FRAME, DEFAULT_CHANNEL_IS_DETACHABLE, DEFAULT_CHANNEL_IS_DETACHED, DEFAULT_CHANNEL_IS_MAXIMIZABLE, DEFAULT_CHANNEL_IS_MINIMIZABLE, DEFAULT_CHANNEL_IS_MINIMIZED, DEFAULT_CHANNEL_IS_MOVABLE, DEFAULT_CHANNEL_IS_REMOVABLE, DEFAULT_CHANNEL_ROW, DEFAULT_CHANNEL_WIDTH, DESKTOP_URL, DETACH_IMAGE, EDIT_CONTAINER_NAME, EDIT_IMAGE, EDIT_PROVIDER_TEMPLATE, EDIT_TEMPLATE, EMPTY_PROVIDER_CONTENT, ERR_MESSAGE, ERROR_TEMPLATE, ERROR_TEMPLATE_NOCONTEXT, EVENT_PORTLET_MAP, FONT_COLOR, FONT_FACE, FONT_FACE1, FONT_SIZE, FRONT_CONTAINER_NAME, FULLWIDTH_POPUP_HEIGHT, FULLWIDTH_POPUP_WIDTH, GENERATE_EVENT_LIST, GLOBAL_PORTLET_LIST, HAS_FRAME, HEADER_BG_COLOR, HEADER_FONT_COLOR, HEADER_TEXT, HELP_ICON, HELP_IMAGE, HELP_LINK, HELP_TAG, HELP_URL, HELP_URLS, INACTIVE_BULLET_IMAGE, INLINE_ERROR, INLINE_ERROR_TEMPLATE, LAST_CHANNEL_NAME, LAUNCH_POPUP, LAUNCH_POPUP_JS, LAYOUT, LAYOUT_FULL_BOTTOM_TEMPLATE, LAYOUT_FULL_TOP_TEMPLATE, LAYOUT1_TEMPLATE, LAYOUT2_TEMPLATE, LAYOUT3_TEMPLATE, LAYOUT4_TEMPLATE, LINK_SEPARATOR_COLOR, LOCALE_STRING, LOGOUT_URL, MAXIMIZE_IMAGE, MAXIMIZED_CHANNEL, MAXIMIZED_TEMPLATE, MENUBAR, MENUBAR_TEMPLATE, MINIMIZE_IMAGE, MINIMIZED_TEMPLATE, NORMALIZE_IMAGE, OPENURL_INPARENT_JS, OPTIONS_TEMPLATE, OVERLOAD_TEMPLATE, PARALLEL_CHANNELS_INIT, PARENT_CONTAINER_NAME, PARENT_TAB_CONTAINER, PERFORM_COLUMN_SUBSTITUTION_JS, PERFORM_SUBSTITUTION_JS, POPUP_MENUBAR_TEMPLATE, POPUP_TEMPLATE, PRODUCT_NAME, PROVIDER_CMDS, PROVIDER_NAME, PROVIDER_TITLE, PROVIDER_WRAPPER_TEMPLATE, REFRESH_PARENT_CONTAINER_ONLY, REMOVE_IMAGE, REMOVE_PROVIDER_JS, S_ATTACH_IMAGE, S_BRAND_IMAGE, S_BRAND_IMAGE2, S_DETACH_IMAGE, S_EDIT_IMAGE, S_HELP_IMAGE, S_MAXIMIZE_IMAGE, S_MINIMIZE_IMAGE, S_NORMALIZE_IMAGE, S_REMOVE_IMAGE, SELECT_ALL_JS, SELECTED_TAB_NAME, SIZE, STACK_TRACE, STATIC_CONTENT, SWITCH_COLUMNS_JS, TAB_COLOR, TAB_FONT_COLOR, TAB_NOTCH_IMAGE, TAB_PORTLET_LIST, TABLE_BG_COLOR, THEME_CHANNEL, THICK_POPUP_HEIGHT, THICK_POPUP_WIDTH, THIN_POPUP_HEIGHT, THIN_POPUP_WIDTH, TIMEOUT, TITLE, TITLE_BAR_COLOR, TITLE_FONT_COLOR, TITLE_TEXT, TOOLBAR_ROLLOVER, TOOLBAR_ROLLOVER_JS, USER_TEMPLATE
 
Fields inherited from interface com.sun.portal.providers.ProviderWidths
WIDTH_FULL_BOTTOM, WIDTH_FULL_TOP, WIDTH_THICK, WIDTH_THIN
 
Fields inherited from interface com.sun.portal.providers.ProviderEditTypes
EDIT_COMPLETE, EDIT_SUBSET
 
Constructor Summary
URLScraperProvider()
          Default constructor.
 
Method Summary
 StringBuffer getContent(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res)
          Get the provider's content by retrieving content from specified URL.
protected  boolean getCookiesToForwardAll()
           
protected  List getcookiesToForwardList()
           
 StringBuffer getEdit(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res)
          Calls the getEdit(Map) method in this object to provide backwards compatibility.
protected  File getFile(String pathname)
          This method is called by getContent() if the url returned by getURL() is a file url.
protected  StringBuffer getFileAsBuffer(String pathName)
          Gets the specified file as StringBuffer
protected  String getFormData()
           
protected  String getHttpAuthPassword()
           
protected  String getHttpAuthUid()
           
protected  StringBuffer getHttpContent(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res, String url)
          Get the provider's content by retrieving content from the specified http or https URL.
protected  StringBuffer getHttpContent(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res, String url, boolean ubt)
          Get the provider's content by retrieving content from the specified http or https URL.
 String getInputEncoding()
           Gets the inputEncoding to be used by content.
protected  String getLoginFormData()
           
protected  String getLoginUrl()
           
protected  String getLogoutUrl()
           
protected  String getRuleSetID()
           Gets the urlScraperRulesetID to be used by rewriter.
protected  int getTimeout()
          Gets the timeout property for the provider.
protected  String getURL()
           Gets the url property for the provider.
protected  boolean isHttpAuth()
           
 boolean isPresentable(javax.servlet.http.HttpServletRequest request)
          Determines presentability for channels based on this provider.
 URL processEdit(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res)
          Calls the processEdit(Map) method in this object to provide backwards compatibility.
 
Methods inherited from class com.sun.portal.providers.ProfileProviderAdapter
existsBooleanProperty, existsIntegerProperty, existsListProperty, existsListProperty, existsStringProperty, existsStringProperty, getBooleanProperty, getBooleanProperty, getBooleanProperty, getBooleanProperty, getClientProperty, getIntegerProperty, getIntegerProperty, getIntegerProperty, getIntegerProperty, getListProperty, getListProperty, getMapProperty, getMapProperty, getMapProperty, getMapProperty, getMapProperty, getMapProperty, getStringAttribute, getStringProperty, getStringProperty, getStringProperty, getStringProperty, getStringProperty, getStringProperty, getTemplate, getTemplate, getTemplatePath, isAllowed, setBooleanProperty, setClientProperty, setIntegerProperty, setListProperty, setMapProperty, setStringAttribute, setStringProperty
 
Methods inherited from class com.sun.portal.providers.ProviderAdapter
getContent, getDescription, getEdit, getEditType, getHelp, getHelp, getName, getProviderContext, getRefreshTime, getResourceBundle, getResourceBundle, getTitle, getWidth, init, isEditable, isPresentable, processEdit
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

typeTable

protected static String[][] typeTable
Array of File extensions mapped to the MIMETypes

Constructor Detail

URLScraperProvider

public URLScraperProvider()
Default constructor.

Method Detail

getTimeout

protected int getTimeout()
                  throws ProviderException
Gets the timeout property for the provider.

Returns:
timeout value
Throws:
ProviderException - if there is an error getting the timeout property.
See Also:
ProviderException

getURL

protected String getURL()
                 throws ProviderException

Gets the url property for the provider. This is the URL from where the contents are fetched

Returns:
URL value
Throws:
ProviderException - if there is an error getting the URL property.
See Also:
ProviderException

getRuleSetID

protected String getRuleSetID()
                       throws ProviderException

Gets the urlScraperRulesetID to be used by rewriter.

Returns:
String value
Throws:
ProviderException - if there is an error getting the urlScrapperRulesetID.
See Also:
ProviderException

getInputEncoding

public String getInputEncoding()
                        throws ProviderException

Gets the inputEncoding to be used by content. This method returns the inputEncoding which would be used in encoding the scraped content.

Returns:
String value
Throws:
ProviderException - if there is an error getting the input encoding.
See Also:
ProviderException

isPresentable

public boolean isPresentable(javax.servlet.http.HttpServletRequest request)
Determines presentability for channels based on this provider. This overrides the base class's implementation to returns true for all device

Specified by:
isPresentable in interface Provider
Overrides:
isPresentable in class ProviderAdapter
Parameters:
request - the HttpServletRequest
Returns:
boolean true for all devices
See Also:
Provider.isPresentable(javax.servlet.http.HttpServletRequest)

getContent

public StringBuffer getContent(javax.servlet.http.HttpServletRequest req,
                               javax.servlet.http.HttpServletResponse res)
                        throws ProviderException

Get the provider's content by retrieving content from specified URL. This method internally calls getHttpContent when the url returned from getURL() is a http or https url. This method wraps certain exceptions thrown, into an error message to display as the channel content.

Specified by:
getContent in interface Provider
Overrides:
getContent in class ProviderAdapter
Parameters:
req - An HttpServletRequest that contains information related to this request for content.
res - An HttpServletResponse that allows the provider to influence the overall response for the desktop page (besides generating the content).
Returns:
Channel content
Throws:
ProviderException - if there was an error generating the content.
See Also:
ProviderException, getHttpContent(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String), getURL()

getHttpContent

protected StringBuffer getHttpContent(javax.servlet.http.HttpServletRequest req,
                                      javax.servlet.http.HttpServletResponse res,
                                      String url)
                               throws InterruptedException,
                                      MalformedURLException,
                                      ProviderException

Get the provider's content by retrieving content from the specified http or https URL.

This method does not handle file URLs. It only handles http or https urls. The content scraped from the specified url is rewritten if a rewriter is available using the ruleset returned by getRuleSetID()

This method throws exceptions for certain exceptional conditions instead of returning an error message in the returned StringBuffer

Parameters:
req - An HttpServletRequest that contains information related to this request for content.
res - An HttpServletResponse that allows the provider to influence the overall response for the desktop page (besides generating the content).
url - http or https url string
Returns:
Scraped content
Throws:
InterruptedException - if there is a timeout while trying to get the scraped content
MalformedURLException - if the url passed in is not a valid http or https url.
ProviderException - if there was an error generating the content
See Also:
ProviderException, getRuleSetID()

getHttpContent

protected StringBuffer getHttpContent(javax.servlet.http.HttpServletRequest req,
                                      javax.servlet.http.HttpServletResponse res,
                                      String url,
                                      boolean ubt)
                               throws InterruptedException,
                                      MalformedURLException,
                                      ProviderException

Get the provider's content by retrieving content from the specified http or https URL.

This method does not handle file URLs. It only handles http or https urls. The content scraped from the specified url is rewritten if a rewriter is available using the ruleset returned by getRuleSetID()

This method throws exceptions for certain exceptional conditions instead of returning an error message in the returned StringBuffer

Parameters:
req - An HttpServletRequest that contains information related to this request for content.
res - An HttpServletResponse that allows the provider to influence the overall response for the desktop page (besides generating the content).
url - http or https url string
ubt - Indicates whether to track links external to portal
Returns:
Scraped content
Throws:
InterruptedException - if there is a timeout while trying to get the scraped content
MalformedURLException - if the url passed in is not a valid http or https url.
ProviderException - if there was an error generating the content
See Also:
ProviderException, getRuleSetID()

getFile

protected File getFile(String pathname)
This method is called by getContent() if the url returned by getURL() is a file url.

Returns:
File Object specified by the pathName or null if the file does not exists or cannot be read.

getFileAsBuffer

protected StringBuffer getFileAsBuffer(String pathName)
                                throws IOException,
                                       ProviderException
Gets the specified file as StringBuffer

Returns:
StringBuffer containing the data from the specified file or null if file does not exist or cannot be read.
Throws:
IOException
ProviderException - if there is an error getting the file as StringBuffer.
See Also:
ProviderException

getCookiesToForwardAll

protected boolean getCookiesToForwardAll()
                                  throws ProviderException
Throws:
ProviderException

getcookiesToForwardList

protected List getcookiesToForwardList()
                                throws ProviderException
Throws:
ProviderException

isHttpAuth

protected boolean isHttpAuth()
                      throws ProviderException
Throws:
ProviderException

getHttpAuthUid

protected String getHttpAuthUid()
                         throws ProviderException
Throws:
ProviderException

getHttpAuthPassword

protected String getHttpAuthPassword()
                              throws ProviderException
Throws:
ProviderException

getLoginUrl

protected String getLoginUrl()
                      throws ProviderException
Throws:
ProviderException

getLogoutUrl

protected String getLogoutUrl()
                       throws ProviderException
Throws:
ProviderException

getLoginFormData

protected String getLoginFormData()
                           throws ProviderException
Throws:
ProviderException

getFormData

protected String getFormData()
                      throws ProviderException
Throws:
ProviderException

getEdit

public StringBuffer getEdit(javax.servlet.http.HttpServletRequest req,
                            javax.servlet.http.HttpServletResponse res)
                     throws ProviderException
Description copied from class: ProviderAdapter
Calls the getEdit(Map) method in this object to provide backwards compatibility. The implementation of this method provides backwards compatibility for providers that only implement the deprecated getEdit(Map) method. It logs a warning informing the administrator that calling this method has performance implications, and that it should be re-implemented using the non-deprecated version of this method. Each time this method is called, the HTTP parameter data in the request object must be converted to the Map form that is accepted by the getEdit(Map) version of this method.

Specified by:
getEdit in interface Provider
Overrides:
getEdit in class ProviderAdapter
Throws:
ProviderException

processEdit

public URL processEdit(javax.servlet.http.HttpServletRequest req,
                       javax.servlet.http.HttpServletResponse res)
                throws ProviderException
Description copied from class: ProviderAdapter
Calls the processEdit(Map) method in this object to provide backwards compatibility.

The implementation of this method provides backwards compatibility for providers that only implement the deprecated processEdit(Map) method. It logs a warning informing the administrator that calling this method has performance implications, and that it should be re-implemented using the non-deprecated version of this method.

Each time this method is called, the HTTP parameter data in the request object must be converted to the Map form that is accepted by the processEdit(Map) version of this method.

Specified by:
processEdit in interface Provider
Overrides:
processEdit in class ProviderAdapter
Throws:
ProviderException