|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.sun.portal.providers.ProviderAdapter
com.sun.portal.providers.ProfileProviderAdapter
com.sun.portal.providers.urlscraper.URLScraperProvider
A URLScraperProvider is a content provider that can retrieve and display content from a given URL.
URLScraperProvider acts as an HTTP client and makes a request for the content of the specified URL and then displays it in the channel.
Each URLScraper channel has its own timeout attribute. The channel will wait up to its individual timeout to receive content.
Forwarding of cookies
Each URLScraper channel has a cookiesToForwardList
attribute
that can be set on the in the display profile. If
a cookie is allowed by this attribute, a cookie in the request
coming from the browser will be forwarded to the web server specified
for the URL. allCookies
attribute can be set to true to allow
all the cookies. A set-cookie
request from that web server
will be sent back to the browser. The set-cookie
request
is modified so that the cookie is only sent back to the portal server.
URL Rewriting
The content gathered by the channel will be rewritten if
the rewriter is available. The ruleset used by the rewriter can be
specified in the display profile attribute rulesetID.
Relative URLs are converted to absolute URLs. For example, if your portal server is
http://portal.iplanet.com/
and the web server specified in the
URL is http://foo.sesta.com/
and the file contains
<IMG SRC="/images/blah.gif">
then the content sent back to browser via portal server will be
rewritten as:
<IMG SRC="http://foo.sesta.com/images/blah.gif">
Because otherwise the browser will attempt to read the image from
http://portal.sesta.com/images/blah.gif
and will not resolve it.
SSL protected pages
In general the URLScraperProvider will work with SSL pages. The
important thing to remember is that there can be no level of
interaction required by the specified URL as there is no way to
pass that information to the end user.
Timeouts
There are 2 timeout values to consider:
Encoding
The order for determining the encoding would be
HTTP header, if available (only applies to http(s) urls)
inputEncoding property, if non-blank
tag in content, e.g. meta tag in html & wml, xml header for xml, if available
(only applies to HTML, XML,WML determined based on the MIMEType)
system default
MIMEType is determined from the jvm table. If not set, it is determined
from the file extension.
Proxy Configuration
URLScraper channel uses a proxy to scrape the url specified
if the proxy is set in jvm12.conf file for web server
For Example the proxy can be set as
http.proxyHost=
http.proxyPort=
The refreshTime
attribute is used for caching and
will cause the URL not to be fetched again if the page is reloaded
within that time.
NOTE: getEdit()
and processEdit()
methods are not implemented in URLScraper.
Field Summary | |
protected static java.lang.String[][] |
typeTable
Array of File extensions mapped to the MIMETypes |
Fields inherited from interface com.sun.portal.providers.ProviderWidths |
WIDTH_FULL_BOTTOM, WIDTH_FULL_TOP, WIDTH_THICK, WIDTH_THIN |
Fields inherited from interface com.sun.portal.providers.ProviderEditTypes |
EDIT_COMPLETE, EDIT_SUBSET |
Constructor Summary | |
URLScraperProvider()
Default constructor. |
Method Summary | |
protected boolean |
forward(java.lang.String cookieName,
boolean allCookies,
java.util.List cookiesToForwardList)
This method returns true if allCookies property is true otherwise checks if the cookie name exists in the cookiesToForward list and returns true if it does or false if it doesn't. |
java.lang.StringBuffer |
getContent(javax.servlet.http.HttpServletRequest req,
javax.servlet.http.HttpServletResponse res)
Get the provider's content by retrieving content from specified URL. |
protected java.lang.String |
getContentEncoding(java.lang.String contentType,
byte[] bytes,
java.lang.String MIMEType)
Gets the charset |
protected java.lang.String |
getContentEncodingFromContentBytes(byte[] contentBytes)
Gets the charset from content |
protected java.io.File |
getFile(java.lang.String pathname)
This method is called by getContent() if the url
returned by getURL() is a file url. |
protected java.lang.StringBuffer |
getFileAsBuffer(java.lang.String pathName)
Gets the specified file as StringBuffer |
protected java.lang.StringBuffer |
getHttpContent(javax.servlet.http.HttpServletRequest req,
javax.servlet.http.HttpServletResponse res,
java.lang.String url)
Get the provider's content by retrieving content from the specified http or https URL. |
java.lang.String |
getInputEncoding()
Gets the inputEncoding to be used by content. |
protected java.lang.String |
getRuleSetID()
Gets the urlScraperRulesetID to be used by rewriter. |
protected int |
getTimeout()
Gets the timeout property for the provider. |
protected java.lang.String |
getURL()
Gets the url property for the provider. |
boolean |
isPresentable(javax.servlet.http.HttpServletRequest request)
Determines presentability for channels based on this provider. |
Methods inherited from class com.sun.portal.providers.ProviderAdapter |
getContent, getDescription, getEdit, getEdit, getEditType, getHelp, getHelp, getName, getProviderContext, getRefreshTime, getResourceBundle, getResourceBundle, getTitle, getWidth, init, isEditable, isPresentable, processEdit, processEdit |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected static java.lang.String[][] typeTable
Constructor Detail |
public URLScraperProvider()
Method Detail |
protected int getTimeout() throws ProviderException
ProviderException
- if there is an error getting the timeout
property.ProviderException
protected java.lang.String getURL() throws ProviderException
Gets the url property for the provider. This is the URL from where the contents are fetched
ProviderException
- if there is an error getting the URL
property.ProviderException
protected java.lang.String getRuleSetID() throws ProviderException
Gets the urlScraperRulesetID to be used by rewriter.
ProviderException
- if there is an error getting the
urlScrapperRulesetID.ProviderException
protected boolean forward(java.lang.String cookieName, boolean allCookies, java.util.List cookiesToForwardList)
This method returns true if allCookies property is true otherwise checks if the cookie name exists in the cookiesToForward list and returns true if it does or false if it doesn't.
allCookies
- allCookies property value from display profilecookiesToForwardList
- cookiesToForwardList property value from display profile
public java.lang.String getInputEncoding() throws ProviderException
Gets the inputEncoding to be used by content. This method returns the inputEncoding which would be used in encoding the scraped content.
ProviderException
- if there is an error getting the
input encoding.ProviderException
public boolean isPresentable(javax.servlet.http.HttpServletRequest request)
isPresentable
in interface Provider
isPresentable
in class ProviderAdapter
request
- the HttpServletRequest
Provider.isPresentable(javax.servlet.http.HttpServletRequest)
public java.lang.StringBuffer getContent(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res) throws ProviderException
Get the provider's content by retrieving content from specified
URL.
This method internally calls getHttpContent
when the url
returned from getURL()
is a http or https url.
This method wraps certain exceptions thrown, into an error message to
display as the channel content.
getContent
in interface Provider
getContent
in class ProviderAdapter
req
- An HttpServletRequest that contains information related
to this request for content.res
- An HttpServletResponse that allows the provider to
influence the overall response for the desktop page
(besides generating the content).
ProviderException
- if there was an error generating the
content.ProviderException
,
getHttpContent(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String)
,
getURL()
protected java.lang.StringBuffer getHttpContent(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res, java.lang.String url) throws java.lang.InterruptedException, java.net.MalformedURLException, ProviderException
Get the provider's content by retrieving content from the specified http or https URL.
This method does not handle file URLs. It only handles http or https urls.
The content scraped from the specified url is rewritten if a rewriter is
available using the ruleset returned by getRuleSetID()
This method throws exceptions for certain exceptional conditions instead
of returning an error message in the returned StringBuffer
req
- An HttpServletRequest that contains information related
to this request for content.res
- An HttpServletResponse that allows the provider to
influence the overall response for the desktop page
(besides generating the content).url
- http or https url string
java.lang.InterruptedException
- if there is a timeout while
trying to get the scraped content
java.net.MalformedURLException
- if the url passed in is not a valid
http or https url.
ProviderException
- if there was an error generating the
contentProviderException
,
getRuleSetID()
protected java.io.File getFile(java.lang.String pathname)
getContent()
if the url
returned by getURL()
is a file url.
protected java.lang.StringBuffer getFileAsBuffer(java.lang.String pathName) throws java.io.IOException, ProviderException
java.io.IOException
ProviderException
- if there is an error getting the file
as StringBuffer.ProviderException
protected java.lang.String getContentEncoding(java.lang.String contentType, byte[] bytes, java.lang.String MIMEType) throws ProviderException
This method determines the charset based on the contentType header if it is available (only applies to http(s) urls), or from the inputEncoding property if it is non-blank, or from the meta tag in content, e.g. meta tag in html, xml or wml header if they are available (only applies to HTML, XML, WML).
contentType
- If http(s) urls, null otherwisebytes
- Bytes from the scraped contentMIMEType
- MIMEType for the content
ProviderException
- if there is an error getting the charsetProviderException
protected java.lang.String getContentEncodingFromContentBytes(byte[] contentBytes)
This method determines the charset based on meta tag in content
contentBytes
- Bytes from the scraped content
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |