SUMMARY
Microsoft provides the Office 2000 HTML Filter 2.0 as a free download from the OfficeUpdate page on the following Microsoft Web site at:
This add-in to Microsoft Word 2000 allows users to save the currently open Word document as plain HTML without the XML data islands that are used by Word for "round tripping." It does this by saving the current file out as a standard Office HTML file, and then removing the XML by using a special DLL (MSFilter.dll) that is installed by the add-in.
This article demonstrates how you can call this DLL from Visual Basic for Applications (VBA) so that you can programmatically produce plain HTML files in Word 2000 without using the user interface (UI )dialog boxes provided by the add-in.
MORE INFORMATION
Microsoft provides programming examples for illustration only, without warranty either
expressed or implied, including, but not limited to, the implied warranties of
merchantability and/or fitness for a particular purpose. This article assumes
that you are familiar with the programming language being demonstrated and the
tools used to create and debug procedures. Microsoft support professionals can
help explain the functionality of a particular procedure, but they will not
modify these examples to provide added functionality or construct procedures to
meet your specific needs. If you have limited programming experience, you may
want to contact a Microsoft Certified Partner or the Microsoft fee-based
consulting line at (800) 936-5200. For more information about Microsoft Certified
Partners, please visit the following Microsoft Web site:
For more information about the support options that are available and about how to contact Microsoft, visit the following Microsoft Web site:
Calling MSFilter from VBA
To remove the XML data islands from an HTML file saved by Word, you can programmatically call the
MSPeelerMain function exported from the MSFilter.dll (which is copied to the System directory when the add-in is installed). The prototype for the function looks like the following:
Function MSPeelerMain (ByVal sHtmlFile As String, ByVal sCmdOptions As String) As Integer
The first parameter,
sHtmlFile, is the HTML file from which you want to remove Office XML data, and the second parameter contains command options that specify which items you want removed. The possible values for
sCmdOptions are as follows:
OPTIONS:
-a - keep standard @rule constructs (@font-face, @page)
-b - do not create backup copies
-c - remove all standard CSS properties
-f - overwrite without prompting when output-file already exists
-l - remove LANG attributes
-m - do not output the Generator and Originator META tags
-o - keep Microsoft Office native markup
-r - track rate of reduction
-s - remove the STYLE element
-t - remove non-essential linked files from the thicket
-v - keep VML, remove static images
-x - export a CSS stylesheet (.css) based on <sHtmlFile>
To use a default set of options, just pass "-" as the second string.
The following Visual Basic code saves the current Word document as native HTML, and then uses the Filter DLL to remove the XML markup:
Private Declare Function MSPeelerMain Lib "msfilter.dll" _
(ByVal sHtmlFile As String, ByVal sCmdOptions As String) As Integer
' This is the output file for the example; change as needed.
Private Const c_sOutputFile As String = "C:\MyNewHTMLFile.htm"
Public Sub SaveAsSimpleHTML( sDocFile As String )
Dim wdApp As Object
Dim wdDoc As Object
On Error GoTo Err_Routine
'Launch Microsoft Word and open a test document.
Set wdApp = CreateObject("Word.Application")
Set wdDoc = wdApp.Documents.Open( sDocFile )
' Save the test document as HTML (includes XML data islands).
wdDoc.SaveAs c_sOutputFile, 8 '(wdFormatHTML)
'Close the test document and quit Microsoft Word.
'NOTE: The document must be closed before the XML can be removed
'because Word maintains an exclusive file lock.
wdDoc.Close
wdApp.Quit
'Make sure Word has time to shut down properly.
DoEvents
'If the filter is installed, remove the extra XML.
MSPeelerMain c_sOutputFile, "-tfrb"
MsgBox "The current file was saved as plain HTML." & _
vbCrLf & "Location: " & c_sOutputFile
Exit Sub
Err_Routine:
If Err.Number = 53 Then
' File not found would be if the DLL could not be found, which
' would mean that the Filter is not installed on this computer.
MsgBox "The HTML Filter 2.0 DLL is not installed."
Err.Clear
Resume Next
Else
MsgBox "An error occurred: " & Str(Err.Number) & _
" - " & Err.Description, vbCritical
End If
End Sub