How To Programmatically Use the HTML Filter DLL to Save Word Documents as Plain HTML (291325)



The information in this article applies to:

  • Microsoft Word 2000

This article was previously published under Q291325

SUMMARY

Microsoft provides the Office 2000 HTML Filter 2.0 as a free download from the OfficeUpdate page on the following Microsoft Web site at: This add-in to Microsoft Word 2000 allows users to save the currently open Word document as plain HTML without the XML data islands that are used by Word for "round tripping." It does this by saving the current file out as a standard Office HTML file, and then removing the XML by using a special DLL (MSFilter.dll) that is installed by the add-in.

This article demonstrates how you can call this DLL from Visual Basic for Applications (VBA) so that you can programmatically produce plain HTML files in Word 2000 without using the user interface (UI )dialog boxes provided by the add-in.

MORE INFORMATION

Microsoft provides programming examples for illustration only, without warranty either expressed or implied, including, but not limited to, the implied warranties of merchantability and/or fitness for a particular purpose. This article assumes that you are familiar with the programming language being demonstrated and the tools used to create and debug procedures. Microsoft support professionals can help explain the functionality of a particular procedure, but they will not modify these examples to provide added functionality or construct procedures to meet your specific needs. If you have limited programming experience, you may want to contact a Microsoft Certified Partner or the Microsoft fee-based consulting line at (800) 936-5200. For more information about Microsoft Certified Partners, please visit the following Microsoft Web site: For more information about the support options that are available and about how to contact Microsoft, visit the following Microsoft Web site:

Calling MSFilter from VBA

To remove the XML data islands from an HTML file saved by Word, you can programmatically call the MSPeelerMain function exported from the MSFilter.dll (which is copied to the System directory when the add-in is installed). The prototype for the function looks like the following:
Function MSPeelerMain (ByVal sHtmlFile As String, ByVal sCmdOptions As String) As Integer
				
The first parameter, sHtmlFile, is the HTML file from which you want to remove Office XML data, and the second parameter contains command options that specify which items you want removed. The possible values for sCmdOptions are as follows:
  OPTIONS:
  -a      - keep standard @rule constructs (@font-face, @page)
  -b      - do not create backup copies
  -c      - remove all standard CSS properties
  -f      - overwrite without prompting when output-file already exists
  -l      - remove LANG attributes
  -m      - do not output the Generator and Originator META tags
  -o      - keep Microsoft Office native markup
  -r      - track rate of reduction
  -s      - remove the STYLE element
  -t      - remove non-essential linked files from the thicket
  -v      - keep VML, remove static images
  -x      - export a CSS stylesheet (.css) based on <sHtmlFile>  
				
To use a default set of options, just pass "-" as the second string.

The following Visual Basic code saves the current Word document as native HTML, and then uses the Filter DLL to remove the XML markup:
Private Declare Function MSPeelerMain Lib "msfilter.dll" _
  (ByVal sHtmlFile As String, ByVal sCmdOptions As String) As Integer
  
' This is the output file for the example; change as needed.
Private Const c_sOutputFile As String = "C:\MyNewHTMLFile.htm"

Public Sub SaveAsSimpleHTML( sDocFile As String )
    Dim wdApp As Object
    Dim wdDoc As Object
    
    On Error GoTo Err_Routine
    
    'Launch Microsoft Word and open a test document.
    Set wdApp = CreateObject("Word.Application")
    Set wdDoc = wdApp.Documents.Open( sDocFile )
    
    ' Save the test document as HTML (includes XML data islands).
    wdDoc.SaveAs c_sOutputFile, 8 '(wdFormatHTML)
    
    'Close the test document and quit Microsoft Word.
    'NOTE: The document must be closed before the XML can be removed
    'because Word maintains an exclusive file lock.
    wdDoc.Close
    wdApp.Quit
    
    'Make sure Word has time to shut down properly.
    DoEvents

    'If the filter is installed, remove the extra XML.
    MSPeelerMain c_sOutputFile, "-tfrb"
    
    MsgBox "The current file was saved as plain HTML." & _
        vbCrLf & "Location: " & c_sOutputFile
        
Exit Sub
    
Err_Routine:
    If Err.Number = 53 Then
     ' File not found would be if the DLL could not be found, which
     ' would mean that the Filter is not installed on this computer.
        MsgBox "The HTML Filter 2.0 DLL is not installed."
        Err.Clear
        Resume Next
    Else
        MsgBox "An error occurred: " & Str(Err.Number) & _
            " - " & Err.Description, vbCritical
    End If
End Sub
				

Modification Type:MajorLast Reviewed:6/23/2005
Keywords:kbdownload kbdownload kbAutomation kbhowto KB291325