Information about how to extract Office file formats and schemas (840817)



The information in this article applies to:

  • Microsoft Office Excel 2003
  • Microsoft Excel 2002
  • Microsoft Excel 2000
  • Microsoft Excel 97 for Windows
  • Microsoft Office PowerPoint 2003
  • Microsoft PowerPoint 2002
  • Microsoft PowerPoint 2000
  • Microsoft PowerPoint 97 for Windows
  • Microsoft Office Word 2003
  • Microsoft Word 2002
  • Microsoft Word 2000
  • Microsoft Word 97 for Windows

SUMMARY

If you have to extract file format or schema information for Microsoft Excel, Microsoft PowerPoint, or Microsoft Word, you can use several methods such as API programming calls, XML, RTF, or HTML. If these methods do not address your needs, you may be eligible to participate in a licensing program and receive technical documentation for certain Microsoft Office binary file formats.

INTRODUCTION

This article describes several techniques that are available for extracting file format and schema information for Excel, PowerPoint, and Word.

MORE INFORMATION

Office Application Programming Interfaces (APIs)

The Office binary file formats are designed to be accessed through the Office Application Programming Interfaces (APIs), instead of by direct manipulation of the format. Because of the complexity of the formats, direct manipulation can cause corruption and is strongly discouraged.

For additional information about the Office APIs, visit the following Microsoft Web site:The Office binary file formats use the Windows Structured Storage APIs. The Office-specific information is stored as streams in this more generalized format. Common elements, such as document properties, can be accessed through the Structured Storage APIs and do not require access to the Office binary file format documentation.

For additional information about the Windows Structured Storage APIs, visit the following Microsoft Web site:Important Reading or manipulating the structure directly can cause corruption and is strongly discouraged.

XML

XML is a plain-text, Unicode-based metalanguage (a language for defining markup languages). XML is not tied to any programming language, operating system, or software vendor. XML provides access to a plethora of technologies for manipulating, structuring, transforming, and querying data. As the use of XML has grown, it is now typically accepted that XML is not only useful for describing new document formats for the Web, but is also suitable to describe structured data. Examples of structured data include information that is typically contained in spreadsheets, program configuration files, and network protocols.

Microsoft Office includes support for XML schemas. Microsoft maintains a licensing program for certain Office XML schemas.

To learn more about Office XML schemas, visit the following Microsoft Web site to view the Microsoft Office System and XML: Bringing XML to the Desktop article:

To learn more about the licensing program for Office XML schemas, visit the following Microsoft Web site to view the File Format and Standards Licensing Programs article:

Rich Text Format (RTF)

The Rich Text Format (RTF) specification is a method of encoding formatted text and graphics for easy transfer between programs. The RTF specification provides a format for text and graphics interchange that can be used with different output devices, operating environments, and operating systems. RTF uses the American National Standards Institute (ANSI), PC-8, Macintosh, or IBM PC character set to control the representation and the formatting of a document, both on the screen and in print. With the RTF specification, documents that are created under different operating systems and that are created by using different software programs can be transferred between those operating systems and those programs.

For more information about how to write or how to implement a sample RTF reader, visit the following Microsoft Web site, and then type RTF Reader in the Search MSDN For box:

Visio XML schema

Through the Microsoft documentation and a royalty-free license, customers and partners can take advantage of the XML schema in its diagramming and data visualization tool. The availability of the Visio schema provides a complete and W3C-compliant description of the Visio Extensible Markup Language (XML) file format, enabling organizations to access information captured in their Visio diagrams and uses it with other XML-enabled programs, such as customer relationship management (CRM) and enterprise resource planning (ERP) systems, as part of their business processes. For more information and download capabilities, visit the following Microsoft Web site:

HTML

HTML files are text files that include the information that users will see, and tags that specify formatting information about how the information will be presented for display purposes. You can use HTML to store, distribute, and present Office documents and data in a format that can be viewed by using most Web browsers while retaining the rich content and functionality of Office documents.

For more information about how to edit HTML, visit the following Microsoft Web site:For more information about how to work with code, HTML, and resource files, visit the following Microsoft Web site:

Licensing programs

Qualified customers, partners, or government entities that verify the Office APIs and XML, RTF or HTML formats do not address their particular needs may apply to Microsoft to license technical documentation for certain Microsoft Office binary file formats for the following licensing programs:
  • Government License Program

    This program entitles bona fide governmental entity customers of Microsoft to license the Microsoft .doc, .xls, or .ppt file format documentation for certain internal, non-commercial uses.
  • Internal Usage License Program

    This program entitles qualified Microsoft customers to license the Microsoft .doc, .xls, or .ppt file format documentation for use in the development of internal-use software solutions that support the .doc, .xls, or .ppt file formats from Microsoft and to complement Microsoft Office.
  • ISV License Program

    This program entitles qualified software developers to license the Microsoft .doc, .xls, or .ppt file format documentation for use in the development of commercial software products and solutions that support the .doc, .xls, or .ppt file formats from Microsoft and to complement Microsoft Office.
If you verify that a license program applies to your need, contact Microsoft at the following e-mail address to initiate the license qualification and sign-up process: When you write to Microsoft, provide the following information:
  • The licensing program that you are interested in
  • Your company or agency name
  • Your mailing address
  • Your city
  • Your state or province
  • Your zip code or postal Code
  • Your country
  • A contact name
  • A contact title
  • A contact telephone number
  • A contact fax number

Modification Type:MajorLast Reviewed:7/28/2006
Keywords:kbinfo KB840817 kbAudDeveloper