INFO: Improving the Performance of Video Drivers (247058)



The information in this article applies to:

  • Microsoft Windows CE 2.12 Embedded ToolKit (ETK)
  • Microsoft Windows CE 2.11 Embedded ToolKit (ETK)
  • Microsoft Windows CE 2.1 Embedded ToolKit (ETK)
  • Microsoft Windows CE Operating System, Versions 3.0
  • Microsoft Windows CE Operating System, Versions 2.12
  • Microsoft Windows CE Operating System, Versions 2.11
  • Microsoft Windows CE Operating System, Versions 2.0
  • Microsoft Windows CE Platform Builder 2.11
  • Microsoft Windows CE Platform Builder 2.12
  • Microsoft Windows CE Platform Builder 3.0 (BETA)

This article was previously published under Q247058

SUMMARY

Original equipment manufacturers (OEMs) should pay particular attention to graphics performance. Some customers do not implement software-accelerated graphics functions that can improve performance. Incorporating these functions may reduce read-only memory (ROM) by as much as of 10 KB. However, the benefits, which include snappy screen painting and the overall appearance of faster performance, outweigh the problems.

Adaptation kits come with a generic graphics emulation (GPE) library that supports all bit depths from 1 to 32 bits per pixel (BPP), and therefore are not optimized for any particular display. The performance for your particular display probably is suboptimal if you rely solely on this library. Significant performance gains can be achieved by replacing frequently used functions in the GPE library with your own optimized emulation functions. Microsoft provides sample code for 2 BPP, 8 BPP, and 16 BPP optimized emulation functions.

MORE INFORMATION

Please refer to the following sections for more details on how to improve the performance of your video driver:

Supporting Bit Block Transfers in Windows CE Display Drivers

Beginning with Windows CE 2.0, a display driver can make use of up to three levels of BLT processing:
  • BLT emulation provided by GPE (the default).
  • Software accelerations provided by the emulation library or by the embedded developer.
  • Hardware accelerations supported by the native display device hardware.
The display driver chooses one of these methods for each BLT it performs.

The following sections briefly introduce the GPE and emulation software libraries. Sample source code from the Embedded Toolkit (ETK) demonstrates how to choose the desired method of BLT processing in the display driver. Build environment directives may be used to control the inclusion of code for accelerations. These directives are introduced along with an explanation of how to include the GPE and emulation libraries in the Windows CE operating system image. The functions of the emulation library are listed along with their associated ROPs, as well as source and destination pixel depths.

The GPE Library for Default BLT Processing

GPE is the Graphics Primitive Engine class library. In the Embedded Toolkit, the library is provided in binary form and may be found in the Public\Common\Oak\Lib folder. The class library serves as the basis for Windows CE display drivers. GPE provides default processing for BLTs in its EmulatedBlt function.

The Emulation Library for Software-Accelerated BLTs

Windows CE provides an emulation library (as sample code) for software-accelerated BLTs. In the Embedded Toolkit, the library can be found in the Public\Common\Oak\Drivers\Display\Emul folder. The library contains emulated BLT functions for BLTs at various destination pixel depths, including 2, 8, and 16. The emulated BLTs can be used to improve performance over and above the default BLT processing that is provided in GPE. The embedded developer can use this source code as a template for writing more software-accelerated BLTs, as appropriate for the target hardware and software.

Handling BLTs and Accelerations in a Display Driver

In a Windows CE display driver, the driver routes BLT handling to either GPE (the default), the emulation library, or directly to the hardware. The S3Trio64 display driver demonstrates how all three of these methods are invoked. The sample code that is included here can also be found in the Embedded Toolkit in the Platform\Cepc\Drivers\Display\S3trio64 folder. BLT processing begins with a call to the driver's BltPrepare function. It initializes the BLT parameters and determines which function will be used to perform the individual BLTs. Typically, GPE is initialized to handle default BLT processing.

Sample Code

SCODE S3Trio64::BltPrepare(
	GPEBltParms *pBltParms )
{
			// Debug messages and optional timing processing.

	pBltParms->pBlt = EmulatedBlt; // Catch all.
				
For improved performance, BltPrepare can examine the characteristics of the BLT and the associated surfaces to determine whether an accelerated form of the BLT can be used. After initializing the default handler, the display driver may include support for hardware or software accelerations. The driver may encapsulate this support within a #IFDEF block. The ENABLE_ACCELERATION directive is generally used to specify inclusion of the code for hardware accelerations. The ENABLE_EMULATION directive is generally used to specify inclusion of the code for software-accelerated emulation.

After setting the default handler, the S3Trio64 driver dispatches BLTs to hardware acceleration, if available, and then to software-accelerated emulation, if available.

Sample Code

#ifdef ENABLE_ACCELERATION

#define FUNCNAME(basename) (SCODE (GPE::*)(struct GPEBltParms *))Emulator::Emulated##basename

	if ( pBltParms->pDst->InVideoMemory() ) {
		switch( pBltParms->rop4 )
				
The driver defines a macro that simplifies constructing calls to functions in the emulation library. The driver next checks whether the destination surface is in video memory. This is a very important check that most display drivers must make before using hardware acceleration. The issue is that on Windows CE, the display driver is used by GDI to render printer output as well as display output. The printer output is rendered in system memory, not in video memory. Most display hardware can only perform accelerated drawing when the destination surface is in video memory (for BLTs that don't take a source) or when both source and destination surfaces are in video memory (for BLTs that do take a source). If the hardware has this limitation, the driver must check the destination (or source and destination) before calling the acceleration.

As shown below, the driver evaluates the ROP code and directs the BLT to a supported hardware acceleration, when available. The ROP for SRCCOPY illustrates how the driver checks for a source surface in video memory before calling the SRCCOPY hardware acceleration.

Sample Code

switch( pBltParms->rop4 )
		{
			case 0x0000:	// BLACKNESS
				SelectSolidColor( 0 );
				pBltParms->pBlt = (SCODE (GPE::*)
(struct GPEBltParms *)) AcceleratedFillRect;
				break;
			case 0xFFFF:	// WHITENESS
				SelectSolidColor( 0x00ffffff );
				pBltParms->pBlt = (SCODE (GPE::*)
(struct GPEBltParms *))AcceleratedFillRect;
				break;
			case 0xF0F0:	// PATCOPY
				if( pBltParms->solidColor == -1 )
				{
					// Don't accelerate pattern fills.
				}
				else
				{
					// Solid brush.
				SelectSolidColor( pBltParms->solidColor );
				pBltParms->pBlt = (SCODE (GPE::*)
(struct GPEBltParms *)) AcceleratedFillRect;
				}
				break;
			case 0xCCCC:	// SRCCOPY
				if (pBltParms->pLookup || 
pBltParms->pConvert )
					// Can't accelerate color. translations
break; 
				if( pBltParms->bltFlags & BLT_STRETCH )
				{
					// Stretch Blt.
					break;
				}
			// Can't accelerate if src not video memory.				if( pBltParms->pSrc->InVideoMemory() ) 	
					pBltParms->pBlt = (SCODE (GPE::*) 
(struct GPEBltParms *)) AcceleratedSrcCopyBlt;
				break;
		}
				
The S3Trio64 display driver also demonstrates how to use the emulated BLT library. After setting the default BLT handler and handling hardware accelerations, the driver checks for possible software accelerations. When available, it sets the BLT function pointer to the corresponding function in the emulation library. The S3Trio64 driver encapsulates within a #IFDEF FB16BPP block all BLT handling where the destination is a 16-BPP surface. The driver attempts to accelerate a mask PATCOPY text BLT. Before doing so, it checks to see that the BLT is using a solid brush. In the case of a mask PATCOPY text BLT with a 1-BPP mask, the driver dispatches the BLT to the emulation library's 16-BPP Text function. When the mask is a 4-BPP mask, it dispatches the BLT to the emulation library's 16-BPP alpha Text function.

Sample Code

if (pBltParms->pBlt == EmulatedBlt) {
#ifdef FB16BPP
		switch (pBltParms->rop4)
		{
			case 0xAAF0:    
// Special PATCOPY rop for text output -- fill where mask is set.
				// Not a pattern brush?
				if( (pBltParms->solidColor != -1) &&
					(pBltParms->pDst->Format() == gpe16Bpp) )
				{
					WaitForNotBusy();
					if (pBltParms->pMask->Format() == gpe1Bpp)
					{
						pBltParms->pBlt = 
FUNCNAME(BltText16);
						return S_OK;
					}
					else   // Anti-aliased text.
					{
						pBltParms->pBlt = 
FUNCNAME(BltAlphaText16);
						return S_OK;
					}
				}
				break;
			default:  // Come random ROP4.
		;
	}
				
Before invoking a BLT function in the emulation library, the driver must always check for conditions that might prevent the BLT from using emulation. This is critical because the functions in the emulation library assume that the driver has validated the BLT before calling an emulation function. In the sample code, the driver checks for a solid brush. That's because pattern brushes aren't supported in the emulation library. There are other cases that the emulation library cannot handle, including color conversions, bit depth conversions, stretching, and transparency. As shown in the following, the driver checks these restrictions:

Sample Code

#else // !FB16BPP
		// 
		// Bail on any parameter values that might make this
		// blt not able to be emulatabled.
		// 
		if ( EGPEFormatToBpp[pBltParms->pDst->Format()] != 8        ||
			 (pBltParms->bltFlags & (BLT_TRANSPARENT | BLT_STRETCH)) ||
			 pBltParms->pLookup                                     ||  
			 pBltParms->pConvert)
		{
			return S_OK;
		}
				
After validating that emulation can handle the BLT, the display driver examines the ROPs, dispatching supported ROPs to the appropriate emulation function. For simplicity, the driver specifies the ROP4 value. For masking BLTs, the entire WORD is used; whereas, for nonmasking BLTs, the least significant byte is used (the ROP3 value). ROPs are defined in Windows as a DWORD. In Windows CE display drivers, only the most significant WORD of the ROP is used. The lower WORD (ROP compiler directives) is ignored.

Sample Code

switch (pBltParms->rop4)
		{
			case 0x0000:    // BLACKNESS
				pBltParms->solidColor = 0;
				pBltParms->pBlt = FUNCNAME(BltFill08);
				return S_OK;
			case 0xFFFF:    // WHITENESS
				pBltParms->solidColor = 0x00ffffff;
				pBltParms->pBlt = FUNCNAME(BltFill08);
				return S_OK;
case 0xF0F0:    // PATCOPY
				if( pBltParms->solidColor != -1 )
				{
					pBltParms->pBlt = FUNCNAME(BltFill08);
					return S_OK;
				}
				break;
				
The following sample function demonstrates how to do 8-BPP to 16-BPP blits conversions with a SRCCOPY ROP, with transparency and with no stretching. It should be evident how to modify this code to not do transparency, or to use a different ROP or different bit depths. You can hook this blit with the following check in BltPrepare:

Sample Code


    	if ( (pBltParms->rop4 == 0xCCCC)
    		&& (pBltParms->pLookup ) 
    		&& !(pBltParms->bltFlags & BLT_STRETCH) 
            && (pBltParms->pSrc->Format() == gpe8Bpp) 
            && (pBltParms->pDst->Format() == gpe16Bpp) )
        {
    		pBltParms->pBlt = (SCODE (GPE::*)(struct GPEBltParms *))EmulatedBltSrcCpy08to16;
    		return S_OK;
        }
                                               .
                                               .
                                               .

SCODE TRIDENT::EmulatedBltSrcCpy08to16( GPEBltParms *pParms )
{
	DEBUGMSG(GPE_ZONE_BLT_HI,(TEXT("EmulatedBltSrcCpy08to16\r\n")));

    UINT32      iDstScanStride  = pParms->pDst->Stride();
    BYTE       *pDibBitsDst     = (BYTE *)pParms->pDst->Buffer();
    UINT32      iSrcScanStride  = pParms->pSrc->Stride();
    BYTE       *pDibBitsSrc     = (BYTE *)pParms->pSrc->Buffer();
    PRECTL      prcSrc          = pParms->prclSrc;
    PRECTL      prcDst          = pParms->prclDst;
    int         iNumDstRows;
    int         iNumDstCols;
    BYTE       *pbSrcScanLine;
    BYTE       *pbDstScanLine;
    BYTE       *pbSrc;
    WORD       *pwDstPixel;
    int         i;
    int         j;

    // Caller assures a well-ordered, non-empty rect.
    // Compute size of destination rect.
    iNumDstCols = prcDst->right  - prcDst->left;
    iNumDstRows = prcDst->bottom - prcDst->top;

    // Compute pointers to the starting rows in the src and dst bitmaps.
    pbSrcScanLine = pDibBitsSrc + prcSrc->top * iSrcScanStride + prcSrc->left;
    pbDstScanLine = pDibBitsDst + prcDst->top * iDstScanStride + prcDst->left * 2;
    
    if (pParms->bltFlags & BLT_TRANSPARENT)
    {
        for (i = 0; i < iNumDstRows; i++) 
        {
            // Set up pointers to first bytes on src and dst scanlines.
            pbSrc = pbSrcScanLine;
            pwDstPixel = (WORD *)pbDstScanLine;

            for (j = 0; j < iNumDstCols; j++ ) 
            {
                if (*pbSrc != (BYTE)pParms->solidColor)
                {
                    *pwDstPixel = (WORD)pParms->pLookup[*pbSrc];
                }
                *pwDstPixel++;
                *pbSrc++;
            }

            // Advance to next scanline.
            pbSrcScanLine += iSrcScanStride;
            pbDstScanLine += iDstScanStride;
        }
    }
    else
    {
        for (i = 0; i < iNumDstRows; i++) 
        {
            // Set up pointers to first bytes on src and dst scanlines.
            pbSrc = pbSrcScanLine;
            pwDstPixel = (WORD *)pbDstScanLine;

            for (j = 0; j < iNumDstCols; j++ ) 
            {
                *pwDstPixel++ = (WORD)pParms->pLookup[*pbSrc++];
            }

            // Advance to next scanline.
            pbSrcScanLine += iSrcScanStride;
            pbDstScanLine += iDstScanStride;
        }
    }

	return S_OK;
}

				

More Samples of Drivers That Use Accelerations

Other display drivers that are provided in the Embedded Toolkit demonstrate how to invoke hardware accelerations and make use of the emulation library. For hardware accelerations, see the S3Virge driver in the Platform\Cepc\Drivers\Display\S3virge folder. For hardware accelerations and use of the emulation library, see the Citizen driver in the Platform\Odo\Drivers\Display\Citizen folder.

How to Include Emulation and GPE Libraries in an Operating System Image

The emulated BLT functions are compiled, then linked into a single library, Emul.lib. The display driver links to the emulation library through a link directive in the driver's SOURCES file. For example, the SOURCES file for the S3Trio64 display driver includes the Emul.lib file in its list of TARGETLIBS. This will cause it to link with Emul.lib. The GPE Library is provided in binary form in the Embedded Toolkit, and is similarly included in the driver's SOURCES file.

The S3Trio64 display driver is the default display driver for CEPC. The Platform.bib file in the Platform\Cepc\Files folder directs the ROMIMAGE tool to include the driver in the operating system image. Other CEPC display drivers may replace S364Trio. See the Platform.bib file for the list of environment variables that may be set to direct the ROMIMAGE tool to use a different display driver.

The ODO 2BPP driver is the default display driver for ODO. The Platform.bib file in the Platform\Odo\Files folder lists the environment variables that may be set to direct ROMIMAGE to use a different display driver.

Function Listing of Emulation Library

File Name         Function                    ROP   Src Bit   Target     Description
                                                    Depth     Bit Depth 
======================================================================================
Ebalph02.cpp	EmulatedBltAlphaText02	AAF0		02	Special-case fast Blt function for rendering anti-aliased text. Assumes mask surface contains the 4BPP alpha bitmap for the glyph.
Ebalph16.cpp	EmulatedBltAlphaText16	AAF0		16	Special-case fast Blt function for rendering anti-aliased text. Assumes mask surface contains the 4BPP alpha bitmap for the glyph.
Ebcopy02.cpp	EmulatedBltSrcCopy0202	CCCC	02	02	Implements BitBlt(SRCCOPY)
Ebcopy08.cpp	EmulatedBltSrcCopy0808	CCCC	08	08	Implements BitBlt(SRCCOPY)
Ebcopy16.cpp	EmulatedBltSrcCopy1616	CCCC	16	16	Implements BitBlt(SRCCOPY)
Ebdinv02.cpp	EmulatedBltDstInvert02	5555		02	Implements PatBlt(DSTINVERT)
Ebdinv08.cpp	EmulatedBltDstInvert08	5555		08	Implements PatBlt(DSTINVERT)
Ebfill02.cpp	EmulatedBltFill02       0000, FFFF, F0F0      02	Implements PatBlt(PATCOPY) for ROP F0F0, PatBlt(BLACKNESS) for ROP 0000, PatBlt(WHITENESS) for ROP FFFF
Ebfill08.cpp	EmulatedBltFill08       0000, FFFF, F0F0      08	Implements PatBlt(PATCOPY) for ROP F0F0, PatBlt(BLACKNESS) for ROP 0000, PatBlt(WHITENESS) for ROP FFFF
Ebfill16.cpp	EmulatedBltFill16       0000, FFFF, F0F0      16	Implements PatBlt(PATCOPY) for ROP F0F0, PatBlt(BLACKNESS) for ROP 0000, PatBlt(WHITENESS) for ROP FFFF
Ebpinv02.cpp	EmulatedBltPatInvert02	5A5A		02	Implements PatBlt(PATINVERT)
Ebpinv08.cpp	EmulatedBltPatInvert08	5A5A		08	Implements PatBlt(PATINVERT)
Ebsand02.cpp	EmulatedBltSrcAnd0202	8888	02	02	Implements BitBlt(SRCAND)
Ebsand08.cpp	EmulatedBltSrcAnd0808	8888	08	08	Implements BitBlt(SRCAND)
Ebsand16.cpp	EmulatedBltSrcAnd1616	8888	16	16	Implements BitBlt(SRCAND)
Ebsinv02.cpp	EmulatedBltSrcInvert0202	6666	02	02	Implements BitBlt(SRCINVERT)
Ebsinv08.cpp	EmulatedBltSrcInvert0808	6666	08	08	Implements BitBlt(SRCINVERT)
Ebsinv16.cpp	EmulatedBltSrcInvert1616	6666	16	16	Implements BitBlt(SRCINVERT)
Ebspnt02.cpp	EmulatedBltSrcPaint0202	EEEE	02	02	Implements BitBlt(SRCPAINT)
Ebspnt08.cpp	EmulatedBltSrcPaint0808	EEEE	08	08	Implements BitBlt(SRCPAINT)
Ebspnt16.cpp	EmulatedBltSrcPaint1616	EEEE	16	16	Implements BitBlt(SRCPAINT)
Ebtext02.cpp	EmulatedBltText02          AAF0               02       Special-case fast Blt function for rendering text, i.e., the ROP 0xAAF0 (solid-color fill with a mask)
Ebtext08.cpp	EmulatedBltText08          AAF0               08	Special-case fast Blt function for rendering text, i.e., the ROP 0xAAF0 (solid-color fill with a mask)
Ebtext16.cpp	EmulatedBltText16          AAF0               16	Special-case fast Blt function for rendering text, i.e., the ROP 0xAAF0 (solid-color fill with a mask)
				

Supporting Line Drawing in Windows CE Display Drivers

The previous sections describe the various levels of BLT processing that are available beginning in Windows CE 2.0. For the most part, all of the discussion is relevant for line drawing as well. The exception is that the emulation library that is provided in the Embedded Toolkit does not provide software-accelerated line drawing. The embedded developer can add functions to the library.

Handling Line Drawing and Accelerations in a Display Driver

In a Windows CE display driver, the driver routes line drawing to either GPE (the default) or directly to the hardware. The S3Trio64 display driver demonstrates how these methods are invoked:

Sample Code

SCODE S3Trio64::Line(
	GPELineParms *pLineParms,
	EGPEPhase phase )
{
#ifdef ENABLE_ACCELERATION
	if( phase == gpeSingle || phase == gpePrepare )
	{
		pLineParms->pLine = EmulatedLine;
		if( pLineParms->pDst->InVideoMemory() 
    && pLineParms->mix == 0x0d0d )
		{
			SelectSolidColor( pLineParms->solidColor );
			pLineParms->pLine = (SCODE (GPE::*)
(struct GPELineParms *))AcceleratedSolidLine;
		}
//		if( pLineParms->mix == 0x0B07 )	   .. dotted line

	}
#else
		pLineParms->pLine = EmulatedLine;
#endif
	return S_OK;
}
				
Line drawing begins with a call to the driver's Line function. Typically, GPE is initialized to handle default line drawing. For improved performance, Line can examine the characteristics of the line drawing and the associated surfaces to determine whether an accelerated form of line drawing can be used. The S3Trio64 driver encapsulates hardware accelerations within a #IFDEF ENABLE_ACCELERATION block. The driver checks for a destination surface in video memory. The driver checks the line drawing parameters. The 0x0D0D mix (R2_COPYPEN) is selected for hardware acceleration. The function pointer is set to the driver's AcceleratedSolidLine function.

Modification Type:MajorLast Reviewed:4/14/2004
Keywords:kbinfo KB247058