After several months of hard work we are very proud of the releases of the last days: We started an official beta release of FPDI and the FPDI PDF-Parser in version 2 - a rewrite from scratch backed up by PHP and PDF development experiences of more than a decade.
Feedback or issues are welcome on all available channels:
We also released new versions of all SetaPDF components.
A highlight of this release is a new extraction strategy of the SetaPDF-Extractor component which allows PHP developers to extract groups of words which are related to each other, such as words in a column or paragraph. You can see this strategy in action e.g. here.
Beside this feature this release comes with several bug fixes, tweaks and also some changes, as you will see in the release notes below.
This release is also prepared for the upcoming PHP version 7.2. All components were successfully tested on the latest PHP 7.2 beta version.
Check the release notes of the components below.
Log in to download the latest version of the related packages!
Added ensure() method to the reader interface and abstract class, which will ensure content of a specific length and a specific location.
Change
Changed SetaPDF_Core_Parser_CrossReferenceTable::isCompressed() method to return an integer which represents its compression method. The possible methods are available as class constants COMPRESSED_*.
Renamed SetaPDF_Core_Document_CrossReferenceTable::getCompressed() to getCompressedStream().
Added SetaPDF_Core_ColorSpace_DeviceN::getAlternateColorSpace() and marked getAlternateSpace() as deprecated.
Handle numeric values as integers or floats instead of only floats.
Bugfix
Ignore custom non-string values in Info dictionary when synchronizing with XMP data package.
Fixed SetaPDF_Core_Page::toXObject() method if rotation and gradients were in use.
Fixed handling of LZW filter if abbreviation is used.
Fixed calculation of avarange width in Type0 fonts if no W array is available.
Fallback in SetaPDF_Core_Font_Simple::getAvgWidth() to "missing width" when no glyph width is available.
Tweak
Optimized resolving of terminal fields in AcroForm if Kids entries refer to the same object several times.
Added resolving of form fields without widget annotation.
Added several not implemented functionalities in Type3 font class.
Implemented logic to resolve font bounding box from glyph procedures in Type3 fonts.
Added support for CMap tables without a "begincmap" token.
Fixed reading of corrupted documents when SetaPDF_Core_Parser_CrossReferenceTable::$readOnAccess was set to false.
Optimized resolving of width and height of an XObject if it has a Matrix value.
Fixed behaviour of stream reader if stream wrappers are in use which read limit is 8096 bytes.
Allow creation of SetaPDF_Core_Type_IndirectReference instances with SetaPDF_Core_Type_IndirectObjectInterface instead of only SetaPDF_Core_Type_IndirectObject.
Improved performance of SetaPDF_Core_Image_Png::_extractAlphaChannel for toXObject() method.
Code clean up in various classes.
Fixed compatibility issues with PHP 7.2.
Moved token stack to tokenizer class.
Optimized resolving of pages in the page tree.
Added argument checks to formatFrom*() methods in SetaPDF_Core_BitConverter class.
Optimizations in tokenizer class.
Demo
Optimized demo which resolves color spaces (ignore image masks).
Use following minimal composer.json file to install a package through
Composer.
]
Information
SetaPDF-Extractor Component
Feature
Added words result class.
Added word groups result class.
Added new word group strategy which will keep related words together to allow extraction of e.g. columns. The strategy also will dehyphen grouped words.
Change
Renamed SetaPDF_Extractor_Result_Segment (currently only marked as deprecated) to SetaPDF_Extractor_Result_Collection.
Refactored SetaPDF_Extractor_Result_(Word|WordWithGlyphs) to support several word parts which also ends in removing the SetaPDF_Extractor_Result_CompareableInterface interface and its methods from it.
SetaPDF_Extractor_Strategy_Glyph::getResult() will return an instance of SetaPDF_Extractor_Result_Collection now.
SetaPDF_Extractor_Strategy_Word::getResult() will return an instance of SetaPDF_Extractor_Result_Words now.
Removed $cleanUp parameter in all getResult() methods of all strategies (only needed in PHP 5.3).
Bugfix
Handle pages and XObjects without a Resources dictionary.
Handle division by zero in Word strategy.
Fixed handling of Rectangle filter if page is rotated and its origin is shifted through its page boundary.
Fixed division by zero for invisible glyphs with a width of zero.
Tweak
Removed unused logic from SetaPDF_Extractor_Result_Bounds::getRectangle().
Added fallback calculation of font bounding box for fonts with an empty bounding box.
Optimized resolving of space width in fallback if font bounding box is empty.
Fixed compatibility issues with PHP 7.2.
Demo
Added demo for word group strategy.
Removed obsolete demo "ExtractSegments.php" due to word group strategy demo.
SetaPDF-Core Component
Feature
Added collidable interface and functionality for geometric classes (point, rectangle).
Added ensure() method to the reader interface and abstract class, which will ensure content of a specific length and a specific location.
Change
Changed SetaPDF_Core_Parser_CrossReferenceTable::isCompressed() method to return an integer which represents its compression method. The possible methods are available as class constants COMPRESSED_*.
Renamed SetaPDF_Core_Document_CrossReferenceTable::getCompressed() to getCompressedStream().
Added SetaPDF_Core_ColorSpace_DeviceN::getAlternateColorSpace() and marked getAlternateSpace() as deprecated.
Handle numeric values as integers or floats instead of only floats.
Bugfix
Ignore custom non-string values in Info dictionary when synchronizing with XMP data package.
Fixed SetaPDF_Core_Page::toXObject() method if rotation and gradients were in use.
Fixed handling of LZW filter if abbreviation is used.
Fixed calculation of avarange width in Type0 fonts if no W array is available.
Fallback in SetaPDF_Core_Font_Simple::getAvgWidth() to "missing width" when no glyph width is available.
Tweak
Optimized resolving of terminal fields in AcroForm if Kids entries refer to the same object several times.
Added resolving of form fields without widget annotation.
Added several not implemented functionalities in Type3 font class.
Implemented logic to resolve font bounding box from glyph procedures in Type3 fonts.
Added support for CMap tables without a "begincmap" token.
Fixed reading of corrupted documents when SetaPDF_Core_Parser_CrossReferenceTable::$readOnAccess was set to false.
Optimized resolving of width and height of an XObject if it has a Matrix value.
Fixed behaviour of stream reader if stream wrappers are in use which read limit is 8096 bytes.
Allow creation of SetaPDF_Core_Type_IndirectReference instances with SetaPDF_Core_Type_IndirectObjectInterface instead of only SetaPDF_Core_Type_IndirectObject.
Improved performance of SetaPDF_Core_Image_Png::_extractAlphaChannel for toXObject() method.
Code clean up in various classes.
Fixed compatibility issues with PHP 7.2.
Moved token stack to tokenizer class.
Optimized resolving of pages in the page tree.
Added argument checks to formatFrom*() methods in SetaPDF_Core_BitConverter class.
Optimizations in tokenizer class.
Demo
Optimized demo which resolves color spaces (ignore image masks).
Added ensure() method to the reader interface and abstract class, which will ensure content of a specific length and a specific location.
Change
Changed SetaPDF_Core_Parser_CrossReferenceTable::isCompressed() method to return an integer which represents its compression method. The possible methods are available as class constants COMPRESSED_*.
Renamed SetaPDF_Core_Document_CrossReferenceTable::getCompressed() to getCompressedStream().
Added SetaPDF_Core_ColorSpace_DeviceN::getAlternateColorSpace() and marked getAlternateSpace() as deprecated.
Handle numeric values as integers or floats instead of only floats.
Bugfix
Ignore custom non-string values in Info dictionary when synchronizing with XMP data package.
Fixed SetaPDF_Core_Page::toXObject() method if rotation and gradients were in use.
Fixed handling of LZW filter if abbreviation is used.
Fixed calculation of avarange width in Type0 fonts if no W array is available.
Fallback in SetaPDF_Core_Font_Simple::getAvgWidth() to "missing width" when no glyph width is available.
Tweak
Optimized resolving of terminal fields in AcroForm if Kids entries refer to the same object several times.
Added resolving of form fields without widget annotation.
Added several not implemented functionalities in Type3 font class.
Implemented logic to resolve font bounding box from glyph procedures in Type3 fonts.
Added support for CMap tables without a "begincmap" token.
Fixed reading of corrupted documents when SetaPDF_Core_Parser_CrossReferenceTable::$readOnAccess was set to false.
Optimized resolving of width and height of an XObject if it has a Matrix value.
Fixed behaviour of stream reader if stream wrappers are in use which read limit is 8096 bytes.
Allow creation of SetaPDF_Core_Type_IndirectReference instances with SetaPDF_Core_Type_IndirectObjectInterface instead of only SetaPDF_Core_Type_IndirectObject.
Improved performance of SetaPDF_Core_Image_Png::_extractAlphaChannel for toXObject() method.
Code clean up in various classes.
Fixed compatibility issues with PHP 7.2.
Moved token stack to tokenizer class.
Optimized resolving of pages in the page tree.
Added argument checks to formatFrom*() methods in SetaPDF_Core_BitConverter class.
Optimizations in tokenizer class.
Demo
Optimized demo which resolves color spaces (ignore image masks).
Added ensure() method to the reader interface and abstract class, which will ensure content of a specific length and a specific location.
Change
Changed SetaPDF_Core_Parser_CrossReferenceTable::isCompressed() method to return an integer which represents its compression method. The possible methods are available as class constants COMPRESSED_*.
Renamed SetaPDF_Core_Document_CrossReferenceTable::getCompressed() to getCompressedStream().
Added SetaPDF_Core_ColorSpace_DeviceN::getAlternateColorSpace() and marked getAlternateSpace() as deprecated.
Handle numeric values as integers or floats instead of only floats.
Bugfix
Ignore custom non-string values in Info dictionary when synchronizing with XMP data package.
Fixed SetaPDF_Core_Page::toXObject() method if rotation and gradients were in use.
Fixed handling of LZW filter if abbreviation is used.
Fixed calculation of avarange width in Type0 fonts if no W array is available.
Fallback in SetaPDF_Core_Font_Simple::getAvgWidth() to "missing width" when no glyph width is available.
Tweak
Optimized resolving of terminal fields in AcroForm if Kids entries refer to the same object several times.
Added resolving of form fields without widget annotation.
Added several not implemented functionalities in Type3 font class.
Implemented logic to resolve font bounding box from glyph procedures in Type3 fonts.
Added support for CMap tables without a "begincmap" token.
Fixed reading of corrupted documents when SetaPDF_Core_Parser_CrossReferenceTable::$readOnAccess was set to false.
Optimized resolving of width and height of an XObject if it has a Matrix value.
Fixed behaviour of stream reader if stream wrappers are in use which read limit is 8096 bytes.
Allow creation of SetaPDF_Core_Type_IndirectReference instances with SetaPDF_Core_Type_IndirectObjectInterface instead of only SetaPDF_Core_Type_IndirectObject.
Improved performance of SetaPDF_Core_Image_Png::_extractAlphaChannel for toXObject() method.
Code clean up in various classes.
Fixed compatibility issues with PHP 7.2.
Moved token stack to tokenizer class.
Optimized resolving of pages in the page tree.
Added argument checks to formatFrom*() methods in SetaPDF_Core_BitConverter class.
Optimizations in tokenizer class.
Demo
Optimized demo which resolves color spaces (ignore image masks).
Added ensure() method to the reader interface and abstract class, which will ensure content of a specific length and a specific location.
Change
Changed SetaPDF_Core_Parser_CrossReferenceTable::isCompressed() method to return an integer which represents its compression method. The possible methods are available as class constants COMPRESSED_*.
Renamed SetaPDF_Core_Document_CrossReferenceTable::getCompressed() to getCompressedStream().
Added SetaPDF_Core_ColorSpace_DeviceN::getAlternateColorSpace() and marked getAlternateSpace() as deprecated.
Handle numeric values as integers or floats instead of only floats.
Bugfix
Ignore custom non-string values in Info dictionary when synchronizing with XMP data package.
Fixed SetaPDF_Core_Page::toXObject() method if rotation and gradients were in use.
Fixed handling of LZW filter if abbreviation is used.
Fixed calculation of avarange width in Type0 fonts if no W array is available.
Fallback in SetaPDF_Core_Font_Simple::getAvgWidth() to "missing width" when no glyph width is available.
Tweak
Optimized resolving of terminal fields in AcroForm if Kids entries refer to the same object several times.
Added resolving of form fields without widget annotation.
Added several not implemented functionalities in Type3 font class.
Implemented logic to resolve font bounding box from glyph procedures in Type3 fonts.
Added support for CMap tables without a "begincmap" token.
Fixed reading of corrupted documents when SetaPDF_Core_Parser_CrossReferenceTable::$readOnAccess was set to false.
Optimized resolving of width and height of an XObject if it has a Matrix value.
Fixed behaviour of stream reader if stream wrappers are in use which read limit is 8096 bytes.
Allow creation of SetaPDF_Core_Type_IndirectReference instances with SetaPDF_Core_Type_IndirectObjectInterface instead of only SetaPDF_Core_Type_IndirectObject.
Improved performance of SetaPDF_Core_Image_Png::_extractAlphaChannel for toXObject() method.
Code clean up in various classes.
Fixed compatibility issues with PHP 7.2.
Moved token stack to tokenizer class.
Optimized resolving of pages in the page tree.
Added argument checks to formatFrom*() methods in SetaPDF_Core_BitConverter class.
Optimizations in tokenizer class.
Demo
Optimized demo which resolves color spaces (ignore image masks).
Added ensure() method to the reader interface and abstract class, which will ensure content of a specific length and a specific location.
Change
Changed SetaPDF_Core_Parser_CrossReferenceTable::isCompressed() method to return an integer which represents its compression method. The possible methods are available as class constants COMPRESSED_*.
Renamed SetaPDF_Core_Document_CrossReferenceTable::getCompressed() to getCompressedStream().
Added SetaPDF_Core_ColorSpace_DeviceN::getAlternateColorSpace() and marked getAlternateSpace() as deprecated.
Handle numeric values as integers or floats instead of only floats.
Bugfix
Ignore custom non-string values in Info dictionary when synchronizing with XMP data package.
Fixed SetaPDF_Core_Page::toXObject() method if rotation and gradients were in use.
Fixed handling of LZW filter if abbreviation is used.
Fixed calculation of avarange width in Type0 fonts if no W array is available.
Fallback in SetaPDF_Core_Font_Simple::getAvgWidth() to "missing width" when no glyph width is available.
Tweak
Optimized resolving of terminal fields in AcroForm if Kids entries refer to the same object several times.
Added resolving of form fields without widget annotation.
Added several not implemented functionalities in Type3 font class.
Implemented logic to resolve font bounding box from glyph procedures in Type3 fonts.
Added support for CMap tables without a "begincmap" token.
Fixed reading of corrupted documents when SetaPDF_Core_Parser_CrossReferenceTable::$readOnAccess was set to false.
Optimized resolving of width and height of an XObject if it has a Matrix value.
Fixed behaviour of stream reader if stream wrappers are in use which read limit is 8096 bytes.
Allow creation of SetaPDF_Core_Type_IndirectReference instances with SetaPDF_Core_Type_IndirectObjectInterface instead of only SetaPDF_Core_Type_IndirectObject.
Improved performance of SetaPDF_Core_Image_Png::_extractAlphaChannel for toXObject() method.
Code clean up in various classes.
Fixed compatibility issues with PHP 7.2.
Moved token stack to tokenizer class.
Optimized resolving of pages in the page tree.
Added argument checks to formatFrom*() methods in SetaPDF_Core_BitConverter class.
Optimizations in tokenizer class.
Demo
Optimized demo which resolves color spaces (ignore image masks).
Added ensure() method to the reader interface and abstract class, which will ensure content of a specific length and a specific location.
Change
Changed SetaPDF_Core_Parser_CrossReferenceTable::isCompressed() method to return an integer which represents its compression method. The possible methods are available as class constants COMPRESSED_*.
Renamed SetaPDF_Core_Document_CrossReferenceTable::getCompressed() to getCompressedStream().
Added SetaPDF_Core_ColorSpace_DeviceN::getAlternateColorSpace() and marked getAlternateSpace() as deprecated.
Handle numeric values as integers or floats instead of only floats.
Bugfix
Ignore custom non-string values in Info dictionary when synchronizing with XMP data package.
Fixed SetaPDF_Core_Page::toXObject() method if rotation and gradients were in use.
Fixed handling of LZW filter if abbreviation is used.
Fixed calculation of avarange width in Type0 fonts if no W array is available.
Fallback in SetaPDF_Core_Font_Simple::getAvgWidth() to "missing width" when no glyph width is available.
Tweak
Optimized resolving of terminal fields in AcroForm if Kids entries refer to the same object several times.
Added resolving of form fields without widget annotation.
Added several not implemented functionalities in Type3 font class.
Implemented logic to resolve font bounding box from glyph procedures in Type3 fonts.
Added support for CMap tables without a "begincmap" token.
Fixed reading of corrupted documents when SetaPDF_Core_Parser_CrossReferenceTable::$readOnAccess was set to false.
Optimized resolving of width and height of an XObject if it has a Matrix value.
Fixed behaviour of stream reader if stream wrappers are in use which read limit is 8096 bytes.
Allow creation of SetaPDF_Core_Type_IndirectReference instances with SetaPDF_Core_Type_IndirectObjectInterface instead of only SetaPDF_Core_Type_IndirectObject.
Improved performance of SetaPDF_Core_Image_Png::_extractAlphaChannel for toXObject() method.
Code clean up in various classes.
Fixed compatibility issues with PHP 7.2.
Moved token stack to tokenizer class.
Optimized resolving of pages in the page tree.
Added argument checks to formatFrom*() methods in SetaPDF_Core_BitConverter class.
Optimizations in tokenizer class.
Demo
Optimized demo which resolves color spaces (ignore image masks).
This website makes use of cookies to enhance browsing experience and provide additional functionality.
For more details please see our
Data Privacy Statement
(German).