importation upstream 4.27.0

This commit is contained in:
Jephté Clain 2024-11-27 13:43:12 +04:00
parent 86f34af6f6
commit 09540b1767
155 changed files with 14600 additions and 0 deletions

21
upstream-4.x/LICENSE Normal file
View File

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2022 openspout
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -0,0 +1,166 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS

38
upstream-4.x/README.md Normal file
View File

@ -0,0 +1,38 @@
# OpenSpout
[![Latest Stable Version](https://poser.pugx.org/openspout/openspout/v/stable)](https://packagist.org/packages/openspout/openspout)
[![Total Downloads](https://poser.pugx.org/openspout/openspout/downloads)](https://packagist.org/packages/openspout/openspout)
[![Build Status](https://github.com/openspout/openspout/actions/workflows/ci.yml/badge.svg)](https://github.com/openspout/openspout/actions/workflows/ci.yml)
[![Infection MSI](https://img.shields.io/endpoint?style=flat&url=https%3A%2F%2Fbadge-api.stryker-mutator.io%2Fgithub.com%2Fopenspout%2Fopenspout%2F4.x)](https://dashboard.stryker-mutator.io/reports/github.com/openspout/openspout/4.x)
OpenSpout is a community driven fork of `box/spout`, a PHP library to read and write spreadsheet files
(CSV, XLSX and ODS), in a fast and scalable way. Unlike other file readers or writers, it is capable of processing
very large files, while keeping the memory usage really low (less than 3MB).
## Documentation
Documentation can be found at [`docs/`](docs).
## Upgrade from `box/spout:v3` to `openspout/openspout:v3`
1. Replace `box/spout` with `openspout/openspout` in your `composer.json`
2. Replace `Box\Spout` with `OpenSpout` in your code
## Upgrade guide
Version 4 introduced new functionality but also some breaking changes. If you want to upgrade your OpenSpout codebase
please consult the [Upgrade guide](UPGRADE.md).
## Copyright and License
This is a fork of Box's Spout library: https://github.com/box/spout
Code until and directly descending from commit [`cc42c1d`](https://github.com/openspout/openspout/commit/cc42c1d29fc5d29f07caeace99bd29dbb6d7c2f8)
is copyright of _Box, Inc._ and licensed under the Apache License, Version 2.0:
https://github.com/openspout/openspout/blob/cc42c1d29fc5d29f07caeace99bd29dbb6d7c2f8/LICENSE
Code created, edited and released after the commit mentioned above
is copyright of _openspout_ Github organization and licensed under MIT License.
https://github.com/openspout/openspout/blob/main/LICENSE

162
upstream-4.x/UPGRADE.md Normal file
View File

@ -0,0 +1,162 @@
# Upgrade guide
## Upgrading from 3.x to 4.0
Beginning with v4, only actively supported [PHP version](https://www.php.net/supported-versions.php) will be supported.
Removing support for EOLed PHP versions as well adding support for new PHP versions will be included in MINOR releases.
### Most notable changes
1. OpenSpout is now fully typed
2. Classes and interfaces not consumed by the user are now marked as `@internal`
3. Classes used by the user are all `final`
### Reader & Writer objects
Both readers and writers have to be naturally instantiated with `new` keyword, passing the eventual needed `Options`
class as the first argument:
```php
use OpenSpout\Reader\CSV\Reader;
use OpenSpout\Reader\CSV\Options;
$options = new Options();
$options->FIELD_DELIMITER = '|';
$options->FIELD_ENCLOSURE = '@';
$reader = new Reader($options);
```
### Cell types on writes
Cell types are now handled with separate classes:
```php
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Row;
$row = new Row([
new Cell\BooleanCell(true),
new Cell\DateIntervalCell(new DateInterval('P1D')),
new Cell\DateTimeCell(new DateTimeImmutable('now')),
new Cell\EmptyCell(null),
new Cell\FormulaCell('=SUM(A1:A2)'),
new Cell\NumericCell(3),
new Cell\StringCell('foo'),
]);
```
Auto-typing is still available though:
```php
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Row;
$cell = Cell::fromValue(true); // Instance of Cell\BooleanCell
$row = Row::fromValues([
true,
new DateInterval('P1D'),
new DateTimeImmutable('now'),
null,
'=SUM(A1:A2)',
3,
'foo',
]);
```
## Upgrading from 2.x to 3.0
OpenSpout 3.0 introduced several backwards-incompatible changes. The upgrade from OpenSpout 2.x to 3.0 must therefore
be done with caution.
This guide is meant to ease this process.
### Most notable changes
In 2.x, styles were applied per row; it was therefore impossible to apply different styles to cells in the same row.
With the 3.0 version, this is now possible: each cell can have its own style.
OpenSpout 3.0 tries to enforce better typing. For instance, instead of using/returning generic arrays, OpenSpout now
makes use of specific `Row` and `Cell` objects that can encapsulate more data such as type, style, value.
Finally, **_OpenSpout 3.2 only supports PHP 7.2 and above_**, as other PHP versions are no longer supported by the
community.
### Reader changes
Creating a reader should now be done through the Reader `ReaderEntityFactory`, instead of using the `ReaderFactory`.
Also, the `ReaderFactory::create($type)` method was removed and replaced by methods for each reader:
```php
use OpenSpout\Reader\Common\Creator\ReaderEntityFactory; // namespace is no longer "OpenSpout\Reader"
$reader = ReaderEntityFactory::createXLSXReader(); // replaces ReaderFactory::create(Type::XLSX)
$reader = ReaderEntityFactory::createCSVReader(); // replaces ReaderFactory::create(Type::CSV)
$reader = ReaderEntityFactory::createODSReader(); // replaces ReaderFactory::create(Type::ODS)
```
When iterating over the spreadsheet rows, OpenSpout now returns `Row` objects, instead of an array containing row
values. Accessing the row values should now be done this way:
```php
foreach ($reader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $row) { // $row is a "Row" object, not an array
$rowAsArray = $row->toArray(); // this is the 2.x equivalent
// OR
$cellsArray = $row->getCells(); // this can be used to get access to cells' details
...
}
}
```
### Writer changes
Writer creation follows the same change as the reader. It should now be done through the Writer `WriterEntityFactory`,
instead of using the `WriterFactory`.
Also, the `WriterFactory::create($type)` method was removed and replaced by methods for each writer:
```php
use OpenSpout\Writer\Common\Creator\WriterEntityFactory; // namespace is no longer "OpenSpout\Writer"
$writer = WriterEntityFactory::createXLSXWriter(); // replaces WriterFactory::create(Type::XLSX)
$writer = WriterEntityFactory::createCSVWriter(); // replaces WriterFactory::create(Type::CSV)
$writer = WriterEntityFactory::createODSWriter(); // replaces WriterFactory::create(Type::ODS)
```
Adding rows is also done differently: instead of passing an array, the writer now takes in a `Row` object (or an
array of `Row`). Creating such objects can easily be done this way:
```php
// Adding a row from an array of values (2.x equivalent)
$cellValues = ['foo', 12345];
$row1 = WriterEntityFactory::createRowFromArray($cellValues, $rowStyle);
// Adding a row from an array of Cell
$cell1 = WriterEntityFactory::createCell('foo', $cellStyle1); // this cell has its own style
$cell2 = WriterEntityFactory::createCell(12345, $cellStyle2); // this cell has its own style
$row2 = WriterEntityFactory::createRow([$cell1, $cell2]);
$writer->addRows([$row1, $row2]);
```
### Namespace changes for styles
The namespaces for styles have changed. Styles are still created by using a `builder` class.
For the builder, please update your import statements to use the following namespaces:
OpenSpout\Writer\Common\Creator\Style\StyleBuilder
OpenSpout\Writer\Common\Creator\Style\BorderBuilder
The `Style` base class and style definitions like `Border`, `BorderPart` and `Color` also have a new namespace.
If your are using these classes directly via an import statement in your code, please use the following namespaces:
OpenSpout\Common\Entity\Style\Border
OpenSpout\Common\Entity\Style\BorderPart
OpenSpout\Common\Entity\Style\Color
OpenSpout\Common\Entity\Style\Style
### Handling of empty rows
In 2.x, empty rows were not added to the spreadsheet.
In 3.0, `addRow` now always writes a row to the spreadsheet: when the row does not contain any cells, an empty row
is created in the sheet.

View File

@ -0,0 +1,76 @@
{
"name": "openspout/openspout",
"description": "PHP Library to read and write spreadsheet files (CSV, XLSX and ODS), in a fast and scalable way",
"license": "MIT",
"type": "library",
"keywords": [
"php",
"read",
"write",
"csv",
"xlsx",
"ods",
"odf",
"open",
"office",
"excel",
"spreadsheet",
"scale",
"memory",
"stream",
"ooxml"
],
"authors": [
{
"name": "Adrien Loison",
"email": "adrien@box.com"
}
],
"homepage": "https://github.com/openspout/openspout",
"require": {
"php": "~8.2.0 || ~8.3.0 || ~8.4.0",
"ext-dom": "*",
"ext-fileinfo": "*",
"ext-filter": "*",
"ext-libxml": "*",
"ext-xmlreader": "*",
"ext-zip": "*"
},
"require-dev": {
"ext-zlib": "*",
"friendsofphp/php-cs-fixer": "^3.65.0",
"infection/infection": "^0.29.8",
"phpbench/phpbench": "^1.3.1",
"phpstan/phpstan": "^2.0.2",
"phpstan/phpstan-phpunit": "^2.0.1",
"phpstan/phpstan-strict-rules": "^2",
"phpunit/phpunit": "^11.4.3"
},
"suggest": {
"ext-iconv": "To handle non UTF-8 CSV files (if \"php-mbstring\" is not already installed or is too limited)",
"ext-mbstring": "To handle non UTF-8 CSV files (if \"iconv\" is not already installed)"
},
"autoload": {
"psr-4": {
"OpenSpout\\": "src/"
}
},
"autoload-dev": {
"psr-4": {
"OpenSpout\\Benchmarks\\": "benchmarks/"
},
"classmap": [
"tests/"
]
},
"config": {
"allow-plugins": {
"infection/extension-installer": true
}
},
"extra": {
"branch-alias": {
"dev-master": "3.3.x-dev"
}
}
}

View File

@ -0,0 +1,6 @@
{
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
"extends": [
"local>Slamdunk/.github:renovate-config"
]
}

View File

@ -0,0 +1,65 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity;
use DateInterval;
use DateTimeInterface;
use OpenSpout\Common\Entity\Cell\BooleanCell;
use OpenSpout\Common\Entity\Cell\DateIntervalCell;
use OpenSpout\Common\Entity\Cell\DateTimeCell;
use OpenSpout\Common\Entity\Cell\EmptyCell;
use OpenSpout\Common\Entity\Cell\FormulaCell;
use OpenSpout\Common\Entity\Cell\NumericCell;
use OpenSpout\Common\Entity\Cell\StringCell;
use OpenSpout\Common\Entity\Comment\Comment;
use OpenSpout\Common\Entity\Style\Style;
abstract class Cell
{
public ?Comment $comment = null;
private Style $style;
public function __construct(?Style $style)
{
$this->setStyle($style);
}
abstract public function getValue(): null|bool|DateInterval|DateTimeInterface|float|int|string;
final public function setStyle(?Style $style): void
{
$this->style = $style ?? new Style();
}
final public function getStyle(): Style
{
return $this->style;
}
final public static function fromValue(null|bool|DateInterval|DateTimeInterface|float|int|string $value, ?Style $style = null): self
{
if (\is_bool($value)) {
return new BooleanCell($value, $style);
}
if (null === $value || '' === $value) {
return new EmptyCell($value, $style);
}
if (\is_int($value) || \is_float($value)) {
return new NumericCell($value, $style);
}
if ($value instanceof DateTimeInterface) {
return new DateTimeCell($value, $style);
}
if ($value instanceof DateInterval) {
return new DateIntervalCell($value, $style);
}
if (isset($value[0]) && '=' === $value[0]) {
return new FormulaCell($value, $style, null);
}
return new StringCell($value, $style);
}
}

View File

@ -0,0 +1,24 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class BooleanCell extends Cell
{
private readonly bool $value;
public function __construct(bool $value, ?Style $style)
{
$this->value = $value;
parent::__construct($style);
}
public function getValue(): bool
{
return $this->value;
}
}

View File

@ -0,0 +1,31 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use DateInterval;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class DateIntervalCell extends Cell
{
private readonly DateInterval $value;
/**
* For Excel make sure to set a format onto the style (Style::setFormat()) with the left most unit enclosed with
* brackets: '[h]:mm', '[hh]:mm:ss', '[m]:ss', '[s]', etc.
* This makes sure excel knows what to do with the remaining time that exceeds this unit. Without brackets Excel
* will interpret the value as date time and not duration if it is greater or equal 1.
*/
public function __construct(DateInterval $value, ?Style $style)
{
$this->value = $value;
parent::__construct($style);
}
public function getValue(): DateInterval
{
return $this->value;
}
}

View File

@ -0,0 +1,25 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use DateTimeInterface;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class DateTimeCell extends Cell
{
private readonly DateTimeInterface $value;
public function __construct(DateTimeInterface $value, ?Style $style)
{
$this->value = $value;
parent::__construct($style);
}
public function getValue(): DateTimeInterface
{
return $this->value;
}
}

View File

@ -0,0 +1,24 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class EmptyCell extends Cell
{
private readonly ?string $value;
public function __construct(?string $value, ?Style $style)
{
$this->value = $value;
parent::__construct($style);
}
public function getValue(): ?string
{
return $this->value;
}
}

View File

@ -0,0 +1,29 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class ErrorCell extends Cell
{
private readonly string $value;
public function __construct(string $value, ?Style $style)
{
$this->value = $value;
parent::__construct($style);
}
public function getValue(): ?string
{
return null;
}
public function getRawValue(): string
{
return $this->value;
}
}

View File

@ -0,0 +1,31 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use DateInterval;
use DateTimeImmutable;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class FormulaCell extends Cell
{
public function __construct(
private readonly string $value,
?Style $style,
private readonly null|bool|DateInterval|DateTimeImmutable|float|int|string $computedValue = null,
) {
parent::__construct($style);
}
public function getValue(): string
{
return $this->value;
}
public function getComputedValue(): null|bool|DateInterval|DateTimeImmutable|float|int|string
{
return $this->computedValue;
}
}

View File

@ -0,0 +1,24 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class NumericCell extends Cell
{
private readonly float|int $value;
public function __construct(float|int $value, ?Style $style)
{
$this->value = $value;
parent::__construct($style);
}
public function getValue(): float|int
{
return $this->value;
}
}

View File

@ -0,0 +1,24 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Style\Style;
final class StringCell extends Cell
{
private readonly string $value;
public function __construct(string $value, ?Style $style)
{
$this->value = $value;
parent::__construct($style);
}
public function getValue(): string
{
return $this->value;
}
}

View File

@ -0,0 +1,47 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Comment;
/**
* This class defines a comment that can be added to a cell.
*/
final class Comment
{
/** Comment height (CSS style, i.e. XXpx or YYpt). */
public string $height = '55.5pt';
/** Comment width (CSS style, i.e. XXpx or YYpt). */
public string $width = '96pt';
/** Left margin (CSS style, i.e. XXpx or YYpt). */
public string $marginLeft = '59.25pt';
/** Top margin (CSS style, i.e. XXpx or YYpt). */
public string $marginTop = '1.5pt';
/** Visible. */
public bool $visible = false;
/** Comment fill color. */
public string $fillColor = '#FFFFE1';
/** @var TextRun[] */
private array $textRuns = [];
public function addTextRun(?TextRun $textRun): void
{
$this->textRuns[] = $textRun;
}
/**
* The TextRuns for this comment.
*
* @return TextRun[]
*/
public function getTextRuns(): array
{
return $this->textRuns;
}
}

View File

@ -0,0 +1,23 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Comment;
/**
* This class defines rich text in a fluent interface that can be added to a comment.
*/
final class TextRun
{
public string $text;
public int $fontSize = 10;
public string $fontColor = '000000';
public string $fontName = 'Tahoma';
public bool $bold = false;
public bool $italic = false;
public function __construct(string $text)
{
$this->text = $text;
}
}

View File

@ -0,0 +1,169 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity;
use DateInterval;
use DateTimeInterface;
use OpenSpout\Common\Entity\Style\Style;
final class Row
{
/**
* The cells in this row.
*
* @var Cell[]
*/
private array $cells = [];
/** The row style. */
private Style $style;
/** Row height. */
private float $height = 0;
/**
* Row constructor.
*
* @param Cell[] $cells
*/
public function __construct(array $cells, ?Style $style = null)
{
$this
->setCells($cells)
->setStyle($style)
;
}
/**
* @param list<null|bool|DateInterval|DateTimeInterface|float|int|string> $cellValues
*/
public static function fromValues(array $cellValues = [], ?Style $rowStyle = null): self
{
$cells = array_map(static function (null|bool|DateInterval|DateTimeInterface|float|int|string $cellValue): Cell {
return Cell::fromValue($cellValue);
}, $cellValues);
return new self($cells, $rowStyle);
}
/**
* @param array<array-key, null|bool|DateInterval|DateTimeInterface|float|int|string> $cellValues
* @param array<array-key, Style> $columnStyles
*/
public static function fromValuesWithStyles(array $cellValues = [], ?Style $rowStyle = null, array $columnStyles = []): self
{
$cells = array_map(static function (null|bool|DateInterval|DateTimeInterface|float|int|string $cellValue, int|string $key) use ($columnStyles): Cell {
return Cell::fromValue($cellValue, $columnStyles[$key] ?? null);
}, $cellValues, array_keys($cellValues));
return new self($cells, $rowStyle);
}
/**
* @return Cell[] $cells
*/
public function getCells(): array
{
return $this->cells;
}
/**
* @param Cell[] $cells
*/
public function setCells(array $cells): self
{
$this->cells = [];
foreach ($cells as $cell) {
$this->addCell($cell);
}
return $this;
}
public function setCellAtIndex(Cell $cell, int $cellIndex): self
{
$this->cells[$cellIndex] = $cell;
return $this;
}
public function getCellAtIndex(int $cellIndex): ?Cell
{
return $this->cells[$cellIndex] ?? null;
}
public function addCell(Cell $cell): self
{
$this->cells[] = $cell;
return $this;
}
public function getNumCells(): int
{
// When using "setCellAtIndex", it's possible to
// have "$this->cells" contain holes.
if ([] === $this->cells) {
return 0;
}
return max(array_keys($this->cells)) + 1;
}
public function getStyle(): Style
{
return $this->style;
}
public function setStyle(?Style $style): self
{
$this->style = $style ?? new Style();
return $this;
}
/**
* Set row height.
*/
public function setHeight(float $height): self
{
$this->height = $height;
return $this;
}
/**
* Returns row height.
*/
public function getHeight(): float
{
return $this->height;
}
/**
* @return list<null|bool|DateInterval|DateTimeInterface|float|int|string> The row values, as array
*/
public function toArray(): array
{
return array_map(static function (Cell $cell): null|bool|DateInterval|DateTimeInterface|float|int|string {
return $cell->getValue();
}, $this->cells);
}
/**
* Detect whether a row is considered empty.
* An empty row has all of its cells empty.
*/
public function isEmpty(): bool
{
foreach ($this->cells as $cell) {
if (!$cell instanceof Cell\EmptyCell) {
return false;
}
}
return true;
}
}

View File

@ -0,0 +1,46 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Style;
final class Border
{
public const LEFT = 'left';
public const RIGHT = 'right';
public const TOP = 'top';
public const BOTTOM = 'bottom';
public const STYLE_NONE = 'none';
public const STYLE_SOLID = 'solid';
public const STYLE_DASHED = 'dashed';
public const STYLE_DOTTED = 'dotted';
public const STYLE_DOUBLE = 'double';
public const WIDTH_THIN = 'thin';
public const WIDTH_MEDIUM = 'medium';
public const WIDTH_THICK = 'thick';
/** @var array<string, BorderPart> */
private array $parts;
public function __construct(BorderPart ...$borderParts)
{
foreach ($borderParts as $borderPart) {
$this->parts[$borderPart->getName()] = $borderPart;
}
}
public function getPart(string $name): ?BorderPart
{
return $this->parts[$name] ?? null;
}
/**
* @return array<string, BorderPart>
*/
public function getParts(): array
{
return $this->parts;
}
}

View File

@ -0,0 +1,90 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Style;
use OpenSpout\Writer\Exception\Border\InvalidNameException;
use OpenSpout\Writer\Exception\Border\InvalidStyleException;
use OpenSpout\Writer\Exception\Border\InvalidWidthException;
final readonly class BorderPart
{
public const allowedStyles = [
Border::STYLE_NONE,
Border::STYLE_SOLID,
Border::STYLE_DASHED,
Border::STYLE_DOTTED,
Border::STYLE_DOUBLE,
];
public const allowedNames = [
Border::LEFT,
Border::RIGHT,
Border::TOP,
Border::BOTTOM,
];
public const allowedWidths = [
Border::WIDTH_THIN,
Border::WIDTH_MEDIUM,
Border::WIDTH_THICK,
];
private string $style;
private string $name;
private string $color;
private string $width;
/**
* @param string $name @see BorderPart::allowedNames
* @param string $color A RGB color code
* @param string $width @see BorderPart::allowedWidths
* @param string $style @see BorderPart::allowedStyles
*
* @throws InvalidNameException
* @throws InvalidStyleException
* @throws InvalidWidthException
*/
public function __construct(
string $name,
string $color = Color::BLACK,
string $width = Border::WIDTH_MEDIUM,
string $style = Border::STYLE_SOLID
) {
if (!\in_array($name, self::allowedNames, true)) {
throw new InvalidNameException($name);
}
if (!\in_array($style, self::allowedStyles, true)) {
throw new InvalidStyleException($style);
}
if (!\in_array($width, self::allowedWidths, true)) {
throw new InvalidWidthException($width);
}
$this->name = $name;
$this->color = $color;
$this->width = $width;
$this->style = $style;
}
public function getName(): string
{
return $this->name;
}
public function getStyle(): string
{
return $this->style;
}
public function getColor(): string
{
return $this->color;
}
public function getWidth(): string
{
return $this->width;
}
}

View File

@ -0,0 +1,31 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Style;
/**
* This class provides constants to work with text alignment.
*/
final class CellAlignment
{
public const LEFT = 'left';
public const RIGHT = 'right';
public const CENTER = 'center';
public const JUSTIFY = 'justify';
private const VALID_ALIGNMENTS = [
self::LEFT => 1,
self::RIGHT => 1,
self::CENTER => 1,
self::JUSTIFY => 1,
];
/**
* @return bool Whether the given cell alignment is valid
*/
public static function isValid(string $cellAlignment): bool
{
return isset(self::VALID_ALIGNMENTS[$cellAlignment]);
}
}

View File

@ -0,0 +1,37 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Style;
/**
* This class provides constants to work with text vertical alignment.
*/
final class CellVerticalAlignment
{
public const AUTO = 'auto';
public const BASELINE = 'baseline';
public const BOTTOM = 'bottom';
public const CENTER = 'center';
public const DISTRIBUTED = 'distributed';
public const JUSTIFY = 'justify';
public const TOP = 'top';
private const VALID_ALIGNMENTS = [
self::AUTO => 1,
self::BASELINE => 1,
self::BOTTOM => 1,
self::CENTER => 1,
self::DISTRIBUTED => 1,
self::JUSTIFY => 1,
self::TOP => 1,
];
/**
* @return bool Whether the given cell vertical alignment is valid
*/
public static function isValid(string $cellVerticalAlignment): bool
{
return isset(self::VALID_ALIGNMENTS[$cellVerticalAlignment]);
}
}

View File

@ -0,0 +1,88 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Style;
use OpenSpout\Common\Exception\InvalidColorException;
/**
* This class provides constants and functions to work with colors.
*/
final class Color
{
/**
* Standard colors - based on Office Online.
*/
public const BLACK = '000000';
public const WHITE = 'FFFFFF';
public const RED = 'FF0000';
public const DARK_RED = 'C00000';
public const ORANGE = 'FFC000';
public const YELLOW = 'FFFF00';
public const LIGHT_GREEN = '92D040';
public const GREEN = '00B050';
public const LIGHT_BLUE = '00B0E0';
public const BLUE = '0070C0';
public const DARK_BLUE = '002060';
public const PURPLE = '7030A0';
/**
* Returns an RGB color from R, G and B values.
*
* @param int $red Red component, 0 - 255
* @param int $green Green component, 0 - 255
* @param int $blue Blue component, 0 - 255
*
* @return string RGB color
*/
public static function rgb(int $red, int $green, int $blue): string
{
self::throwIfInvalidColorComponentValue($red);
self::throwIfInvalidColorComponentValue($green);
self::throwIfInvalidColorComponentValue($blue);
return strtoupper(
self::convertColorComponentToHex($red).
self::convertColorComponentToHex($green).
self::convertColorComponentToHex($blue)
);
}
/**
* Returns the ARGB color of the given RGB color,
* assuming that alpha value is always 1.
*
* @param string $rgbColor RGB color like "FF08B2"
*
* @return string ARGB color
*/
public static function toARGB(string $rgbColor): string
{
return 'FF'.$rgbColor;
}
/**
* Throws an exception is the color component value is outside of bounds (0 - 255).
*
* @throws InvalidColorException
*/
private static function throwIfInvalidColorComponentValue(int $colorComponent): void
{
if ($colorComponent < 0 || $colorComponent > 255) {
throw new InvalidColorException("The RGB components must be between 0 and 255. Received: {$colorComponent}");
}
}
/**
* Converts the color component to its corresponding hexadecimal value.
*
* @param int $colorComponent Color component, 0 - 255
*
* @return string Corresponding hexadecimal value, with a leading 0 if needed. E.g "0f", "2d"
*/
private static function convertColorComponentToHex(int $colorComponent): string
{
return str_pad(dechex($colorComponent), 2, '0', STR_PAD_LEFT);
}
}

View File

@ -0,0 +1,495 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Entity\Style;
use OpenSpout\Common\Exception\InvalidArgumentException;
/**
* Represents a style to be applied to a cell.
*/
final class Style
{
/**
* Default values.
*/
public const DEFAULT_FONT_SIZE = 11;
public const DEFAULT_FONT_COLOR = Color::BLACK;
public const DEFAULT_FONT_NAME = 'Arial';
/** @var int Style ID */
private int $id = -1;
/** @var bool Whether the font should be bold */
private bool $fontBold = false;
/** @var bool Whether the bold property was set */
private bool $hasSetFontBold = false;
/** @var bool Whether the font should be italic */
private bool $fontItalic = false;
/** @var bool Whether the italic property was set */
private bool $hasSetFontItalic = false;
/** @var bool Whether the font should be underlined */
private bool $fontUnderline = false;
/** @var bool Whether the underline property was set */
private bool $hasSetFontUnderline = false;
/** @var bool Whether the font should be struck through */
private bool $fontStrikethrough = false;
/** @var bool Whether the strikethrough property was set */
private bool $hasSetFontStrikethrough = false;
/** @var int Font size */
private int $fontSize = self::DEFAULT_FONT_SIZE;
/** @var bool Whether the font size property was set */
private bool $hasSetFontSize = false;
/** @var string Font color */
private string $fontColor = self::DEFAULT_FONT_COLOR;
/** @var bool Whether the font color property was set */
private bool $hasSetFontColor = false;
/** @var string Font name */
private string $fontName = self::DEFAULT_FONT_NAME;
/** @var bool Whether the font name property was set */
private bool $hasSetFontName = false;
/** @var bool Whether specific font properties should be applied */
private bool $shouldApplyFont = false;
/** @var bool Whether specific cell alignment should be applied */
private bool $shouldApplyCellAlignment = false;
/** @var string Cell alignment */
private string $cellAlignment;
/** @var bool Whether the cell alignment property was set */
private bool $hasSetCellAlignment = false;
/** @var bool Whether specific cell vertical alignment should be applied */
private bool $shouldApplyCellVerticalAlignment = false;
/** @var string Cell vertical alignment */
private string $cellVerticalAlignment;
/** @var bool Whether the cell vertical alignment property was set */
private bool $hasSetCellVerticalAlignment = false;
/** @var bool Whether the text should wrap in the cell (useful for long or multi-lines text) */
private bool $shouldWrapText = false;
/** @var bool Whether the wrap text property was set */
private bool $hasSetWrapText = false;
/** @var int Text rotation */
private int $textRotation = 0;
/** @var bool Whether the text rotation property was set */
private bool $hasSetTextRotation = false;
/** @var bool Whether the cell should shrink to fit to content */
private bool $shouldShrinkToFit = false;
/** @var bool Whether the shouldShrinkToFit text property was set */
private bool $hasSetShrinkToFit = false;
private ?Border $border = null;
/** @var null|string Background color */
private ?string $backgroundColor = null;
/** @var null|string Format */
private ?string $format = null;
private bool $isRegistered = false;
private bool $isEmpty = true;
public function __sleep(): array
{
$vars = get_object_vars($this);
unset($vars['id'], $vars['isRegistered']);
return array_keys($vars);
}
public function getId(): int
{
\assert(0 <= $this->id);
return $this->id;
}
public function setId(int $id): self
{
$this->id = $id;
return $this;
}
public function getBorder(): ?Border
{
return $this->border;
}
public function setBorder(Border $border): self
{
$this->border = $border;
$this->isEmpty = false;
return $this;
}
public function isFontBold(): bool
{
return $this->fontBold;
}
public function setFontBold(): self
{
$this->fontBold = true;
$this->hasSetFontBold = true;
$this->shouldApplyFont = true;
$this->isEmpty = false;
return $this;
}
public function hasSetFontBold(): bool
{
return $this->hasSetFontBold;
}
public function isFontItalic(): bool
{
return $this->fontItalic;
}
public function setFontItalic(): self
{
$this->fontItalic = true;
$this->hasSetFontItalic = true;
$this->shouldApplyFont = true;
$this->isEmpty = false;
return $this;
}
public function hasSetFontItalic(): bool
{
return $this->hasSetFontItalic;
}
public function isFontUnderline(): bool
{
return $this->fontUnderline;
}
public function setFontUnderline(): self
{
$this->fontUnderline = true;
$this->hasSetFontUnderline = true;
$this->shouldApplyFont = true;
$this->isEmpty = false;
return $this;
}
public function hasSetFontUnderline(): bool
{
return $this->hasSetFontUnderline;
}
public function isFontStrikethrough(): bool
{
return $this->fontStrikethrough;
}
public function setFontStrikethrough(): self
{
$this->fontStrikethrough = true;
$this->hasSetFontStrikethrough = true;
$this->shouldApplyFont = true;
$this->isEmpty = false;
return $this;
}
public function hasSetFontStrikethrough(): bool
{
return $this->hasSetFontStrikethrough;
}
public function getFontSize(): int
{
return $this->fontSize;
}
/**
* @param int $fontSize Font size, in pixels
*/
public function setFontSize(int $fontSize): self
{
$this->fontSize = $fontSize;
$this->hasSetFontSize = true;
$this->shouldApplyFont = true;
$this->isEmpty = false;
return $this;
}
public function hasSetFontSize(): bool
{
return $this->hasSetFontSize;
}
public function getFontColor(): string
{
return $this->fontColor;
}
/**
* Sets the font color.
*
* @param string $fontColor ARGB color (@see Color)
*/
public function setFontColor(string $fontColor): self
{
$this->fontColor = $fontColor;
$this->hasSetFontColor = true;
$this->shouldApplyFont = true;
$this->isEmpty = false;
return $this;
}
public function hasSetFontColor(): bool
{
return $this->hasSetFontColor;
}
public function getFontName(): string
{
return $this->fontName;
}
/**
* @param string $fontName Name of the font to use
*/
public function setFontName(string $fontName): self
{
$this->fontName = $fontName;
$this->hasSetFontName = true;
$this->shouldApplyFont = true;
$this->isEmpty = false;
return $this;
}
public function hasSetFontName(): bool
{
return $this->hasSetFontName;
}
public function getCellAlignment(): string
{
return $this->cellAlignment;
}
public function getCellVerticalAlignment(): string
{
return $this->cellVerticalAlignment;
}
/**
* @param string $cellAlignment The cell alignment
*/
public function setCellAlignment(string $cellAlignment): self
{
if (!CellAlignment::isValid($cellAlignment)) {
throw new InvalidArgumentException('Invalid cell alignment value');
}
$this->cellAlignment = $cellAlignment;
$this->hasSetCellAlignment = true;
$this->shouldApplyCellAlignment = true;
$this->isEmpty = false;
return $this;
}
/**
* @param string $cellVerticalAlignment The cell vertical alignment
*/
public function setCellVerticalAlignment(string $cellVerticalAlignment): self
{
if (!CellVerticalAlignment::isValid($cellVerticalAlignment)) {
throw new InvalidArgumentException('Invalid cell vertical alignment value');
}
$this->cellVerticalAlignment = $cellVerticalAlignment;
$this->hasSetCellVerticalAlignment = true;
$this->shouldApplyCellVerticalAlignment = true;
$this->isEmpty = false;
return $this;
}
public function hasSetCellAlignment(): bool
{
return $this->hasSetCellAlignment;
}
public function hasSetCellVerticalAlignment(): bool
{
return $this->hasSetCellVerticalAlignment;
}
/**
* @return bool Whether specific cell alignment should be applied
*/
public function shouldApplyCellAlignment(): bool
{
return $this->shouldApplyCellAlignment;
}
public function shouldApplyCellVerticalAlignment(): bool
{
return $this->shouldApplyCellVerticalAlignment;
}
public function shouldWrapText(): bool
{
return $this->shouldWrapText;
}
/**
* @param bool $shouldWrap Should the text be wrapped
*/
public function setShouldWrapText(bool $shouldWrap = true): self
{
$this->shouldWrapText = $shouldWrap;
$this->hasSetWrapText = true;
$this->isEmpty = false;
return $this;
}
public function hasSetWrapText(): bool
{
return $this->hasSetWrapText;
}
public function textRotation(): int
{
return $this->textRotation;
}
/**
* @param int $rotation Rotate text
*/
public function setTextRotation(int $rotation): self
{
$this->textRotation = $rotation;
$this->hasSetTextRotation = true;
$this->isEmpty = false;
return $this;
}
public function hasSetTextRotation(): bool
{
return $this->hasSetTextRotation;
}
/**
* @return bool Whether specific font properties should be applied
*/
public function shouldApplyFont(): bool
{
return $this->shouldApplyFont;
}
/**
* Sets the background color.
*
* @param string $color ARGB color (@see Color)
*/
public function setBackgroundColor(string $color): self
{
$this->backgroundColor = $color;
$this->isEmpty = false;
return $this;
}
public function getBackgroundColor(): ?string
{
return $this->backgroundColor;
}
/**
* Sets format.
*/
public function setFormat(string $format): self
{
$this->format = $format;
$this->isEmpty = false;
return $this;
}
public function getFormat(): ?string
{
return $this->format;
}
public function isRegistered(): bool
{
return $this->isRegistered;
}
public function markAsRegistered(?int $id): void
{
$this->setId($id);
$this->isRegistered = true;
}
public function isEmpty(): bool
{
return $this->isEmpty;
}
/**
* Sets should shrink to fit.
*/
public function setShouldShrinkToFit(bool $shrinkToFit = true): self
{
$this->hasSetShrinkToFit = true;
$this->shouldShrinkToFit = $shrinkToFit;
return $this;
}
/**
* @return bool Whether format should be applied
*/
public function shouldShrinkToFit(): bool
{
return $this->shouldShrinkToFit;
}
public function hasSetShrinkToFit(): bool
{
return $this->hasSetShrinkToFit;
}
}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Exception;
final class EncodingConversionException extends OpenSpoutException {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Exception;
final class IOException extends OpenSpoutException {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Exception;
final class InvalidArgumentException extends OpenSpoutException {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Exception;
final class InvalidColorException extends OpenSpoutException {}

View File

@ -0,0 +1,9 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Exception;
use Exception;
abstract class OpenSpoutException extends Exception {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Exception;
final class UnsupportedTypeException extends OpenSpoutException {}

View File

@ -0,0 +1,195 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Helper;
use Error;
use OpenSpout\Common\Exception\EncodingConversionException;
/**
* @internal
*/
final readonly class EncodingHelper
{
/**
* Definition of the encodings that can have a BOM.
*/
public const ENCODING_UTF8 = 'UTF-8';
public const ENCODING_UTF16_LE = 'UTF-16LE';
public const ENCODING_UTF16_BE = 'UTF-16BE';
public const ENCODING_UTF32_LE = 'UTF-32LE';
public const ENCODING_UTF32_BE = 'UTF-32BE';
/**
* Definition of the BOMs for the different encodings.
*/
public const BOM_UTF8 = "\xEF\xBB\xBF";
public const BOM_UTF16_LE = "\xFF\xFE";
public const BOM_UTF16_BE = "\xFE\xFF";
public const BOM_UTF32_LE = "\xFF\xFE\x00\x00";
public const BOM_UTF32_BE = "\x00\x00\xFE\xFF";
/** @var array<string, string> Map representing the encodings supporting BOMs (key) and their associated BOM (value) */
private array $supportedEncodingsWithBom;
private bool $canUseIconv;
private bool $canUseMbString;
public function __construct(bool $canUseIconv, bool $canUseMbString)
{
$this->canUseIconv = $canUseIconv;
$this->canUseMbString = $canUseMbString;
$this->supportedEncodingsWithBom = [
self::ENCODING_UTF8 => self::BOM_UTF8,
self::ENCODING_UTF16_LE => self::BOM_UTF16_LE,
self::ENCODING_UTF16_BE => self::BOM_UTF16_BE,
self::ENCODING_UTF32_LE => self::BOM_UTF32_LE,
self::ENCODING_UTF32_BE => self::BOM_UTF32_BE,
];
}
public static function factory(): self
{
return new self(
\function_exists('iconv'),
\function_exists('mb_convert_encoding'),
);
}
/**
* Returns the number of bytes to use as offset in order to skip the BOM.
*
* @param resource $filePointer Pointer to the file to check
* @param string $encoding Encoding of the file to check
*
* @return int Bytes offset to apply to skip the BOM (0 means no BOM)
*/
public function getBytesOffsetToSkipBOM($filePointer, string $encoding): int
{
$byteOffsetToSkipBom = 0;
if ($this->hasBOM($filePointer, $encoding)) {
$bomUsed = $this->supportedEncodingsWithBom[$encoding];
// we skip the N first bytes
$byteOffsetToSkipBom = \strlen($bomUsed);
}
return $byteOffsetToSkipBom;
}
/**
* Attempts to convert a non UTF-8 string into UTF-8.
*
* @param string $string Non UTF-8 string to be converted
* @param string $sourceEncoding The encoding used to encode the source string
*
* @return string The converted, UTF-8 string
*
* @throws EncodingConversionException If conversion is not supported or if the conversion failed
*/
public function attemptConversionToUTF8(?string $string, string $sourceEncoding): ?string
{
return $this->attemptConversion($string, $sourceEncoding, self::ENCODING_UTF8);
}
/**
* Attempts to convert a UTF-8 string into the given encoding.
*
* @param string $string UTF-8 string to be converted
* @param string $targetEncoding The encoding the string should be re-encoded into
*
* @return string The converted string, encoded with the given encoding
*
* @throws EncodingConversionException If conversion is not supported or if the conversion failed
*/
public function attemptConversionFromUTF8(?string $string, string $targetEncoding): ?string
{
return $this->attemptConversion($string, self::ENCODING_UTF8, $targetEncoding);
}
/**
* Returns whether the file identified by the given pointer has a BOM.
*
* @param resource $filePointer Pointer to the file to check
* @param string $encoding Encoding of the file to check
*
* @return bool TRUE if the file has a BOM, FALSE otherwise
*/
private function hasBOM($filePointer, string $encoding): bool
{
$hasBOM = false;
rewind($filePointer);
if (\array_key_exists($encoding, $this->supportedEncodingsWithBom)) {
$potentialBom = $this->supportedEncodingsWithBom[$encoding];
$numBytesInBom = \strlen($potentialBom);
$hasBOM = (fgets($filePointer, $numBytesInBom + 1) === $potentialBom);
}
return $hasBOM;
}
/**
* Attempts to convert the given string to the given encoding.
* Depending on what is installed on the server, we will try to iconv or mbstring.
*
* @param string $string string to be converted
* @param string $sourceEncoding The encoding used to encode the source string
* @param string $targetEncoding The encoding the string should be re-encoded into
*
* @return string The converted string, encoded with the given encoding
*
* @throws EncodingConversionException If conversion is not supported or if the conversion failed
*/
private function attemptConversion(?string $string, string $sourceEncoding, string $targetEncoding): ?string
{
// if source and target encodings are the same, it's a no-op
if (null === $string || $sourceEncoding === $targetEncoding) {
return $string;
}
$convertedString = null;
if ($this->canUseIconv) {
set_error_handler(static function (): bool {
return true;
});
$convertedString = iconv($sourceEncoding, $targetEncoding, $string);
restore_error_handler();
} elseif ($this->canUseMbString) {
$errorMessage = null;
set_error_handler(static function ($nr, $message) use (&$errorMessage): bool {
$errorMessage = $message; // @codeCoverageIgnore
return true; // @codeCoverageIgnore
});
try {
$convertedString = mb_convert_encoding($string, $targetEncoding, $sourceEncoding);
} catch (Error $error) {
$errorMessage = $error->getMessage();
}
restore_error_handler();
if (null !== $errorMessage) {
$convertedString = false;
}
} else {
throw new EncodingConversionException("The conversion from {$sourceEncoding} to {$targetEncoding} is not supported. Please install \"iconv\" or \"mbstring\".");
}
if (false === $convertedString) {
throw new EncodingConversionException("The conversion from {$sourceEncoding} to {$targetEncoding} failed.");
}
return $convertedString;
}
}

View File

@ -0,0 +1,29 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Helper\Escaper;
/**
* @internal
*/
interface EscaperInterface
{
/**
* Escapes the given string to make it compatible with PHP.
*
* @param string $string The string to escape
*
* @return string The escaped string
*/
public function escape(string $string): string;
/**
* Unescapes the given string to make it compatible with PHP.
*
* @param string $string The string to unescape
*
* @return string The unescaped string
*/
public function unescape(string $string): string;
}

View File

@ -0,0 +1,47 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Helper\Escaper;
/**
* @internal
*/
final class ODS implements EscaperInterface
{
/**
* Escapes the given string to make it compatible with XLSX.
*
* @param string $string The string to escape
*
* @return string The escaped string
*/
public function escape(string $string): string
{
/*
* 'ENT_DISALLOWED' ensures that invalid characters in the given document type are replaced.
* Otherwise control characters like a vertical tab "\v" will make the XML document unreadable by the XML processor.
*
* @see https://github.com/box/spout/issues/329
*/
return htmlspecialchars($string, ENT_QUOTES | ENT_DISALLOWED, 'UTF-8');
}
/**
* Unescapes the given string to make it compatible with XLSX.
*
* @param string $string The string to unescape
*
* @return string The unescaped string
*/
public function unescape(string $string): string
{
// ==============
// = WARNING =
// ==============
// It is assumed that the given string has already had its XML entities decoded.
// This is true if the string is coming from a DOMNode (as DOMNode already decode XML entities on creation).
// Therefore there is no need to call "htmlspecialchars_decode()".
return $string;
}
}

View File

@ -0,0 +1,193 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Helper\Escaper;
/**
* @internal
*/
final class XLSX implements EscaperInterface
{
/** @var bool Whether the escaper has already been initialized */
private bool $isAlreadyInitialized = false;
/** @var string Regex pattern to detect control characters that need to be escaped */
private string $escapableControlCharactersPattern;
/** @var string[] Map containing control characters to be escaped (key) and their escaped value (value) */
private array $controlCharactersEscapingMap;
/** @var string[] Map containing control characters to be escaped (value) and their escaped value (key) */
private array $controlCharactersEscapingReverseMap;
/**
* Escapes the given string to make it compatible with XLSX.
*
* @param string $string The string to escape
*
* @return string The escaped string
*/
public function escape(string $string): string
{
$this->initIfNeeded();
$escapedString = $this->escapeControlCharacters($string);
// @NOTE: Using ENT_QUOTES as XML entities ('<', '>', '&') as well as
// single/double quotes (for XML attributes) need to be encoded.
return htmlspecialchars($escapedString, ENT_QUOTES, 'UTF-8');
}
/**
* Unescapes the given string to make it compatible with XLSX.
*
* @param string $string The string to unescape
*
* @return string The unescaped string
*/
public function unescape(string $string): string
{
$this->initIfNeeded();
// ==============
// = WARNING =
// ==============
// It is assumed that the given string has already had its XML entities decoded.
// This is true if the string is coming from a DOMNode (as DOMNode already decode XML entities on creation).
// Therefore there is no need to call "htmlspecialchars_decode()".
return $this->unescapeControlCharacters($string);
}
/**
* Initializes the control characters if not already done.
*/
private function initIfNeeded(): void
{
if (!$this->isAlreadyInitialized) {
$this->escapableControlCharactersPattern = $this->getEscapableControlCharactersPattern();
$this->controlCharactersEscapingMap = $this->getControlCharactersEscapingMap();
$this->controlCharactersEscapingReverseMap = array_flip($this->controlCharactersEscapingMap);
$this->isAlreadyInitialized = true;
}
}
/**
* @return string Regex pattern containing all escapable control characters
*/
private function getEscapableControlCharactersPattern(): string
{
// control characters values are from 0 to 1F (hex values) in the ASCII table
// some characters should not be escaped though: "\t", "\r" and "\n".
return '[\x00-\x08'.
// skipping "\t" (0x9) and "\n" (0xA)
'\x0B-\x0C'.
// skipping "\r" (0xD)
'\x0E-\x1F]';
}
/**
* Builds the map containing control characters to be escaped
* mapped to their escaped values.
* "\t", "\r" and "\n" don't need to be escaped.
*
* NOTE: the logic has been adapted from the XlsxWriter library (BSD License)
*
* @see https://github.com/jmcnamara/XlsxWriter/blob/f1e610f29/xlsxwriter/sharedstrings.py#L89
*
* @return string[]
*/
private function getControlCharactersEscapingMap(): array
{
$controlCharactersEscapingMap = [];
// control characters values are from 0 to 1F (hex values) in the ASCII table
for ($charValue = 0x00; $charValue <= 0x1F; ++$charValue) {
$character = \chr($charValue);
if (1 === preg_match("/{$this->escapableControlCharactersPattern}/", $character)) {
$charHexValue = dechex($charValue);
$escapedChar = '_x'.\sprintf('%04s', strtoupper($charHexValue)).'_';
$controlCharactersEscapingMap[$escapedChar] = $character;
}
}
return $controlCharactersEscapingMap;
}
/**
* Converts PHP control characters from the given string to OpenXML escaped control characters.
*
* Excel escapes control characters with _xHHHH_ and also escapes any
* literal strings of that type by encoding the leading underscore.
* So "\0" -> _x0000_ and "_x0000_" -> _x005F_x0000_.
*
* NOTE: the logic has been adapted from the XlsxWriter library (BSD License)
*
* @see https://github.com/jmcnamara/XlsxWriter/blob/f1e610f29/xlsxwriter/sharedstrings.py#L89
*
* @param string $string String to escape
*/
private function escapeControlCharacters(string $string): string
{
$escapedString = $this->escapeEscapeCharacter($string);
// if no control characters
if (1 !== preg_match("/{$this->escapableControlCharactersPattern}/", $escapedString)) {
return $escapedString;
}
return preg_replace_callback("/({$this->escapableControlCharactersPattern})/", function ($matches) {
return $this->controlCharactersEscapingReverseMap[$matches[0]];
}, $escapedString);
}
/**
* Escapes the escape character: "_x0000_" -> "_x005F_x0000_".
*
* @param string $string String to escape
*
* @return string The escaped string
*/
private function escapeEscapeCharacter(string $string): string
{
return preg_replace('/_(x[\dA-F]{4})_/', '_x005F_$1_', $string);
}
/**
* Converts OpenXML escaped control characters from the given string to PHP control characters.
*
* Excel escapes control characters with _xHHHH_ and also escapes any
* literal strings of that type by encoding the leading underscore.
* So "_x0000_" -> "\0" and "_x005F_x0000_" -> "_x0000_"
*
* NOTE: the logic has been adapted from the XlsxWriter library (BSD License)
*
* @see https://github.com/jmcnamara/XlsxWriter/blob/f1e610f29/xlsxwriter/sharedstrings.py#L89
*
* @param string $string String to unescape
*/
private function unescapeControlCharacters(string $string): string
{
$unescapedString = $string;
foreach ($this->controlCharactersEscapingMap as $escapedCharValue => $charValue) {
// only unescape characters that don't contain the escaped escape character for now
$unescapedString = preg_replace("/(?<!_x005F)({$escapedCharValue})/", $charValue, $unescapedString);
}
return $this->unescapeEscapeCharacter($unescapedString);
}
/**
* Unecapes the escape character: "_x005F_x0000_" => "_x0000_".
*
* @param string $string String to unescape
*
* @return string The unescaped string
*/
private function unescapeEscapeCharacter(string $string): string
{
return preg_replace('/_x005F(_x[\dA-F]{4}_)/', '$1', $string);
}
}

View File

@ -0,0 +1,164 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Helper;
use OpenSpout\Common\Exception\IOException;
use RecursiveDirectoryIterator;
use RecursiveIteratorIterator;
/**
* @internal
*/
final readonly class FileSystemHelper implements FileSystemHelperInterface
{
/** @var string Real path of the base folder where all the I/O can occur */
private string $baseFolderRealPath;
/**
* @param string $baseFolderPath The path of the base folder where all the I/O can occur
*/
public function __construct(string $baseFolderPath)
{
$realpath = realpath($baseFolderPath);
\assert(false !== $realpath);
$this->baseFolderRealPath = $realpath;
}
public function getBaseFolderRealPath(): string
{
return $this->baseFolderRealPath;
}
/**
* Creates an empty folder with the given name under the given parent folder.
*
* @param string $parentFolderPath The parent folder path under which the folder is going to be created
* @param string $folderName The name of the folder to create
*
* @return string Path of the created folder
*
* @throws IOException If unable to create the folder or if the folder path is not inside of the base folder
*/
public function createFolder(string $parentFolderPath, string $folderName): string
{
$this->throwIfOperationNotInBaseFolder($parentFolderPath);
$folderPath = $parentFolderPath.\DIRECTORY_SEPARATOR.$folderName;
$errorMessage = '';
set_error_handler(static function ($nr, $message) use (&$errorMessage): bool {
$errorMessage = $message;
return true;
});
$wasCreationSuccessful = mkdir($folderPath, 0777, true);
restore_error_handler();
if (!$wasCreationSuccessful) {
throw new IOException("Unable to create folder: {$folderPath} - {$errorMessage}");
}
return $folderPath;
}
/**
* Creates a file with the given name and content in the given folder.
* The parent folder must exist.
*
* @param string $parentFolderPath The parent folder path where the file is going to be created
* @param string $fileName The name of the file to create
* @param string $fileContents The contents of the file to create
*
* @return string Path of the created file
*
* @throws IOException If unable to create the file or if the file path is not inside of the base folder
*/
public function createFileWithContents(string $parentFolderPath, string $fileName, string $fileContents): string
{
$this->throwIfOperationNotInBaseFolder($parentFolderPath);
$filePath = $parentFolderPath.\DIRECTORY_SEPARATOR.$fileName;
$errorMessage = '';
set_error_handler(static function ($nr, $message) use (&$errorMessage): bool {
$errorMessage = $message;
return true;
});
$wasCreationSuccessful = file_put_contents($filePath, $fileContents);
restore_error_handler();
if (false === $wasCreationSuccessful) {
throw new IOException("Unable to create file: {$filePath} - {$errorMessage}");
}
return $filePath;
}
/**
* Delete the file at the given path.
*
* @param string $filePath Path of the file to delete
*
* @throws IOException If the file path is not inside of the base folder
*/
public function deleteFile(string $filePath): void
{
$this->throwIfOperationNotInBaseFolder($filePath);
if (file_exists($filePath) && is_file($filePath)) {
unlink($filePath);
}
}
/**
* Delete the folder at the given path as well as all its contents.
*
* @param string $folderPath Path of the folder to delete
*
* @throws IOException If the folder path is not inside of the base folder
*/
public function deleteFolderRecursively(string $folderPath): void
{
$this->throwIfOperationNotInBaseFolder($folderPath);
$itemIterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($folderPath, RecursiveDirectoryIterator::SKIP_DOTS),
RecursiveIteratorIterator::CHILD_FIRST
);
foreach ($itemIterator as $item) {
if ($item->isDir()) {
rmdir($item->getPathname());
} else {
unlink($item->getPathname());
}
}
rmdir($folderPath);
}
/**
* All I/O operations must occur inside the base folder, for security reasons.
* This function will throw an exception if the folder where the I/O operation
* should occur is not inside the base folder.
*
* @param string $operationFolderPath The path of the folder where the I/O operation should occur
*
* @throws IOException If the folder where the I/O operation should occur
* is not inside the base folder or the base folder does not exist
*/
private function throwIfOperationNotInBaseFolder(string $operationFolderPath): void
{
$operationFolderRealPath = realpath($operationFolderPath);
if (false === $operationFolderRealPath) {
throw new IOException("Folder not found: {$operationFolderRealPath}");
}
$isInBaseFolder = str_starts_with($operationFolderRealPath, $this->baseFolderRealPath);
if (!$isInBaseFolder) {
throw new IOException("Cannot perform I/O operation outside of the base folder: {$this->baseFolderRealPath}");
}
}
}

View File

@ -0,0 +1,57 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Helper;
use OpenSpout\Common\Exception\IOException;
/**
* @internal
*/
interface FileSystemHelperInterface
{
/**
* Creates an empty folder with the given name under the given parent folder.
*
* @param string $parentFolderPath The parent folder path under which the folder is going to be created
* @param string $folderName The name of the folder to create
*
* @return string Path of the created folder
*
* @throws IOException If unable to create the folder or if the folder path is not inside of the base folder
*/
public function createFolder(string $parentFolderPath, string $folderName): string;
/**
* Creates a file with the given name and content in the given folder.
* The parent folder must exist.
*
* @param string $parentFolderPath The parent folder path where the file is going to be created
* @param string $fileName The name of the file to create
* @param string $fileContents The contents of the file to create
*
* @return string Path of the created file
*
* @throws IOException If unable to create the file or if the file path is not inside of the base folder
*/
public function createFileWithContents(string $parentFolderPath, string $fileName, string $fileContents): string;
/**
* Delete the file at the given path.
*
* @param string $filePath Path of the file to delete
*
* @throws IOException If the file path is not inside of the base folder
*/
public function deleteFile(string $filePath): void;
/**
* Delete the folder at the given path as well as all its contents.
*
* @param string $folderPath Path of the folder to delete
*
* @throws IOException If the folder path is not inside of the base folder
*/
public function deleteFolderRecursively(string $folderPath): void;
}

View File

@ -0,0 +1,80 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common\Helper;
/**
* @internal
*/
final readonly class StringHelper
{
/** @var bool Whether the mbstring extension is loaded */
private bool $hasMbstringSupport;
public function __construct(bool $hasMbstringSupport)
{
$this->hasMbstringSupport = $hasMbstringSupport;
}
public static function factory(): self
{
return new self(\function_exists('mb_strlen'));
}
/**
* Returns the length of the given string.
* It uses the multi-bytes function is available.
*
* @see strlen
* @see mb_strlen
*/
public function getStringLength(string $string): int
{
return $this->hasMbstringSupport
? mb_strlen($string)
: \strlen($string); // @codeCoverageIgnore
}
/**
* Returns the position of the first occurrence of the given character/substring within the given string.
* It uses the multi-bytes function is available.
*
* @see strpos
* @see mb_strpos
*
* @param string $char Needle
* @param string $string Haystack
*
* @return int Char/substring's first occurrence position within the string if found (starts at 0) or -1 if not found
*/
public function getCharFirstOccurrencePosition(string $char, string $string): int
{
$position = $this->hasMbstringSupport
? mb_strpos($string, $char)
: strpos($string, $char); // @codeCoverageIgnore
return (false !== $position) ? $position : -1;
}
/**
* Returns the position of the last occurrence of the given character/substring within the given string.
* It uses the multi-bytes function is available.
*
* @see strrpos
* @see mb_strrpos
*
* @param string $char Needle
* @param string $string Haystack
*
* @return int Char/substring's last occurrence position within the string if found (starts at 0) or -1 if not found
*/
public function getCharLastOccurrencePosition(string $char, string $string): int
{
$position = $this->hasMbstringSupport
? mb_strrpos($string, $char)
: strrpos($string, $char); // @codeCoverageIgnore
return (false !== $position) ? $position : -1;
}
}

View File

@ -0,0 +1,33 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Common;
use OpenSpout\Common\Exception\InvalidArgumentException;
/**
* @internal
*/
trait TempFolderOptionTrait
{
private string $tempFolder;
final public function setTempFolder(string $tempFolder): void
{
if (!is_dir($tempFolder) || !is_writable($tempFolder)) {
throw new InvalidArgumentException("{$tempFolder} is not a writable folder");
}
$this->tempFolder = $tempFolder;
}
final public function getTempFolder(): string
{
if (!isset($this->tempFolder)) {
$this->setTempFolder(sys_get_temp_dir());
}
return $this->tempFolder;
}
}

View File

@ -0,0 +1,171 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Exception\ReaderException;
use OpenSpout\Reader\Exception\ReaderNotOpenedException;
/**
* @template T of SheetIteratorInterface
*
* @implements ReaderInterface<T>
*/
abstract class AbstractReader implements ReaderInterface
{
/** @var bool Indicates whether the stream is currently open */
private bool $isStreamOpened = false;
/**
* Prepares the reader to read the given file. It also makes sure
* that the file exists and is readable.
*
* @param string $filePath Path of the file to be read
*
* @throws IOException If the file at the given path does not exist, is not readable or is corrupted
*/
public function open(string $filePath): void
{
if ($this->isStreamWrapper($filePath) && (!$this->doesSupportStreamWrapper() || !$this->isSupportedStreamWrapper($filePath))) {
throw new IOException("Could not open {$filePath} for reading! Stream wrapper used is not supported for this type of file.");
}
if (!$this->isPhpStream($filePath)) {
// we skip the checks if the provided file path points to a PHP stream
if (!file_exists($filePath)) {
throw new IOException("Could not open {$filePath} for reading! File does not exist.");
}
if (!is_readable($filePath)) {
throw new IOException("Could not open {$filePath} for reading! File is not readable.");
}
}
try {
$fileRealPath = $this->getFileRealPath($filePath);
$this->openReader($fileRealPath);
$this->isStreamOpened = true;
} catch (ReaderException $exception) {
throw new IOException(
"Could not open {$filePath} for reading!",
0,
$exception
);
}
}
/**
* Closes the reader, preventing any additional reading.
*/
final public function close(): void
{
if ($this->isStreamOpened) {
$this->closeReader();
$this->isStreamOpened = false;
}
}
/**
* Returns whether stream wrappers are supported.
*/
abstract protected function doesSupportStreamWrapper(): bool;
/**
* Opens the file at the given file path to make it ready to be read.
*
* @param string $filePath Path of the file to be read
*/
abstract protected function openReader(string $filePath): void;
/**
* Closes the reader. To be used after reading the file.
*/
abstract protected function closeReader(): void;
final protected function ensureStreamOpened(): void
{
if (!$this->isStreamOpened) {
throw new ReaderNotOpenedException('Reader should be opened first.');
}
}
/**
* Returns the real path of the given path.
* If the given path is a valid stream wrapper, returns the path unchanged.
*/
private function getFileRealPath(string $filePath): string
{
if ($this->isSupportedStreamWrapper($filePath)) {
return $filePath;
}
// Need to use realpath to fix "Can't open file" on some Windows setup
$realpath = realpath($filePath);
\assert(false !== $realpath);
return $realpath;
}
/**
* Returns the scheme of the custom stream wrapper, if the path indicates a stream wrapper is used.
* For example, php://temp => php, s3://path/to/file => s3...
*
* @param string $filePath Path of the file to be read
*
* @return null|string The stream wrapper scheme or NULL if not a stream wrapper
*/
private function getStreamWrapperScheme(string $filePath): ?string
{
$streamScheme = null;
if (1 === preg_match('/^(\w+):\/\//', $filePath, $matches)) {
$streamScheme = $matches[1];
}
return $streamScheme;
}
/**
* Checks if the given path is an unsupported stream wrapper
* (like local path, php://temp, mystream://foo/bar...).
*
* @param string $filePath Path of the file to be read
*
* @return bool Whether the given path is an unsupported stream wrapper
*/
private function isStreamWrapper(string $filePath): bool
{
return null !== $this->getStreamWrapperScheme($filePath);
}
/**
* Checks if the given path is an supported stream wrapper
* (like php://temp, mystream://foo/bar...).
* If the given path is a local path, returns true.
*
* @param string $filePath Path of the file to be read
*
* @return bool Whether the given path is an supported stream wrapper
*/
private function isSupportedStreamWrapper(string $filePath): bool
{
$streamScheme = $this->getStreamWrapperScheme($filePath);
return null === $streamScheme || \in_array($streamScheme, stream_get_wrappers(), true);
}
/**
* Checks if a path is a PHP stream (like php://output, php://memory, ...).
*
* @param string $filePath Path of the file to be read
*
* @return bool Whether the given path maps to a PHP stream
*/
private function isPhpStream(string $filePath): bool
{
$streamScheme = $this->getStreamWrapperScheme($filePath);
return 'php' === $streamScheme;
}
}

View File

@ -0,0 +1,15 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\CSV;
use OpenSpout\Common\Helper\EncodingHelper;
final class Options
{
public bool $SHOULD_PRESERVE_EMPTY_ROWS = false;
public string $FIELD_DELIMITER = ',';
public string $FIELD_ENCLOSURE = '"';
public string $ENCODING = EncodingHelper::ENCODING_UTF8;
}

View File

@ -0,0 +1,80 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\CSV;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Common\Helper\EncodingHelper;
use OpenSpout\Reader\AbstractReader;
/**
* @extends AbstractReader<SheetIterator>
*/
final class Reader extends AbstractReader
{
/** @var resource Pointer to the file to be written */
private $filePointer;
/** @var SheetIterator To iterator over the CSV unique "sheet" */
private SheetIterator $sheetIterator;
private readonly Options $options;
private readonly EncodingHelper $encodingHelper;
public function __construct(
?Options $options = null,
?EncodingHelper $encodingHelper = null
) {
$this->options = $options ?? new Options();
$this->encodingHelper = $encodingHelper ?? EncodingHelper::factory();
}
public function getSheetIterator(): SheetIterator
{
$this->ensureStreamOpened();
return $this->sheetIterator;
}
/**
* Returns whether stream wrappers are supported.
*/
protected function doesSupportStreamWrapper(): bool
{
return true;
}
/**
* Opens the file at the given path to make it ready to be read.
* If setEncoding() was not called, it assumes that the file is encoded in UTF-8.
*
* @param string $filePath Path of the CSV file to be read
*
* @throws IOException
*/
protected function openReader(string $filePath): void
{
$resource = fopen($filePath, 'r');
\assert(false !== $resource);
$this->filePointer = $resource;
$this->sheetIterator = new SheetIterator(
new Sheet(
new RowIterator(
$this->filePointer,
$this->options,
$this->encodingHelper
)
)
);
}
/**
* Closes the reader. To be used after reading the file.
*/
protected function closeReader(): void
{
fclose($this->filePointer);
}
}

View File

@ -0,0 +1,219 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\CSV;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Row;
use OpenSpout\Common\Exception\EncodingConversionException;
use OpenSpout\Common\Helper\EncodingHelper;
use OpenSpout\Reader\RowIteratorInterface;
/**
* Iterate over CSV rows.
*/
final class RowIterator implements RowIteratorInterface
{
/**
* Value passed to fgetcsv. 0 means "unlimited" (slightly slower but accommodates for very long lines).
*/
public const MAX_READ_BYTES_PER_LINE = 0;
/** @var null|resource Pointer to the CSV file to read */
private $filePointer;
/** @var int Number of read rows */
private int $numReadRows = 0;
/** @var null|Row Buffer used to store the current row, while checking if there are more rows to read */
private ?Row $rowBuffer = null;
/** @var bool Indicates whether all rows have been read */
private bool $hasReachedEndOfFile = false;
private readonly Options $options;
/** @var EncodingHelper Helper to work with different encodings */
private readonly EncodingHelper $encodingHelper;
/**
* @param resource $filePointer Pointer to the CSV file to read
*/
public function __construct(
$filePointer,
Options $options,
EncodingHelper $encodingHelper
) {
$this->filePointer = $filePointer;
$this->options = $options;
$this->encodingHelper = $encodingHelper;
}
/**
* Rewind the Iterator to the first element.
*
* @see http://php.net/manual/en/iterator.rewind.php
*/
public function rewind(): void
{
$this->rewindAndSkipBom();
$this->numReadRows = 0;
$this->rowBuffer = null;
$this->next();
}
/**
* Checks if current position is valid.
*
* @see http://php.net/manual/en/iterator.valid.php
*/
public function valid(): bool
{
return null !== $this->filePointer && !$this->hasReachedEndOfFile;
}
/**
* Move forward to next element. Reads data for the next unprocessed row.
*
* @see http://php.net/manual/en/iterator.next.php
*
* @throws EncodingConversionException If unable to convert data to UTF-8
*/
public function next(): void
{
$this->hasReachedEndOfFile = feof($this->filePointer);
if (!$this->hasReachedEndOfFile) {
$this->readDataForNextRow();
}
}
/**
* Return the current element from the buffer.
*
* @see http://php.net/manual/en/iterator.current.php
*/
public function current(): ?Row
{
return $this->rowBuffer;
}
/**
* Return the key of the current element.
*
* @see http://php.net/manual/en/iterator.key.php
*/
public function key(): int
{
return $this->numReadRows;
}
/**
* This rewinds and skips the BOM if inserted at the beginning of the file
* by moving the file pointer after it, so that it is not read.
*/
private function rewindAndSkipBom(): void
{
$byteOffsetToSkipBom = $this->encodingHelper->getBytesOffsetToSkipBOM($this->filePointer, $this->options->ENCODING);
// sets the cursor after the BOM (0 means no BOM, so rewind it)
fseek($this->filePointer, $byteOffsetToSkipBom);
}
/**
* @throws EncodingConversionException If unable to convert data to UTF-8
*/
private function readDataForNextRow(): void
{
do {
$rowData = $this->getNextUTF8EncodedRow();
} while ($this->shouldReadNextRow($rowData));
if (false !== $rowData) {
// array_map will replace NULL values by empty strings
$rowDataBufferAsArray = array_map('\strval', $rowData);
$this->rowBuffer = new Row(array_map(static function ($cellValue) {
return Cell::fromValue($cellValue);
}, $rowDataBufferAsArray), null);
++$this->numReadRows;
} else {
// If we reach this point, it means end of file was reached.
// This happens when the last lines are empty lines.
$this->hasReachedEndOfFile = true;
}
}
/**
* @param array<int, null|string>|bool $currentRowData
*
* @return bool Whether the data for the current row can be returned or if we need to keep reading
*/
private function shouldReadNextRow($currentRowData): bool
{
$hasSuccessfullyFetchedRowData = (false !== $currentRowData);
$hasNowReachedEndOfFile = feof($this->filePointer);
$isEmptyLine = $this->isEmptyLine($currentRowData);
return
(!$hasSuccessfullyFetchedRowData && !$hasNowReachedEndOfFile)
|| (!$this->options->SHOULD_PRESERVE_EMPTY_ROWS && $isEmptyLine);
}
/**
* Returns the next row, converted if necessary to UTF-8.
* As fgetcsv() does not manage correctly encoding for non UTF-8 data,
* we remove manually whitespace with ltrim or rtrim (depending on the order of the bytes).
*
* @return array<int, null|string>|false The row for the current file pointer, encoded in UTF-8 or FALSE if nothing to read
*
* @throws EncodingConversionException If unable to convert data to UTF-8
*/
private function getNextUTF8EncodedRow(): array|false
{
$encodedRowData = fgetcsv(
$this->filePointer,
self::MAX_READ_BYTES_PER_LINE,
$this->options->FIELD_DELIMITER,
$this->options->FIELD_ENCLOSURE,
''
);
if (false === $encodedRowData) {
return false;
}
foreach ($encodedRowData as $cellIndex => $cellValue) {
switch ($this->options->ENCODING) {
case EncodingHelper::ENCODING_UTF16_LE:
case EncodingHelper::ENCODING_UTF32_LE:
// remove whitespace from the beginning of a string as fgetcsv() add extra whitespace when it try to explode non UTF-8 data
$cellValue = ltrim($cellValue);
break;
case EncodingHelper::ENCODING_UTF16_BE:
case EncodingHelper::ENCODING_UTF32_BE:
// remove whitespace from the end of a string as fgetcsv() add extra whitespace when it try to explode non UTF-8 data
$cellValue = rtrim($cellValue);
break;
}
$encodedRowData[$cellIndex] = $this->encodingHelper->attemptConversionToUTF8($cellValue, $this->options->ENCODING);
}
return $encodedRowData;
}
/**
* @param array<int, null|string>|bool $lineData Array containing the cells value for the line
*
* @return bool Whether the given line is empty
*/
private function isEmptyLine($lineData): bool
{
return \is_array($lineData) && 1 === \count($lineData) && null === $lineData[0];
}
}

View File

@ -0,0 +1,53 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\CSV;
use OpenSpout\Reader\SheetInterface;
/**
* @implements SheetInterface<RowIterator>
*/
final readonly class Sheet implements SheetInterface
{
/** @var RowIterator To iterate over the CSV's rows */
private RowIterator $rowIterator;
/**
* @param RowIterator $rowIterator Corresponding row iterator
*/
public function __construct(RowIterator $rowIterator)
{
$this->rowIterator = $rowIterator;
}
public function getRowIterator(): RowIterator
{
return $this->rowIterator;
}
/**
* @return int Index of the sheet
*/
public function getIndex(): int
{
return 0;
}
/**
* @return string Name of the sheet - empty string since CSV does not support that
*/
public function getName(): string
{
return '';
}
/**
* @return bool Always TRUE as there is only one sheet
*/
public function isActive(): bool
{
return true;
}
}

View File

@ -0,0 +1,77 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\CSV;
use OpenSpout\Reader\SheetIteratorInterface;
/**
* @implements SheetIteratorInterface<Sheet>
*/
final class SheetIterator implements SheetIteratorInterface
{
/** @var Sheet The CSV unique "sheet" */
private readonly Sheet $sheet;
/** @var bool Whether the unique "sheet" has already been read */
private bool $hasReadUniqueSheet = false;
/**
* @param Sheet $sheet Corresponding unique sheet
*/
public function __construct(Sheet $sheet)
{
$this->sheet = $sheet;
}
/**
* Rewind the Iterator to the first element.
*
* @see http://php.net/manual/en/iterator.rewind.php
*/
public function rewind(): void
{
$this->hasReadUniqueSheet = false;
}
/**
* Checks if current position is valid.
*
* @see http://php.net/manual/en/iterator.valid.php
*/
public function valid(): bool
{
return !$this->hasReadUniqueSheet;
}
/**
* Move forward to next element.
*
* @see http://php.net/manual/en/iterator.next.php
*/
public function next(): void
{
$this->hasReadUniqueSheet = true;
}
/**
* Return the current element.
*
* @see http://php.net/manual/en/iterator.current.php
*/
public function current(): Sheet
{
return $this->sheet;
}
/**
* Return the key of the current element.
*
* @see http://php.net/manual/en/iterator.key.php
*/
public function key(): int
{
return 1;
}
}

View File

@ -0,0 +1,21 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Common;
/**
* @internal
*/
final readonly class ColumnWidth
{
/**
* @param positive-int $start
* @param positive-int $end
*/
public function __construct(
public int $start,
public int $end,
public float $width,
) {}
}

View File

@ -0,0 +1,64 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Common\Creator;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Common\Exception\UnsupportedTypeException;
use OpenSpout\Reader\CSV\Reader as CSVReader;
use OpenSpout\Reader\ODS\Reader as ODSReader;
use OpenSpout\Reader\ReaderInterface;
use OpenSpout\Reader\XLSX\Reader as XLSXReader;
/**
* This factory is used to create readers, based on the type of the file to be read.
* It supports CSV, XLSX and ODS formats.
*
* @deprecated Guessing mechanisms are brittle by nature and won't be provided by this library anymore
*/
final class ReaderFactory
{
/**
* Creates a reader by file extension.
*
* @param string $path The path to the spreadsheet file. Supported extensions are .csv,.ods and .xlsx
*
* @throws UnsupportedTypeException
*/
public static function createFromFile(string $path): ReaderInterface
{
$extension = strtolower(pathinfo($path, PATHINFO_EXTENSION));
return match ($extension) {
'csv' => new CSVReader(),
'xlsx' => new XLSXReader(),
'ods' => new ODSReader(),
default => throw new UnsupportedTypeException('No readers supporting the given type: '.$extension),
};
}
/**
* Creates a reader by mime type.
*
* @param string $path the path to the spreadsheet file
*
* @throws UnsupportedTypeException
* @throws IOException
*/
public static function createFromFileByMimeType(string $path): ReaderInterface
{
if (!file_exists($path)) {
throw new IOException("Could not open {$path} for reading! File does not exist.");
}
$mime_type = mime_content_type($path);
return match ($mime_type) {
'application/csv', 'text/csv', 'text/plain' => new CSVReader(),
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' => new XLSXReader(),
'application/vnd.oasis.opendocument.spreadsheet' => new ODSReader(),
default => throw new UnsupportedTypeException('No readers supporting the given type: '.$mime_type),
};
}
}

View File

@ -0,0 +1,51 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Common\Manager;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Row;
/**
* @internal
*/
final class RowManager
{
/**
* Fills the missing indexes of a row with empty cells.
*/
public function fillMissingIndexesWithEmptyCells(Row $row): void
{
$numCells = $row->getNumCells();
if (0 === $numCells) {
return;
}
$rowCells = $row->getCells();
$maxCellIndex = $numCells;
/**
* If the row has empty cells, calling "setCellAtIndex" will add the cell
* but in the wrong place (the new cell is added at the end of the array).
* Therefore, we need to sort the array using keys to have proper order.
*
* @see https://github.com/box/spout/issues/740
*/
$needsSorting = false;
for ($cellIndex = 0; $cellIndex < $maxCellIndex; ++$cellIndex) {
if (!isset($rowCells[$cellIndex])) {
$row->setCellAtIndex(Cell::fromValue(''), $cellIndex);
$needsSorting = true;
}
}
if ($needsSorting) {
$rowCells = $row->getCells();
ksort($rowCells);
$row->setCells($rowCells);
}
}
}

View File

@ -0,0 +1,153 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Common;
use OpenSpout\Reader\Exception\XMLProcessingException;
use OpenSpout\Reader\Wrapper\XMLReader;
use ReflectionMethod;
/**
* @internal
*/
final class XMLProcessor
{
// Node types
public const NODE_TYPE_START = XMLReader::ELEMENT;
public const NODE_TYPE_END = XMLReader::END_ELEMENT;
// Keys associated to reflection attributes to invoke a callback
public const CALLBACK_REFLECTION_METHOD = 'reflectionMethod';
public const CALLBACK_REFLECTION_OBJECT = 'reflectionObject';
// Values returned by the callbacks to indicate what the processor should do next
public const PROCESSING_CONTINUE = 1;
public const PROCESSING_STOP = 2;
/** @var XMLReader The XMLReader object that will help read sheet's XML data */
private readonly XMLReader $xmlReader;
/** @var array<string, array{reflectionMethod: ReflectionMethod, reflectionObject: object}> Registered callbacks */
private array $callbacks = [];
/**
* @param XMLReader $xmlReader XMLReader object
*/
public function __construct(XMLReader $xmlReader)
{
$this->xmlReader = $xmlReader;
}
/**
* @param string $nodeName A callback may be triggered when a node with this name is read
* @param int $nodeType Type of the node [NODE_TYPE_START || NODE_TYPE_END]
* @param callable $callback Callback to execute when the read node has the given name and type
*/
public function registerCallback(string $nodeName, int $nodeType, $callback): self
{
$callbackKey = $this->getCallbackKey($nodeName, $nodeType);
$this->callbacks[$callbackKey] = $this->getInvokableCallbackData($callback);
return $this;
}
/**
* Resumes the reading of the XML file where it was left off.
* Stops whenever a callback indicates that reading should stop or at the end of the file.
*
* @throws XMLProcessingException
*/
public function readUntilStopped(): void
{
while ($this->xmlReader->read()) {
$nodeType = $this->xmlReader->nodeType;
$nodeNamePossiblyWithPrefix = $this->xmlReader->name;
$nodeNameWithoutPrefix = $this->xmlReader->localName;
$callbackData = $this->getRegisteredCallbackData($nodeNamePossiblyWithPrefix, $nodeNameWithoutPrefix, $nodeType);
if (null !== $callbackData) {
$callbackResponse = $this->invokeCallback($callbackData, [$this->xmlReader]);
if (self::PROCESSING_STOP === $callbackResponse) {
// stop reading
break;
}
}
}
}
/**
* @param string $nodeName Name of the node
* @param int $nodeType Type of the node [NODE_TYPE_START || NODE_TYPE_END]
*
* @return string Key used to store the associated callback
*/
private function getCallbackKey(string $nodeName, int $nodeType): string
{
return "{$nodeName}{$nodeType}";
}
/**
* Because the callback can be a "protected" function, we don't want to use call_user_func() directly
* but instead invoke the callback using Reflection. This allows the invocation of "protected" functions.
* Since some functions can be called a lot, we pre-process the callback to only return the elements that
* will be needed to invoke the callback later.
*
* @param callable $callback Array reference to a callback: [OBJECT, METHOD_NAME]
*
* @return array{reflectionMethod: ReflectionMethod, reflectionObject: object} Associative array containing the elements needed to invoke the callback using Reflection
*/
private function getInvokableCallbackData($callback): array
{
$callbackObject = $callback[0];
$callbackMethodName = $callback[1];
$reflectionMethod = new ReflectionMethod($callbackObject, $callbackMethodName);
$reflectionMethod->setAccessible(true);
return [
self::CALLBACK_REFLECTION_METHOD => $reflectionMethod,
self::CALLBACK_REFLECTION_OBJECT => $callbackObject,
];
}
/**
* @param string $nodeNamePossiblyWithPrefix Name of the node, possibly prefixed
* @param string $nodeNameWithoutPrefix Name of the same node, un-prefixed
* @param int $nodeType Type of the node [NODE_TYPE_START || NODE_TYPE_END]
*
* @return null|array{reflectionMethod: ReflectionMethod, reflectionObject: object} Callback data to be used for execution when a node of the given name/type is read or NULL if none found
*/
private function getRegisteredCallbackData(string $nodeNamePossiblyWithPrefix, string $nodeNameWithoutPrefix, int $nodeType): ?array
{
// With prefixed nodes, we should match if (by order of preference):
// 1. the callback was registered with the prefixed node name (e.g. "x:worksheet")
// 2. the callback was registered with the un-prefixed node name (e.g. "worksheet")
$callbackKeyForPossiblyPrefixedName = $this->getCallbackKey($nodeNamePossiblyWithPrefix, $nodeType);
$callbackKeyForUnPrefixedName = $this->getCallbackKey($nodeNameWithoutPrefix, $nodeType);
$hasPrefix = ($nodeNamePossiblyWithPrefix !== $nodeNameWithoutPrefix);
$callbackKeyToUse = $callbackKeyForUnPrefixedName;
if ($hasPrefix && isset($this->callbacks[$callbackKeyForPossiblyPrefixedName])) {
$callbackKeyToUse = $callbackKeyForPossiblyPrefixedName;
}
// Using isset here because it is way faster than array_key_exists...
return $this->callbacks[$callbackKeyToUse] ?? null;
}
/**
* @param array{reflectionMethod: ReflectionMethod, reflectionObject: object} $callbackData Associative array containing data to invoke the callback using Reflection
* @param XMLReader[] $args Arguments to pass to the callback
*
* @return int Callback response
*/
private function invokeCallback(array $callbackData, array $args): int
{
$reflectionMethod = $callbackData[self::CALLBACK_REFLECTION_METHOD];
$callbackObject = $callbackData[self::CALLBACK_REFLECTION_OBJECT];
return $reflectionMethod->invokeArgs($callbackObject, $args);
}
}

View File

@ -0,0 +1,23 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Exception;
use Throwable;
final class InvalidValueException extends ReaderException
{
private readonly string $invalidValue;
public function __construct(string $invalidValue, string $message = '', int $code = 0, ?Throwable $previous = null)
{
$this->invalidValue = $invalidValue;
parent::__construct($message, $code, $previous);
}
public function getInvalidValue(): string
{
return $this->invalidValue;
}
}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Exception;
final class IteratorNotRewindableException extends ReaderException {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Exception;
final class NoSheetsFoundException extends ReaderException {}

View File

@ -0,0 +1,9 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Exception;
use OpenSpout\Common\Exception\OpenSpoutException;
abstract class ReaderException extends OpenSpoutException {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Exception;
final class ReaderNotOpenedException extends ReaderException {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Exception;
final class SharedStringNotFoundException extends ReaderException {}

View File

@ -0,0 +1,7 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Exception;
final class XMLProcessingException extends ReaderException {}

View File

@ -0,0 +1,283 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\ODS\Helper;
use DateInterval;
use DateTimeImmutable;
use DOMElement;
use DOMNode;
use DOMText;
use Exception;
use OpenSpout\Common\Helper\Escaper\ODS;
use OpenSpout\Reader\Exception\InvalidValueException;
/**
* @internal
*/
final readonly class CellValueFormatter
{
/**
* Definition of all possible cell types.
*/
public const CELL_TYPE_STRING = 'string';
public const CELL_TYPE_FLOAT = 'float';
public const CELL_TYPE_BOOLEAN = 'boolean';
public const CELL_TYPE_DATE = 'date';
public const CELL_TYPE_TIME = 'time';
public const CELL_TYPE_CURRENCY = 'currency';
public const CELL_TYPE_PERCENTAGE = 'percentage';
public const CELL_TYPE_VOID = 'void';
/**
* Definition of XML nodes names used to parse data.
*/
public const XML_NODE_P = 'p';
public const XML_NODE_TEXT_A = 'text:a';
public const XML_NODE_TEXT_SPAN = 'text:span';
public const XML_NODE_TEXT_S = 'text:s';
public const XML_NODE_TEXT_TAB = 'text:tab';
public const XML_NODE_TEXT_LINE_BREAK = 'text:line-break';
/**
* Definition of XML attributes used to parse data.
*/
public const XML_ATTRIBUTE_TYPE = 'office:value-type';
public const XML_ATTRIBUTE_VALUE = 'office:value';
public const XML_ATTRIBUTE_BOOLEAN_VALUE = 'office:boolean-value';
public const XML_ATTRIBUTE_DATE_VALUE = 'office:date-value';
public const XML_ATTRIBUTE_TIME_VALUE = 'office:time-value';
public const XML_ATTRIBUTE_CURRENCY = 'office:currency';
public const XML_ATTRIBUTE_C = 'text:c';
/**
* List of XML nodes representing whitespaces and their corresponding value.
*/
private const WHITESPACE_XML_NODES = [
self::XML_NODE_TEXT_S => ' ',
self::XML_NODE_TEXT_TAB => "\t",
self::XML_NODE_TEXT_LINE_BREAK => "\n",
];
/** @var bool Whether date/time values should be returned as PHP objects or be formatted as strings */
private bool $shouldFormatDates;
/** @var ODS Used to unescape XML data */
private ODS $escaper;
/**
* @param bool $shouldFormatDates Whether date/time values should be returned as PHP objects or be formatted as strings
* @param ODS $escaper Used to unescape XML data
*/
public function __construct(bool $shouldFormatDates, ODS $escaper)
{
$this->shouldFormatDates = $shouldFormatDates;
$this->escaper = $escaper;
}
/**
* Returns the (unescaped) correctly marshalled, cell value associated to the given XML node.
*
* @see http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#refTable13
*
* @return bool|DateInterval|DateTimeImmutable|float|int|string The value associated with the cell, empty string if cell's type is void/undefined
*
* @throws InvalidValueException If the node value is not valid
*/
public function extractAndFormatNodeValue(DOMElement $node): bool|DateInterval|DateTimeImmutable|float|int|string
{
$cellType = $node->getAttribute(self::XML_ATTRIBUTE_TYPE);
return match ($cellType) {
self::CELL_TYPE_STRING => $this->formatStringCellValue($node),
self::CELL_TYPE_FLOAT => $this->formatFloatCellValue($node),
self::CELL_TYPE_BOOLEAN => $this->formatBooleanCellValue($node),
self::CELL_TYPE_DATE => $this->formatDateCellValue($node),
self::CELL_TYPE_TIME => $this->formatTimeCellValue($node),
self::CELL_TYPE_CURRENCY => $this->formatCurrencyCellValue($node),
self::CELL_TYPE_PERCENTAGE => $this->formatPercentageCellValue($node),
default => '',
};
}
/**
* Returns the cell String value.
*
* @return string The value associated with the cell
*/
private function formatStringCellValue(DOMElement $node): string
{
$pNodeValues = [];
$pNodes = $node->getElementsByTagName(self::XML_NODE_P);
foreach ($pNodes as $pNode) {
$pNodeValues[] = $this->extractTextValueFromNode($pNode);
}
$escapedCellValue = implode("\n", $pNodeValues);
return $this->escaper->unescape($escapedCellValue);
}
/**
* Returns the cell Numeric value from the given node.
*
* @return float|int The value associated with the cell
*/
private function formatFloatCellValue(DOMElement $node): float|int
{
$nodeValue = $node->getAttribute(self::XML_ATTRIBUTE_VALUE);
$nodeIntValue = (int) $nodeValue;
$nodeFloatValue = (float) $nodeValue;
return ((float) $nodeIntValue === $nodeFloatValue) ? $nodeIntValue : $nodeFloatValue;
}
/**
* Returns the cell Boolean value from the given node.
*
* @return bool The value associated with the cell
*/
private function formatBooleanCellValue(DOMElement $node): bool
{
return (bool) $node->getAttribute(self::XML_ATTRIBUTE_BOOLEAN_VALUE);
}
/**
* Returns the cell Date value from the given node.
*
* @throws InvalidValueException If the value is not a valid date
*/
private function formatDateCellValue(DOMElement $node): DateTimeImmutable|string
{
// The XML node looks like this:
// <table:table-cell calcext:value-type="date" office:date-value="2016-05-19T16:39:00" office:value-type="date">
// <text:p>05/19/16 04:39 PM</text:p>
// </table:table-cell>
if ($this->shouldFormatDates) {
// The date is already formatted in the "p" tag
$nodeWithValueAlreadyFormatted = $node->getElementsByTagName(self::XML_NODE_P)->item(0);
$cellValue = $nodeWithValueAlreadyFormatted->nodeValue;
} else {
// otherwise, get it from the "date-value" attribute
$nodeValue = $node->getAttribute(self::XML_ATTRIBUTE_DATE_VALUE);
try {
$cellValue = new DateTimeImmutable($nodeValue);
} catch (Exception $previous) {
throw new InvalidValueException($nodeValue, '', 0, $previous);
}
}
return $cellValue;
}
/**
* Returns the cell Time value from the given node.
*
* @return DateInterval|string The value associated with the cell
*
* @throws InvalidValueException If the value is not a valid time
*/
private function formatTimeCellValue(DOMElement $node): DateInterval|string
{
// The XML node looks like this:
// <table:table-cell calcext:value-type="time" office:time-value="PT13H24M00S" office:value-type="time">
// <text:p>01:24:00 PM</text:p>
// </table:table-cell>
if ($this->shouldFormatDates) {
// The date is already formatted in the "p" tag
$nodeWithValueAlreadyFormatted = $node->getElementsByTagName(self::XML_NODE_P)->item(0);
$cellValue = $nodeWithValueAlreadyFormatted->nodeValue;
} else {
// otherwise, get it from the "time-value" attribute
$nodeValue = $node->getAttribute(self::XML_ATTRIBUTE_TIME_VALUE);
try {
$cellValue = new DateInterval($nodeValue);
} catch (Exception $previous) {
throw new InvalidValueException($nodeValue, '', 0, $previous);
}
}
return $cellValue;
}
/**
* Returns the cell Currency value from the given node.
*
* @return string The value associated with the cell (e.g. "100 USD" or "9.99 EUR")
*/
private function formatCurrencyCellValue(DOMElement $node): string
{
$value = $node->getAttribute(self::XML_ATTRIBUTE_VALUE);
$currency = $node->getAttribute(self::XML_ATTRIBUTE_CURRENCY);
return "{$value} {$currency}";
}
/**
* Returns the cell Percentage value from the given node.
*
* @return float|int The value associated with the cell
*/
private function formatPercentageCellValue(DOMElement $node): float|int
{
// percentages are formatted like floats
return $this->formatFloatCellValue($node);
}
private function extractTextValueFromNode(DOMNode $pNode): string
{
$textValue = '';
foreach ($pNode->childNodes as $childNode) {
if ($childNode instanceof DOMText) {
$textValue .= $childNode->nodeValue;
} elseif ($this->isWhitespaceNode($childNode->nodeName) && $childNode instanceof DOMElement) {
$textValue .= $this->transformWhitespaceNode($childNode);
} elseif (self::XML_NODE_TEXT_A === $childNode->nodeName || self::XML_NODE_TEXT_SPAN === $childNode->nodeName) {
$textValue .= $this->extractTextValueFromNode($childNode);
}
}
return $textValue;
}
/**
* Returns whether the given node is a whitespace node. It must be one of these:
* - <text:s />
* - <text:tab />
* - <text:line-break />.
*/
private function isWhitespaceNode(string $nodeName): bool
{
return isset(self::WHITESPACE_XML_NODES[$nodeName]);
}
/**
* The "<text:p>" node can contain the string value directly
* or contain child elements. In this case, whitespaces contain in
* the child elements should be replaced by their XML equivalent:
* - space => <text:s />
* - tab => <text:tab />
* - line break => <text:line-break />.
*
* @see https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1415200_253892949
*
* @param DOMElement $node The XML node representing a whitespace
*
* @return string The corresponding whitespace value
*/
private function transformWhitespaceNode(DOMElement $node): string
{
$countAttribute = $node->getAttribute(self::XML_ATTRIBUTE_C); // only defined for "<text:s>"
$numWhitespaces = '' !== $countAttribute ? (int) $countAttribute : 1;
return str_repeat(self::WHITESPACE_XML_NODES[$node->nodeName], $numWhitespaces);
}
}

View File

@ -0,0 +1,54 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\ODS\Helper;
use OpenSpout\Reader\Exception\XMLProcessingException;
use OpenSpout\Reader\Wrapper\XMLReader;
/**
* @internal
*/
final class SettingsHelper
{
public const SETTINGS_XML_FILE_PATH = 'settings.xml';
/**
* Definition of XML nodes name and attribute used to parse settings data.
*/
public const XML_NODE_CONFIG_ITEM = 'config:config-item';
public const XML_ATTRIBUTE_CONFIG_NAME = 'config:name';
public const XML_ATTRIBUTE_VALUE_ACTIVE_TABLE = 'ActiveTable';
/**
* @param string $filePath Path of the file to be read
*
* @return null|string Name of the sheet that was defined as active or NULL if none found
*/
public function getActiveSheetName(string $filePath): ?string
{
$xmlReader = new XMLReader();
if (false === $xmlReader->openFileInZip($filePath, self::SETTINGS_XML_FILE_PATH)) {
return null;
}
$activeSheetName = null;
try {
while ($xmlReader->readUntilNodeFound(self::XML_NODE_CONFIG_ITEM)) {
if (self::XML_ATTRIBUTE_VALUE_ACTIVE_TABLE === $xmlReader->getAttribute(self::XML_ATTRIBUTE_CONFIG_NAME)) {
$activeSheetName = $xmlReader->readString();
break;
}
}
} catch (XMLProcessingException) { // @codeCoverageIgnore
// do nothing
}
$xmlReader->close();
return $activeSheetName;
}
}

View File

@ -0,0 +1,11 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\ODS;
final class Options
{
public bool $SHOULD_FORMAT_DATES = false;
public bool $SHOULD_PRESERVE_EMPTY_ROWS = false;
}

View File

@ -0,0 +1,72 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\ODS;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Common\Helper\Escaper\ODS;
use OpenSpout\Reader\AbstractReader;
use OpenSpout\Reader\Exception\NoSheetsFoundException;
use OpenSpout\Reader\ODS\Helper\SettingsHelper;
use ZipArchive;
/**
* @extends AbstractReader<SheetIterator>
*/
final class Reader extends AbstractReader
{
private ZipArchive $zip;
private readonly Options $options;
/** @var SheetIterator To iterator over the ODS sheets */
private SheetIterator $sheetIterator;
public function __construct(?Options $options = null)
{
$this->options = $options ?? new Options();
}
public function getSheetIterator(): SheetIterator
{
$this->ensureStreamOpened();
return $this->sheetIterator;
}
/**
* Returns whether stream wrappers are supported.
*/
protected function doesSupportStreamWrapper(): bool
{
return false;
}
/**
* Opens the file at the given file path to make it ready to be read.
*
* @param string $filePath Path of the file to be read
*
* @throws IOException If the file at the given path or its content cannot be read
* @throws NoSheetsFoundException If there are no sheets in the file
*/
protected function openReader(string $filePath): void
{
$this->zip = new ZipArchive();
if (true !== $this->zip->open($filePath)) {
throw new IOException("Could not open {$filePath} for reading.");
}
$this->sheetIterator = new SheetIterator($filePath, $this->options, new ODS(), new SettingsHelper());
}
/**
* Closes the reader. To be used after reading the file.
*/
protected function closeReader(): void
{
$this->zip->close();
}
}

View File

@ -0,0 +1,343 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\ODS;
use DOMElement;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Row;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Common\XMLProcessor;
use OpenSpout\Reader\Exception\InvalidValueException;
use OpenSpout\Reader\Exception\IteratorNotRewindableException;
use OpenSpout\Reader\Exception\SharedStringNotFoundException;
use OpenSpout\Reader\ODS\Helper\CellValueFormatter;
use OpenSpout\Reader\RowIteratorInterface;
use OpenSpout\Reader\Wrapper\XMLReader;
final class RowIterator implements RowIteratorInterface
{
/**
* Definition of XML nodes names used to parse data.
*/
public const XML_NODE_TABLE = 'table:table';
public const XML_NODE_ROW = 'table:table-row';
public const XML_NODE_CELL = 'table:table-cell';
public const MAX_COLUMNS_EXCEL = 16384;
/**
* Definition of XML attribute used to parse data.
*/
public const XML_ATTRIBUTE_NUM_ROWS_REPEATED = 'table:number-rows-repeated';
public const XML_ATTRIBUTE_NUM_COLUMNS_REPEATED = 'table:number-columns-repeated';
private readonly Options $options;
/** @var XMLProcessor Helper Object to process XML nodes */
private readonly XMLProcessor $xmlProcessor;
/** @var CellValueFormatter Helper to format cell values */
private readonly CellValueFormatter $cellValueFormatter;
/** @var bool Whether the iterator has already been rewound once */
private bool $hasAlreadyBeenRewound = false;
/** @var Row The currently processed row */
private Row $currentlyProcessedRow;
/** @var null|Row Buffer used to store the current row, while checking if there are more rows to read */
private ?Row $rowBuffer = null;
/** @var bool Indicates whether all rows have been read */
private bool $hasReachedEndOfFile = false;
/** @var int Last row index processed (one-based) */
private int $lastRowIndexProcessed = 0;
/** @var int Row index to be processed next (one-based) */
private int $nextRowIndexToBeProcessed = 1;
/** @var null|Cell Last processed cell (because when reading cell at column N+1, cell N is processed) */
private ?Cell $lastProcessedCell = null;
/** @var int Number of times the last processed row should be repeated */
private int $numRowsRepeated = 1;
/** @var int Number of times the last cell value should be copied to the cells on its right */
private int $numColumnsRepeated = 1;
/** @var bool Whether at least one cell has been read for the row currently being processed */
private bool $hasAlreadyReadOneCellInCurrentRow = false;
public function __construct(
Options $options,
CellValueFormatter $cellValueFormatter,
XMLProcessor $xmlProcessor
) {
$this->cellValueFormatter = $cellValueFormatter;
// Register all callbacks to process different nodes when reading the XML file
$this->xmlProcessor = $xmlProcessor;
$this->xmlProcessor->registerCallback(self::XML_NODE_ROW, XMLProcessor::NODE_TYPE_START, [$this, 'processRowStartingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_CELL, XMLProcessor::NODE_TYPE_START, [$this, 'processCellStartingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_ROW, XMLProcessor::NODE_TYPE_END, [$this, 'processRowEndingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_TABLE, XMLProcessor::NODE_TYPE_END, [$this, 'processTableEndingNode']);
$this->options = $options;
}
/**
* Rewind the Iterator to the first element.
* NOTE: It can only be done once, as it is not possible to read an XML file backwards.
*
* @see http://php.net/manual/en/iterator.rewind.php
*
* @throws IteratorNotRewindableException If the iterator is rewound more than once
*/
public function rewind(): void
{
// Because sheet and row data is located in the file, we can't rewind both the
// sheet iterator and the row iterator, as XML file cannot be read backwards.
// Therefore, rewinding the row iterator has been disabled.
if ($this->hasAlreadyBeenRewound) {
throw new IteratorNotRewindableException();
}
$this->hasAlreadyBeenRewound = true;
$this->lastRowIndexProcessed = 0;
$this->nextRowIndexToBeProcessed = 1;
$this->rowBuffer = null;
$this->hasReachedEndOfFile = false;
$this->next();
}
/**
* Checks if current position is valid.
*
* @see http://php.net/manual/en/iterator.valid.php
*/
public function valid(): bool
{
return !$this->hasReachedEndOfFile;
}
/**
* Move forward to next element. Empty rows will be skipped.
*
* @see http://php.net/manual/en/iterator.next.php
*
* @throws SharedStringNotFoundException If a shared string was not found
* @throws IOException If unable to read the sheet data XML
*/
public function next(): void
{
if ($this->doesNeedDataForNextRowToBeProcessed()) {
$this->readDataForNextRow();
}
++$this->lastRowIndexProcessed;
}
/**
* Return the current element, from the buffer.
*
* @see http://php.net/manual/en/iterator.current.php
*/
public function current(): Row
{
return $this->rowBuffer;
}
/**
* Return the key of the current element.
*
* @see http://php.net/manual/en/iterator.key.php
*/
public function key(): int
{
return $this->lastRowIndexProcessed;
}
/**
* Returns whether we need data for the next row to be processed.
* We DO need to read data if:
* - we have not read any rows yet
* OR
* - the next row to be processed immediately follows the last read row.
*
* @return bool whether we need data for the next row to be processed
*/
private function doesNeedDataForNextRowToBeProcessed(): bool
{
$hasReadAtLeastOneRow = (0 !== $this->lastRowIndexProcessed);
return
!$hasReadAtLeastOneRow
|| $this->lastRowIndexProcessed === $this->nextRowIndexToBeProcessed - 1;
}
/**
* @throws SharedStringNotFoundException If a shared string was not found
* @throws IOException If unable to read the sheet data XML
*/
private function readDataForNextRow(): void
{
$this->currentlyProcessedRow = new Row([], null);
$this->xmlProcessor->readUntilStopped();
$this->rowBuffer = $this->currentlyProcessedRow;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<table:table-row>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processRowStartingNode(XMLReader $xmlReader): int
{
// Reset data from current row
$this->hasAlreadyReadOneCellInCurrentRow = false;
$this->lastProcessedCell = null;
$this->numColumnsRepeated = 1;
$this->numRowsRepeated = $this->getNumRowsRepeatedForCurrentNode($xmlReader);
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<table:table-cell>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processCellStartingNode(XMLReader $xmlReader): int
{
$currentNumColumnsRepeated = $this->getNumColumnsRepeatedForCurrentNode($xmlReader);
// NOTE: expand() will automatically decode all XML entities of the child nodes
/** @var DOMElement $node */
$node = $xmlReader->expand();
$currentCell = $this->getCell($node);
// process cell N only after having read cell N+1 (see below why)
if ($this->hasAlreadyReadOneCellInCurrentRow) {
for ($i = 0; $i < $this->numColumnsRepeated; ++$i) {
$this->currentlyProcessedRow->addCell($this->lastProcessedCell);
}
}
$this->hasAlreadyReadOneCellInCurrentRow = true;
$this->lastProcessedCell = $currentCell;
$this->numColumnsRepeated = $currentNumColumnsRepeated;
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @return int A return code that indicates what action should the processor take next
*/
private function processRowEndingNode(): int
{
$isEmptyRow = $this->isEmptyRow($this->currentlyProcessedRow, $this->lastProcessedCell);
// if the fetched row is empty and we don't want to preserve it...
if (!$this->options->SHOULD_PRESERVE_EMPTY_ROWS && $isEmptyRow) {
// ... skip it
return XMLProcessor::PROCESSING_CONTINUE;
}
// if the row is empty, we don't want to return more than one cell
$actualNumColumnsRepeated = (!$isEmptyRow) ? $this->numColumnsRepeated : 1;
$numCellsInCurrentlyProcessedRow = $this->currentlyProcessedRow->getNumCells();
// Only add the value if the last read cell is not a trailing empty cell repeater in Excel.
// The current count of read columns is determined by counting the values in "$this->currentlyProcessedRowData".
// This is to avoid creating a lot of empty cells, as Excel adds a last empty "<table:table-cell>"
// with a number-columns-repeated value equals to the number of (supported columns - used columns).
// In Excel, the number of supported columns is 16384, but we don't want to returns rows with
// always 16384 cells.
if (($numCellsInCurrentlyProcessedRow + $actualNumColumnsRepeated) !== self::MAX_COLUMNS_EXCEL) {
for ($i = 0; $i < $actualNumColumnsRepeated; ++$i) {
$this->currentlyProcessedRow->addCell($this->lastProcessedCell);
}
}
// If we are processing row N and the row is repeated M times,
// then the next row to be processed will be row (N+M).
$this->nextRowIndexToBeProcessed += $this->numRowsRepeated;
// at this point, we have all the data we need for the row
// so that we can populate the buffer
return XMLProcessor::PROCESSING_STOP;
}
/**
* @return int A return code that indicates what action should the processor take next
*/
private function processTableEndingNode(): int
{
// The closing "</table:table>" marks the end of the file
$this->hasReachedEndOfFile = true;
return XMLProcessor::PROCESSING_STOP;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<table:table-row>" starting node
*
* @return int The value of "table:number-rows-repeated" attribute of the current node, or 1 if attribute missing
*/
private function getNumRowsRepeatedForCurrentNode(XMLReader $xmlReader): int
{
$numRowsRepeated = $xmlReader->getAttribute(self::XML_ATTRIBUTE_NUM_ROWS_REPEATED);
return (null !== $numRowsRepeated) ? (int) $numRowsRepeated : 1;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<table:table-cell>" starting node
*
* @return int The value of "table:number-columns-repeated" attribute of the current node, or 1 if attribute missing
*/
private function getNumColumnsRepeatedForCurrentNode(XMLReader $xmlReader): int
{
$numColumnsRepeated = $xmlReader->getAttribute(self::XML_ATTRIBUTE_NUM_COLUMNS_REPEATED);
return (null !== $numColumnsRepeated) ? (int) $numColumnsRepeated : 1;
}
/**
* Returns the cell with (unescaped) correctly marshalled, cell value associated to the given XML node.
*
* @return Cell The cell set with the associated with the cell
*/
private function getCell(DOMElement $node): Cell
{
try {
$cellValue = $this->cellValueFormatter->extractAndFormatNodeValue($node);
$cell = Cell::fromValue($cellValue);
} catch (InvalidValueException $exception) {
$cell = new Cell\ErrorCell($exception->getInvalidValue(), null);
}
return $cell;
}
/**
* After finishing processing each cell, a row is considered empty if it contains
* no cells or if the last read cell is empty.
* After finishing processing each cell, the last read cell is not part of the
* row data yet (as we still need to apply the "num-columns-repeated" attribute).
*
* @param null|Cell $lastReadCell The last read cell
*
* @return bool Whether the row is empty
*/
private function isEmptyRow(Row $currentRow, ?Cell $lastReadCell): bool
{
return
$currentRow->isEmpty()
&& (null === $lastReadCell || $lastReadCell instanceof Cell\EmptyCell);
}
}

View File

@ -0,0 +1,81 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\ODS;
use OpenSpout\Reader\SheetWithVisibilityInterface;
/**
* @implements SheetWithVisibilityInterface<RowIterator>
*/
final readonly class Sheet implements SheetWithVisibilityInterface
{
/** @var RowIterator To iterate over sheet's rows */
private RowIterator $rowIterator;
/** @var int Index of the sheet, based on order in the workbook (zero-based) */
private int $index;
/** @var string Name of the sheet */
private string $name;
/** @var bool Whether the sheet was the active one */
private bool $isActive;
/** @var bool Whether the sheet is visible */
private bool $isVisible;
/**
* @param RowIterator $rowIterator The corresponding row iterator
* @param int $sheetIndex Index of the sheet, based on order in the workbook (zero-based)
* @param string $sheetName Name of the sheet
* @param bool $isSheetActive Whether the sheet was defined as active
* @param bool $isSheetVisible Whether the sheet is visible
*/
public function __construct(RowIterator $rowIterator, int $sheetIndex, string $sheetName, bool $isSheetActive, bool $isSheetVisible)
{
$this->rowIterator = $rowIterator;
$this->index = $sheetIndex;
$this->name = $sheetName;
$this->isActive = $isSheetActive;
$this->isVisible = $isSheetVisible;
}
public function getRowIterator(): RowIterator
{
return $this->rowIterator;
}
/**
* @return int Index of the sheet, based on order in the workbook (zero-based)
*/
public function getIndex(): int
{
return $this->index;
}
/**
* @return string Name of the sheet
*/
public function getName(): string
{
return $this->name;
}
/**
* @return bool Whether the sheet was defined as active
*/
public function isActive(): bool
{
return $this->isActive;
}
/**
* @return bool Whether the sheet is visible
*/
public function isVisible(): bool
{
return $this->isVisible;
}
}

View File

@ -0,0 +1,228 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\ODS;
use DOMElement;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Common\Helper\Escaper\ODS;
use OpenSpout\Reader\Common\XMLProcessor;
use OpenSpout\Reader\Exception\XMLProcessingException;
use OpenSpout\Reader\ODS\Helper\CellValueFormatter;
use OpenSpout\Reader\ODS\Helper\SettingsHelper;
use OpenSpout\Reader\SheetIteratorInterface;
use OpenSpout\Reader\Wrapper\XMLReader;
/**
* @implements SheetIteratorInterface<Sheet>
*/
final class SheetIterator implements SheetIteratorInterface
{
public const CONTENT_XML_FILE_PATH = 'content.xml';
public const XML_STYLE_NAMESPACE = 'urn:oasis:names:tc:opendocument:xmlns:style:1.0';
/**
* Definition of XML nodes name and attribute used to parse sheet data.
*/
public const XML_NODE_AUTOMATIC_STYLES = 'office:automatic-styles';
public const XML_NODE_STYLE_TABLE_PROPERTIES = 'table-properties';
public const XML_NODE_TABLE = 'table:table';
public const XML_ATTRIBUTE_STYLE_NAME = 'style:name';
public const XML_ATTRIBUTE_TABLE_NAME = 'table:name';
public const XML_ATTRIBUTE_TABLE_STYLE_NAME = 'table:style-name';
public const XML_ATTRIBUTE_TABLE_DISPLAY = 'table:display';
/** @var string Path of the file to be read */
private readonly string $filePath;
private readonly Options $options;
/** @var XMLReader The XMLReader object that will help read sheet's XML data */
private readonly XMLReader $xmlReader;
/** @var ODS Used to unescape XML data */
private readonly ODS $escaper;
/** @var bool Whether there are still at least a sheet to be read */
private bool $hasFoundSheet;
/** @var int The index of the sheet being read (zero-based) */
private int $currentSheetIndex;
/** @var string The name of the sheet that was defined as active */
private readonly ?string $activeSheetName;
/** @var array<string, bool> Associative array [STYLE_NAME] => [IS_SHEET_VISIBLE] */
private array $sheetsVisibility;
public function __construct(
string $filePath,
Options $options,
ODS $escaper,
SettingsHelper $settingsHelper
) {
$this->filePath = $filePath;
$this->options = $options;
$this->xmlReader = new XMLReader();
$this->escaper = $escaper;
$this->activeSheetName = $settingsHelper->getActiveSheetName($filePath);
}
/**
* Rewind the Iterator to the first element.
*
* @see http://php.net/manual/en/iterator.rewind.php
*
* @throws IOException If unable to open the XML file containing sheets' data
*/
public function rewind(): void
{
$this->xmlReader->close();
if (false === $this->xmlReader->openFileInZip($this->filePath, self::CONTENT_XML_FILE_PATH)) {
$contentXmlFilePath = $this->filePath.'#'.self::CONTENT_XML_FILE_PATH;
throw new IOException("Could not open \"{$contentXmlFilePath}\".");
}
try {
$this->sheetsVisibility = $this->readSheetsVisibility();
$this->hasFoundSheet = $this->xmlReader->readUntilNodeFound(self::XML_NODE_TABLE);
} catch (XMLProcessingException $exception) {
throw new IOException("The content.xml file is invalid and cannot be read. [{$exception->getMessage()}]");
}
$this->currentSheetIndex = 0;
}
/**
* Checks if current position is valid.
*
* @see http://php.net/manual/en/iterator.valid.php
*/
public function valid(): bool
{
$valid = $this->hasFoundSheet;
if (!$valid) {
$this->xmlReader->close();
}
return $valid;
}
/**
* Move forward to next element.
*
* @see http://php.net/manual/en/iterator.next.php
*/
public function next(): void
{
$this->hasFoundSheet = $this->xmlReader->readUntilNodeFound(self::XML_NODE_TABLE);
if ($this->hasFoundSheet) {
++$this->currentSheetIndex;
}
}
/**
* Return the current element.
*
* @see http://php.net/manual/en/iterator.current.php
*/
public function current(): Sheet
{
$escapedSheetName = $this->xmlReader->getAttribute(self::XML_ATTRIBUTE_TABLE_NAME);
\assert(null !== $escapedSheetName);
$sheetName = $this->escaper->unescape($escapedSheetName);
$isSheetActive = $this->isSheetActive($sheetName, $this->currentSheetIndex, $this->activeSheetName);
$sheetStyleName = $this->xmlReader->getAttribute(self::XML_ATTRIBUTE_TABLE_STYLE_NAME);
\assert(null !== $sheetStyleName);
$isSheetVisible = $this->isSheetVisible($sheetStyleName);
return new Sheet(
new RowIterator(
$this->options,
new CellValueFormatter($this->options->SHOULD_FORMAT_DATES, new ODS()),
new XMLProcessor($this->xmlReader)
),
$this->currentSheetIndex,
$sheetName,
$isSheetActive,
$isSheetVisible
);
}
/**
* Return the key of the current element.
*
* @see http://php.net/manual/en/iterator.key.php
*/
public function key(): int
{
return $this->currentSheetIndex + 1;
}
/**
* Extracts the visibility of the sheets.
*
* @return array<string, bool> Associative array [STYLE_NAME] => [IS_SHEET_VISIBLE]
*/
private function readSheetsVisibility(): array
{
$sheetsVisibility = [];
$this->xmlReader->readUntilNodeFound(self::XML_NODE_AUTOMATIC_STYLES);
$automaticStylesNode = $this->xmlReader->expand();
\assert($automaticStylesNode instanceof DOMElement);
$tableStyleNodes = $automaticStylesNode->getElementsByTagNameNS(self::XML_STYLE_NAMESPACE, self::XML_NODE_STYLE_TABLE_PROPERTIES);
foreach ($tableStyleNodes as $tableStyleNode) {
$isSheetVisible = ('false' !== $tableStyleNode->getAttribute(self::XML_ATTRIBUTE_TABLE_DISPLAY));
$parentStyleNode = $tableStyleNode->parentNode;
\assert($parentStyleNode instanceof DOMElement);
$styleName = $parentStyleNode->getAttribute(self::XML_ATTRIBUTE_STYLE_NAME);
$sheetsVisibility[$styleName] = $isSheetVisible;
}
return $sheetsVisibility;
}
/**
* Returns whether the current sheet was defined as the active one.
*
* @param string $sheetName Name of the current sheet
* @param int $sheetIndex Index of the current sheet
* @param null|string $activeSheetName Name of the sheet that was defined as active or NULL if none defined
*
* @return bool Whether the current sheet was defined as the active one
*/
private function isSheetActive(string $sheetName, int $sheetIndex, ?string $activeSheetName): bool
{
// The given sheet is active if its name matches the defined active sheet's name
// or if no information about the active sheet was found, it defaults to the first sheet.
return
(null === $activeSheetName && 0 === $sheetIndex)
|| ($activeSheetName === $sheetName);
}
/**
* Returns whether the current sheet is visible.
*
* @param string $sheetStyleName Name of the sheet style
*
* @return bool Whether the current sheet is visible
*/
private function isSheetVisible(string $sheetStyleName): bool
{
return $this->sheetsVisibility[$sheetStyleName] ??
true;
}
}

View File

@ -0,0 +1,37 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader;
use OpenSpout\Common\Exception\IOException;
/**
* @template T of SheetIteratorInterface
*/
interface ReaderInterface
{
/**
* Prepares the reader to read the given file. It also makes sure
* that the file exists and is readable.
*
* @param string $filePath Path of the file to be read
*
* @throws IOException
*/
public function open(string $filePath): void;
/**
* Returns an iterator to iterate over sheets.
*
* @return T
*
* @throws Exception\ReaderNotOpenedException If called before opening the reader
*/
public function getSheetIterator(): SheetIteratorInterface;
/**
* Closes the reader, preventing any additional reading.
*/
public function close(): void;
}

View File

@ -0,0 +1,16 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader;
use Iterator;
use OpenSpout\Common\Entity\Row;
/**
* @extends Iterator<Row>
*/
interface RowIteratorInterface extends Iterator
{
public function current(): ?Row;
}

View File

@ -0,0 +1,31 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader;
/**
* @template T of RowIteratorInterface
*/
interface SheetInterface
{
/**
* @return T iterator to iterate over the sheet's rows
*/
public function getRowIterator(): RowIteratorInterface;
/**
* @return int Index of the sheet
*/
public function getIndex(): int;
/**
* @return string Name of the sheet
*/
public function getName(): string;
/**
* @return bool Whether the sheet was defined as active
*/
public function isActive(): bool;
}

View File

@ -0,0 +1,20 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader;
use Iterator;
/**
* @template T of SheetInterface
*
* @extends Iterator<T>
*/
interface SheetIteratorInterface extends Iterator
{
/**
* @return T of SheetInterface
*/
public function current(): SheetInterface;
}

View File

@ -0,0 +1,18 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader;
/**
* @template T of RowIteratorInterface
*
* @extends SheetInterface<T>
*/
interface SheetWithMergeCellsInterface extends SheetInterface
{
/**
* @return list<string> Merge cells list ["C7:E7", "A9:D10"]
*/
public function getMergeCells(): array;
}

View File

@ -0,0 +1,18 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader;
/**
* @template T of RowIteratorInterface
*
* @extends SheetInterface<T>
*/
interface SheetWithVisibilityInterface extends SheetInterface
{
/**
* @return bool Whether the sheet is visible
*/
public function isVisible(): bool;
}

View File

@ -0,0 +1,77 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Wrapper;
use OpenSpout\Reader\Exception\XMLProcessingException;
/**
* @internal
*/
trait XMLInternalErrorsHelper
{
/** @var bool Stores whether XML errors were initially stored internally - used to reset */
private bool $initialUseInternalErrorsValue;
/**
* To avoid displaying lots of warning/error messages on screen,
* stores errors internally instead.
*/
private function useXMLInternalErrors(): void
{
libxml_clear_errors();
$this->initialUseInternalErrorsValue = libxml_use_internal_errors(true);
}
/**
* Throws an XMLProcessingException if an error occured.
* It also always resets the "libxml_use_internal_errors" setting back to its initial value.
*
* @throws XMLProcessingException
*/
private function resetXMLInternalErrorsSettingAndThrowIfXMLErrorOccured(): void
{
if ($this->hasXMLErrorOccured()) {
$this->resetXMLInternalErrorsSetting();
throw new XMLProcessingException($this->getLastXMLErrorMessage());
}
$this->resetXMLInternalErrorsSetting();
}
private function resetXMLInternalErrorsSetting(): void
{
libxml_use_internal_errors($this->initialUseInternalErrorsValue);
}
/**
* Returns whether the a XML error has occured since the last time errors were cleared.
*
* @return bool TRUE if an error occured, FALSE otherwise
*/
private function hasXMLErrorOccured(): bool
{
return false !== libxml_get_last_error();
}
/**
* Returns the error message for the last XML error that occured.
*
* @see libxml_get_last_error
*
* @return string Last XML error message or null if no error
*/
private function getLastXMLErrorMessage(): string
{
$errorMessage = '';
$error = libxml_get_last_error();
if (false !== $error) {
$errorMessage = trim($error->message);
}
return $errorMessage;
}
}

View File

@ -0,0 +1,187 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\Wrapper;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Exception\XMLProcessingException;
use ZipArchive;
/**
* @internal
*/
final class XMLReader extends \XMLReader
{
use XMLInternalErrorsHelper;
public const ZIP_WRAPPER = 'zip://';
/**
* Opens the XML Reader to read a file located inside a ZIP file.
*
* @param string $zipFilePath Path to the ZIP file
* @param string $fileInsideZipPath Relative or absolute path of the file inside the zip
*
* @return bool TRUE on success or FALSE on failure
*/
public function openFileInZip(string $zipFilePath, string $fileInsideZipPath): bool
{
$wasOpenSuccessful = false;
$realPathURI = $this->getRealPathURIForFileInZip($zipFilePath, $fileInsideZipPath);
// We need to check first that the file we are trying to read really exist because:
// - PHP emits a warning when trying to open a file that does not exist.
if ($this->fileExistsWithinZip($realPathURI)) {
$wasOpenSuccessful = $this->open($realPathURI, null, LIBXML_NONET);
}
return $wasOpenSuccessful;
}
/**
* Returns the real path for the given path components.
* This is useful to avoid issues on some Windows setup.
*
* @param string $zipFilePath Path to the ZIP file
* @param string $fileInsideZipPath Relative or absolute path of the file inside the zip
*
* @return string The real path URI
*/
public function getRealPathURIForFileInZip(string $zipFilePath, string $fileInsideZipPath): string
{
// The file path should not start with a '/', otherwise it won't be found
$fileInsideZipPathWithoutLeadingSlash = ltrim($fileInsideZipPath, '/');
$realpath = realpath($zipFilePath);
if (false === $realpath) {
throw new IOException("Could not open {$zipFilePath} for reading! File does not exist.");
}
return self::ZIP_WRAPPER.$realpath.'#'.$fileInsideZipPathWithoutLeadingSlash;
}
/**
* Move to next node in document.
*
* @see \XMLReader::read
*
* @throws XMLProcessingException If an error/warning occurred
*/
public function read(): bool
{
$this->useXMLInternalErrors();
$wasReadSuccessful = parent::read();
$this->resetXMLInternalErrorsSettingAndThrowIfXMLErrorOccured();
return $wasReadSuccessful;
}
/**
* Read until the element with the given name is found, or the end of the file.
*
* @param string $nodeName Name of the node to find
*
* @return bool TRUE on success or FALSE on failure
*
* @throws XMLProcessingException If an error/warning occurred
*/
public function readUntilNodeFound(string $nodeName): bool
{
do {
$wasReadSuccessful = $this->read();
$isNotPositionedOnStartingNode = !$this->isPositionedOnStartingNode($nodeName);
} while ($wasReadSuccessful && $isNotPositionedOnStartingNode);
return $wasReadSuccessful;
}
/**
* Move cursor to next node skipping all subtrees.
*
* @see \XMLReader::next
*
* @param null|string $localName The name of the next node to move to
*
* @throws XMLProcessingException If an error/warning occurred
*/
public function next($localName = null): bool
{
$this->useXMLInternalErrors();
$wasNextSuccessful = parent::next($localName);
$this->resetXMLInternalErrorsSettingAndThrowIfXMLErrorOccured();
return $wasNextSuccessful;
}
/**
* @return bool Whether the XML Reader is currently positioned on the starting node with given name
*/
public function isPositionedOnStartingNode(string $nodeName): bool
{
return $this->isPositionedOnNode($nodeName, self::ELEMENT);
}
/**
* @return bool Whether the XML Reader is currently positioned on the ending node with given name
*/
public function isPositionedOnEndingNode(string $nodeName): bool
{
return $this->isPositionedOnNode($nodeName, self::END_ELEMENT);
}
/**
* @return string The name of the current node, un-prefixed
*/
public function getCurrentNodeName(): string
{
return $this->localName;
}
/**
* Returns whether the file at the given location exists.
*
* @param string $zipStreamURI URI of a zip stream, e.g. "zip://file.zip#path/inside.xml"
*
* @return bool TRUE if the file exists, FALSE otherwise
*/
private function fileExistsWithinZip(string $zipStreamURI): bool
{
$doesFileExists = false;
$pattern = '/zip:\/\/([^#]+)#(.*)/';
if (1 === preg_match($pattern, $zipStreamURI, $matches)) {
$zipFilePath = $matches[1];
$innerFilePath = $matches[2];
$zip = new ZipArchive();
if (true === $zip->open($zipFilePath)) {
$doesFileExists = (false !== $zip->locateName($innerFilePath));
$zip->close();
}
}
return $doesFileExists;
}
/**
* @return bool Whether the XML Reader is currently positioned on the node with given name and type
*/
private function isPositionedOnNode(string $nodeName, int $nodeType): bool
{
/**
* In some cases, the node has a prefix (for instance, "<sheet>" can also be "<x:sheet>").
* So if the given node name does not have a prefix, we need to look at the unprefixed name ("localName").
*
* @see https://github.com/box/spout/issues/233
*/
$hasPrefix = str_contains($nodeName, ':');
$currentNodeName = ($hasPrefix) ? $this->name : $this->localName;
return $this->nodeType === $nodeType && $currentNodeName === $nodeName;
}
}

View File

@ -0,0 +1,85 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Helper;
use OpenSpout\Common\Exception\InvalidArgumentException;
/**
* @internal
*/
final class CellHelper
{
// Using ord() is super slow... Using a pre-computed hash table instead.
private const columnLetterToIndexMapping = [
'A' => 0, 'B' => 1, 'C' => 2, 'D' => 3, 'E' => 4, 'F' => 5, 'G' => 6,
'H' => 7, 'I' => 8, 'J' => 9, 'K' => 10, 'L' => 11, 'M' => 12, 'N' => 13,
'O' => 14, 'P' => 15, 'Q' => 16, 'R' => 17, 'S' => 18, 'T' => 19, 'U' => 20,
'V' => 21, 'W' => 22, 'X' => 23, 'Y' => 24, 'Z' => 25,
];
/**
* Returns the base 10 column index associated to the cell index (base 26).
* Excel uses A to Z letters for column indexing, where A is the 1st column,
* Z is the 26th and AA is the 27th.
* The mapping is zero based, so that A1 maps to 0, B2 maps to 1, Z13 to 25 and AA4 to 26.
*
* @param string $cellIndex The Excel cell index ('A1', 'BC13', ...)
*
* @throws InvalidArgumentException When the given cell index is invalid
*/
public static function getColumnIndexFromCellIndex(string $cellIndex): int
{
if (!self::isValidCellIndex($cellIndex)) {
throw new InvalidArgumentException('Cannot get column index from an invalid cell index.');
}
$columnIndex = 0;
// Remove row information
$columnLetters = preg_replace('/\d/', '', $cellIndex);
// strlen() is super slow too... Using isset() is way faster and not too unreadable,
// since we checked before that there are between 1 and 3 letters.
$columnLength = isset($columnLetters[1]) ? (isset($columnLetters[2]) ? 3 : 2) : 1;
// Looping over the different letters of the column is slower than this method.
// Also, not using the pow() function because it's slooooow...
switch ($columnLength) {
case 1:
$columnIndex = self::columnLetterToIndexMapping[$columnLetters];
break;
case 2:
$firstLetterIndex = (self::columnLetterToIndexMapping[$columnLetters[0]] + 1) * 26;
$secondLetterIndex = self::columnLetterToIndexMapping[$columnLetters[1]];
$columnIndex = $firstLetterIndex + $secondLetterIndex;
break;
case 3:
$firstLetterIndex = (self::columnLetterToIndexMapping[$columnLetters[0]] + 1) * 676;
$secondLetterIndex = (self::columnLetterToIndexMapping[$columnLetters[1]] + 1) * 26;
$thirdLetterIndex = self::columnLetterToIndexMapping[$columnLetters[2]];
$columnIndex = $firstLetterIndex + $secondLetterIndex + $thirdLetterIndex;
break;
}
return $columnIndex;
}
/**
* Returns whether a cell index is valid, in an Excel world.
* To be valid, the cell index should start with capital letters and be followed by numbers.
* There can only be 3 letters, as there can only be 16,384 rows, which is equivalent to 'XFE'.
*
* @param string $cellIndex The Excel cell index ('A1', 'BC13', ...)
*/
private static function isValidCellIndex(string $cellIndex): bool
{
return 1 === preg_match('/^[A-Z]{1,3}\d+$/', $cellIndex);
}
}

View File

@ -0,0 +1,344 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Helper;
use DateInterval;
use DateTimeImmutable;
use DOMElement;
use Exception;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Helper\Escaper\XLSX;
use OpenSpout\Reader\Exception\InvalidValueException;
use OpenSpout\Reader\XLSX\Manager\SharedStringsManager;
use OpenSpout\Reader\XLSX\Manager\StyleManagerInterface;
/**
* This class provides helper functions to format cell values.
*/
final readonly class CellValueFormatter
{
/**
* Definition of all possible cell types.
*/
public const CELL_TYPE_INLINE_STRING = 'inlineStr';
public const CELL_TYPE_STR = 'str';
public const CELL_TYPE_SHARED_STRING = 's';
public const CELL_TYPE_BOOLEAN = 'b';
public const CELL_TYPE_NUMERIC = 'n';
public const CELL_TYPE_DATE = 'd';
public const CELL_TYPE_ERROR = 'e';
/**
* Definition of XML nodes names used to parse data.
*/
public const XML_NODE_VALUE = 'v';
public const XML_NODE_INLINE_STRING_VALUE = 't';
public const XML_NODE_FORMULA = 'f';
/**
* Definition of XML attributes used to parse data.
*/
public const XML_ATTRIBUTE_TYPE = 't';
public const XML_ATTRIBUTE_STYLE_ID = 's';
/**
* Constants used for date formatting.
*/
public const NUM_SECONDS_IN_ONE_DAY = 86400;
/** @var SharedStringsManager Manages shared strings */
private SharedStringsManager $sharedStringsManager;
/** @var StyleManagerInterface Manages styles */
private StyleManagerInterface $styleManager;
/** @var bool Whether date/time values should be returned as PHP objects or be formatted as strings */
private bool $shouldFormatDates;
/** @var bool Whether date/time values should use a calendar starting in 1904 instead of 1900 */
private bool $shouldUse1904Dates;
/** @var XLSX Used to unescape XML data */
private XLSX $escaper;
/**
* @param SharedStringsManager $sharedStringsManager Manages shared strings
* @param StyleManagerInterface $styleManager Manages styles
* @param bool $shouldFormatDates Whether date/time values should be returned as PHP objects or be formatted as strings
* @param bool $shouldUse1904Dates Whether date/time values should use a calendar starting in 1904 instead of 1900
* @param XLSX $escaper Used to unescape XML data
*/
public function __construct(
SharedStringsManager $sharedStringsManager,
StyleManagerInterface $styleManager,
bool $shouldFormatDates,
bool $shouldUse1904Dates,
XLSX $escaper
) {
$this->sharedStringsManager = $sharedStringsManager;
$this->styleManager = $styleManager;
$this->shouldFormatDates = $shouldFormatDates;
$this->shouldUse1904Dates = $shouldUse1904Dates;
$this->escaper = $escaper;
}
/**
* Returns the (unescaped) correctly marshalled, cell value associated to the given XML node.
*/
public function extractAndFormatNodeValue(DOMElement $node): Cell
{
// Default cell type is "n"
$cellType = $node->getAttribute(self::XML_ATTRIBUTE_TYPE);
if ('' === $cellType) {
$cellType = self::CELL_TYPE_NUMERIC;
}
$vNodeValue = $this->getVNodeValue($node);
$fNodeValue = $node->getElementsByTagName(self::XML_NODE_FORMULA)->item(0)?->nodeValue;
if (null !== $fNodeValue) {
$computedValue = $this->formatRawValueForCellType($cellType, $node, $vNodeValue);
return new Cell\FormulaCell(
'='.$fNodeValue,
null,
$computedValue instanceof Cell\ErrorCell ? null : $computedValue
);
}
if ('' === $vNodeValue && self::CELL_TYPE_INLINE_STRING !== $cellType) {
return Cell::fromValue($vNodeValue);
}
$rawValue = $this->formatRawValueForCellType($cellType, $node, $vNodeValue);
if ($rawValue instanceof Cell) {
return $rawValue;
}
return Cell::fromValue($rawValue);
}
/**
* Returns the cell's string value from a node's nested value node.
*
* @return string The value associated with the cell
*/
private function getVNodeValue(DOMElement $node): string
{
// for cell types having a "v" tag containing the value.
// if not, the returned value should be empty string.
$vNode = $node->getElementsByTagName(self::XML_NODE_VALUE)->item(0);
return (string) $vNode?->nodeValue;
}
/**
* Returns the cell String value where string is inline.
*
* @return string The value associated with the cell
*/
private function formatInlineStringCellValue(DOMElement $node): string
{
// inline strings are formatted this way (they can contain any number of <t> nodes):
// <c r="A1" t="inlineStr"><is><t>[INLINE_STRING]</t><t>[INLINE_STRING_2]</t></is></c>
$tNodes = $node->getElementsByTagName(self::XML_NODE_INLINE_STRING_VALUE);
$cellValue = '';
for ($i = 0; $i < $tNodes->count(); ++$i) {
$nodeValue = $tNodes->item($i)->nodeValue;
\assert(null !== $nodeValue);
$cellValue .= $this->escaper->unescape($nodeValue);
}
return $cellValue;
}
/**
* Returns the cell String value from shared-strings file using nodeValue index.
*
* @return string The value associated with the cell
*/
private function formatSharedStringCellValue(string $nodeValue): string
{
// shared strings are formatted this way:
// <c r="A1" t="s"><v>[SHARED_STRING_INDEX]</v></c>
$sharedStringIndex = (int) $nodeValue;
$escapedCellValue = $this->sharedStringsManager->getStringAtIndex($sharedStringIndex);
return $this->escaper->unescape($escapedCellValue);
}
/**
* Returns the cell String value, where string is stored in value node.
*
* @return string The value associated with the cell
*/
private function formatStrCellValue(string $nodeValue): string
{
$escapedCellValue = trim($nodeValue);
return $this->escaper->unescape($escapedCellValue);
}
/**
* Returns the cell Numeric value from string of nodeValue.
* The value can also represent a timestamp and a DateTime will be returned.
*
* @param int $cellStyleId 0 being the default style
*/
private function formatNumericCellValue(float|int|string $nodeValue, int $cellStyleId): DateInterval|DateTimeImmutable|float|int|string
{
// Numeric values can represent numbers as well as timestamps.
// We need to look at the style of the cell to determine whether it is one or the other.
$formatCode = $this->styleManager->getNumberFormatCode($cellStyleId);
if (DateIntervalFormatHelper::isDurationFormat($formatCode)) {
$cellValue = $this->formatExcelDateIntervalValue((float) $nodeValue, $formatCode);
} elseif ($this->styleManager->shouldFormatNumericValueAsDate($cellStyleId)) {
$cellValue = $this->formatExcelTimestampValue((float) $nodeValue, $cellStyleId);
} else {
$nodeIntValue = (int) $nodeValue;
$nodeFloatValue = (float) $nodeValue;
$cellValue = ((float) $nodeIntValue === $nodeFloatValue) ? $nodeIntValue : $nodeFloatValue;
}
return $cellValue;
}
private function formatExcelDateIntervalValue(float $nodeValue, string $excelFormat): DateInterval|string
{
$dateInterval = DateIntervalFormatHelper::createDateIntervalFromHours($nodeValue);
if ($this->shouldFormatDates) {
return DateIntervalFormatHelper::formatDateInterval($dateInterval, $excelFormat);
}
return $dateInterval;
}
/**
* Returns a cell's PHP Date value, associated to the given timestamp.
* NOTE: The timestamp is a float representing the number of days since the base Excel date:
* Dec 30th 1899, 1900 or Jan 1st, 1904, depending on the Workbook setting.
* NOTE: The timestamp can also represent a time, if it is a value between 0 and 1.
*
* @param int $cellStyleId 0 being the default style
*
* @throws InvalidValueException If the value is not a valid timestamp
*
* @see ECMA-376 Part 1 - §18.17.4
*/
private function formatExcelTimestampValue(float $nodeValue, int $cellStyleId): DateTimeImmutable|string
{
if (!$this->isValidTimestampValue($nodeValue)) {
throw new InvalidValueException((string) $nodeValue);
}
return $this->formatExcelTimestampValueAsDateTimeValue($nodeValue, $cellStyleId);
}
/**
* Returns whether the given timestamp is supported by SpreadsheetML.
*
* @see ECMA-376 Part 1 - §18.17.4 - this specifies the timestamp boundaries.
*/
private function isValidTimestampValue(float $timestampValue): bool
{
// @NOTE: some versions of Excel don't support negative dates (e.g. Excel for Mac 2011)
return
$this->shouldUse1904Dates && $timestampValue >= -695055 && $timestampValue <= 2957003.9999884
|| !$this->shouldUse1904Dates && $timestampValue >= -693593 && $timestampValue <= 2958465.9999884;
}
/**
* Returns a cell's PHP DateTime value, associated to the given timestamp.
* Only the time value matters. The date part is set to the base Excel date:
* Dec 30th 1899, 1900 or Jan 1st, 1904, depending on the Workbook setting.
*
* @param int $cellStyleId 0 being the default style
*/
private function formatExcelTimestampValueAsDateTimeValue(float $nodeValue, int $cellStyleId): DateTimeImmutable|string
{
$baseDate = $this->shouldUse1904Dates ? '1904-01-01' : '1899-12-30';
$daysSinceBaseDate = (int) $nodeValue;
$daysSign = '+';
if ($daysSinceBaseDate < 0) {
$daysSinceBaseDate = abs($daysSinceBaseDate);
$daysSign = '-';
}
$timeRemainder = fmod($nodeValue, 1);
$secondsRemainder = round($timeRemainder * self::NUM_SECONDS_IN_ONE_DAY, 0);
$secondsSign = '+';
if ($secondsRemainder < 0) {
$secondsRemainder = abs($secondsRemainder);
$secondsSign = '-';
}
$dateObj = DateTimeImmutable::createFromFormat('|Y-m-d', $baseDate);
\assert(false !== $dateObj);
$dateObj = $dateObj->modify($daysSign.$daysSinceBaseDate.'days');
\assert(false !== $dateObj);
$dateObj = $dateObj->modify($secondsSign.$secondsRemainder.'seconds');
\assert(false !== $dateObj);
if ($this->shouldFormatDates) {
$styleNumberFormatCode = $this->styleManager->getNumberFormatCode($cellStyleId);
$phpDateFormat = DateFormatHelper::toPHPDateFormat($styleNumberFormatCode);
$cellValue = $dateObj->format($phpDateFormat);
} else {
$cellValue = $dateObj;
}
return $cellValue;
}
/**
* Returns the cell Boolean value from a specific node's Value.
*
* @return bool The value associated with the cell
*/
private function formatBooleanCellValue(string $nodeValue): bool
{
return (bool) $nodeValue;
}
/**
* Returns a cell's PHP Date value, associated to the given stored nodeValue.
*
* @see ECMA-376 Part 1 - §18.17.4
*
* @param string $nodeValue ISO 8601 Date string
*/
private function formatDateCellValue(string $nodeValue): Cell\ErrorCell|DateTimeImmutable|string
{
// Mitigate thrown Exception on invalid date-time format (http://php.net/manual/en/datetime.construct.php)
try {
$cellValue = ($this->shouldFormatDates) ? $nodeValue : new DateTimeImmutable($nodeValue);
} catch (Exception) {
return new Cell\ErrorCell($nodeValue, null);
}
return $cellValue;
}
private function formatRawValueForCellType(
string $cellType,
DOMElement $node,
string $vNodeValue
): bool|Cell\ErrorCell|DateInterval|DateTimeImmutable|float|int|string {
return match ($cellType) {
self::CELL_TYPE_INLINE_STRING => $this->formatInlineStringCellValue($node),
self::CELL_TYPE_SHARED_STRING => $this->formatSharedStringCellValue($vNodeValue),
self::CELL_TYPE_STR => $this->formatStrCellValue($vNodeValue),
self::CELL_TYPE_BOOLEAN => $this->formatBooleanCellValue($vNodeValue),
self::CELL_TYPE_NUMERIC => $this->formatNumericCellValue(
$vNodeValue,
(int) $node->getAttribute(self::XML_ATTRIBUTE_STYLE_ID)
),
self::CELL_TYPE_DATE => $this->formatDateCellValue($vNodeValue),
default => new Cell\ErrorCell($vNodeValue, null),
};
}
}

View File

@ -0,0 +1,125 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Helper;
/**
* @internal
*/
final class DateFormatHelper
{
public const KEY_GENERAL = 'general';
public const KEY_HOUR_12 = '12h';
public const KEY_HOUR_24 = '24h';
/**
* This map is used to replace Excel format characters by their PHP equivalent.
* Keys should be ordered from longest to smallest.
* Mapping between Excel format characters and PHP format characters.
*/
private const excelDateFormatToPHPDateFormatMapping = [
self::KEY_GENERAL => [
// Time
'am/pm' => 'A', // Uppercase Ante meridiem and Post meridiem
':mm' => ':i', // Minutes with leading zeros - if preceded by a ":" (otherwise month)
'mm:' => 'i:', // Minutes with leading zeros - if followed by a ":" (otherwise month)
'ss' => 's', // Seconds, with leading zeros
'.s' => '', // Ignore (fractional seconds format does not exist in PHP)
// Date
'e' => 'Y', // Full numeric representation of a year, 4 digits
'yyyy' => 'Y', // Full numeric representation of a year, 4 digits
'yy' => 'y', // Two digit representation of a year
'mmmmm' => 'M', // Short textual representation of a month, three letters ("mmmmm" should only contain the 1st letter...)
'mmmm' => 'F', // Full textual representation of a month
'mmm' => 'M', // Short textual representation of a month, three letters
'mm' => 'm', // Numeric representation of a month, with leading zeros
'm' => 'n', // Numeric representation of a month, without leading zeros
'dddd' => 'l', // Full textual representation of the day of the week
'ddd' => 'D', // Textual representation of a day, three letters
'dd' => 'd', // Day of the month, 2 digits with leading zeros
'd' => 'j', // Day of the month without leading zeros
],
self::KEY_HOUR_12 => [
'hh' => 'h', // 12-hour format of an hour without leading zeros
'h' => 'g', // 12-hour format of an hour without leading zeros
],
self::KEY_HOUR_24 => [
'hh' => 'H', // 24-hour hours with leading zero
'h' => 'G', // 24-hour format of an hour without leading zeros
],
];
/**
* Converts the given Excel date format to a format understandable by the PHP date function.
*
* @param string $excelDateFormat Excel date format
*
* @return string PHP date format (as defined here: http://php.net/manual/en/function.date.php)
*/
public static function toPHPDateFormat(string $excelDateFormat): string
{
// Remove brackets potentially present at the beginning of the format string
// and text portion of the format at the end of it (starting with ";")
// See §18.8.31 of ECMA-376 for more detail.
$dateFormat = preg_replace('/^(?:\[\$[^\]]+?\])?([^;]*).*/', '$1', $excelDateFormat);
\assert(null !== $dateFormat);
// Double quotes are used to escape characters that must not be interpreted.
// For instance, ["Day " dd] should result in "Day 13" and we should not try to interpret "D", "a", "y"
// By exploding the format string using double quote as a delimiter, we can get all parts
// that must be transformed (even indexes) and all parts that must not be (odd indexes).
$dateFormatParts = explode('"', $dateFormat);
foreach ($dateFormatParts as $partIndex => $dateFormatPart) {
// do not look at odd indexes
if (1 === $partIndex % 2) {
continue;
}
// Make sure all characters are lowercase, as the mapping table is using lowercase characters
$transformedPart = strtolower($dateFormatPart);
// Remove escapes related to non-format characters
$transformedPart = str_replace('\\', '', $transformedPart);
// Apply general transformation first...
$transformedPart = strtr($transformedPart, self::excelDateFormatToPHPDateFormatMapping[self::KEY_GENERAL]);
// ... then apply hour transformation, for 12-hour or 24-hour format
if (self::has12HourFormatMarker($dateFormatPart)) {
$transformedPart = strtr($transformedPart, self::excelDateFormatToPHPDateFormatMapping[self::KEY_HOUR_12]);
} else {
$transformedPart = strtr($transformedPart, self::excelDateFormatToPHPDateFormatMapping[self::KEY_HOUR_24]);
}
// overwrite the parts array with the new transformed part
$dateFormatParts[$partIndex] = $transformedPart;
}
// Merge all transformed parts back together
$phpDateFormat = implode('"', $dateFormatParts);
// Finally, to have the date format compatible with the DateTime::format() function, we need to escape
// all characters that are inside double quotes (and double quotes must be removed).
// For instance, ["Day " dd] should become [\D\a\y\ dd]
return preg_replace_callback('/"(.+?)"/', static function ($matches): string {
$stringToEscape = $matches[1];
$letters = preg_split('//u', $stringToEscape, -1, PREG_SPLIT_NO_EMPTY);
\assert(false !== $letters);
return '\\'.implode('\\', $letters);
}, $phpDateFormat);
}
/**
* @param string $excelDateFormat Date format as defined by Excel
*
* @return bool Whether the given date format has the 12-hour format marker
*/
private static function has12HourFormatMarker(string $excelDateFormat): bool
{
return false !== stripos($excelDateFormat, 'am/pm');
}
}

View File

@ -0,0 +1,100 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Helper;
use DateInterval;
final class DateIntervalFormatHelper
{
/**
* @see https://www.php.net/manual/en/dateinterval.format.php.
*/
private const dateIntervalFormats = [
'hh' => '%H',
'h' => '%h',
'mm' => '%I',
'm' => '%i',
'ss' => '%S',
's' => '%s',
];
/**
* Excel stores durations as fractions of days (24h = 1).
*
* Only fills hours/minutes/seconds because those are the only values that we can format back out again.
* Excel can also only handle those units as duration.
* PHP's DateInterval is also quite limited - it will not automatically convert unit overflow
* (60 seconds are not converted to 1 minute).
*/
public static function createDateIntervalFromHours(float $dayFractions): DateInterval
{
$time = abs($dayFractions) * 24; // convert to hours
$hours = floor($time);
$time = ($time - $hours) * 60;
$minutes = (int) floor($time); // must cast to int for type strict compare below
$time = ($time - $minutes) * 60;
$seconds = (int) round($time); // must cast to int for type strict compare below
// Bubble up rounding gain if we ended up with 60 seconds - disadvantage of using fraction of days for small durations:
if (60 === $seconds) {
$seconds = 0;
++$minutes;
}
if (60 === $minutes) {
$minutes = 0;
++$hours;
}
$interval = new DateInterval("P0DT{$hours}H{$minutes}M{$seconds}S");
if ($dayFractions < 0) {
$interval->invert = 1;
}
return $interval;
}
public static function isDurationFormat(string $excelFormat): bool
{
// Only consider formats with leading brackets as valid duration formats (e.g. "[hh]:mm", "[mm]:ss", etc.):
return 1 === preg_match('/^(\[hh?](:mm(:ss)?)?|\[mm?](:ss)?|\[ss?])$/', $excelFormat);
}
public static function toPHPDateIntervalFormat(string $excelDateFormat, string &$startUnit): string
{
$startUnitStarted = false;
$phpFormatParts = [];
$formatParts = explode(':', str_replace(['[', ']'], '', $excelDateFormat));
foreach ($formatParts as $formatPart) {
if (false === $startUnitStarted) {
$startUnit = $formatPart;
$startUnitStarted = true;
}
$phpFormatParts[] = self::dateIntervalFormats[$formatPart];
}
// Add the minus sign for potential negative durations:
return '%r'.implode(':', $phpFormatParts);
}
public static function formatDateInterval(DateInterval $dateInterval, string $excelDateFormat): string
{
$startUnit = '';
$phpFormat = self::toPHPDateIntervalFormat($excelDateFormat, $startUnit);
// We have to move the hours to minutes or hours+minutes to seconds if the format in Excel did the same:
$startUnit = $startUnit[0]; // only take the first char
$dateIntervalClone = clone $dateInterval;
if ('m' === $startUnit) {
$dateIntervalClone->i = $dateIntervalClone->i + $dateIntervalClone->h * 60;
$dateIntervalClone->h = 0;
} elseif ('s' === $startUnit) {
$dateIntervalClone->s = $dateIntervalClone->s + $dateIntervalClone->i * 60 + $dateIntervalClone->h * 3600;
$dateIntervalClone->i = 0;
$dateIntervalClone->h = 0;
}
return $dateIntervalClone->format($phpFormat);
}
}

View File

@ -0,0 +1,103 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager\SharedStringsCaching;
/**
* @internal
*/
final readonly class CachingStrategyFactory implements CachingStrategyFactoryInterface
{
/**
* The memory amount needed to store a string was obtained empirically from this data:.
*
* ------------------------------------
* | Number of chars⁺ | Memory needed |
* ------------------------------------
* | 3,000 | 1 MB |
* | 15,000 | 2 MB |
* | 30,000 | 5 MB |
* | 75,000 | 11 MB |
* | 150,000 | 21 MB |
* | 300,000 | 43 MB |
* | 750,000 | 105 MB |
* | 1,500,000 | 210 MB |
* | 2,250,000 | 315 MB |
* | 3,000,000 | 420 MB |
* | 4,500,000 | 630 MB |
* ------------------------------------
*
* All characters were 1 byte long
*
* This gives a linear graph where each 1-byte character requires about 150 bytes to be stored.
* Given that some characters can take up to 4 bytes, we need 600 bytes per character to be safe.
* Also, there is on average about 20 characters per cell (this is entirely empirical data...).
*
* This means that in order to store one shared string in memory, the memory amount needed is:
* => 20 * 600 12KB
*/
public const AMOUNT_MEMORY_NEEDED_PER_STRING_IN_KB = 12;
/**
* To avoid running out of memory when extracting a huge number of shared strings, they can be saved to temporary files
* instead of in memory. Then, when accessing a string, the corresponding file contents will be loaded in memory
* and the string will be quickly retrieved.
* The performance bottleneck is not when creating these temporary files, but rather when loading their content.
* Because the contents of the last loaded file stays in memory until another file needs to be loaded, it works
* best when the indexes of the shared strings are sorted in the sheet data.
* 10,000 was chosen because it creates small files that are fast to be loaded in memory.
*/
public const MAX_NUM_STRINGS_PER_TEMP_FILE = 10000;
private MemoryLimit $memoryLimit;
public function __construct(MemoryLimit $memoryLimit)
{
$this->memoryLimit = $memoryLimit;
}
/**
* Returns the best caching strategy, given the number of unique shared strings
* and the amount of memory available.
*
* @param null|int $sharedStringsUniqueCount Number of unique shared strings (NULL if unknown)
* @param string $tempFolder Temporary folder where the temporary files to store shared strings will be stored
*
* @return CachingStrategyInterface The best caching strategy
*/
public function createBestCachingStrategy(?int $sharedStringsUniqueCount, string $tempFolder): CachingStrategyInterface
{
if ($this->isInMemoryStrategyUsageSafe($sharedStringsUniqueCount)) {
return new InMemoryStrategy($sharedStringsUniqueCount);
}
return new FileBasedStrategy($tempFolder, self::MAX_NUM_STRINGS_PER_TEMP_FILE);
}
/**
* Returns whether it is safe to use in-memory caching, given the number of unique shared strings
* and the amount of memory available.
*
* @param null|int $sharedStringsUniqueCount Number of unique shared strings (NULL if unknown)
*/
private function isInMemoryStrategyUsageSafe(?int $sharedStringsUniqueCount): bool
{
// if the number of shared strings in unknown, do not use "in memory" strategy
if (null === $sharedStringsUniqueCount) {
return false;
}
$memoryAvailable = $this->memoryLimit->getMemoryLimitInKB();
if (-1 === (int) $memoryAvailable) {
// if cannot get memory limit or if memory limit set as unlimited, don't trust and play safe
$isInMemoryStrategyUsageSafe = ($sharedStringsUniqueCount < self::MAX_NUM_STRINGS_PER_TEMP_FILE);
} else {
$memoryNeeded = $sharedStringsUniqueCount * self::AMOUNT_MEMORY_NEEDED_PER_STRING_IN_KB;
$isInMemoryStrategyUsageSafe = ($memoryAvailable > $memoryNeeded);
}
return $isInMemoryStrategyUsageSafe;
}
}

View File

@ -0,0 +1,19 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager\SharedStringsCaching;
interface CachingStrategyFactoryInterface
{
/**
* Returns the best caching strategy, given the number of unique shared strings
* and the amount of memory available.
*
* @param null|int $sharedStringsUniqueCount Number of unique shared strings (NULL if unknown)
* @param string $tempFolder Temporary folder where the temporary files to store shared strings will be stored
*
* @return CachingStrategyInterface The best caching strategy
*/
public function createBestCachingStrategy(?int $sharedStringsUniqueCount, string $tempFolder): CachingStrategyInterface;
}

View File

@ -0,0 +1,43 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager\SharedStringsCaching;
use OpenSpout\Reader\Exception\SharedStringNotFoundException;
/**
* @internal
*/
interface CachingStrategyInterface
{
/**
* Adds the given string to the cache.
*
* @param string $sharedString The string to be added to the cache
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*/
public function addStringForIndex(string $sharedString, int $sharedStringIndex): void;
/**
* Closes the cache after the last shared string was added.
* This prevents any additional string from being added to the cache.
*/
public function closeCache(): void;
/**
* Returns the string located at the given index from the cache.
*
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*
* @return string The shared string at the given index
*
* @throws SharedStringNotFoundException If no shared string found for the given index
*/
public function getStringAtIndex(int $sharedStringIndex): string;
/**
* Destroys the cache, freeing memory and removing any created artifacts.
*/
public function clearCache(): void;
}

View File

@ -0,0 +1,184 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager\SharedStringsCaching;
use OpenSpout\Common\Helper\FileSystemHelper;
use OpenSpout\Reader\Exception\SharedStringNotFoundException;
/**
* This class implements the file-based caching strategy for shared strings.
* Shared strings are stored in small files (with a max number of strings per file).
* This strategy is slower than an in-memory strategy but is used to avoid out of memory crashes.
*
* @internal
*/
final class FileBasedStrategy implements CachingStrategyInterface
{
/**
* Value to use to escape the line feed character ("\n").
*/
public const ESCAPED_LINE_FEED_CHARACTER = '_x000A_';
/** @var FileSystemHelper Helper to perform file system operations */
private readonly FileSystemHelper $fileSystemHelper;
/** @var string Temporary folder where the temporary files will be created */
private readonly string $tempFolder;
/**
* @var int Maximum number of strings that can be stored in one temp file
*
* @see CachingStrategyFactory::MAX_NUM_STRINGS_PER_TEMP_FILE
*/
private readonly int $maxNumStringsPerTempFile;
/** @var null|resource Pointer to the last temp file a shared string was written to */
private $tempFilePointer;
/**
* @var string Path of the temporary file whose contents is currently stored in memory
*
* @see CachingStrategyFactory::MAX_NUM_STRINGS_PER_TEMP_FILE
*/
private string $readMemoryTempFilePath = '';
/** @var string Path of the temporary file whose contents is currently being written to */
private string $writeMemoryTempFilePath = '';
/**
* @see CachingStrategyFactory::MAX_NUM_STRINGS_PER_TEMP_FILE
*
* @var string[] Contents of the temporary file that was last read
*/
private array $inMemoryTempFileContents;
/**
* @param string $tempFolder Temporary folder where the temporary files to store shared strings will be stored
* @param int $maxNumStringsPerTempFile Maximum number of strings that can be stored in one temp file
*/
public function __construct(string $tempFolder, int $maxNumStringsPerTempFile)
{
$this->fileSystemHelper = new FileSystemHelper($tempFolder);
$this->tempFolder = $this->fileSystemHelper->createFolder($tempFolder, uniqid('sharedstrings'));
$this->maxNumStringsPerTempFile = $maxNumStringsPerTempFile;
}
/**
* Adds the given string to the cache.
*
* @param string $sharedString The string to be added to the cache
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*/
public function addStringForIndex(string $sharedString, int $sharedStringIndex): void
{
$tempFilePath = $this->getSharedStringTempFilePath($sharedStringIndex);
if ($this->writeMemoryTempFilePath !== $tempFilePath) {
if (null !== $this->tempFilePointer) {
fclose($this->tempFilePointer);
}
$resource = fopen($tempFilePath, 'w');
\assert(false !== $resource);
$this->tempFilePointer = $resource;
$this->writeMemoryTempFilePath = $tempFilePath;
}
// The shared string retrieval logic expects each cell data to be on one line only
// Encoding the line feed character allows to preserve this assumption
$lineFeedEncodedSharedString = $this->escapeLineFeed($sharedString);
fwrite($this->tempFilePointer, $lineFeedEncodedSharedString.PHP_EOL);
}
/**
* Closes the cache after the last shared string was added.
* This prevents any additional string from being added to the cache.
*/
public function closeCache(): void
{
// close pointer to the last temp file that was written
if (null !== $this->tempFilePointer) {
$this->writeMemoryTempFilePath = '';
fclose($this->tempFilePointer);
}
}
/**
* Returns the string located at the given index from the cache.
*
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*
* @return string The shared string at the given index
*
* @throws SharedStringNotFoundException If no shared string found for the given index
*/
public function getStringAtIndex(int $sharedStringIndex): string
{
$tempFilePath = $this->getSharedStringTempFilePath($sharedStringIndex);
$indexInFile = $sharedStringIndex % $this->maxNumStringsPerTempFile;
if ($this->readMemoryTempFilePath !== $tempFilePath) {
$contents = @file_get_contents($tempFilePath);
if (false === $contents) {
throw new SharedStringNotFoundException("Shared string temp file could not be read: {$tempFilePath} ; for index: {$sharedStringIndex}");
}
$this->inMemoryTempFileContents = explode(PHP_EOL, $contents);
$this->readMemoryTempFilePath = $tempFilePath;
}
$sharedString = null;
// Using isset here because it is way faster than array_key_exists...
if (isset($this->inMemoryTempFileContents[$indexInFile])) {
$escapedSharedString = $this->inMemoryTempFileContents[$indexInFile];
$sharedString = $this->unescapeLineFeed($escapedSharedString);
}
if (null === $sharedString) {
throw new SharedStringNotFoundException("Shared string not found for index: {$sharedStringIndex}");
}
return rtrim($sharedString, PHP_EOL);
}
/**
* Destroys the cache, freeing memory and removing any created artifacts.
*/
public function clearCache(): void
{
$this->fileSystemHelper->deleteFolderRecursively($this->tempFolder);
}
/**
* Returns the path for the temp file that should contain the string for the given index.
*
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*
* @return string The temp file path for the given index
*/
private function getSharedStringTempFilePath(int $sharedStringIndex): string
{
$numTempFile = (int) ($sharedStringIndex / $this->maxNumStringsPerTempFile);
return $this->tempFolder.\DIRECTORY_SEPARATOR.'sharedstrings'.$numTempFile;
}
/**
* Escapes the line feed characters (\n).
*/
private function escapeLineFeed(string $unescapedString): string
{
return str_replace("\n", self::ESCAPED_LINE_FEED_CHARACTER, $unescapedString);
}
/**
* Unescapes the line feed characters (\n).
*/
private function unescapeLineFeed(string $escapedString): string
{
return str_replace(self::ESCAPED_LINE_FEED_CHARACTER, "\n", $escapedString);
}
}

View File

@ -0,0 +1,81 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager\SharedStringsCaching;
use OpenSpout\Reader\Exception\SharedStringNotFoundException;
use RuntimeException;
use SplFixedArray;
/**
* This class implements the in-memory caching strategy for shared strings.
* This strategy is used when the number of unique strings is low, compared to the memory available.
*
* @internal
*/
final class InMemoryStrategy implements CachingStrategyInterface
{
/** @var SplFixedArray<string> Array used to cache the shared strings */
private SplFixedArray $inMemoryCache;
/** @var bool Whether the cache has been closed */
private bool $isCacheClosed = false;
/**
* @param int $sharedStringsUniqueCount Number of unique shared strings
*/
public function __construct(int $sharedStringsUniqueCount)
{
$this->inMemoryCache = new SplFixedArray($sharedStringsUniqueCount);
}
/**
* Adds the given string to the cache.
*
* @param string $sharedString The string to be added to the cache
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*/
public function addStringForIndex(string $sharedString, int $sharedStringIndex): void
{
if (!$this->isCacheClosed) {
$this->inMemoryCache->offsetSet($sharedStringIndex, $sharedString);
}
}
/**
* Closes the cache after the last shared string was added.
* This prevents any additional string from being added to the cache.
*/
public function closeCache(): void
{
$this->isCacheClosed = true;
}
/**
* Returns the string located at the given index from the cache.
*
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*
* @return string The shared string at the given index
*
* @throws SharedStringNotFoundException If no shared string found for the given index
*/
public function getStringAtIndex(int $sharedStringIndex): string
{
try {
return $this->inMemoryCache->offsetGet($sharedStringIndex);
} catch (RuntimeException) {
throw new SharedStringNotFoundException("Shared string not found for index: {$sharedStringIndex}");
}
}
/**
* Destroys the cache, freeing memory and removing any created artifacts.
*/
public function clearCache(): void
{
$this->inMemoryCache = new SplFixedArray(0);
$this->isCacheClosed = false;
}
}

View File

@ -0,0 +1,50 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager\SharedStringsCaching;
/**
* @internal
*/
final readonly class MemoryLimit
{
private string $memoryLimit;
public function __construct(string $memoryLimit)
{
$this->memoryLimit = $memoryLimit;
}
/**
* Returns the PHP "memory_limit" in Kilobytes.
*/
public function getMemoryLimitInKB(): float
{
$memoryLimitFormatted = strtolower(trim($this->memoryLimit));
// No memory limit
if ('-1' === $memoryLimitFormatted) {
return -1;
}
if (1 === preg_match('/(\d+)([bkmgt])b?/', $memoryLimitFormatted, $matches)) {
$amount = (int) $matches[1];
$unit = $matches[2];
switch ($unit) {
case 'b': return $amount / 1024;
case 'k': return $amount;
case 'm': return $amount * 1024;
case 'g': return $amount * 1024 * 1024;
case 't': return $amount * 1024 * 1024 * 1024;
}
}
return -1;
}
}

View File

@ -0,0 +1,241 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager;
use DOMElement;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Exception\SharedStringNotFoundException;
use OpenSpout\Reader\Exception\XMLProcessingException;
use OpenSpout\Reader\Wrapper\XMLReader;
use OpenSpout\Reader\XLSX\Manager\SharedStringsCaching\CachingStrategyFactoryInterface;
use OpenSpout\Reader\XLSX\Manager\SharedStringsCaching\CachingStrategyInterface;
use OpenSpout\Reader\XLSX\Options;
/**
* @internal
*/
final class SharedStringsManager
{
/**
* Definition of XML nodes names used to parse data.
*/
public const XML_NODE_SST = 'sst';
public const XML_NODE_SI = 'si';
public const XML_NODE_R = 'r';
public const XML_NODE_T = 't';
/**
* Definition of XML attributes used to parse data.
*/
public const XML_ATTRIBUTE_COUNT = 'count';
public const XML_ATTRIBUTE_UNIQUE_COUNT = 'uniqueCount';
public const XML_ATTRIBUTE_XML_SPACE = 'xml:space';
public const XML_ATTRIBUTE_VALUE_PRESERVE = 'preserve';
/** @var string Path of the XLSX file being read */
private readonly string $filePath;
private readonly Options $options;
/** @var WorkbookRelationshipsManager Helps retrieving workbook relationships */
private readonly WorkbookRelationshipsManager $workbookRelationshipsManager;
/** @var CachingStrategyFactoryInterface Factory to create shared strings caching strategies */
private readonly CachingStrategyFactoryInterface $cachingStrategyFactory;
/** @var CachingStrategyInterface The best caching strategy for storing shared strings */
private CachingStrategyInterface $cachingStrategy;
public function __construct(
string $filePath,
Options $options,
WorkbookRelationshipsManager $workbookRelationshipsManager,
CachingStrategyFactoryInterface $cachingStrategyFactory
) {
$this->filePath = $filePath;
$this->options = $options;
$this->workbookRelationshipsManager = $workbookRelationshipsManager;
$this->cachingStrategyFactory = $cachingStrategyFactory;
}
/**
* Returns whether the XLSX file contains a shared strings XML file.
*/
public function hasSharedStrings(): bool
{
return $this->workbookRelationshipsManager->hasSharedStringsXMLFile();
}
/**
* Builds an in-memory array containing all the shared strings of the sheet.
* All the strings are stored in a XML file, located at 'xl/sharedStrings.xml'.
* It is then accessed by the sheet data, via the string index in the built table.
*
* More documentation available here: http://msdn.microsoft.com/en-us/library/office/gg278314.aspx
*
* The XML file can be really big with sheets containing a lot of data. That is why
* we need to use a XML reader that provides streaming like the XMLReader library.
*
* @throws IOException If shared strings XML file can't be read
*/
public function extractSharedStrings(): void
{
$sharedStringsXMLFilePath = $this->workbookRelationshipsManager->getSharedStringsXMLFilePath();
$xmlReader = new XMLReader();
$sharedStringIndex = 0;
if (false === $xmlReader->openFileInZip($this->filePath, $sharedStringsXMLFilePath)) {
throw new IOException('Could not open "'.$sharedStringsXMLFilePath.'".');
}
try {
$sharedStringsUniqueCount = $this->getSharedStringsUniqueCount($xmlReader);
$this->cachingStrategy = $this->getBestSharedStringsCachingStrategy($sharedStringsUniqueCount);
$xmlReader->readUntilNodeFound(self::XML_NODE_SI);
while (self::XML_NODE_SI === $xmlReader->getCurrentNodeName()) {
$this->processSharedStringsItem($xmlReader, $sharedStringIndex);
++$sharedStringIndex;
// jump to the next '<si>' tag
$xmlReader->next(self::XML_NODE_SI);
}
$this->cachingStrategy->closeCache();
} catch (XMLProcessingException $exception) {
throw new IOException("The sharedStrings.xml file is invalid and cannot be read. [{$exception->getMessage()}]");
}
$xmlReader->close();
}
/**
* Returns the shared string at the given index, using the previously chosen caching strategy.
*
* @param int $sharedStringIndex Index of the shared string in the sharedStrings.xml file
*
* @return string The shared string at the given index
*
* @throws SharedStringNotFoundException If no shared string found for the given index
*/
public function getStringAtIndex(int $sharedStringIndex): string
{
return $this->cachingStrategy->getStringAtIndex($sharedStringIndex);
}
/**
* Destroys the cache, freeing memory and removing any created artifacts.
*/
public function cleanup(): void
{
if (isset($this->cachingStrategy)) {
$this->cachingStrategy->clearCache();
}
}
/**
* Returns the shared strings unique count, as specified in <sst> tag.
*
* @param XMLReader $xmlReader XMLReader instance
*
* @return null|int Number of unique shared strings in the sharedStrings.xml file
*
* @throws IOException If sharedStrings.xml is invalid and can't be read
*/
private function getSharedStringsUniqueCount(XMLReader $xmlReader): ?int
{
$xmlReader->next(self::XML_NODE_SST);
// Iterate over the "sst" elements to get the actual "sst ELEMENT" (skips any DOCTYPE)
while (self::XML_NODE_SST === $xmlReader->getCurrentNodeName() && XMLReader::ELEMENT !== $xmlReader->nodeType) {
$xmlReader->read();
}
$uniqueCount = $xmlReader->getAttribute(self::XML_ATTRIBUTE_UNIQUE_COUNT);
// some software do not add the "uniqueCount" attribute but only use the "count" one
// @see https://github.com/box/spout/issues/254
if (null === $uniqueCount) {
$uniqueCount = $xmlReader->getAttribute(self::XML_ATTRIBUTE_COUNT);
}
return (null !== $uniqueCount) ? (int) $uniqueCount : null;
}
/**
* Returns the best shared strings caching strategy.
*
* @param null|int $sharedStringsUniqueCount Number of unique shared strings (NULL if unknown)
*/
private function getBestSharedStringsCachingStrategy(?int $sharedStringsUniqueCount): CachingStrategyInterface
{
return $this->cachingStrategyFactory
->createBestCachingStrategy($sharedStringsUniqueCount, $this->options->getTempFolder())
;
}
/**
* Processes the shared strings item XML node which the given XML reader is positioned on.
*
* @param XMLReader $xmlReader XML Reader positioned on a "<si>" node
* @param int $sharedStringIndex Index of the processed shared strings item
*/
private function processSharedStringsItem(XMLReader $xmlReader, int $sharedStringIndex): void
{
$sharedStringValue = '';
// NOTE: expand() will automatically decode all XML entities of the child nodes
$siNode = $xmlReader->expand();
\assert($siNode instanceof DOMElement);
$textNodes = $siNode->getElementsByTagName(self::XML_NODE_T);
foreach ($textNodes as $textNode) {
if ($this->shouldExtractTextNodeValue($textNode)) {
$textNodeValue = $textNode->nodeValue;
\assert(null !== $textNodeValue);
$shouldPreserveWhitespace = $this->shouldPreserveWhitespace($textNode);
$sharedStringValue .= $shouldPreserveWhitespace
? $textNodeValue
: trim($textNodeValue);
}
}
$this->cachingStrategy->addStringForIndex($sharedStringValue, $sharedStringIndex);
}
/**
* Not all text nodes' values must be extracted.
* Some text nodes are part of a node describing the pronunciation for instance.
* We'll only consider the nodes whose parents are "<si>" or "<r>".
*
* @param DOMElement $textNode Text node to check
*
* @return bool Whether the given text node's value must be extracted
*/
private function shouldExtractTextNodeValue(DOMElement $textNode): bool
{
$parentNode = $textNode->parentNode;
\assert(null !== $parentNode);
$parentTagName = $parentNode->localName;
return self::XML_NODE_SI === $parentTagName || self::XML_NODE_R === $parentTagName;
}
/**
* If the text node has the attribute 'xml:space="preserve"', then preserve whitespace.
*
* @param DOMElement $textNode The text node element (<t>) whose whitespace may be preserved
*
* @return bool Whether whitespace should be preserved
*/
private function shouldPreserveWhitespace(DOMElement $textNode): bool
{
$spaceValue = $textNode->getAttribute(self::XML_ATTRIBUTE_XML_SPACE);
return self::XML_ATTRIBUTE_VALUE_PRESERVE === $spaceValue;
}
}

View File

@ -0,0 +1,295 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager;
use OpenSpout\Common\Helper\Escaper\XLSX;
use OpenSpout\Reader\Common\Manager\RowManager;
use OpenSpout\Reader\Common\XMLProcessor;
use OpenSpout\Reader\Wrapper\XMLReader;
use OpenSpout\Reader\XLSX\Helper\CellValueFormatter;
use OpenSpout\Reader\XLSX\Options;
use OpenSpout\Reader\XLSX\RowIterator;
use OpenSpout\Reader\XLSX\Sheet;
use OpenSpout\Reader\XLSX\SheetHeaderReader;
use OpenSpout\Reader\XLSX\SheetMergeCellsReader;
/**
* @internal
*/
final class SheetManager
{
/**
* Paths of XML files relative to the XLSX file root.
*/
public const WORKBOOK_XML_RELS_FILE_PATH = 'xl/_rels/workbook.xml.rels';
public const WORKBOOK_XML_FILE_PATH = 'xl/workbook.xml';
/**
* Definition of XML node names used to parse data.
*/
public const XML_NODE_WORKBOOK_PROPERTIES = 'workbookPr';
public const XML_NODE_WORKBOOK_VIEW = 'workbookView';
public const XML_NODE_SHEET = 'sheet';
public const XML_NODE_SHEETS = 'sheets';
public const XML_NODE_RELATIONSHIP = 'Relationship';
/**
* Definition of XML attributes used to parse data.
*/
public const XML_ATTRIBUTE_DATE_1904 = 'date1904';
public const XML_ATTRIBUTE_ACTIVE_TAB = 'activeTab';
public const XML_ATTRIBUTE_R_ID = 'r:id';
public const XML_ATTRIBUTE_NAME = 'name';
public const XML_ATTRIBUTE_STATE = 'state';
public const XML_ATTRIBUTE_ID = 'Id';
public const XML_ATTRIBUTE_TARGET = 'Target';
/**
* State value to represent a hidden sheet.
*/
public const SHEET_STATE_HIDDEN = 'hidden';
/** @var string Path of the XLSX file being read */
private readonly string $filePath;
private readonly Options $options;
/** @var SharedStringsManager Manages shared strings */
private readonly SharedStringsManager $sharedStringsManager;
/** @var XLSX Used to unescape XML data */
private readonly XLSX $escaper;
/** @var Sheet[] List of sheets */
private array $sheets;
/** @var int Index of the sheet currently read */
private int $currentSheetIndex;
/** @var int Index of the active sheet (0 by default) */
private int $activeSheetIndex;
public function __construct(
string $filePath,
Options $options,
SharedStringsManager $sharedStringsManager,
XLSX $escaper
) {
$this->filePath = $filePath;
$this->options = $options;
$this->sharedStringsManager = $sharedStringsManager;
$this->escaper = $escaper;
}
/**
* Returns the sheets metadata of the file located at the previously given file path.
* The paths to the sheets' data are read from the [Content_Types].xml file.
*
* @return Sheet[] Sheets within the XLSX file
*/
public function getSheets(): array
{
$this->sheets = [];
$this->currentSheetIndex = 0;
$this->activeSheetIndex = 0; // By default, the first sheet is active
$xmlReader = new XMLReader();
$xmlProcessor = new XMLProcessor($xmlReader);
$xmlProcessor->registerCallback(self::XML_NODE_WORKBOOK_PROPERTIES, XMLProcessor::NODE_TYPE_START, [$this, 'processWorkbookPropertiesStartingNode']);
$xmlProcessor->registerCallback(self::XML_NODE_WORKBOOK_VIEW, XMLProcessor::NODE_TYPE_START, [$this, 'processWorkbookViewStartingNode']);
$xmlProcessor->registerCallback(self::XML_NODE_SHEET, XMLProcessor::NODE_TYPE_START, [$this, 'processSheetStartingNode']);
$xmlProcessor->registerCallback(self::XML_NODE_SHEETS, XMLProcessor::NODE_TYPE_END, [$this, 'processSheetsEndingNode']);
if ($xmlReader->openFileInZip($this->filePath, self::WORKBOOK_XML_FILE_PATH)) {
$xmlProcessor->readUntilStopped();
$xmlReader->close();
}
return $this->sheets;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<workbookPr>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processWorkbookPropertiesStartingNode(XMLReader $xmlReader): int
{
// Using "filter_var($x, FILTER_VALIDATE_BOOLEAN)" here because the value of the "date1904" attribute
// may be the string "false", that is not mapped to the boolean "false" by default...
$shouldUse1904Dates = filter_var($xmlReader->getAttribute(self::XML_ATTRIBUTE_DATE_1904), FILTER_VALIDATE_BOOLEAN);
$this->options->SHOULD_USE_1904_DATES = $shouldUse1904Dates;
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<workbookView>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processWorkbookViewStartingNode(XMLReader $xmlReader): int
{
// The "workbookView" node is located before "sheet" nodes, ensuring that
// the active sheet is known before parsing sheets data.
$this->activeSheetIndex = (int) $xmlReader->getAttribute(self::XML_ATTRIBUTE_ACTIVE_TAB);
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<sheet>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processSheetStartingNode(XMLReader $xmlReader): int
{
$isSheetActive = ($this->currentSheetIndex === $this->activeSheetIndex);
$this->sheets[] = $this->getSheetFromSheetXMLNode($xmlReader, $this->currentSheetIndex, $isSheetActive);
++$this->currentSheetIndex;
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @return int A return code that indicates what action should the processor take next
*/
private function processSheetsEndingNode(): int
{
return XMLProcessor::PROCESSING_STOP;
}
/**
* Returns an instance of a sheet, given the XML node describing the sheet - from "workbook.xml".
* We can find the XML file path describing the sheet inside "workbook.xml.res", by mapping with the sheet ID
* ("r:id" in "workbook.xml", "Id" in "workbook.xml.res").
*
* @param XMLReader $xmlReaderOnSheetNode XML Reader instance, pointing on the node describing the sheet, as defined in "workbook.xml"
* @param int $sheetIndexZeroBased Index of the sheet, based on order of appearance in the workbook (zero-based)
* @param bool $isSheetActive Whether this sheet was defined as active
*
* @return Sheet Sheet instance
*/
private function getSheetFromSheetXMLNode(XMLReader $xmlReaderOnSheetNode, int $sheetIndexZeroBased, bool $isSheetActive): Sheet
{
$sheetId = $xmlReaderOnSheetNode->getAttribute(self::XML_ATTRIBUTE_R_ID);
\assert(null !== $sheetId);
$sheetState = $xmlReaderOnSheetNode->getAttribute(self::XML_ATTRIBUTE_STATE);
$isSheetVisible = (self::SHEET_STATE_HIDDEN !== $sheetState);
$escapedSheetName = $xmlReaderOnSheetNode->getAttribute(self::XML_ATTRIBUTE_NAME);
\assert(null !== $escapedSheetName);
$sheetName = $this->escaper->unescape($escapedSheetName);
$sheetDataXMLFilePath = $this->getSheetDataXMLFilePathForSheetId($sheetId);
$mergeCells = [];
if ($this->options->SHOULD_LOAD_MERGE_CELLS) {
$mergeCells = (new SheetMergeCellsReader(
$this->filePath,
$sheetDataXMLFilePath,
$xmlReader = new XMLReader(),
new XMLProcessor($xmlReader)
))->getMergeCells();
}
return new Sheet(
$this->createRowIterator($this->filePath, $sheetDataXMLFilePath, $this->options, $this->sharedStringsManager),
$this->createSheetHeaderReader($this->filePath, $sheetDataXMLFilePath),
$sheetIndexZeroBased,
$sheetName,
$isSheetActive,
$isSheetVisible,
$mergeCells
);
}
/**
* @param string $sheetId The sheet ID, as defined in "workbook.xml"
*
* @return string The XML file path describing the sheet inside "workbook.xml.res", for the given sheet ID
*/
private function getSheetDataXMLFilePathForSheetId(string $sheetId): string
{
$sheetDataXMLFilePath = '';
// find the file path of the sheet, by looking at the "workbook.xml.res" file
$xmlReader = new XMLReader();
if ($xmlReader->openFileInZip($this->filePath, self::WORKBOOK_XML_RELS_FILE_PATH)) {
while ($xmlReader->read()) {
if ($xmlReader->isPositionedOnStartingNode(self::XML_NODE_RELATIONSHIP)) {
$relationshipSheetId = $xmlReader->getAttribute(self::XML_ATTRIBUTE_ID);
if ($relationshipSheetId === $sheetId) {
// In workbook.xml.rels, it is only "worksheets/sheet1.xml"
// In [Content_Types].xml, the path is "/xl/worksheets/sheet1.xml"
$sheetDataXMLFilePath = $xmlReader->getAttribute(self::XML_ATTRIBUTE_TARGET);
\assert(null !== $sheetDataXMLFilePath);
// sometimes, the sheet data file path already contains "/xl/"...
if (!str_starts_with($sheetDataXMLFilePath, '/xl/')) {
$sheetDataXMLFilePath = '/xl/'.$sheetDataXMLFilePath;
break;
}
}
}
}
$xmlReader->close();
}
return $sheetDataXMLFilePath;
}
private function createRowIterator(
string $filePath,
string $sheetDataXMLFilePath,
Options $options,
SharedStringsManager $sharedStringsManager
): RowIterator {
$workbookRelationshipsManager = new WorkbookRelationshipsManager($filePath);
$styleManager = new StyleManager(
$filePath,
$workbookRelationshipsManager->hasStylesXMLFile()
? $workbookRelationshipsManager->getStylesXMLFilePath()
: null
);
$cellValueFormatter = new CellValueFormatter(
$sharedStringsManager,
$styleManager,
$options->SHOULD_FORMAT_DATES,
$options->SHOULD_USE_1904_DATES,
new XLSX()
);
return new RowIterator(
$filePath,
$sheetDataXMLFilePath,
$options->SHOULD_PRESERVE_EMPTY_ROWS,
$xmlReader = new XMLReader(),
new XMLProcessor($xmlReader),
$cellValueFormatter,
new RowManager()
);
}
private function createSheetHeaderReader(
string $filePath,
string $sheetDataXMLFilePath
): SheetHeaderReader {
$xmlReader = new XMLReader();
return new SheetHeaderReader(
$filePath,
$sheetDataXMLFilePath,
$xmlReader,
new XMLProcessor($xmlReader)
);
}
}

View File

@ -0,0 +1,325 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager;
use OpenSpout\Reader\Wrapper\XMLReader;
class StyleManager implements StyleManagerInterface
{
/**
* Nodes used to find relevant information in the styles XML file.
*/
final public const XML_NODE_NUM_FMTS = 'numFmts';
final public const XML_NODE_NUM_FMT = 'numFmt';
final public const XML_NODE_CELL_XFS = 'cellXfs';
final public const XML_NODE_XF = 'xf';
/**
* Attributes used to find relevant information in the styles XML file.
*/
final public const XML_ATTRIBUTE_NUM_FMT_ID = 'numFmtId';
final public const XML_ATTRIBUTE_FORMAT_CODE = 'formatCode';
final public const XML_ATTRIBUTE_APPLY_NUMBER_FORMAT = 'applyNumberFormat';
final public const XML_ATTRIBUTE_COUNT = 'count';
/**
* By convention, default style ID is 0.
*/
final public const DEFAULT_STYLE_ID = 0;
final public const NUMBER_FORMAT_GENERAL = 'General';
/**
* Mapping between built-in numFmtId and the associated format - for dates only.
*
* @see https://msdn.microsoft.com/en-us/library/ff529597(v=office.12).aspx
*/
private const builtinNumFmtIdToNumFormatMapping = [
14 => 'm/d/yyyy', // @NOTE: ECMA spec is 'mm-dd-yy'
15 => 'd-mmm-yy',
16 => 'd-mmm',
17 => 'mmm-yy',
18 => 'h:mm AM/PM',
19 => 'h:mm:ss AM/PM',
20 => 'h:mm',
21 => 'h:mm:ss',
22 => 'm/d/yyyy h:mm', // @NOTE: ECMA spec is 'm/d/yy h:mm',
45 => 'mm:ss',
46 => '[h]:mm:ss',
47 => 'mm:ss.0', // @NOTE: ECMA spec is 'mmss.0',
];
/** @var string Path of the XLSX file being read */
private readonly string $filePath;
/** @var null|string Path of the styles XML file */
private readonly ?string $stylesXMLFilePath;
/** @var array<int, string> Array containing a mapping NUM_FMT_ID => FORMAT_CODE */
private array $customNumberFormats;
/** @var array<array-key, array<string, null|bool|int>> Array containing a mapping STYLE_ID => [STYLE_ATTRIBUTES] */
private array $stylesAttributes;
/** @var array<int, bool> Cache containing a mapping NUM_FMT_ID => IS_DATE_FORMAT. Used to avoid lots of recalculations */
private array $numFmtIdToIsDateFormatCache = [];
/**
* @param string $filePath Path of the XLSX file being read
*/
public function __construct(string $filePath, ?string $stylesXMLFilePath)
{
$this->filePath = $filePath;
$this->stylesXMLFilePath = $stylesXMLFilePath;
}
public function shouldFormatNumericValueAsDate(int $styleId): bool
{
if (null === $this->stylesXMLFilePath) {
return false;
}
$stylesAttributes = $this->getStylesAttributes();
// Default style (0) does not format numeric values as timestamps. Only custom styles do.
// Also if the style ID does not exist in the styles.xml file, format as numeric value.
// Using isset here because it is way faster than array_key_exists...
if (self::DEFAULT_STYLE_ID === $styleId || !isset($stylesAttributes[$styleId])) {
return false;
}
$styleAttributes = $stylesAttributes[$styleId];
return $this->doesStyleIndicateDate($styleAttributes);
}
public function getNumberFormatCode(int $styleId): string
{
if (null === $this->stylesXMLFilePath) {
return '';
}
$stylesAttributes = $this->getStylesAttributes();
$styleAttributes = $stylesAttributes[$styleId];
$numFmtId = $styleAttributes[self::XML_ATTRIBUTE_NUM_FMT_ID];
\assert(\is_int($numFmtId));
if ($this->isNumFmtIdBuiltInDateFormat($numFmtId)) {
$numberFormatCode = self::builtinNumFmtIdToNumFormatMapping[$numFmtId];
} else {
$customNumberFormats = $this->getCustomNumberFormats();
$numberFormatCode = $customNumberFormats[$numFmtId] ?? '';
}
return $numberFormatCode;
}
/**
* @return array<int, string> The custom number formats
*/
protected function getCustomNumberFormats(): array
{
if (!isset($this->customNumberFormats)) {
$this->extractRelevantInfo();
}
return $this->customNumberFormats;
}
/**
* @return array<array-key, array<string, null|bool|int>> The styles attributes
*/
protected function getStylesAttributes(): array
{
if (!isset($this->stylesAttributes)) {
$this->extractRelevantInfo();
}
return $this->stylesAttributes;
}
/**
* Reads the styles.xml file and extract the relevant information from the file.
*/
private function extractRelevantInfo(): void
{
$this->customNumberFormats = [];
$this->stylesAttributes = [];
$xmlReader = new XMLReader();
if ($xmlReader->openFileInZip($this->filePath, $this->stylesXMLFilePath)) {
while ($xmlReader->read()) {
if ($xmlReader->isPositionedOnStartingNode(self::XML_NODE_NUM_FMTS)
&& '0' !== $xmlReader->getAttribute(self::XML_ATTRIBUTE_COUNT)) {
$this->extractNumberFormats($xmlReader);
} elseif ($xmlReader->isPositionedOnStartingNode(self::XML_NODE_CELL_XFS)) {
$this->extractStyleAttributes($xmlReader);
}
}
$xmlReader->close();
}
}
/**
* Extracts number formats from the "numFmt" nodes.
* For simplicity, the styles attributes are kept in memory. This is possible thanks
* to the reuse of formats. So 1 million cells should not use 1 million formats.
*
* @param XMLReader $xmlReader XML Reader positioned on the "numFmts" node
*/
private function extractNumberFormats(XMLReader $xmlReader): void
{
while ($xmlReader->read()) {
if ($xmlReader->isPositionedOnStartingNode(self::XML_NODE_NUM_FMT)) {
$numFmtId = (int) $xmlReader->getAttribute(self::XML_ATTRIBUTE_NUM_FMT_ID);
$formatCode = $xmlReader->getAttribute(self::XML_ATTRIBUTE_FORMAT_CODE);
\assert(null !== $formatCode);
$this->customNumberFormats[$numFmtId] = $formatCode;
} elseif ($xmlReader->isPositionedOnEndingNode(self::XML_NODE_NUM_FMTS)) {
// Once done reading "numFmts" node's children
break;
}
}
}
/**
* Extracts style attributes from the "xf" nodes, inside the "cellXfs" section.
* For simplicity, the styles attributes are kept in memory. This is possible thanks
* to the reuse of styles. So 1 million cells should not use 1 million styles.
*
* @param XMLReader $xmlReader XML Reader positioned on the "cellXfs" node
*/
private function extractStyleAttributes(XMLReader $xmlReader): void
{
while ($xmlReader->read()) {
if ($xmlReader->isPositionedOnStartingNode(self::XML_NODE_XF)) {
$numFmtId = $xmlReader->getAttribute(self::XML_ATTRIBUTE_NUM_FMT_ID);
$normalizedNumFmtId = (null !== $numFmtId) ? (int) $numFmtId : null;
$applyNumberFormat = $xmlReader->getAttribute(self::XML_ATTRIBUTE_APPLY_NUMBER_FORMAT);
$normalizedApplyNumberFormat = (null !== $applyNumberFormat) ? (bool) $applyNumberFormat : null;
$this->stylesAttributes[] = [
self::XML_ATTRIBUTE_NUM_FMT_ID => $normalizedNumFmtId,
self::XML_ATTRIBUTE_APPLY_NUMBER_FORMAT => $normalizedApplyNumberFormat,
];
} elseif ($xmlReader->isPositionedOnEndingNode(self::XML_NODE_CELL_XFS)) {
// Once done reading "cellXfs" node's children
break;
}
}
}
/**
* @param array<string, null|bool|int> $styleAttributes Array containing the style attributes (2 keys: "applyNumberFormat" and "numFmtId")
*
* @return bool Whether the style with the given attributes indicates that the number is a date
*/
private function doesStyleIndicateDate(array $styleAttributes): bool
{
$applyNumberFormat = $styleAttributes[self::XML_ATTRIBUTE_APPLY_NUMBER_FORMAT];
$numFmtId = $styleAttributes[self::XML_ATTRIBUTE_NUM_FMT_ID];
// A style may apply a date format if it has:
// - "applyNumberFormat" attribute not set to "false"
// - "numFmtId" attribute set
// This is a preliminary check, as having "numFmtId" set just means the style should apply a specific number format,
// but this is not necessarily a date.
if (false === $applyNumberFormat || !\is_int($numFmtId)) {
return false;
}
return $this->doesNumFmtIdIndicateDate($numFmtId);
}
/**
* Returns whether the number format ID indicates that the number is a date.
* The result is cached to avoid recomputing the same thing over and over, as
* "numFmtId" attributes can be shared between multiple styles.
*
* @return bool Whether the number format ID indicates that the number is a date
*/
private function doesNumFmtIdIndicateDate(int $numFmtId): bool
{
if (!isset($this->numFmtIdToIsDateFormatCache[$numFmtId])) {
$formatCode = $this->getFormatCodeForNumFmtId($numFmtId);
$this->numFmtIdToIsDateFormatCache[$numFmtId] = (
$this->isNumFmtIdBuiltInDateFormat($numFmtId)
|| $this->isFormatCodeCustomDateFormat($formatCode)
);
}
return $this->numFmtIdToIsDateFormatCache[$numFmtId];
}
/**
* @return null|string The custom number format or NULL if none defined for the given numFmtId
*/
private function getFormatCodeForNumFmtId(int $numFmtId): ?string
{
$customNumberFormats = $this->getCustomNumberFormats();
// Using isset here because it is way faster than array_key_exists...
return $customNumberFormats[$numFmtId] ?? null;
}
/**
* @return bool Whether the number format ID indicates that the number is a date
*/
private function isNumFmtIdBuiltInDateFormat(int $numFmtId): bool
{
return \array_key_exists($numFmtId, self::builtinNumFmtIdToNumFormatMapping);
}
/**
* @return bool Whether the given format code indicates that the number is a date
*/
private function isFormatCodeCustomDateFormat(?string $formatCode): bool
{
// if no associated format code or if using the default "General" format
if (null === $formatCode || 0 === strcasecmp($formatCode, self::NUMBER_FORMAT_GENERAL)) {
return false;
}
return $this->isFormatCodeMatchingDateFormatPattern($formatCode);
}
/**
* @return bool Whether the given format code matches a date format pattern
*/
private function isFormatCodeMatchingDateFormatPattern(string $formatCode): bool
{
// Remove extra formatting (what's between [ ], the brackets should not be preceded by a "\")
$pattern = '((?<!\\\)\[.+?(?<!\\\)\])';
$formatCode = preg_replace($pattern, '', $formatCode);
\assert(null !== $formatCode);
// Remove strings in double quotes, as they won't be interpreted as date format characters
$formatCode = preg_replace('/"[^"]+"/', '', $formatCode);
\assert(null !== $formatCode);
// custom date formats contain specific characters to represent the date:
// e - yy - m - d - h - s
// and all of their variants (yyyy - mm - dd...)
$dateFormatCharacters = ['e', 'yy', 'm', 'd', 'h', 's'];
$hasFoundDateFormatCharacter = false;
foreach ($dateFormatCharacters as $dateFormatCharacter) {
// character not preceded by "\" (case insensitive)
$pattern = '/(?<!\\\)'.$dateFormatCharacter.'/i';
if (1 === preg_match($pattern, $formatCode)) {
$hasFoundDateFormatCharacter = true;
break;
}
}
return $hasFoundDateFormatCharacter;
}
}

View File

@ -0,0 +1,31 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager;
/**
* @internal
*/
interface StyleManagerInterface
{
/**
* Returns whether the style with the given ID should consider
* numeric values as timestamps and format the cell as a date.
*
* @param int $styleId Zero-based style ID
*
* @return bool Whether the cell with the given cell should display a date instead of a numeric value
*/
public function shouldFormatNumericValueAsDate(int $styleId): bool;
/**
* Returns the format as defined in "styles.xml" of the given style.
* NOTE: It is assumed that the style DOES have a number format associated to it.
*
* @param int $styleId Zero-based style ID
*
* @return string The number format code associated with the given style
*/
public function getNumberFormatCode(int $styleId): string;
}

View File

@ -0,0 +1,151 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX\Manager;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Wrapper\XMLReader;
/**
* @internal
*/
final class WorkbookRelationshipsManager
{
public const BASE_PATH = 'xl/';
/**
* Path of workbook relationships XML file inside the XLSX file.
*/
public const WORKBOOK_RELS_XML_FILE_PATH = 'xl/_rels/workbook.xml.rels';
/**
* Relationships types - For Transitional and Strict OOXML.
*/
public const RELATIONSHIP_TYPE_SHARED_STRINGS = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings';
public const RELATIONSHIP_TYPE_STYLES = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles';
public const RELATIONSHIP_TYPE_SHARED_STRINGS_STRICT = 'http://purl.oclc.org/ooxml/officeDocument/relationships/sharedStrings';
public const RELATIONSHIP_TYPE_STYLES_STRICT = 'http://purl.oclc.org/ooxml/officeDocument/relationships/styles';
/**
* Nodes and attributes used to find relevant information in the workbook relationships XML file.
*/
public const XML_NODE_RELATIONSHIP = 'Relationship';
public const XML_ATTRIBUTE_TYPE = 'Type';
public const XML_ATTRIBUTE_TARGET = 'Target';
/** @var string Path of the XLSX file being read */
private readonly string $filePath;
/** @var array<string, string> Cache of the already read workbook relationships: [TYPE] => [FILE_NAME] */
private array $cachedWorkbookRelationships;
/**
* @param string $filePath Path of the XLSX file being read
*/
public function __construct(string $filePath)
{
$this->filePath = $filePath;
}
/**
* @return string The path of the shared string XML file
*/
public function getSharedStringsXMLFilePath(): string
{
$workbookRelationships = $this->getWorkbookRelationships();
$sharedStringsXMLFilePath = $workbookRelationships[self::RELATIONSHIP_TYPE_SHARED_STRINGS]
?? $workbookRelationships[self::RELATIONSHIP_TYPE_SHARED_STRINGS_STRICT];
// the file path can be relative (e.g. "styles.xml") or absolute (e.g. "/xl/styles.xml")
$doesContainBasePath = str_contains($sharedStringsXMLFilePath, self::BASE_PATH);
if (!$doesContainBasePath) {
// make sure we return an absolute file path
$sharedStringsXMLFilePath = self::BASE_PATH.$sharedStringsXMLFilePath;
}
return $sharedStringsXMLFilePath;
}
/**
* @return bool Whether the XLSX file contains a shared string XML file
*/
public function hasSharedStringsXMLFile(): bool
{
$workbookRelationships = $this->getWorkbookRelationships();
return isset($workbookRelationships[self::RELATIONSHIP_TYPE_SHARED_STRINGS])
|| isset($workbookRelationships[self::RELATIONSHIP_TYPE_SHARED_STRINGS_STRICT]);
}
/**
* @return bool Whether the XLSX file contains a styles XML file
*/
public function hasStylesXMLFile(): bool
{
$workbookRelationships = $this->getWorkbookRelationships();
return isset($workbookRelationships[self::RELATIONSHIP_TYPE_STYLES])
|| isset($workbookRelationships[self::RELATIONSHIP_TYPE_STYLES_STRICT]);
}
/**
* @return string The path of the styles XML file
*/
public function getStylesXMLFilePath(): string
{
$workbookRelationships = $this->getWorkbookRelationships();
$stylesXMLFilePath = $workbookRelationships[self::RELATIONSHIP_TYPE_STYLES]
?? $workbookRelationships[self::RELATIONSHIP_TYPE_STYLES_STRICT];
// the file path can be relative (e.g. "styles.xml") or absolute (e.g. "/xl/styles.xml")
$doesContainBasePath = str_contains($stylesXMLFilePath, self::BASE_PATH);
if (!$doesContainBasePath) {
// make sure we return a full path
$stylesXMLFilePath = self::BASE_PATH.$stylesXMLFilePath;
}
return $stylesXMLFilePath;
}
/**
* Reads the workbook.xml.rels and extracts the filename associated to the different types.
* It caches the result so that the file is read only once.
*
* @return array<string, string>
*
* @throws IOException If workbook.xml.rels can't be read
*/
private function getWorkbookRelationships(): array
{
if (!isset($this->cachedWorkbookRelationships)) {
$xmlReader = new XMLReader();
if (false === $xmlReader->openFileInZip($this->filePath, self::WORKBOOK_RELS_XML_FILE_PATH)) {
throw new IOException('Could not open "'.self::WORKBOOK_RELS_XML_FILE_PATH.'".');
}
$this->cachedWorkbookRelationships = [];
while ($xmlReader->readUntilNodeFound(self::XML_NODE_RELATIONSHIP)) {
$this->processWorkbookRelationship($xmlReader);
}
}
return $this->cachedWorkbookRelationships;
}
/**
* Extracts and store the data of the current workbook relationship.
*/
private function processWorkbookRelationship(XMLReader $xmlReader): void
{
$type = $xmlReader->getAttribute(self::XML_ATTRIBUTE_TYPE);
$target = $xmlReader->getAttribute(self::XML_ATTRIBUTE_TARGET);
\assert(null !== $target);
// @NOTE: if a type is defined more than once, we overwrite the previous value
// To be changed if we want to get the file paths of sheet XML files for instance.
$this->cachedWorkbookRelationships[$type] = $target;
}
}

View File

@ -0,0 +1,17 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX;
use OpenSpout\Common\TempFolderOptionTrait;
final class Options
{
use TempFolderOptionTrait;
public bool $SHOULD_FORMAT_DATES = false;
public bool $SHOULD_PRESERVE_EMPTY_ROWS = false;
public bool $SHOULD_USE_1904_DATES = false;
public bool $SHOULD_LOAD_MERGE_CELLS = false;
}

View File

@ -0,0 +1,111 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Common\Helper\Escaper\XLSX;
use OpenSpout\Reader\AbstractReader;
use OpenSpout\Reader\Exception\NoSheetsFoundException;
use OpenSpout\Reader\XLSX\Manager\SharedStringsCaching\CachingStrategyFactory;
use OpenSpout\Reader\XLSX\Manager\SharedStringsCaching\CachingStrategyFactoryInterface;
use OpenSpout\Reader\XLSX\Manager\SharedStringsCaching\MemoryLimit;
use OpenSpout\Reader\XLSX\Manager\SharedStringsManager;
use OpenSpout\Reader\XLSX\Manager\SheetManager;
use OpenSpout\Reader\XLSX\Manager\WorkbookRelationshipsManager;
use ZipArchive;
/**
* @extends AbstractReader<SheetIterator>
*/
final class Reader extends AbstractReader
{
private ZipArchive $zip;
/** @var SharedStringsManager Manages shared strings */
private SharedStringsManager $sharedStringsManager;
/** @var SheetIterator To iterator over the XLSX sheets */
private SheetIterator $sheetIterator;
private readonly Options $options;
private readonly CachingStrategyFactoryInterface $cachingStrategyFactory;
public function __construct(
?Options $options = null,
?CachingStrategyFactoryInterface $cachingStrategyFactory = null
) {
$this->options = $options ?? new Options();
if (null === $cachingStrategyFactory) {
$memoryLimit = \ini_get('memory_limit');
$cachingStrategyFactory = new CachingStrategyFactory(new MemoryLimit($memoryLimit));
}
$this->cachingStrategyFactory = $cachingStrategyFactory;
}
public function getSheetIterator(): SheetIterator
{
$this->ensureStreamOpened();
return $this->sheetIterator;
}
/**
* Returns whether stream wrappers are supported.
*/
protected function doesSupportStreamWrapper(): bool
{
return false;
}
/**
* Opens the file at the given file path to make it ready to be read.
* It also parses the sharedStrings.xml file to get all the shared strings available in memory
* and fetches all the available sheets.
*
* @param string $filePath Path of the file to be read
*
* @throws IOException If the file at the given path or its content cannot be read
* @throws NoSheetsFoundException If there are no sheets in the file
*/
protected function openReader(string $filePath): void
{
$this->zip = new ZipArchive();
if (true !== $this->zip->open($filePath)) {
throw new IOException("Could not open {$filePath} for reading.");
}
$this->sharedStringsManager = new SharedStringsManager(
$filePath,
$this->options,
new WorkbookRelationshipsManager($filePath),
$this->cachingStrategyFactory
);
if ($this->sharedStringsManager->hasSharedStrings()) {
// Extracts all the strings from the sheets for easy access in the future
$this->sharedStringsManager->extractSharedStrings();
}
$this->sheetIterator = new SheetIterator(
new SheetManager(
$filePath,
$this->options,
$this->sharedStringsManager,
new XLSX()
)
);
}
/**
* Closes the reader. To be used after reading the file.
*/
protected function closeReader(): void
{
$this->zip->close();
$this->sharedStringsManager->cleanup();
}
}

View File

@ -0,0 +1,398 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX;
use DOMElement;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Row;
use OpenSpout\Common\Exception\InvalidArgumentException;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Common\Manager\RowManager;
use OpenSpout\Reader\Common\XMLProcessor;
use OpenSpout\Reader\Exception\SharedStringNotFoundException;
use OpenSpout\Reader\RowIteratorInterface;
use OpenSpout\Reader\Wrapper\XMLReader;
use OpenSpout\Reader\XLSX\Helper\CellHelper;
use OpenSpout\Reader\XLSX\Helper\CellValueFormatter;
final class RowIterator implements RowIteratorInterface
{
/**
* Definition of XML nodes names used to parse data.
*/
public const XML_NODE_DIMENSION = 'dimension';
public const XML_NODE_WORKSHEET = 'worksheet';
public const XML_NODE_ROW = 'row';
public const XML_NODE_CELL = 'c';
/**
* Definition of XML attributes used to parse data.
*/
public const XML_ATTRIBUTE_REF = 'ref';
public const XML_ATTRIBUTE_SPANS = 'spans';
public const XML_ATTRIBUTE_ROW_INDEX = 'r';
public const XML_ATTRIBUTE_CELL_INDEX = 'r';
/** @var string Path of the XLSX file being read */
private readonly string $filePath;
/** @var string Path of the sheet data XML file as in [Content_Types].xml */
private readonly string $sheetDataXMLFilePath;
/** @var XMLReader The XMLReader object that will help read sheet's XML data */
private readonly XMLReader $xmlReader;
/** @var XMLProcessor Helper Object to process XML nodes */
private readonly XMLProcessor $xmlProcessor;
/** @var CellValueFormatter Helper to format cell values */
private readonly CellValueFormatter $cellValueFormatter;
/** @var RowManager Manages rows */
private readonly RowManager $rowManager;
/**
* TODO: This variable can be deleted when row indices get preserved.
*
* @var int Number of read rows
*/
private int $numReadRows = 0;
/** @var Row Contains the row currently processed */
private Row $currentlyProcessedRow;
/** @var null|Row Buffer used to store the current row, while checking if there are more rows to read */
private ?Row $rowBuffer = null;
/** @var bool Indicates whether all rows have been read */
private bool $hasReachedEndOfFile = false;
/** @var int The number of columns the sheet has (0 meaning undefined) */
private int $numColumns = 0;
/** @var bool Whether empty rows should be returned or skipped */
private readonly bool $shouldPreserveEmptyRows;
/** @var int Last row index processed (one-based) */
private int $lastRowIndexProcessed = 0;
/** @var int Row index to be processed next (one-based) */
private int $nextRowIndexToBeProcessed = 0;
/** @var int Last column index processed (zero-based) */
private int $lastColumnIndexProcessed = -1;
/**
* @param string $filePath Path of the XLSX file being read
* @param string $sheetDataXMLFilePath Path of the sheet data XML file as in [Content_Types].xml
* @param bool $shouldPreserveEmptyRows Whether empty rows should be preserved
* @param XMLReader $xmlReader XML Reader
* @param XMLProcessor $xmlProcessor Helper to process XML files
* @param CellValueFormatter $cellValueFormatter Helper to format cell values
* @param RowManager $rowManager Manages rows
*/
public function __construct(
string $filePath,
string $sheetDataXMLFilePath,
bool $shouldPreserveEmptyRows,
XMLReader $xmlReader,
XMLProcessor $xmlProcessor,
CellValueFormatter $cellValueFormatter,
RowManager $rowManager
) {
$this->filePath = $filePath;
$this->sheetDataXMLFilePath = $this->normalizeSheetDataXMLFilePath($sheetDataXMLFilePath);
$this->shouldPreserveEmptyRows = $shouldPreserveEmptyRows;
$this->xmlReader = $xmlReader;
$this->cellValueFormatter = $cellValueFormatter;
$this->rowManager = $rowManager;
// Register all callbacks to process different nodes when reading the XML file
$this->xmlProcessor = $xmlProcessor;
$this->xmlProcessor->registerCallback(self::XML_NODE_DIMENSION, XMLProcessor::NODE_TYPE_START, [$this, 'processDimensionStartingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_ROW, XMLProcessor::NODE_TYPE_START, [$this, 'processRowStartingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_CELL, XMLProcessor::NODE_TYPE_START, [$this, 'processCellStartingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_ROW, XMLProcessor::NODE_TYPE_END, [$this, 'processRowEndingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_WORKSHEET, XMLProcessor::NODE_TYPE_END, [$this, 'processWorksheetEndingNode']);
}
/**
* Rewind the Iterator to the first element.
* Initializes the XMLReader object that reads the associated sheet data.
* The XMLReader is configured to be safe from billion laughs attack.
*
* @see http://php.net/manual/en/iterator.rewind.php
*
* @throws IOException If the sheet data XML cannot be read
*/
public function rewind(): void
{
$this->xmlReader->close();
if (false === $this->xmlReader->openFileInZip($this->filePath, $this->sheetDataXMLFilePath)) {
throw new IOException("Could not open \"{$this->sheetDataXMLFilePath}\".");
}
$this->numReadRows = 0;
$this->lastRowIndexProcessed = 0;
$this->nextRowIndexToBeProcessed = 0;
$this->rowBuffer = null;
$this->hasReachedEndOfFile = false;
$this->numColumns = 0;
$this->next();
}
/**
* Checks if current position is valid.
*
* @see http://php.net/manual/en/iterator.valid.php
*/
public function valid(): bool
{
$valid = !$this->hasReachedEndOfFile;
if (!$valid) {
$this->xmlReader->close();
}
return $valid;
}
/**
* Move forward to next element. Reads data describing the next unprocessed row.
*
* @see http://php.net/manual/en/iterator.next.php
*
* @throws SharedStringNotFoundException If a shared string was not found
* @throws IOException If unable to read the sheet data XML
*/
public function next(): void
{
++$this->nextRowIndexToBeProcessed;
if ($this->doesNeedDataForNextRowToBeProcessed()) {
$this->readDataForNextRow();
}
}
/**
* Return the current element, either an empty row or from the buffer.
*
* @see http://php.net/manual/en/iterator.current.php
*/
public function current(): Row
{
$rowToBeProcessed = $this->rowBuffer;
if ($this->shouldPreserveEmptyRows) {
// when we need to preserve empty rows, we will either return
// an empty row or the last row read. This depends whether the
// index of last row that was read matches the index of the last
// row whose value should be returned.
if ($this->lastRowIndexProcessed !== $this->nextRowIndexToBeProcessed) {
// return empty row if mismatch between last processed row
// and the row that needs to be returned
$rowToBeProcessed = new Row([], null);
}
}
\assert(null !== $rowToBeProcessed);
return $rowToBeProcessed;
}
/**
* Return the key of the current element. Here, the row index.
*
* @see http://php.net/manual/en/iterator.key.php
*/
public function key(): int
{
// TODO: This should return $this->nextRowIndexToBeProcessed
// but to avoid a breaking change, the return value for
// this function has been kept as the number of rows read.
return $this->shouldPreserveEmptyRows ?
$this->nextRowIndexToBeProcessed :
$this->numReadRows;
}
/**
* @param string $sheetDataXMLFilePath Path of the sheet data XML file as in [Content_Types].xml
*
* @return string path of the XML file containing the sheet data,
* without the leading slash
*/
private function normalizeSheetDataXMLFilePath(string $sheetDataXMLFilePath): string
{
return ltrim($sheetDataXMLFilePath, '/');
}
/**
* Returns whether we need data for the next row to be processed.
* We don't need to read data if:
* we have already read at least one row
* AND
* we need to preserve empty rows
* AND
* the last row that was read is not the row that need to be processed
* (i.e. if we need to return empty rows).
*
* @return bool whether we need data for the next row to be processed
*/
private function doesNeedDataForNextRowToBeProcessed(): bool
{
$hasReadAtLeastOneRow = (0 !== $this->lastRowIndexProcessed);
return
!$hasReadAtLeastOneRow
|| !$this->shouldPreserveEmptyRows
|| $this->lastRowIndexProcessed < $this->nextRowIndexToBeProcessed;
}
/**
* @throws SharedStringNotFoundException If a shared string was not found
* @throws IOException If unable to read the sheet data XML
*/
private function readDataForNextRow(): void
{
$this->currentlyProcessedRow = new Row([], null);
$this->xmlProcessor->readUntilStopped();
$this->rowBuffer = $this->currentlyProcessedRow;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<dimension>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processDimensionStartingNode(XMLReader $xmlReader): int
{
// Read dimensions of the sheet
$dimensionRef = $xmlReader->getAttribute(self::XML_ATTRIBUTE_REF); // returns 'A1:M13' for instance (or 'A1' for empty sheet)
\assert(null !== $dimensionRef);
if (1 === preg_match('/[A-Z]+\d+:([A-Z]+\d+)/', $dimensionRef, $matches)) {
$this->numColumns = CellHelper::getColumnIndexFromCellIndex($matches[1]) + 1;
}
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<row>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processRowStartingNode(XMLReader $xmlReader): int
{
// Reset index of the last processed column
$this->lastColumnIndexProcessed = -1;
// Mark the last processed row as the one currently being read
$this->lastRowIndexProcessed = $this->getRowIndex($xmlReader);
// Read spans info if present
$numberOfColumnsForRow = $this->numColumns;
$spans = $xmlReader->getAttribute(self::XML_ATTRIBUTE_SPANS); // returns '1:5' for instance
if (null !== $spans && '' !== $spans) {
[, $numberOfColumnsForRow] = explode(':', $spans);
$numberOfColumnsForRow = (int) $numberOfColumnsForRow;
}
$cells = array_fill(0, $numberOfColumnsForRow, Cell::fromValue(''));
$this->currentlyProcessedRow->setCells($cells);
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<cell>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processCellStartingNode(XMLReader $xmlReader): int
{
$currentColumnIndex = $this->getColumnIndex($xmlReader);
// NOTE: expand() will automatically decode all XML entities of the child nodes
$node = $xmlReader->expand();
\assert($node instanceof DOMElement);
$cell = $this->cellValueFormatter->extractAndFormatNodeValue($node);
$this->currentlyProcessedRow->setCellAtIndex($cell, $currentColumnIndex);
$this->lastColumnIndexProcessed = $currentColumnIndex;
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @return int A return code that indicates what action should the processor take next
*/
private function processRowEndingNode(): int
{
// if the fetched row is empty and we don't want to preserve it..,
if (!$this->shouldPreserveEmptyRows && $this->currentlyProcessedRow->isEmpty()) {
// ... skip it
return XMLProcessor::PROCESSING_CONTINUE;
}
++$this->numReadRows;
// If needed, we fill the empty cells
if (0 === $this->numColumns) {
$this->rowManager->fillMissingIndexesWithEmptyCells($this->currentlyProcessedRow);
}
// at this point, we have all the data we need for the row
// so that we can populate the buffer
return XMLProcessor::PROCESSING_STOP;
}
/**
* @return int A return code that indicates what action should the processor take next
*/
private function processWorksheetEndingNode(): int
{
// The closing "</worksheet>" marks the end of the file
$this->hasReachedEndOfFile = true;
return XMLProcessor::PROCESSING_STOP;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<row>" node
*
* @return int Row index
*
* @throws InvalidArgumentException When the given cell index is invalid
*/
private function getRowIndex(XMLReader $xmlReader): int
{
// Get "r" attribute if present (from something like <row r="3"...>
$currentRowIndex = $xmlReader->getAttribute(self::XML_ATTRIBUTE_ROW_INDEX);
return (null !== $currentRowIndex) ?
(int) $currentRowIndex :
$this->lastRowIndexProcessed + 1;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<c>" node
*
* @return int Column index
*
* @throws InvalidArgumentException When the given cell index is invalid
*/
private function getColumnIndex(XMLReader $xmlReader): int
{
// Get "r" attribute if present (from something like <c r="A1"...>
$currentCellIndex = $xmlReader->getAttribute(self::XML_ATTRIBUTE_CELL_INDEX);
return (null !== $currentCellIndex) ?
CellHelper::getColumnIndexFromCellIndex($currentCellIndex) :
$this->lastColumnIndexProcessed + 1;
}
}

View File

@ -0,0 +1,116 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX;
use OpenSpout\Reader\Common\ColumnWidth;
use OpenSpout\Reader\SheetWithMergeCellsInterface;
use OpenSpout\Reader\SheetWithVisibilityInterface;
/**
* @implements SheetWithVisibilityInterface<RowIterator>
* @implements SheetWithMergeCellsInterface<RowIterator>
*/
final readonly class Sheet implements SheetWithVisibilityInterface, SheetWithMergeCellsInterface
{
/** @var RowIterator To iterate over sheet's rows */
private RowIterator $rowIterator;
/** @var SheetHeaderReader To read the header of the sheet, containing for instance the col widths */
private SheetHeaderReader $headerReader;
/** @var int Index of the sheet, based on order in the workbook (zero-based) */
private int $index;
/** @var string Name of the sheet */
private string $name;
/** @var bool Whether the sheet was the active one */
private bool $isActive;
/** @var bool Whether the sheet is visible */
private bool $isVisible;
/** @var list<string> Merge cells list ["C7:E7", "A9:D10"] */
private array $mergeCells;
/**
* @param RowIterator $rowIterator The corresponding row iterator
* @param int $sheetIndex Index of the sheet, based on order in the workbook (zero-based)
* @param string $sheetName Name of the sheet
* @param bool $isSheetActive Whether the sheet was defined as active
* @param bool $isSheetVisible Whether the sheet is visible
* @param list<string> $mergeCells Merge cells list ["C7:E7", "A9:D10"]
*/
public function __construct(
RowIterator $rowIterator,
SheetHeaderReader $headerReader,
int $sheetIndex,
string $sheetName,
bool $isSheetActive,
bool $isSheetVisible,
array $mergeCells
) {
$this->rowIterator = $rowIterator;
$this->headerReader = $headerReader;
$this->index = $sheetIndex;
$this->name = $sheetName;
$this->isActive = $isSheetActive;
$this->isVisible = $isSheetVisible;
$this->mergeCells = $mergeCells;
}
public function getRowIterator(): RowIterator
{
return $this->rowIterator;
}
/**
* @return ColumnWidth[] a list of column-widths
*/
public function getColumnWidths(): array
{
return $this->headerReader->getColumnWidths();
}
/**
* @return int Index of the sheet, based on order in the workbook (zero-based)
*/
public function getIndex(): int
{
return $this->index;
}
/**
* @return string Name of the sheet
*/
public function getName(): string
{
return $this->name;
}
/**
* @return bool Whether the sheet was defined as active
*/
public function isActive(): bool
{
return $this->isActive;
}
/**
* @return bool Whether the sheet is visible
*/
public function isVisible(): bool
{
return $this->isVisible;
}
/**
* @return list<string> Merge cells list ["C7:E7", "A9:D10"]
*/
public function getMergeCells(): array
{
return $this->mergeCells;
}
}

View File

@ -0,0 +1,119 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Common\ColumnWidth;
use OpenSpout\Reader\Common\XMLProcessor;
use OpenSpout\Reader\Wrapper\XMLReader;
final class SheetHeaderReader
{
public const XML_NODE_COL = 'col';
public const XML_NODE_SHEETDATA = 'sheetData';
public const XML_ATTRIBUTE_MIN = 'min';
public const XML_ATTRIBUTE_MAX = 'max';
public const XML_ATTRIBUTE_WIDTH = 'width';
/** @var string Path of the XLSX file being read */
private readonly string $filePath;
/** @var string Path of the sheet data XML file as in [Content_Types].xml */
private readonly string $sheetDataXMLFilePath;
/** @var XMLReader The XMLReader object that will help read sheet's XML data */
private readonly XMLReader $xmlReader;
/** @var XMLProcessor Helper Object to process XML nodes */
private readonly XMLProcessor $xmlProcessor;
/** @var ColumnWidth[] The widths of the columns in the sheet, if specified */
private array $columnWidths = [];
/**
* @param string $filePath Path of the XLSX file being read
* @param string $sheetDataXMLFilePath Path of the sheet data XML file as in [Content_Types].xml
* @param XMLReader $xmlReader XML Reader
* @param XMLProcessor $xmlProcessor Helper to process XML files
*/
public function __construct(
string $filePath,
string $sheetDataXMLFilePath,
XMLReader $xmlReader,
XMLProcessor $xmlProcessor
) {
$this->filePath = $filePath;
$this->sheetDataXMLFilePath = $this->normalizeSheetDataXMLFilePath($sheetDataXMLFilePath);
$this->xmlReader = $xmlReader;
// Register all callbacks to process different nodes when reading the XML file
$this->xmlProcessor = $xmlProcessor;
$this->xmlProcessor->registerCallback(self::XML_NODE_COL, XMLProcessor::NODE_TYPE_START, [$this, 'processColStartingNode']);
$this->xmlProcessor->registerCallback(self::XML_NODE_SHEETDATA, XMLProcessor::NODE_TYPE_START, [$this, 'processSheetDataStartingNode']);
// The reader should be unused, but we close to be sure
$this->xmlReader->close();
if (false === $this->xmlReader->openFileInZip($this->filePath, $this->sheetDataXMLFilePath)) {
throw new IOException("Could not open \"{$this->sheetDataXMLFilePath}\".");
}
// Now read the entire header of the sheet, until we reach the <sheetData> element
$this->xmlProcessor->readUntilStopped();
// We don't need the reader anymore, so we close it
$this->xmlReader->close();
}
/**
* @internal
*
* @return ColumnWidth[]
*/
public function getColumnWidths(): array
{
return $this->columnWidths;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<col>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processColStartingNode(XMLReader $xmlReader): int
{
$min = (int) $xmlReader->getAttribute(self::XML_ATTRIBUTE_MIN);
$max = (int) $xmlReader->getAttribute(self::XML_ATTRIBUTE_MAX);
$width = (float) $xmlReader->getAttribute(self::XML_ATTRIBUTE_WIDTH);
\assert($min > 0);
\assert($max > 0);
$columnwidth = new ColumnWidth($min, $max, $width);
$this->columnWidths[] = $columnwidth;
return XMLProcessor::PROCESSING_CONTINUE;
}
/**
* @return int A return code that indicates what action should the processor take next
*/
private function processSheetDataStartingNode(): int
{
// The opening "<sheetData>" marks the end of the file
return XMLProcessor::PROCESSING_STOP;
}
/**
* @param string $sheetDataXMLFilePath Path of the sheet data XML file as in [Content_Types].xml
*
* @return string path of the XML file containing the sheet data,
* without the leading slash
*/
private function normalizeSheetDataXMLFilePath(string $sheetDataXMLFilePath): string
{
return ltrim($sheetDataXMLFilePath, '/');
}
}

View File

@ -0,0 +1,86 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX;
use OpenSpout\Reader\Exception\NoSheetsFoundException;
use OpenSpout\Reader\SheetIteratorInterface;
use OpenSpout\Reader\XLSX\Manager\SheetManager;
/**
* @implements SheetIteratorInterface<Sheet>
*/
final class SheetIterator implements SheetIteratorInterface
{
/** @var Sheet[] The list of sheet present in the file */
private array $sheets;
/** @var int The index of the sheet being read (zero-based) */
private int $currentSheetIndex = 0;
/**
* @param SheetManager $sheetManager Manages sheets
*
* @throws NoSheetsFoundException If there are no sheets in the file
*/
public function __construct(SheetManager $sheetManager)
{
// Fetch all available sheets
$this->sheets = $sheetManager->getSheets();
if (0 === \count($this->sheets)) {
throw new NoSheetsFoundException('The file must contain at least one sheet.');
}
}
/**
* Rewind the Iterator to the first element.
*
* @see http://php.net/manual/en/iterator.rewind.php
*/
public function rewind(): void
{
$this->currentSheetIndex = 0;
}
/**
* Checks if current position is valid.
*
* @see http://php.net/manual/en/iterator.valid.php
*/
public function valid(): bool
{
return $this->currentSheetIndex < \count($this->sheets);
}
/**
* Move forward to next element.
*
* @see http://php.net/manual/en/iterator.next.php
*/
public function next(): void
{
++$this->currentSheetIndex;
}
/**
* Return the current element.
*
* @see http://php.net/manual/en/iterator.current.php
*/
public function current(): Sheet
{
return $this->sheets[$this->currentSheetIndex];
}
/**
* Return the key of the current element.
*
* @see http://php.net/manual/en/iterator.key.php
*/
public function key(): int
{
return $this->currentSheetIndex + 1;
}
}

View File

@ -0,0 +1,69 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Reader\XLSX;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Reader\Common\XMLProcessor;
use OpenSpout\Reader\Wrapper\XMLReader;
use function ltrim;
/**
* @internal
*/
final class SheetMergeCellsReader
{
public const XML_NODE_MERGE_CELL = 'mergeCell';
public const XML_ATTRIBUTE_REF = 'ref';
/** @var list<string> Merged cells list */
private array $mergeCells = [];
/**
* @param string $filePath Path of the XLSX file being read
* @param string $sheetDataXMLFilePath Path of the sheet data XML file as in [Content_Types].xml
* @param XMLProcessor $xmlProcessor Helper to process XML files
*/
public function __construct(
string $filePath,
string $sheetDataXMLFilePath,
XMLReader $xmlReader,
XMLProcessor $xmlProcessor
) {
$sheetDataXMLFilePath = ltrim($sheetDataXMLFilePath, '/');
// Register all callbacks to process different nodes when reading the XML file
$xmlProcessor->registerCallback(self::XML_NODE_MERGE_CELL, XMLProcessor::NODE_TYPE_START, [$this, 'processMergeCellsStartingNode']);
$xmlReader->close();
if (false === $xmlReader->openFileInZip($filePath, $sheetDataXMLFilePath)) {
throw new IOException("Could not open \"{$sheetDataXMLFilePath}\".");
}
// Now read the entire header of the sheet, until we reach the <sheetData> element
$xmlProcessor->readUntilStopped();
$xmlReader->close();
}
/**
* @return list<string>
*/
public function getMergeCells(): array
{
return $this->mergeCells;
}
/**
* @param XMLReader $xmlReader XMLReader object, positioned on a "<mergeCells>" starting node
*
* @return int A return code that indicates what action should the processor take next
*/
private function processMergeCellsStartingNode(XMLReader $xmlReader): int
{
$this->mergeCells[] = $xmlReader->getAttribute(self::XML_ATTRIBUTE_REF);
return XMLProcessor::PROCESSING_CONTINUE;
}
}

View File

@ -0,0 +1,169 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer;
use OpenSpout\Common\Entity\Row;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Writer\Exception\WriterNotOpenedException;
abstract class AbstractWriter implements WriterInterface
{
/** @var resource Pointer to the file/stream we will write to */
protected $filePointer;
/** @var string document creator */
protected string $creator = 'OpenSpout';
/** @var string Content-Type value for the header - to be defined by child class */
protected static string $headerContentType;
/** @var string Path to the output file */
private string $outputFilePath;
/** @var bool Indicates whether the writer has been opened or not */
private bool $isWriterOpened = false;
/** @var 0|positive-int */
private int $writtenRowCount = 0;
final public function openToFile($outputFilePath): void
{
$this->outputFilePath = $outputFilePath;
$errorMessage = null;
set_error_handler(static function ($nr, $message) use (&$errorMessage): bool {
$errorMessage = $message;
return true;
});
$resource = fopen($this->outputFilePath, 'w');
restore_error_handler();
if (null !== $errorMessage) {
throw new IOException("Unable to open file {$this->outputFilePath}: {$errorMessage}");
}
\assert(false !== $resource);
$this->filePointer = $resource;
$this->openWriter();
$this->isWriterOpened = true;
}
/**
* @codeCoverageIgnore
*
* @param mixed $outputFileName
*/
final public function openToBrowser($outputFileName): void
{
$this->outputFilePath = basename($outputFileName);
$resource = fopen('php://output', 'w');
\assert(false !== $resource);
$this->filePointer = $resource;
// Clear any previous output (otherwise the generated file will be corrupted)
// @see https://github.com/box/spout/issues/241
if (ob_get_length() > 0) {
ob_end_clean();
}
/*
* Set headers
*
* For newer browsers such as Firefox, Chrome, Opera, Safari, etc., they all support and use `filename*`
* specified by the new standard, even if they do not automatically decode filename; it does not matter;
* and for older versions of Internet Explorer, they are not recognized `filename*`, will automatically
* ignore it and use the old `filename` (the only minor flaw is that there must be an English suffix name).
* In this way, the multi-browser multi-language compatibility problem is perfectly solved, which does not
* require UA judgment and is more in line with the standard.
*
* @see https://github.com/box/spout/issues/745
* @see https://tools.ietf.org/html/rfc6266
* @see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
*/
header('Content-Type: '.static::$headerContentType);
header(
'Content-Disposition: attachment; '.
'filename="'.rawurlencode($this->outputFilePath).'"; '.
'filename*=UTF-8\'\''.rawurlencode($this->outputFilePath)
);
/*
* When forcing the download of a file over SSL,IE8 and lower browsers fail
* if the Cache-Control and Pragma headers are not set.
*
* @see http://support.microsoft.com/KB/323308
* @see https://github.com/liuggio/ExcelBundle/issues/45
*/
header('Cache-Control: max-age=0');
header('Pragma: public');
$this->openWriter();
$this->isWriterOpened = true;
}
final public function addRow(Row $row): void
{
if (!$this->isWriterOpened) {
throw new WriterNotOpenedException('The writer needs to be opened before adding row.');
}
$this->addRowToWriter($row);
++$this->writtenRowCount;
}
final public function addRows(array $rows): void
{
foreach ($rows as $row) {
$this->addRow($row);
}
}
final public function setCreator(string $creator): void
{
$this->creator = $creator;
}
final public function getWrittenRowCount(): int
{
return $this->writtenRowCount;
}
final public function close(): void
{
if (!$this->isWriterOpened) {
return;
}
$this->closeWriter();
fclose($this->filePointer);
$this->isWriterOpened = false;
}
/**
* Opens the streamer and makes it ready to accept data.
*
* @throws IOException If the writer cannot be opened
*/
abstract protected function openWriter(): void;
/**
* Adds a row to the currently opened writer.
*
* @param Row $row The row containing cells and styles
*
* @throws WriterNotOpenedException If the workbook is not created yet
* @throws IOException If unable to write data
*/
abstract protected function addRowToWriter(Row $row): void;
/**
* Closes the streamer, preventing any additional writing.
*/
abstract protected function closeWriter(): void;
}

View File

@ -0,0 +1,121 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer;
use OpenSpout\Common\Entity\Row;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Writer\Common\Entity\Sheet;
use OpenSpout\Writer\Common\Manager\WorkbookManagerInterface;
use OpenSpout\Writer\Exception\SheetNotFoundException;
use OpenSpout\Writer\Exception\WriterNotOpenedException;
abstract class AbstractWriterMultiSheets extends AbstractWriter
{
private WorkbookManagerInterface $workbookManager;
/**
* Returns all the workbook's sheets.
*
* @return Sheet[] All the workbook's sheets
*
* @throws WriterNotOpenedException If the writer has not been opened yet
*/
final public function getSheets(): array
{
$this->throwIfWorkbookIsNotAvailable();
$externalSheets = [];
$worksheets = $this->workbookManager->getWorksheets();
foreach ($worksheets as $worksheet) {
$externalSheets[] = $worksheet->getExternalSheet();
}
return $externalSheets;
}
/**
* Creates a new sheet and make it the current sheet. The data will now be written to this sheet.
*
* @return Sheet The created sheet
*
* @throws IOException
* @throws WriterNotOpenedException If the writer has not been opened yet
*/
final public function addNewSheetAndMakeItCurrent(): Sheet
{
$this->throwIfWorkbookIsNotAvailable();
$worksheet = $this->workbookManager->addNewSheetAndMakeItCurrent();
return $worksheet->getExternalSheet();
}
/**
* Returns the current sheet.
*
* @return Sheet The current sheet
*
* @throws WriterNotOpenedException If the writer has not been opened yet
*/
final public function getCurrentSheet(): Sheet
{
$this->throwIfWorkbookIsNotAvailable();
return $this->workbookManager->getCurrentWorksheet()->getExternalSheet();
}
/**
* Sets the given sheet as the current one. New data will be written to this sheet.
* The writing will resume where it stopped (i.e. data won't be truncated).
*
* @param Sheet $sheet The sheet to set as current
*
* @throws SheetNotFoundException If the given sheet does not exist in the workbook
* @throws WriterNotOpenedException If the writer has not been opened yet
*/
final public function setCurrentSheet(Sheet $sheet): void
{
$this->throwIfWorkbookIsNotAvailable();
$this->workbookManager->setCurrentSheet($sheet);
}
abstract protected function createWorkbookManager(): WorkbookManagerInterface;
protected function openWriter(): void
{
if (!isset($this->workbookManager)) {
$this->workbookManager = $this->createWorkbookManager();
$this->workbookManager->addNewSheetAndMakeItCurrent();
}
}
/**
* @throws Exception\WriterException
*/
protected function addRowToWriter(Row $row): void
{
$this->throwIfWorkbookIsNotAvailable();
$this->workbookManager->addRowToCurrentWorksheet($row);
}
protected function closeWriter(): void
{
if (isset($this->workbookManager)) {
$this->workbookManager->close($this->filePointer);
}
}
/**
* Checks if the workbook has been created. Throws an exception if not created yet.
*
* @throws WriterNotOpenedException If the workbook is not created yet
*/
private function throwIfWorkbookIsNotAvailable(): void
{
if (!isset($this->workbookManager)) {
throw new WriterNotOpenedException('The writer must be opened before performing this action.');
}
}
}

View File

@ -0,0 +1,21 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer;
final readonly class AutoFilter
{
/**
* @param 0|positive-int $fromColumnIndex
* @param positive-int $fromRow
* @param 0|positive-int $toColumnIndex
* @param positive-int $toRow
*/
public function __construct(
public int $fromColumnIndex,
public int $fromRow,
public int $toColumnIndex,
public int $toRow
) {}
}

View File

@ -0,0 +1,15 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer\CSV;
final class Options
{
public string $FIELD_DELIMITER = ',';
public string $FIELD_ENCLOSURE = '"';
public bool $SHOULD_ADD_BOM = true;
/** @var positive-int */
public int $FLUSH_THRESHOLD = 500;
}

View File

@ -0,0 +1,91 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer\CSV;
use OpenSpout\Common\Entity\Cell;
use OpenSpout\Common\Entity\Row;
use OpenSpout\Common\Exception\IOException;
use OpenSpout\Common\Helper\EncodingHelper;
use OpenSpout\Writer\AbstractWriter;
final class Writer extends AbstractWriter
{
/** @var string Content-Type value for the header */
protected static string $headerContentType = 'text/csv; charset=UTF-8';
private readonly Options $options;
private int $lastWrittenRowIndex = 0;
public function __construct(?Options $options = null)
{
$this->options = $options ?? new Options();
}
public function getOptions(): Options
{
return $this->options;
}
/**
* Opens the CSV streamer and makes it ready to accept data.
*/
protected function openWriter(): void
{
if ($this->options->SHOULD_ADD_BOM) {
// Adds UTF-8 BOM for Unicode compatibility
fwrite($this->filePointer, EncodingHelper::BOM_UTF8);
}
}
/**
* Adds a row to the currently opened writer.
*
* @param Row $row The row containing cells and styles
*
* @throws IOException If unable to write data
*/
protected function addRowToWriter(Row $row): void
{
$cells = array_map(static function (Cell\BooleanCell|Cell\DateIntervalCell|Cell\DateTimeCell|Cell\EmptyCell|Cell\FormulaCell|Cell\NumericCell|Cell\StringCell $value): string {
if ($value instanceof Cell\BooleanCell) {
return (string) (int) $value->getValue();
}
if ($value instanceof Cell\DateTimeCell) {
return $value->getValue()->format(DATE_ATOM);
}
if ($value instanceof Cell\DateIntervalCell) {
return $value->getValue()->format('P%yY%mM%dDT%hH%iM%sS%fF');
}
return (string) $value->getValue();
}, $row->getCells());
$wasWriteSuccessful = fputcsv(
$this->filePointer,
$cells,
$this->options->FIELD_DELIMITER,
$this->options->FIELD_ENCLOSURE,
''
);
if (false === $wasWriteSuccessful) {
throw new IOException('Unable to write data'); // @codeCoverageIgnore
}
++$this->lastWrittenRowIndex;
if (0 === $this->lastWrittenRowIndex % $this->options->FLUSH_THRESHOLD) {
fflush($this->filePointer);
}
}
/**
* Closes the CSV streamer, preventing any additional writing.
* If set, sets the headers and redirects output to the browser.
*/
protected function closeWriter(): void
{
$this->lastWrittenRowIndex = 0;
}
}

View File

@ -0,0 +1,67 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer\Common;
use OpenSpout\Common\Entity\Style\Style;
use OpenSpout\Common\TempFolderOptionTrait;
abstract class AbstractOptions
{
use TempFolderOptionTrait;
public Style $DEFAULT_ROW_STYLE;
public bool $SHOULD_CREATE_NEW_SHEETS_AUTOMATICALLY = true;
public ?float $DEFAULT_COLUMN_WIDTH = null;
public ?float $DEFAULT_ROW_HEIGHT = null;
/** @var ColumnWidth[] Array of min-max-width arrays */
private array $COLUMN_WIDTHS = [];
public function __construct()
{
$this->DEFAULT_ROW_STYLE = new Style();
}
/**
* @param positive-int ...$columns One or more columns with this width
*/
final public function setColumnWidth(float $width, int ...$columns): void
{
// Gather sequences
$sequence = [];
foreach ($columns as $column) {
$sequenceLength = \count($sequence);
if ($sequenceLength > 0) {
$previousValue = $sequence[$sequenceLength - 1];
if ($column !== $previousValue + 1) {
$this->setColumnWidthForRange($width, $sequence[0], $previousValue);
$sequence = [];
}
}
$sequence[] = $column;
}
$this->setColumnWidthForRange($width, $sequence[0], $sequence[\count($sequence) - 1]);
}
/**
* @param float $width The width to set
* @param positive-int $start First column index of the range
* @param positive-int $end Last column index of the range
*/
final public function setColumnWidthForRange(float $width, int $start, int $end): void
{
$this->COLUMN_WIDTHS[] = new ColumnWidth($start, $end, $width);
}
/**
* @internal
*
* @return ColumnWidth[]
*/
final public function getColumnWidths(): array
{
return $this->COLUMN_WIDTHS;
}
}

View File

@ -0,0 +1,21 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer\Common;
/**
* @internal
*/
final readonly class ColumnWidth
{
/**
* @param positive-int $start
* @param positive-int $end
*/
public function __construct(
public int $start,
public int $end,
public float $width,
) {}
}

View File

@ -0,0 +1,39 @@
<?php
declare(strict_types=1);
namespace OpenSpout\Writer\Common\Creator;
use OpenSpout\Common\Exception\UnsupportedTypeException;
use OpenSpout\Writer\CSV\Writer as CSVWriter;
use OpenSpout\Writer\ODS\Writer as ODSWriter;
use OpenSpout\Writer\WriterInterface;
use OpenSpout\Writer\XLSX\Writer as XLSXWriter;
/**
* This factory is used to create writers, based on the type of the file to be read.
* It supports CSV, XLSX and ODS formats.
*
* @deprecated Guessing mechanisms are brittle by nature and won't be provided by this library anymore
*/
final class WriterFactory
{
/**
* This creates an instance of the appropriate writer, given the extension of the file to be written.
*
* @param string $path The path to the spreadsheet file. Supported extensions are .csv,.ods and .xlsx
*
* @throws UnsupportedTypeException
*/
public static function createFromFile(string $path): WriterInterface
{
$extension = strtolower(pathinfo($path, PATHINFO_EXTENSION));
return match ($extension) {
'csv' => new CSVWriter(),
'xlsx' => new XLSXWriter(),
'ods' => new ODSWriter(),
default => throw new UnsupportedTypeException('No writers supporting the given type: '.$extension),
};
}
}

Some files were not shown because too many files have changed in this diff Show More