The Sommier Converter is a tool designed to convert PDF documents into CSV format. It is ideal for users who need to extract structured data from PDF files and convert it into a CSV format for further analysis, processing, or integration into other systems.
This project is written in Java and utilizes libraries like Apache PDFBox for PDF processing and OpenCSV for CSV file generation.
- PDF to CSV Conversion: Convert structured PDF documents to CSV format with ease.
- Custom PDF Parsing: Handles custom PDF layouts and extracts relevant information.
- Data Extraction: Automatically extracts and organizes data into rows and columns based on predefined markers within the PDF document.
- Simple Command-Line Interface (CLI): Easy to use with just a few commands to run.
Before running the project, make sure you have the following installed:
- Java 8 or later: The project is developed using Java.
- Apache PDFBox: For PDF parsing.
- OpenCSV: For handling CSV file output.
Make sure your Maven pom.xml includes the necessary dependencies:
<dependencies>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.27</version>
</dependency>
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.6</version>
</dependency>
</dependencies>Use Maven to build the project and resolve dependencies:
mvn clean installEnsure that you have Java 8 or later installed. You can check this by running:
java -versionTo convert a PDF document into CSV, use the following command:
java -jar sommier_converter.jar input.pdf output.csvWhere:
input.pdf: The PDF file you want to convert.output.csv: The destination CSV file where the extracted data will be saved.
If the PDF structure is customized, you may need to configure the extraction rules. Modify the relevant configuration files or tweak the code to adapt to your specific PDF format.
Assume we have a PDF file customs_declaration.pdf. To convert this PDF to CSV, use:
java -jar sommier_converter.jar customs_declaration.pdf customs_declaration.csvThe output CSV will contain rows and columns with extracted data from the PDF, such as item descriptions, quantities, and tariffs.
The project is organized as follows:
sommier_converter/
│
├── src/ # Java source files
│ ├── Main.java # Main class for execution
│ ├── PdfParser.java # PDF parsing logic
│ └── CsvWriter.java # Logic for writing CSV
│
├── lib/ # External libraries
│
└── pom.xml # Maven configuration file
- PDF Parsing Issues: If the converter fails to parse the PDF, ensure that the PDF is in a consistent format. The tool might not work well with scanned images or PDFs that don’t have text-based content.
- CSV Formatting: If the output CSV is not formatted as expected, check the delimiters and structure in the input PDF and adjust the parsing logic as necessary.
Contributions are welcome! If you'd like to help improve the project, you can:
- Fork the repository.
- Create a branch for your changes.
- Submit a pull request with your improvements or fixes.
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or feedback, feel free to reach out to Oussama Ezziouri at oussama.ezziouri@example.com.Clone the repository:
git clone https://github.com/OUSSAMA-EZZIOURI/sommier_converter.git
cd sommier_converter