- Published on
How to setup Tabula to Extract PDF tables using Docker
- Ashik Nesin
Tabula is a open source tool to extract data tables from PDF files.
I've tried so many cloud based apps to extract tables from PDF and so far nothing is as good as Tabula 🔥
You can even have templates for extracting data as well. So that you can reuse it.
Here is how to setup Tabule using Docker
FROM openjdk:8 ENV TABULA_VERSION 1.2.1 RUN wget -q https://github.com/tabulapdf/tabula/releases/download/v$TABULA_VERSION/tabula-jar-$TABULA_VERSION.zip && \ unzip tabula-jar-$TABULA_VERSION.zip && \ rm tabula-jar-$TABULA_VERSION.zip EXPOSE 8080 CMD ["java", "-Dfile.encoding=utf-8", "-Xms256M", "-Xmx1024M", "-jar", "tabula/tabula.jar"]
And now, you can host this in your cloud provider like Railway
- Based on tabula-docker
Happy extracing data!