- Published on
How to setup Tabula to Extract PDF tables using Docker
- Authors
- Name
- Ashik Nesin
- @AshikNesin
Tabula is a open source tool to extract data tables from PDF files.
I've tried so many cloud based apps to extract tables from PDF and so far nothing is as good as Tabula 🔥
You can even have templates for extracting data as well. So that you can reuse it.
Here is how to setup Tabule using Docker
FROM openjdk:8
ENV TABULA_VERSION 1.2.1
RUN wget -q https://github.com/tabulapdf/tabula/releases/download/v$TABULA_VERSION/tabula-jar-$TABULA_VERSION.zip && \
unzip tabula-jar-$TABULA_VERSION.zip && \
rm tabula-jar-$TABULA_VERSION.zip
EXPOSE 8080
CMD ["java", "-Dfile.encoding=utf-8", "-Xms256M", "-Xmx1024M", "-jar", "tabula/tabula.jar"]
And now, you can host this in your cloud provider like Railway
Reference
- Based on tabula-docker
Happy extracing data!