Welcome to semtools_parseout. This tool makes it easy to convert PDF documents into parsed Markdown files. You can quickly copy the output files to any location you choose.
To get started, visit this page to download the latest release: Download Releases.
Follow these steps to install semtools_parseout:
Run the following commands:
mkdir -p ~/.local/bin
curl -fsSL https://raw.githubusercontent.com/jerryjliu/semtools_parseout/main/parseout -o ~/.local/bin/parseout
chmod +x ~/.local/bin/parseout
Ensure that ~/.local/bin is included in your PATH. Run:
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc # or ~/.bashrc
source ~/.zshrc
parse command is available.Using semtools_parseout is simple. The basic command format is:
parseout <out_dir> <files...>
This command lets you specify where to save your parsed files and which documents to parse.
Here are some common tasks you might perform:
To parse a single PDF document, use the following command:
parseout ./parsed document.pdf
To parse multiple PDF documents at once from a folder, run:
parseout ./parsed scotus_118/*.pdf
The semtools_parseout script acts as a wrapper around the parse function. When you run semtools_parseout, it calls parse, which generates parsed Markdown files. These files are initially stored in ~/.parse/. The wrapper then copies them to the output directory you specify.
After you run the command, you will find the parsed Markdown files in your specified output directory. You can view, edit, or use them as needed.
For more detailed information on how to use the underlying semtools library, check the official repository: semtools Documentation.
If you run into issues or have questions, feel free to open an issue on the GitHub page. Our community is here to help you.
We plan to add more features based on user feedback. Stay tuned for updates that will enhance your experience.
Thank you for using semtools_parseout!