Dev & Type: Отличие формата PDF от языка разметки HTML

You’ll also notice that PDFs do not separate content from presentation. This is a fundamental difference between creating a PDF versus an HTML document. PDFs represent content and formatting at the same time using procedural operators, while other popular languages like HTML and CSS apply style rules to semantic elements. This allows PDFs to represent pixel-perfect layouts, but it also makes it much harder to extract text from a document.

Remember that PDF is a low-level representation of text and graphics, so there is no “underlined text” in a PDF document. There is only text, and lines (as entirely independent entities). Underlining text must be performed manually.

PDFs were initially designed to be a digital representation of physical paper and ink. The graphics operators presented in this chapter make it possible to represent arbitrary paths as a sequence of lines and curves. Like their textual counterparts, graphics operators are procedural. They mimic the actions an artist would take to draw the same image. This can be intuitive if you’re creating graphics from scratch, but can become quite complicated if you’re trying to manually edit an image. For example, it’s easy to say something like, “Draw a line from here to there,” but it’s much harder to say, “Move this box two inches to the left.” Once again, this task is left up to PDF editor applications.