astlan-diff.py

Welcome To Astlan Forums Alpha Reading astlan-diff.py

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #6416

    I have been adding names and some places to the dictionary, that way if I misspell a name it will be in red.

    I have also put the more complicated ones in autotext so I don’t have to type them all.

    I am not happy with any of the diff options I am playing with. If there was a good free tool that people could compare PDF or epubs with that was easy for non-tech people to use that would be good, but not seeing it, and not sure how much time its worth.

    What I might just do is publish a PDF version with revisions/markup showing. I’ve actually had to turn this off to get “clean” alpha PDF’s.

    Once that is published. I accept all the changes, which starts the clock anew (print the clean version to pdf/epub) and then start tracking again for the next release.

    #6418
    Iume
    Member

    No no no. Less time programming, more time writing.

    #1420
    Mikey
    Member
    #6413
    Mikey
    Member

    A Python script for finding changes between Astlan EPUB editions:

    [code=python]
    #!/usr/bin/python3
    # -*- coding: utf-8 -*-

    import difflib
    import epub
    from bs4 import BeautifulSoup

    def text_paras(book, href):
    soup = BeautifulSoup(book.read_item(href), ‘lxml’)
    return [x.get_text() for x in soup.find_all(‘p’)]

    book1 = epub.open_epub(‘Apostles of Doom Alpha 1 – J. L. Langland.epub’)
    book2 = epub.open_epub(‘Apostles of Doom Alpha 2 – J. L. Langland.epub’)

    for item in book1.opf.manifest.values():
    print(item.href)
    if item and item.media_type == ‘application/xhtml+xml’:
    paras1 = text_paras(book1, item.href)
    paras2 = text_paras(book2, item.href)

    s = difflib.SequenceMatcher(None, paras1, paras2)
    for opcode in s.get_opcodes():
    print(“%6s a[%d:%d] b[%d:%d]” % opcode)

    if opcode[0] == ‘insert’:
    print(‘B: ‘, paras2[opcode[3]:opcode[4]])

    elif opcode[0] == ‘replace’:
    print(‘A: ‘, paras1[opcode[1]:opcode[2]])
    print(‘B: ‘, paras2[opcode[3]:opcode[4]])

    elif opcode[0] == ‘delete’:
    print(‘A: ‘, paras1[opcode[1]:opcode[2]])

    [/code]

    #6414

    Very interesting!

    Great minds think a like, before I read this I just posted a reply and asked people if a diff document would be useful, something showing changes, etc.

    #6415
    Mikey
    Member

    I was also looking at using either NLTK or Google’s Tensor Flow language models to pull out all of the character and place names for spell checking, but that will have to wait until later.

    #6417
    Mikey
    Member

    It should be reasonably straightforward to modify this diff tool to produce a EPUB with diff highlights, and bookmarks to changes.

Viewing 7 posts - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.