penelopeダウンロード - penelopeソースコードのダウンロード

penelope

その他のソースコード

v3.1.3

ダウンロード

ペネロペ

Penelope は、特に電子書籍リーダーデバイス向けの辞書を作成、編集、変換するためのマルチツールです。

バージョン: 3.1.3
日付: 2016-09-23
開発者: アルベルト・ペッタリン
ライセンス: MIT ライセンス (MIT)
連絡先: ここをクリックしてください

現在のバージョンでは次のことが可能です。

辞書を次の形式に変換します。
- Bookeen Cybook Odyssey (R/W)
- CSV(読み取り/書き込み)
- EPUB（Wのみ）
- MOBI（Kindle、Wのみ）
- Kobo (R インデックスのみ、W 非暗号化/非難読化のみ)
- StarDict (読み取り/書き込み)
- XML (読み取り/書き込み)
同じタイプの複数の辞書を 1 つの辞書にマージする
同じ見出し語の複数の定義を結合する
見出し語や定義によって並べ替えます
定義をマージ/並べ替え/編集するための独自の入力パーサーを定義します
独自の照合関数を定義します ( bookeen出力形式のみ)
辞書を含む EPUB ファイルを出力します (電子書籍リーダーの検索機能がない場合など)
MOBI (Kindle) 辞書を出力する

重要なアップデート

2016-04-17 悲しいことに、他の FLOSS プロジェクトに FLOSS 時間の 100% が費やされており、家賃や請求書の支払い、家族や友人との時間を費やす必要があるため、Penelope の作業に時間を費やす余裕がなくなりました。 .、他の人と同じように。したがって、私は問題やプルリクエストには取り組みません。それらがまったく処理されるとは期待しないでください。私はこのプロジェクトを引き継いでくれる他の開発者を積極的に探しています。 (この通知は、切り替えが行われたときに削除される必要があります。) 辞書を変換する必要があり、現在のバージョンの Penelope が機能しない場合は、 PyGlossaryを参照するとよいでしょう。ご不便をおかけして誠に申し訳ございません。

インストール

ピップの使用

コンソールを開いて次のように入力します。
```
$ [sudo] pip install penelope
```
それでおしまい！マニュアルを取得するには、引数なしで (または-hまたは--helpを指定して) 実行するだけです。
```
$ penelope
```

この手順では、 lxmlとmarisa-trieインストールします。 dictzip (StarDict 出力) とkindlegen (MOBI 出力) を個別にインストールする必要がある場合があります。以下を参照してください。

ソースコードから

ソースコードを取得します。
- gitを使用してこのリポジトリのクローンを作成します。
```
$ git clone https://github.com/pettarin/penelope.git
```
- または、最新リリースをダウンロードしてどこかに解凍し、
- または、現在のマスター ZIP をダウンロードして、どこかに解凍します。
コンソールを開き、 penelope (クローン) ディレクトリに入ります。
```
$ cd /path/to/penelope
```
それでおしまい！マニュアルを取得するには、引数なしで (または-hまたは--helpを指定して) 実行するだけです。
```
$ python -m penelope
```

この手順では依存関係はインストールされません。手動で行う必要があります。以下を参照してください。

依存関係

Python、バージョン 2.7.x または 3.4.x (またはそれ以降)
StarDict 辞書を書き込む場合: dictzip実行可能ファイル。 $PATHで使用できるか、 --dictzip-pathで指定します。
```
$ [sudo] apt-get install dictzip
```
Kobo 辞書の読み取り/書き込み: Python モジュールmarisa-trie :
```
$ [sudo] pip install marisa-trie
```
または、 $PATHで利用できる、または--marisa-bin-pathで指定された MARISA 実行可能ファイル
MOBI Kindle 辞書を書き込む場合: kindlegen 実行可能ファイル、 $PATHで利用可能、または--kindlegen-pathで指定
XML 辞書の読み取り/書き込み: Python モジュールlxml :
```
$ [sudo] pip install lxml
```

使用法

 usage: 
  $ penelope -h
  $ penelope -i INPUT_FILE -j INPUT_FORMAT -f LANGUAGE_FROM -t LANGUAGE_TO -p OUTPUT_FORMAT -o OUTPUT_FILE [OPTIONS]
  $ penelope -i IN1,IN2[,IN3...] -j INPUT_FORMAT -f LANGUAGE_FROM -t LANGUAGE_TO -p OUTPUT_FORMAT -o OUTPUT_FILE [OPTIONS]

description:
  Convert dictionary file(s) with file name prefix INPUT_FILE from format INPUT_FORMAT to format OUTPUT_FORMAT, saving it as OUTPUT_FILE.
  The dictionary is from LANGUAGE_FROM to LANGUAGE_TO, possibly the same.
  You can merge several dictionaries (with the same format), by providing a list of comma-separated prefixes, as shown by the third synopsis above.

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           enable debug mode (default: False)
  -f LANGUAGE_FROM, --language-from LANGUAGE_FROM
                        from language (ISO 639-1 code)
  -i INPUT_FILE, --input-file INPUT_FILE
                        input file name prefix(es). Multiple prefixes must be
                        comma-separated.
  -j INPUT_FORMAT, --input-format INPUT_FORMAT
                        from format (values: bookeen|csv|kobo|stardict|xml)
  -k, --keep            keep temporary files (default: False)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        output file name
  -p OUTPUT_FORMAT, --output-format OUTPUT_FORMAT
                        to format (values:
                        bookeen|csv|epub|kobo|mobi|stardict|xml)
  -t LANGUAGE_TO, --language-to LANGUAGE_TO
                        to language (ISO 639-1 code)
  -v, --version         print version and exit
  --author AUTHOR       author string
  --copyright COPYRIGHT
                        copyright string
  --cover-path COVER_PATH
                        path of the cover image file
  --description DESCRIPTION
                        description string
  --email EMAIL         email string
  --identifier IDENTIFIER
                        identifier string
  --license LICENSE     license string
  --title TITLE         title string
  --website WEBSITE     website string
  --year YEAR           year string
  --apply-css APPLY_CSS
                        apply the given CSS file (epub and mobi output only)
  --bookeen-collation-function BOOKEEN_COLLATION_FUNCTION
                        use the specified collation function
  --bookeen-install-file
                        create *.install file (default: False)
  --csv-fs CSV_FS       CSV field separator (default: ',')
  --csv-ignore-first-line
                        ignore the first line of the input CSV file(s)
                        (default: False)
  --csv-ls CSV_LS       CSV line separator (default: 'n')
  --dictzip-path DICTZIP_PATH
                        path to dictzip executable
  --epub-no-compress    do not create the compressed container (epub output
                        only, default: False)
  --escape-strings      escape HTML strings (default: False)
  --flatten-synonyms    flatten synonyms, creating a new entry with
                        headword=synonym and using the definition of the
                        original headword (default: False)
  --group-by-prefix-function GROUP_BY_PREFIX_FUNCTION
                        compute the prefix of headwords using the given prefix
                        function file
  --group-by-prefix-length GROUP_BY_PREFIX_LENGTH
                        group headwords by prefix of given length (default: 2)
  --group-by-prefix-merge-across-first
                        merge headword groups even when the first character
                        changes (default: False)
  --group-by-prefix-merge-min-size GROUP_BY_PREFIX_MERGE_MIN_SIZE
                        merge headword groups until the given minimum number
                        of headwords is reached (default: 0, meaning no merge
                        will take place)
  --ignore-case         ignore headword case, all headwords will be lowercased
                        (default: False)
  --ignore-synonyms     ignore synonyms, not reading/writing them if present
                        (default: False)
  --include-index-page  include an index page (epub and mobi output only,
                        default: False)
  --input-file-encoding INPUT_FILE_ENCODING
                        use the specified encoding for reading the raw
                        contents of input file(s) (default: 'utf-8')
  --input-parser INPUT_PARSER
                        use the specified parser function after reading the
                        raw contents of input file(s)
  --kindlegen-path KINDLEGEN_PATH
                        path to kindlegen executable
  --marisa-bin-path MARISA_BIN_PATH
                        path to MARISA bin directory
  --marisa-index-size MARISA_INDEX_SIZE
                        maximum size of the MARISA index (default: 1000000)
  --merge-definitions   merge definitions for the same headword (default:
                        False)
  --merge-separator MERGE_SEPARATOR
                        add this string between merged definitions (default: '
                        | ')
  --mobi-no-kindlegen   do not run kindlegen, keep .opf and .html files
                        (default: False)
  --no-definitions      do not output definitions for EPUB and MOBI formats
                        (default: False)
  --sd-ignore-sametypesequence
                        ignore the value of sametypesequence in StarDict .ifo
                        files (default: False)
  --sd-no-dictzip       do not compress the .dict file in StarDict files
                        (default: False)
  --sort-after          sort after merging/flattening (default: False)
  --sort-before         sort before merging/flattening (default: False)
  --sort-by-definition  sort by definition (default: False)
  --sort-by-headword    sort by headword (default: False)
  --sort-ignore-case    ignore case when sorting (default: False)
  --sort-reverse        reverse the sort order (default: False)

examples:

  $ penelope -i dict.csv -j csv -f en -t it -p stardict -o output.zip
    Convert en->it dictionary dict.csv (in CSV format) into output.zip (in StarDict format)

  $ penelope -i dict.csv -j csv -f en -t it -p stardict -o output.zip --merge-definitions
    As above, but also merge definitions

  $ penelope -i d1,d2,d3 -j csv -f en -t it -p csv -o output.csv --sort-after --sort-by-headword
    Merge CSV dictionaries d1, d2, and d3 into output.csv, sorting by headword

  $ penelope -i d1,d2,d3 -j csv -f en -t it -p csv -o output.csv --sort-after --sort-by-headword --sort-ignore-case
    As above, but ignore case for sorting

  $ penelope -i d1,d2,d3 -j csv -f en -t it -p csv -o output.csv --sort-after --sort-by-headword --sort-reverse
    As above, but reverse the order

  $ penelope -i dict.zip -j stardict -f en -t it -p csv -o output.csv
    Convert en->it dictionary dict.zip (in StarDict format) into output.csv (in CSV format)

  $ penelope -i dict.zip -j stardict -f en -t it -p csv -o output.csv --ignore-synonyms
    As above, but do not read the .syn synonym file if present

  $ penelope -i dict.zip -j stardict -f en -t it -p csv -o output.csv --flatten-synonyms
    As above, but flatten synonyms

  $ penelope -i dict.zip -j stardict -f en -t it -p bookeen -o output
    Convert dict.zip into output.dict.idx and output.dict for Bookeen devices

  $ penelope -i dict.zip -j stardict -f en -t it -p kobo -o dicthtml-en-it
    Convert dict.zip into dicthtml-en-it.zip for Kobo devices

  $ penelope -i dict.csv -j csv -f en -t it -p mobi -o output.mobi --cover-path mycover.png --title "My English->Italian Dictionary"
    Convert dict.csv into a MOBI (Kindle) dictionary, using the specified cover image and title

  $ penelope -i dict.xml -j xml -f en -t it -p mobi -o output.epub
    Convert dict.xml into an EPUB dictionary

  $ penelope -i dict.xml -j xml -f en -t it -p mobi -o output.epub --epub-output-definitions
    As above, but also output definitions

ISO 639-1 言語コードはここで見つけることができます。

辞書のインストール

Bookeen Odyssey デバイス

たとえば、IT -> EN 辞書を使用するとします。

PC で、IT -> EN 辞書ファイルit-en.dictおよびit-en.dict.idxを作成/ダウンロードします。
USB ケーブルを介して Odyssey デバイスを PC に接続します。
ファイルマネージャーを使用して、 it-en.dictとit-en.dict.idx 2 つのファイルを PC から Odyssey デバイスのDictionaries/ディレクトリにコピーします。
Odyssey を再起動し、イタリア語で本を開いて単語を選択すると、英語の定義が表示されます。 (このテストでは、辞書に確実に存在する一般的な単語を選択してください。)

Bookeen 辞書ソフトウェアは、電子書籍のdc:languageデータを読み取ることによって、使用する辞書を選択することに注意してください。電子書籍に適切なdc:languageメタデータがあることを確認してください。メタデータが存在しない場合、正しい辞書がロードされない可能性があります。

kobo デバイス

この記事の執筆時点 (2016 年 2 月 16 日) では、ファイルに次のような公式 Kobo 辞書のファイル名が付いている場合にのみ、Kobo デバイスは辞書を読み込みます。

dicthtml.zip (英語)
dicthtml-de.zip (ドイツ語)、 dicthtml-de-en.zip (ドイツ語 -> 英語)、 dicthtml-en-de.zip (英語 -> ドイツ語)、
dicthtml-es.zip (ES)、 dicthtml-es-en.zip (ES -> JP)、 dicthtml-en-es.zip (EN -> ES)、
dicthtml-fr.zip (フランス語)、 dicthtml-fr-en.zip (フランス語 -> 英語)、 dicthtml-en-fr.zip (英語 -> フランス語)、
dicthtml-it.zip (IT)、 dicthtml-it-en.zip (IT -> JP)、 dicthtml-en-it.zip (EN -> IT)、
dicthtml-nl.zip (NL)
dicthtml-ja.zip (JA)、 dicthtml-en-ja.zip (英語 -> JA)、
dicthtml-pt.zip (PT)、 dicthtml-pt-en.zip (PT -> 英語)、 dicthtml-en-pt.zip (英語 -> PT)

(この MobileRead スレッドを参照してください)

したがって、Penelope で作成されたカスタム辞書をインストールしたい場合は、公式 Kobo 辞書の 1 つを上書きすることを選択する必要があり、後者を使用する可能性は事実上失われます。

たとえば、ポーランド語の辞書 ( dicthtml-pl.zip ) を使用したいが、公式のポルトガル語の辞書 ( dicthtml-pt.zip ) は使用したくないとします。

PC で、ポーランド語辞書dicthtml-pl.zip作成/ダウンロードします。
kobo デバイスで、設定に移動し、ポルトガル語辞書を有効にします。
USB ケーブルを介して kobo デバイスを PC に接続します。
ファイルマネージャーを使用して、 dicthtml-pl.zip PC から kobo デバイスの.kobo/dict/ディレクトリにコピーします。 ( .kobo隠しディレクトリであることに注意してください。ファイルマネージャーの「隠しファイル/ディレクトリを表示する」設定を有効にする必要がある場合があります。)
dicthtml-pl.zip名前をdicthtml-pt.zipに変更します。
kobo を再起動し、ポーランド語の本を開いて単語を選択すると、定義が表示されます。 (このテストでは、辞書に確実に存在する一般的な単語を選択してください。)

koboのファームウェアをアップデートすると、ユーザー辞書が公式辞書に上書きされる可能性があるので注意してください。したがって、ユーザー辞書のバックアップコピーを PC や SD カードなどの安全な場所に保管してください。

この MobileRead スレッドでは、主に Penelope で作成されたカスタム辞書のリストを見つけることができます。

ライセンス

Penelope は、バージョン 2.0.0 (2014-06-30) 以降、MIT ライセンスに基づいてリリースされています。

Google Code によってホストされていた以前のバージョンは、GNU GPL 3 ライセンスに基づいてリリースされました。

制限事項と不足している機能

Bookeen には、その辞書形式 (リバースエンジニアリングされています)、YMMV に関する公式ドキュメントがありません。
Kobo にはその辞書形式 (リバースエンジニアリングされている)、YMMV に関する公式ドキュメントがありません。
kobo 辞書の読み取りは部分的にサポートされています (インデックスは読み取られますが、定義は暗号化/難読化されているため読み込まれません)。
EPUB (3) 辞書の読み取りはサポートされていません。書き込み部分には磨き/リファクタリングが必要です
PRC/MOBI (Kindle) 辞書の読み取りはサポートされていません
読み取り可能な StarDict ファイルにはいくつかの制限があります ( format_stardict.pyのコメントを参照)
ドキュメントが完了していません
単体テストがありません

スポンサー

2015 年 12 月: IngleseXpress.it、「Kindle il Dizionario Inglese-Italiano della Pronuncia Scritta Semplificata ごとに出版できるように、Grazie per averci aiutato a pubblicare!」

謝辞

感謝します:

コードを改善するためのアイデアを提供し、プロジェクト Wiki の多くのページを設定してくれたuwelovesdonna 。
Jens Sadowski はUnicode ファイル名のバグを指摘し、 set dict()の代わりに multiset dict()を使用することを提案してくれました。
Windows と Python 3 のバグを指摘してくれたoldnat 。
Wolfgang Miller-Reichling は、CSV 辞書を読み取るためのコードを提供してくれました。
branok は、ドイツ語の照合関数のアイデアと初期コードを提供してくれました。
-l switch をMARISA_BUILDに渡すことを提案してくれたpal ;
XML 形式で出力するときに& < >をエスケープすることを提案してくれたLukas Brückner 。
Stephan Lichtenhagen は、Python 3 で UTF-8 エンコーディングを強制することを提案してくれました。
$CWD からの依存関係 (問題 #1) を指摘してくださったniconavarrete さん。v2.0.1 で解決されました。
elchamaco は、テスト用の.synファイルを含む StarDict 辞書を提供します。