Pesquisar neste blogue

Mostrar mensagens com a etiqueta tesseract. Mostrar todas as mensagens
Mostrar mensagens com a etiqueta tesseract. Mostrar todas as mensagens

sexta-feira, 17 de julho de 2015

Playing with Odoo and Tesseract.

During these lasts days i have been reading things about Odoo and how it had improved since i last worked on it professionally and i came across  Basics-developing-simple-module-openerp a great resource for anyone wanting to start working on Odoo.
So work started on document_scanner, a plugin for Odoo, which uses Tesseract to import pictures and convert them to text .

 The entry point of a Odoo plugin is the __init__.py file and as my plugin is written on document_scanner.py my __init__.py file looks like


1
2
3
# -*- coding: utf-8 -*-

import document_scanner


Next lets take a look at the __openerp__.py file where you must include the dependencies, the description and your personal information which will be visible to the user when the plugin is installed. My file looks like:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
    'name': 'Document Scanner',
    'version': '1.0',
    'category': 'Tools',
    'description': """
Module which uses OCR to treat image files in Odoo.
==============================================
    """,
    'author': 'Rui Caridade',
    'depends': ['base','document'],
    'data': ['document_scanner_view.xml'],
    'demo': [], 
    'installable': True,
    'auto_install': False
}


which visually translates to


On the __openerp__.py file if you notice there is a reference to a xml file "document_scanner_view.xml" which is where the look and feel of the plugin is defined.
On document_scanner the xml file looks like :


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
<?xml version="1.0" encoding="utf-8"?>
<openerp>
    <data>

        <record id="document_scanner_wizard_view_form" model="ir.ui.view">
            <field name="name">document.scanner.wizard.form</field>
            <field name="model">document.scanner.wizard</field>
            <field name="arch" type="xml">
             <form string="Document Scanner">
                <group col="4">
                <field name="data"/>
                </group>
                <group col="4">
                    <field name="textoDaImagem"/>
                </group>
                    <button name="import_file" string="Import" type="object" />
                    <button icon="gtk-cancel" special="cancel" string="Cancel"/>
            </form>
            </field>
        </record>
        
     <record model="ir.actions.act_window" id="action_document_scanner_form">
        <field name="name">Document Scanner</field>
        <field name="type">ir.actions.act_window</field>
        <field name="res_model">document.scanner.wizard</field>
        <field name="view_type">form</field>
        <field name="view_mode">form</field>
        <field name="target">new</field>
    </record>
        
     <record id="action_dir_doc_scanner_view" model="ir.actions.act_window.view">
        <field name="view_mode">form</field>
        <field name="view_id" ref="document_scanner_wizard_view_form"/>
        <field name="act_window_id" ref="action_document_scanner_form"/>
    </record>

    <menuitem
        action="action_document_scanner_form"
        id="menu_document_scanner"
        parent="knowledge.menu_document_configuration"/>


    </data>
</openerp>


To better understand the file i recommend you read it from the menuitem till the top.

After that was done i needed to find a picture to test and i found Factura Vaca

The end result

 
The text is not quite right yet, still have to look into it. Still thinking on how i can improve this plugin. Any suggestions?



sexta-feira, 3 de julho de 2015

Starting with Odoo - Part 1

Some years ago i worked professionaly with OpenERP in its version 5 and 6 for a small portuguese company called OpenSecure . Life takes you in different routes sometimes and i lost track of its development, which made Odoo a really pleasant surprise.
So in order to refresh my own knowledge of OpenErp and Python i decided to do a series of tutorials, or if you prefer progress reports,where we'll be making a simple app for Odoo , a document scanner using tesseract ocr ( Tesseract-ocr ).

First we need to install Odoo.
The most complete tutorial, really a great job people , can be found here - how-to-install-openerp-odoo-8-on-ubuntu-server -

We'll also need a python binding for tesseract -  pytesseract  - which means in Ubuntu or in your OS choice you may need also to install pip.

In the Part 2 i'll describe the anotomy of an OpenERP app and how to use   pytesseract to extract information from an image file.
If you can identify which kind of document it is the really fun part can begin :).