French societies accountability extraction and treatment by PapIT

Project than treat data from opendata-rncs.inpi.fr. They contain xml files of the account declaration of all french societies. The overall project is meant to be low-code and open source. Aim to provide ethical indicators on companies. Information media is a MySQL database, CSV files, web visualisation and a swagger API. The search engine endpoint return a JSON-LD (Hydra) complaint JSON. Company JSON cannot conform to JSON-LD Organization type due to lack of data (contact for instance). Score and indicators are calculated by batch, sql and why not using fancy libraries. Help in data treatment to improve scoring would be appreciated. Scoring, AI, data scrapping for segmentation. Shell and python are to launch be from their corresponding directory ./sh and ./py.

Install dependencies and python package

Synaptic packages have to be installed libxml2-utils mysql-server tree python3. Pip packages as well has to be installed for development purpose.

$ sh ./install-dependencies.sh

To install the enthic python package only.

$ sh ./install-wheel.sh

Run an instance

Extract data from zip

Data are stored in zip files on opendata-rncs.inpi.fr, group by month. Each XML is in a zip. The first step is then to extract the XML files.

$ sh ./clear-data.sh

Check the XML format

An XSD is provided by the INPI. This step verify all the XML are following this XSD. XML not following this XSD have their filename printed out.

$ sh ./check-data.sh

Format XML

Create two CSV based on XML. One for each table, the identity and the bundles of the company.

$ sh ./table-csv.sh

Create MySQL database

Create database, tables and indexes. The content of the two tables come from the previous CSV files.

$ sh ./database-creation.sh

Run API

A flask REST API can distribute data over the web. Following Swagger standard.

$ python3 ./app.py

Development and contribution

Development and Coding Rules

  • snake_case for variables, definition and CamelCase for classes.

  • Only argument is configuration file for python.

  • No output or print information (just raw results authorized), just log and files.

  • Sonar Qube integration.

  • Pytest python and API testing.

  • Autodocumentation using Sphinx 1.8.5.

  • Benchmark of CPython VS Pypy.

  • Common sens and clean code.

Build and install python enthic package

$ sh ./install-wheel.sh

Testing

Only python package is tested. Used test framework is pytest. Tests can be run via pytest in the python/enthic/ directory.

Generate documentation

Generate HTML documentation via Sphinx documentation framework. Sphinx is called programmatically at the beginning of setup.py. Therefore the above installation build the doc at the same time.

Library structure

./enthic
├── account-ontology.csv
├── bilans-saisis-v1.1.xsd
├── .gitignore
├── input
├── LICENSE.md
├── output
│   ├── bundle.csv
│   └── identity.csv
├── python
│   ├── doc
│   │   ├── conf.py
│   │   ├── index.rst
│   │   └── papit.png
│   ├── enthic
│   │   ├── app.py
│   │   ├── company
│   │   │   ├── company.py
│   │   │   ├── denomination_company.py
│   │   │   ├── __init__.py
│   │   │   └── siren_company.py
│   │   ├── database
│   │   │   ├── mysql.py
│   │   │   ├── mysql_data.py
│   │   │   ├── fetchall.py
│   │   │   └── __init__.py
│   │   ├── configuration.json
│   │   ├── conftest.py
│   │   ├── decorator
│   │   │   ├── check_sql_injection.py
│   │   │   ├── __init__.py
│   │   │   └── insert_request.py
│   │   ├── extract_bundle.py
│   │   ├── __init__.py
│   │   ├── ontology.py
│   │   ├── result
│   │   │   ├── __init__.py
│   │   │   └── result.py
│   │   ├── score
│   │   │   ├── classification.py
│   │   │   └── __init__.py
│   │   ├── static
│   │   │   ├── 404.html
│   │   │   ├── 500.html
│   │   │   ├── bootstrap.min.css
│   │   │   ├── documentation
│   │   │   │   ├── .buildinfo
│   │   │   │   ├── doctrees
│   │   │   │   │   ├── environment.pickle
│   │   │   │   │   └── index.doctree
│   │   │   │   ├── genindex.html
│   │   │   │   ├── index.html
│   │   │   │   ├── _modules
│   │   │   │   │   ├── company
│   │   │   │   │   │   ├── company.html
│   │   │   │   │   │   ├── denomination_company.html
│   │   │   │   │   │   └── siren_company.html
│   │   │   │   │   ├── decorator
│   │   │   │   │   │   ├── check_sql_injection.html
│   │   │   │   │   │   └── insert_request.html
│   │   │   │   │   ├── index.html
│   │   │   │   │   ├── score
│   │   │   │   │   │   └── classification.html
│   │   │   │   │   └── utils
│   │   │   │   │       ├── error_json_response.html
│   │   │   │   │       ├── json_response.html
│   │   │   │   │       ├── not_found_response.html
│   │   │   │   │       └── ok_json_response.html
│   │   │   │   ├── .nojekyll
│   │   │   │   ├── objects.inv
│   │   │   │   ├── py-modindex.html
│   │   │   │   ├── search.html
│   │   │   │   ├── searchindex.js
│   │   │   │   ├── _sources
│   │   │   │   │   └── index.rst.txt
│   │   │   │   └── _static
│   │   │   │       ├── ajax-loader.gif
│   │   │   │       ├── alabaster.css
│   │   │   │       ├── basic.css
│   │   │   │       ├── comment-bright.png
│   │   │   │       ├── comment-close.png
│   │   │   │       ├── comment.png
│   │   │   │       ├── custom.css
│   │   │   │       ├── doctools.js
│   │   │   │       ├── documentation_options.js
│   │   │   │       ├── down.png
│   │   │   │       ├── down-pressed.png
│   │   │   │       ├── file.png
│   │   │   │       ├── jquery-3.2.1.js
│   │   │   │       ├── jquery.js
│   │   │   │       ├── language_data.js
│   │   │   │       ├── minus.png
│   │   │   │       ├── papit.png
│   │   │   │       ├── plus.png
│   │   │   │       ├── pygments.css
│   │   │   │       ├── searchtools.js
│   │   │   │       ├── underscore-1.3.1.js
│   │   │   │       ├── underscore.js
│   │   │   │       ├── up.png
│   │   │   │       ├── up-pressed.png
│   │   │   │       └── websupport.js
│   │   │   ├── favicon.ico
│   │   │   ├── google7775f38904c3d3fc.html
│   │   │   ├── index.html
│   │   │   ├── jquery.min.js
│   │   │   ├── robot.txt
│   │   │   ├── sitemap.xml
│   │   │   ├── swagger.json
│   │   │   ├── swagger-ui-bundle.js
│   │   │   ├── swagger-ui-bundle.js.map
│   │   │   ├── swagger-ui.css
│   │   │   ├── swagger-ui.css.map
│   │   │   ├── swagger-ui.js
│   │   │   ├── swagger-ui.js.map
│   │   │   ├── swagger-ui-standalone-preset.js
│   │   │   └── swagger-ui-standalone-preset.js.map
│   │   ├── test_app.py
│   │   ├── test_extract_bundle.py
│   │   ├── test_treat_bundle.py
│   │   ├── treat_bundle.py
│   │   └── utils
│   │       ├── error_json_response.py
│   │       ├── conversion.py
│   │       ├── __init__.py
│   │       ├── json_response.py
│   │       ├── not_found_response.py
│   │       └── ok_json_response.py
│   ├── __init__.py
│   ├── MANIFEST.in
│   ├── setup.cfg
│   └── setup.py
├── README.rst
├── sh
│   ├── check-data.sh
│   ├── clear-data.sh
│   ├── csv-table.sh
│   ├── database-creation.sh
│   ├── install-dependencies.sh
│   └── install-wheel.sh
├── sonar-project.properties
└── sql
    ├── create-database-enthic.sql
    ├── create-index-bundle.sql
    ├── create-index-identity.sql
    ├── create-table-bundle.sql
    ├── create-table-identity.sql
    ├── create-table-request.sql
    ├── insert-bundle.sql
    └── insert-identity.sql

Donation

You can donate to support Python and Open Source development.

BTC 32JSkGXcBK2dirP6U4vCx9YHHjV5iSYb1G

ETH 0xF556505d13aC9a820116d43c29dc61417d3aB2F8

Indices and tables

Documentation of the Python Code

Those two python scripts are used in shell script table-csv.sh.

Parse all the XML available to list all the bundle code

PROGRAM BY PAPIT SASU, 2019

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

Sum all the bundle of the year for one company from a CSV sorted file

PROGRAM BY PAPIT SASU, 2019

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or bundle_file.write, just log and files.

Following module relates to Flask server.

Flask application, compatible with Sphinx documentation and Gunicorn server

PROGRAM BY PAPIT SASU, 2019

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

Decorator inserting data from the incoming request after having executed function

decorator.insert_request.insert_request(func)[source]

Decorator inserting relevant request data timestamped.

param func

Function decorated.

return

The function decorated.

JSON response Class

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class utils.json_response.JSONResponse(object_response, status=200)[source]

Abstraction on top of the flask Response class, as most Response will be application/json and HTTP default return code 200, but can be changed.

Valid JSON response Class

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class utils.ok_json_response.OKJSONResponse(object_response)[source]

Abstraction on top of the Enthic JSONResponse class, as most Response will be application/json and HTTP return code 200.

Error JSON response Class

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class utils.error_json_response.ErrorJSONResponse(error_message)[source]

Abstraction on top of the flask JSON Response class, for error JSON, application/json and HTTP return code 400. Formatted as hydra JSON-LD.

Not found JSON response Class

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class utils.not_found_response.NotFoundJSONResponse[source]

Abstraction on top of the flask JSON Response class, for not found data JSON, application/json and HTTP return code 404. Formatted as hydra JSON-LD.

Set of function converting the ontology from INPI format to base integer format

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

Class representing a company, constructed with a SIREN

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class company.siren_company.AllSirenCompany(siren)[source]

Class SirenCompany and MultipleBundleCompany inherit from Company class as it has potentially multiple declarations.

class company.siren_company.AverageSirenCompany(siren)[source]

Class AverageSirenCompany inherit from UniqueBundleCompany class as it as a unique average bundle. Inherit also YearCompany to check the year.

class company.siren_company.YearSirenCompany(siren, year)[source]

Class YearDenominationCompany inherit from UniqueBundleCompany class as it as a unique average bundle. Inherit also YearCompany to check the year.

Class representing a company, constructed with a denomination

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class company.denomination_company.AllDenominationCompany(denomination)[source]

Class AllDenominationCompany inherit from MultipleBundleCompany, DenominationCompany class as it has multiple bundles.

class company.denomination_company.AverageDenominationCompany(denomination)[source]

Class AverageDenominationCompany inherit from UniqueBundleCompany, DenominationCompany class as it as a unique average bundle.

class company.denomination_company.YearDenominationCompany(denomination, year)[source]

Class YearDenominationCompany inherit from YearCompany, UniqueBundleCompany, DenominationCompany class as it as a unique average bundle. Inherit also YearCompany to check the year.

Generic classes representing a company and their function(s)

PROGRAM BY PAPIT SASU, 2020

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class company.company.Bundle(*args)[source]

All the bundle declared and scoring of a company. Can be several year, one or average

class company.company.CompanyIdentity(*args)[source]

Identity data of the Company.

class company.company.DenominationCompany(denomination)[source]

Denomination defined company.

class company.company.JSONGenKey[source]

Generic keys found in the JSON response

class company.company.MultipleBundleCompany(sql_request, args)[source]

Company data returned with array of Bundle for each declaration. Inherit from OKJSONResponse to return a JSON and SQLData because of base data.

{
    "siren": {
        "value": "005420120",
        "description": "SIREN"
    },
    "denomination": {
        "value": "STE DES SUCRERIES DU MARQUENTERRE",
        "description": "Dénomination"
    },
    "ape": {
        "value": "Activités des sièes sociaux",
        "description": "Code Activité Principale Exercée (NAF)"
    },
    "postal_code": {
        "value": "62140",
        "description": "Code Postal"
    },
    "town": {
        "value": "MARCONNELLE",
        "description": "Commune"
    },
    "devise": {
        "value": "Euro",
        "description": "Devise"
    },
    "declarations": [
        {
            "declaration": {
                "value": 2016,
                "description": "Année de déclaration"
            },
            "financial_data": [
                {
                    "di": {
                        "account": "Compte annuel complet",
                        "value": -261053.0,
                        "description": "Résultat de l’exercice (bénéfice ou perte)"
                    }
                }
            ]
        }
    ]
}
class company.company.SirenCompany(siren)[source]

Siren defined company.

class company.company.UniqueBundleCompany(sql_request, args)[source]

Company data returned with a unique bundle as attribute. Inherit from OKJSONResponse to return a JSON and SQLData because of base data..

class company.company.YearCompany(year)[source]

Company data for a given year.

Flask MySQL initialisation

PROGRAM BY PAPIT SASU, 2019

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

Fetch results on the MySQL database

PROGRAM BY PAPIT SASU, 2019

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

database.fetch.fetchall(request, args=None)[source]

Return a fetchall for a given SQL request on the application MySQL database

param request

SQL request to execute as a string.

param args

SQL argument to pass the request. Default is None.

database.fetch.fetchone(request, args=None)[source]

Return a fetchone for a given SQL request on the application MySQL database

param request

SQL request to execute as a string.

param args

SQL argument to pass the request. Default is None.

database.fetch.get_results(request, args, sql_func)[source]

Return a fetchall for a given SQL request on the application MySQL database

param request

SQL request to execute as a string.

param args

SQL argument to pass the request.

param request

Cursor callable attribute to call.

SQL data object

PROGRAM BY PAPIT SASU, 2019

Coding Rules:

  • Snake case for variables.

  • Only argument is configuration file.

  • No output or print, just log and files.

class database.mysql_data.SQLData(sql_request, args)[source]

Execute a request and store data as attribute. Response 404 if no data retrieved