Data Mapping

Our goal for this software project was To load the data we get from group 1(Data Context) into a database instance and make it available for group 3(Data Visualization).

Wikibase is the software that enables MediaWiki to store structured data or access data that is stored in a structured data repository.

Challenges:

Find a suitable database where we can store the data. We now have data and must be available.

Research:

Then we looked for similar solutions that we could reuse.

Results:

We have succeeded in loading data into wikibase and be queried from Scholia to visualize data.

Server

You’re going to need some kind of server to run your wikibase instance. Depending on your ressources and usecase there can be a lot of options. From a dedicated x86 server, a rented vServer, a VM, or even a Raspberry Pi. It just needs to run docker. In our case we went for a VM provided by our university department.

Docker

Bot Password

                                <ul style="list-style-type:disc;">
                                    <li>Open and login to Wikibase in browser (address, username and password are specified in docker-compose.yml)</li>
                                    <li>Go to Special Pages -> Bot passwords {address}/wiki/Special:BotPasswords</li>
                                    <li>Create a bot named bot, check all permissions and click create</li>
                                    <li>Copy the created Password (it should look like this: bot@7f7fqvfd5v5mn0nb7v214pu8vtj2vif8)</li>
                                  </ul>  
Alternatively
  • simply use the admin user and passwortd specified in docker-compose.yml
  • Clean Up

    To completely delete the data from your Wikibase instance follow the instructions:

    1. In the console, in the folder where your docker-compose.yml is located, run
      docker-compose down --volumes
      to delete all data from the database and other data related to Wikibase
    2. In the console, in the folder data-mapping run
      rm -rf apicache-py3 && rm -rf import_env && rm -rf pywikibot.lwp && rm -rf user-config.py&& rm -rf import.py && rm -rf data && rm -rf password && rm -rf throttle.ctrl && rm -rf
      to delete the temporary data
      Now you have an empty Wikibase instance again.

    Requirements to run code

    • Python 3.7.x
    • WikidataIntegrator
    • Pywikibot

    Wrote a code to add all items in Wikibase

    1. Properties must already be present in the Wiki
    2. Get a .CSV file as parameter

    After successful completion of the script 2 files should be created:

    1. csv_path_updated.csv
    2. csv_path_errors.csv(contains all successfully added items with newly created QID)

    Important parts of the code:

    1. Open CSV file.
    2. Go through all items in a loop.
    3. Compare type of item with data type from wiki and add accordingly.
    4. Save added items with QIDs in a new file.

    Wrote a code to add all Properties in Wikibase

    • Received a .CSV file as parameter
    • Adds all Properties to Wiki

    Then we specified that the code(items and properties) can be more efficiently written in a file.

    This program contains these files:

    1. base.import.py: adds a new data model to Wikibase.
    2. base.user-config.py: contains user information.
    3. requirements.txt : contains all requirements to execute the code.
    4. import.sh : contains commands to execute the code.

    Executing the script

    1. Clone the data-mapping repository
      git clone git@github.com:code-openness/data-mapping.git
    2. Open data-mapping/import.sh in editor
    3. Paste your bot password
      export BOT_PASSWORD=bot@7f7fqvfd5v5mn0nb7v214pu8vtj2vif8
    4. Adjust MEDIA_WIKI_SERVER with the host address from your docker-compose.yml file
      export MEDIA_WIKI_SERVER=http://localhost:8181
    5. Adjust SPARQL_ENDPOINT with the host address from your docker-compose.yml file
      export MEDIA_WIKI_SERVER=http://localhost:8181
    6. Save import.sh file
    7. execute import.sh
      ./import.sh

    Adding new items:

    1. Assuming you already followed the steps of "Executing the script" atleast once before
    2. Open import_new_data.sh in editor
    3. Replace path in variable NEW_FILE with path to the file with new data
    4. Replace path in variables ITEM_MAP and PROP_MAP if necessary, paths need to be existing files!
    5. Save import_new_data.sh
    6. Execute import_new_data.sh ./import_new_data.sh

    Then we tried to make the code perform better through multithreadding.

    Team: - [belarara](https://github.com/belarara) - [altan](https://github.com/karacaltan) - [Xenja](https://github.com/XenjaCh) - [vahid](https://github.com/vahidhk) - [besendorf](https://github.com/besendorf) [View on GitHub](https://github.com/code-openness/data-mapping){: .btn .btn-purple } [View Scrum Board](https://github.com/orgs/code-openness/projects/2){: .btn .btn-purple } [Further Internal Project Documentation](https://github.com/code-openness/Documentation/wiki){: .btn .btn-purple }

Table of contents