Data Mapping

Our goal for this software project was To load the data we get from group 1(Data Context) into a database instance and make it available for group 3(Data Visualization).

Wikibase is the software that enables MediaWiki to store structured data or access data that is stored in a structured data repository.

Challenges:

Find a suitable database where we can store the data. We now have data and must be available.

Research:

Then we looked for similar solutions that we could reuse.

Results:

We have succeeded in loading data into wikibase and be queried from Scholia to visualize data.

Server

You’re going to need some kind of server to run your wikibase instance. Depending on your ressources and usecase there can be a lot of options. From a dedicated x86 server, a rented vServer, a VM, or even a Raspberry Pi. It just needs to run docker. In our case we went for a VM provided by our university department.

Docker

Setup Docker as shown here
Start Docker
docker-compose up

Bot Password

                                <ul style="list-style-type:disc;">
                                    <li>Open and login to Wikibase in browser (address, username and password are specified in docker-compose.yml)</li>
                                    <li>Go to Special Pages -> Bot passwords {address}/wiki/Special:BotPasswords</li>
                                    <li>Create a bot named bot, check all permissions and click create</li>
                                    <li>Copy the created Password (it should look like this: bot@7f7fqvfd5v5mn0nb7v214pu8vtj2vif8)</li>
                                  </ul>  

Alternatively

simply use the admin user and passwortd specified in docker-compose.yml

Clean Up

To completely delete the data from your Wikibase instance follow the instructions:

In the console, in the folder where your docker-compose.yml is located, run
docker-compose down --volumes
to delete all data from the database and other data related to Wikibase
In the console, in the folder data-mapping run
rm -rf apicache-py3 && rm -rf import_env && rm -rf pywikibot.lwp && rm -rf user-config.py&& rm -rf import.py && rm -rf data && rm -rf password && rm -rf throttle.ctrl && rm -rf
to delete the temporary data
Now you have an empty Wikibase instance again.

Requirements to run code

Python 3.7.x
WikidataIntegrator
Pywikibot

Wrote a code to add all items in Wikibase

Properties must already be present in the Wiki
Get a .CSV file as parameter

After successful completion of the script 2 files should be created:

csv_path_updated.csv
csv_path_errors.csv(contains all successfully added items with newly created QID)

Important parts of the code:

Open CSV file.
Go through all items in a loop.
Compare type of item with data type from wiki and add accordingly.
Save added items with QIDs in a new file.

Wrote a code to add all Properties in Wikibase

Received a .CSV file as parameter
Adds all Properties to Wiki

Then we specified that the code(items and properties) can be more efficiently written in a file.

This program contains these files:

base.import.py: adds a new data model to Wikibase.
base.user-config.py: contains user information.
requirements.txt : contains all requirements to execute the code.
import.sh : contains commands to execute the code.

Executing the script

Clone the data-mapping repository
git clone git@github.com:code-openness/data-mapping.git
Open data-mapping/import.sh in editor
Paste your bot password
export BOT_PASSWORD=bot@7f7fqvfd5v5mn0nb7v214pu8vtj2vif8
Adjust MEDIA_WIKI_SERVER with the host address from your docker-compose.yml file
export MEDIA_WIKI_SERVER=http://localhost:8181
Adjust SPARQL_ENDPOINT with the host address from your docker-compose.yml file
export MEDIA_WIKI_SERVER=http://localhost:8181
Save import.sh file
execute import.sh
./import.sh

Adding new items:

Assuming you already followed the steps of "Executing the script" atleast once before
Open import_new_data.sh in editor
Replace path in variable NEW_FILE with path to the file with new data
Replace path in variables ITEM_MAP and PROP_MAP if necessary, paths need to be existing files!
Save import_new_data.sh
Execute import_new_data.sh ./import_new_data.sh

Then we tried to make the code perform better through multithreadding.

Example