Data Mapping
Our goal for this software project was To load the data we get from group 1(Data Context) into a database instance and make it available for group 3(Data Visualization).
Wikibase is the software that enables MediaWiki to store structured data or access data that is stored in a structured data repository.
Challenges:
Find a suitable database where we can store the data. We now have data and must be available.
Research:
Then we looked for similar solutions that we could reuse.
Results:
We have succeeded in loading data into wikibase and be queried from Scholia to visualize data.
Server
You’re going to need some kind of server to run your wikibase instance. Depending on your ressources and usecase there can be a lot of options. From a dedicated x86 server, a rented vServer, a VM, or even a Raspberry Pi. It just needs to run docker. In our case we went for a VM provided by our university department.
Docker
- Setup Docker as shown here
- Start Docker
- docker-compose up
Bot Password
<ul style="list-style-type:disc;">
<li>Open and login to Wikibase in browser (address, username and password are specified in docker-compose.yml)</li>
<li>Go to Special Pages -> Bot passwords {address}/wiki/Special:BotPasswords</li>
<li>Create a bot named bot, check all permissions and click create</li>
<li>Copy the created Password (it should look like this: bot@7f7fqvfd5v5mn0nb7v214pu8vtj2vif8)</li>
</ul>
Alternatively
- simply use the admin user and passwortd specified in docker-compose.yml
- In the console, in the folder where your docker-compose.yml is located, run
docker-compose down --volumes
to delete all data from the database and other data related to Wikibase - In the console, in the folder data-mapping run
rm -rf apicache-py3 && rm -rf import_env && rm -rf pywikibot.lwp && rm -rf user-config.py&& rm -rf import.py && rm -rf data && rm -rf password && rm -rf throttle.ctrl && rm -rf
to delete the temporary data
Now you have an empty Wikibase instance again. - Python 3.7.x
- WikidataIntegrator
- Pywikibot
- Properties must already be present in the Wiki
- Get a .CSV file as parameter
- csv_path_updated.csv
- csv_path_errors.csv(contains all successfully added items with newly created QID)
- Open CSV file.
- Go through all items in a loop.
- Compare type of item with data type from wiki and add accordingly.
- Save added items with QIDs in a new file.
- Received a .CSV file as parameter
- Adds all Properties to Wiki
- base.import.py: adds a new data model to Wikibase.
- base.user-config.py: contains user information.
- requirements.txt : contains all requirements to execute the code.
- import.sh : contains commands to execute the code.
- Clone the data-mapping repository
git clone git@github.com:code-openness/data-mapping.git - Open data-mapping/import.sh in editor
- Paste your bot password
export BOT_PASSWORD=bot@7f7fqvfd5v5mn0nb7v214pu8vtj2vif8 - Adjust MEDIA_WIKI_SERVER with the host address from your docker-compose.yml file
export MEDIA_WIKI_SERVER=http://localhost:8181 - Adjust SPARQL_ENDPOINT with the host address from your docker-compose.yml file
export MEDIA_WIKI_SERVER=http://localhost:8181 - Save import.sh file
- execute import.sh
./import.sh - Assuming you already followed the steps of "Executing the script" atleast once before
- Open import_new_data.sh in editor
- Replace path in variable NEW_FILE with path to the file with new data
- Replace path in variables ITEM_MAP and PROP_MAP if necessary, paths need to be existing files!
- Save import_new_data.sh
- Execute import_new_data.sh ./import_new_data.sh
Clean Up
To completely delete the data from your Wikibase instance follow the instructions:
Requirements to run code
Wrote a code to add all items in Wikibase
After successful completion of the script 2 files should be created:
Important parts of the code:
Wrote a code to add all Properties in Wikibase
Then we specified that the code(items and properties) can be more efficiently written in a file.
This program contains these files:
Executing the script
Adding new items:
Then we tried to make the code perform better through multithreadding.
Team: - [belarara](https://github.com/belarara) - [altan](https://github.com/karacaltan) - [Xenja](https://github.com/XenjaCh) - [vahid](https://github.com/vahidhk) - [besendorf](https://github.com/besendorf) [View on GitHub](https://github.com/code-openness/data-mapping){: .btn .btn-purple } [View Scrum Board](https://github.com/orgs/code-openness/projects/2){: .btn .btn-purple } [Further Internal Project Documentation](https://github.com/code-openness/Documentation/wiki){: .btn .btn-purple }