Migrating existing Smoothie content to Makerforums

Hey there nice people of the maker forums.

So, I’m thinking about what to do about the end of G+.
You guys have offered to open a Smoothie category, that’s nice thanks, I’ve added a link to it in Smoothie’s top menu.

Now here are the things I’m wondering about :

  1. Could you guys take the history of G+ posts from the old Smoothie G+ community ( it’s full of really really good data ) and re-insert it here seamlessly as a history of forum posts ?
    If that’s possible, I think I’d seriously consider using this forum as the official G+ archive for Smoothie, instead of self-hosting it.

  2. Smoothie has a forum, hosted at http://forum.smoothieware.org/forum/c-496918/general . It’s a Wikidot forum, because Smoothie used to be a wikidot wiki years ago ( I copied the guys at contraptor who I was working with at the time ), but it turned out Wikidot sucked, so I moved to self-hosting ( dokuwiki, own server ) a few years back. But I couldn’t move the forum, so that stayed on Wikidot.
    Is it imaginable to do an export/grab of that forum ( potentially I could maybe do the script for that myself ), and then re-insert that forum’s history here seamlessly as a history of posts that’d look like they were posted here originally ?
    If that’s imaginable, I think I’d seriously consider using this forum as both the G+ archive, and as the new forum for Smoothie’s community. Comments welcome both from the Smoothie community, this community, and whomever would feel like helping with the migration efforts.

Thanks a ton in advance for anything you do or think about :slight_smile:

@mcdanlj Do you have a takeout of the Smoothieware G+ community?

@Arthur_Wolf and @funinthefalls A total of 722 G+ Smoothie posts imported to Discourse topics, with 4840 comments imported to Discourse posts, after spam filtering one post by someone identified to have posted spam or spam-like content in a previous import of at least one other community, and 110 comments on posts so identified.

Imaginable. I’ve don’t know how much work. Can you please generate a ZIP archive snapshot/backup of your wiki for me to look through?

Hey !

I’ve just seen this category filling up with the G+ data, thanks a ton that is a very much appreciated import, and I’m so glad this data will be available here.

Now about the forum, wikidot doesn’t offer a good way to backup the forum data, so I’m going to need to code a crawler that grabs the whole thing, and saves it in a format that’s usable.
Can you give me a format that you want that would make it easy for you to import the old forum’s post ? I would guess that if I do my export in the same format the G+ posts were save as, then you’d be able to just run your G+ script on this forum data and have everything work without any extra work, right ?
If that’s something you think would work, then all I need from you is an example of the data format your script wants to eat. Just tell me and I’ll get right on it.

Thanks for all the support ! If we get this all to work, this forum becomes the official smoothie forum :slight_smile:

You’re very welcome!

Have you tried the zip archive that wikidot say they offer and that I linked to the help for? I’d love to look through that and see how close to usable it is.

My G+ import script depends on the google user information that’s scraped from G+, so it wouldn’t work for importing from wikidot.

If we need to crawl the site, I’ll need two kinds of data, user and post. I’ll want it all in JSON because why anything else? :slight_smile:

User:

  • email (alternatively, we have to make up @wikidot.invalid which Discourse knows not to send email to)
  • username
  • unique wikidot ID (if there is such a thing separate from the username)

Picking a user at random, looking at http://www.wikidot.com/user:info/papergeek the information is sparse. As a logged in user, do you have more information?

Then for posts, I need posts and their associated comments.

For a post, I need:

  • A unique ID for the post itself (could be the URL)
  • The title of the post
  • The date (ISO timestamp) on which it was created
  • The id of the author (whatever we decided above)
  • The text of the post, which I need to get into markdown (which includes simple HTML) or bbcode
  • List of any images to attach to the end — though I think if we just use links to images in wikidot, the system will auto-download the images, so I don’t think we need to copy them over the way I did with G+
  • If you have sub-categories within the forum, some way of indicating that so I can convert them to tags
  • A list of comments, each of which has all the attributes of a post except for comments. (I am not aware of an ability to preserve threading, so just in linear order, I guess)

Yep I did the wikidot export this afternoon, and it gives me the wiki pages but zero forum data.

About crawling the forums, I can do the script, but because of how wikidot sets things up, I don’t have access to any user data, so for example I won’t have user emails etc. Pretty much I’m going to write a crawler that gets anything that’s public, nothing more.

I think I can get all the information you are requesting for the posts though, I don’t know when I’ll have time to work on the crawler code, but you’ll know when I do :slight_smile:

If you can show me the exact json structure for the export I can just follow that, I think that’s what would be the least work for you, just give me example json data ( from another export/import for example ? ). wolf.arthur@gmail.com

Cheers, and thanks again.

Well, that’s… awesome. :grimacing:

Moved the rest of the conversation to email.

If someone here other than @Arthur_Wolf is interested in doing the work to get me data sooner, PM me and I’ll help you know what to do. I have it all written down in a long email. Short version: if you know — or want to learn — Selenium, it’s probably a few hours work.

Took a look at Selenium…way above my pay grade…lol But if there is something simple I can do, let me know.

For anyone, @Arthur_Wolf or otherwise, who is interested in helping, here are some resources for scraping javascript-driven websites like wikidot:

The Google+ importer I wrote is built around a core assumption of google users, which isn’t a facility wikidot provides, so it would be a false economy to try to re-use it. It will actually be easier for me to just start with something simple than try to work around it not being the same format.

Something like this would be easy to import:

{
  'users': [
    'arthurwolf': 'Arthur_Wolf',
  ],
  'posts': [
    {
      'id': 'post-content-2205295',
      'url': 'http://forum.smoothieware.org/forum/t-1081758/external-drivers',
      'title': 'External drivers',
      'date': '2015-01-12T21:25Z',
      'author': 'harry11733',
      'text': '<p>I plan to use some 570 oz stepper motors on a vertical mill CNC conversion. I am trying to decide what stepper motor drivers to use. It seems that the DQ860MA Stepper Motor Driver is the only one that has been reported as working with the smoothieboard.</p>
<p>I am interested in trying the digital steppers from Automation Technology, specifically the KL-5056D driver, which people have been happy with for this purpose. Do these drivers have any advantages or disadvantages relative to the DQ860MA? Will they even work with the smoothieboard? I don't really understand how these newer digital drivers work, for all I know it may be the same basic technology as the DQ860MA.</p>
<p>I am happy to pay a little more for the Automation Technology products in the anticipation of better customer support.</p>
<p>The current technology that people use for mill CNC conversions seems a bit nutty to me. They use the ethernet smooth stepper board to convert an ethernet signal to a parallel port in order to transmit the g-code from Mach 3/4 to the drivers. This seems circuitous compared to using a smoothieboard just to translate a g-code file, and much more expensive, but maybe I am missing some advantage that this other system offers.</p>',
      'comments': [
        {
          'id': 'post-content-2205401',
          'date': '2015-01-13T00:35Z',
          'author': 'bouni',
          'text': '<p>Hi,</p>
<p>I\'ve tried the DQ860MA external steppers with the smoothieboard and they work without any problems so far.</p>
<p><strong>vimeo.com / 115509540</strong></p>
<p>In my opinion the KL-5056D should work as well.<br>
In the <strong>kelinginc.net / KL-5056D.pdf</strong> , page 4 figure 3 you can see how you have to wire the drivers to the ST,DIR,EN and GND pins of the smothieboard. The internal resistors are 270Ohm and calculated for 5VDC signals, as far as i know the smoothieboard outputs only 3.3V, but for my DQ860MA that was not a problem, the optocouplers get only 8mA in this case but it seems to be enough to switch them.</p>
<p>Bouni</p>'
        },
        {
          'id': 'post-content-2205875',
          'date': '2015-01-13T10:21Z',
          'author': 'arthurwolf',
          'text': '<blockquote>
<p>I plan to use some 570 oz stepper motors on a vertical mill CNC conversion. I am trying to decide what stepper motor drivers to use. It seems that the DQ860MA Stepper Motor Driver is the only one that has been reported as working with the smoothieboard.</p>
</blockquote>
<p>Pretty much any external driver with a step/direction interface will work with Smoothieboard.</p>
<p>In some cases the driver will want 5V input, and Smoothieboard outputs 3.3V, but generally the drivers are fine with 3.3V even if rated at 5V. If 3.3V is not sufficent it\'s trivial to use a level shifter to bump the 3.3V up to 5V.</p>
<p>So generally, the vast majority of external drivers work out of the box with Smoothieboard.</p>
<p>I personally use the CW5045&nbsp;and am pretty happy with it.</p>
<blockquote>
<p>I am interested in trying the digital steppers from Automation Technology, specifically the KL-5056D driver, which people have been happy with for this purpose. Do these drivers have any advantages or disadvantages relative to the DQ860MA? Will they even work with the smoothieboard?</p>
</blockquote>
<p>Yes.<br>
All of those drivers are very similar, and work with Smoothie.</p>
<p>Pretty much, if you read DIR+ DIR- EN+ EN- PUL+ PUL- on it, you know it\'ll work with Smoothieboard.</p>
<blockquote>
<p>The current technology that people use for mill CNC conversions seems a bit nutty to me. They use the ethernet smooth stepper board to convert an ethernet signal to a parallel port in order to transmit the g-code from Mach 3/4 to the drivers.</p>
</blockquote>
<p>Yeah that\'s just a relic of the 80s :)</p>
<p>Smoothie is the modern way to do it :)</p>'
        }
      ]
    ]
  ]
}

The user mapping to a string means to attach those posts to an existing makerforums user. You can fill those in where you know the mapping. Otherwise I’ll just create a new user whenever I need to. Those won’t give people magic edit rights like they do for G+ posts imported, but it’s a support tool for @Arthur_Wolf’s forum, and it’s the best I can do. I think that referencing the original URL and author inasmuch as we have that information would comply with the license terms posted there; at least, that’s my intent.

The URL I put in the example shows the source page that I used to create the example. The ID I put in there is from what they put on the div, and Discourse imports really like unique IDs from the source system.

This doesn’t actually look hard to do with selenium. I think I’m 80% of the way there after about 30 minutes of poking around.

Show off :slight_smile:

I should have been clear, that was only of the scraper, not the importer, and of course the last 20% of the scraper will take the other 80% of the time! Like running into my script breaking a long way in from finding out how wikidot handles deleted users, and having to start over.

I discovered html2markdown, so the posts will look better here.

I don’t see a trivial way to preserve threading, but since Discourse hides the threading for the most part, I’m not going to worry about that.

1 Like

@Arthur_Wolf, do you want to try to map at least some well-known users from the old forum here? I can do that slightly more easily before doing the import than afterward.

Unless something unexpected happens, I expect to complete the import today. You will of course own the arthurwolf posts, and if I happen to see any users who obviously map to users here I’ll map them. Here’s where I’m tracking the user maps:

That’s all I could confidently map. Everyone else will have abandoned made-up names ending with _SW. Sorry.

1 Like

I finished a complete run of the scraper, and it looks sane at first glance:

$ ls -lh smoothie.json 
... 5.6M Apr  6 09:42 smoothie.json
$ jq '.posts | length' smoothie.json 
1256
$ jq '.users | length' smoothie.json 
970

On to the importer!

It’s been a mostly-chores Saturday so most of the time away from keyboard, but I am now to the point that I have successful imports happening and have just been tweaking display a bit. 1255 topics, with 4212 comments beyond the initial topic post. All users noted above are mapped. The most time-consuming part of this project was visually scanning 928 users for usernames that were sufficiently meaningful to even check for users here.

This update is now underway. Users not mapped to existing users are created with a _SW suffix to indicate where they came from. They aren’t connected to any authentication.

Allow me to present all the imported posts from the wikidot forum as part of the Smoothie category

@Arthur_Wolf, you now have a single forum containing all the Google+ and Wikidot conversations about Smoothie.

You are set to change everything to make this the official smoothie forum! :tada:

2 Likes

Apparently I wound up w/ two accounts here?

Possible to

Yup! That’s the default for any import unless there’s actually a key to join them together. The duplicate wasn’t from Smoothie import though; those duplicates have “SW” in the name.