https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior https://archive.org/details/archiveteam-warrior-v3-20171013 https://archive.org/download/archiveteam-warrior-v3-20171013 https://archive.org/download/archiveteam-warrior-v3-20171013/archiveteam-warrior-v3-20171013.ova https://archive.org/download/archiveteam-warrior-v3-20171013/archiveteam-warrior-v3-20171013.ova https://ia903000.us.archive.org/7/items/archiveteam-warrior-v3-20171013/archiveteam-warrior-v3-20171013.ova https://github.com/ArchiveTeam/Ubuntu-Warrior/releases https://wiki.archiveteam.org/index.php/Yahoo!_Answers https://webirc.hackint.org/#irc://irc.hackint.org/#noanswers P hackint hackint Not connected. Connect #noanswers #warrior #noanswers You are not currently connected! Reconnect to join #noanswers NickServ 14:45:36 Welcome to HackINT, plshelp! Here on HackINT, we provide services to enable the registration of nicknames and channels! For details, type /msg NickServ help and /msg ChanServ help. ⓘ Archiving Yahoo! Answers | Archiving has started! | Read-only 2021-04-20, shutting down 2021-05-04 | https://wiki.archiveteam.org/index.php/Yahoo!_Answers | https://tracker.archiveteam.org/yahooanswers2/ → plshelp has joined rewby 14:45:48 Me and HCross were looking into the numbers earlier today and we found that between 2500 and 3k is where it would start throwing errors ben40 14:46:01 *for now raidandfade 14:46:08 why "for now"? what's changing? rewby 14:46:19 We're expecting the rate to go up when they go readonly raidandfade 14:46:26 ahhhh forkwhilefork 14:46:28 yeah that makes sense ben40 14:46:31 Also, I hope that the engineers come back on Monday and realise they need to put more servers on the case raidandfade 14:46:32 so you're saying we should come back around that time rewby 14:46:35 Databases are generally quite good at reading, but since there's also a write load right now ben40 14:46:44 I think they'll scale on Monday rewby 14:46:53 I *doubt* they'll scale on monday But we can try raidandfade 14:46:56 i wouldnt be to hopeful qc 14:47:00 unless they come back monday and implement IP rate limiting ben40 14:47:04 lol raidandfade 14:47:06 if anything they'll implement limiting lol not scale forkwhilefork 14:47:08 qc exactly lol thaeli 14:47:10 yeah, i'm just hoping they don't scale _in_ on Monday Zora 14:47:10 I doubt we'll be able to finish for a while At least until the site becomes read-only ben40 14:47:32 If I'm honest I don't think we will finish raidandfade 14:47:34 realistically do we have any idea how much better it'd be in RO? ben40 14:47:37 But we can get a good chunk done rewby 14:47:44 raidandfade: Not a clue raidandfade 14:47:44 considering its 2250 rn plshelp 14:47:46 I got a virtualbox error: "VT-x/AMD-V hardware acceleration is not available on your system. Your 64-bit guest will fail to detect a 64-bit CPU and will not be able to boot." raidandfade 14:47:54 ok plshelp rewby 14:48:09 plshelp: That sounds like a warrior problem. In which case see #warrior raidandfade 14:48:35 i ran out of ram on one of my VMs :( ben40 14:48:39 It sounds like the VM hasn't been set up correctly forkwhilefork 14:48:39 poor bby rewby 14:48:43 raidandfade: We eyeballed 2500 as the limit where 503/504s start coming in and just subtracted 10% from that for safety raidandfade 14:48:51 yeah i figured as much for that but im not sure how much better it'll be when it goes RO since... 2250 is not a big number :) ben40 14:49:20 I guess every so often we could bump it up to 2500 and see if the errors don't come forkwhilefork 14:49:34 sounds like a good strategy imo raidandfade 14:49:36 or we could remove the limit and watch yahoo burn forkwhilefork 14:49:43 well, but then we wouldn't get the data rewby 14:49:45 Removing the limit makes everything worse raidandfade 14:49:51 "and watch yahoo burn" qc 14:49:56 maybe a floating limit based on error rate? forkwhilefork 14:49:57 that's the problem rewby 14:50:04 If we're overloading the servers the netto amount of items we process decreases forkwhilefork 14:50:09 we're trying to stop yahoo from burning while we copy it raidandfade 14:50:14 yes, i was making a joke xd ben40 14:50:15 We put the limit up to 10k earlier today and it actually made the overall yield decrease rewby 14:50:15 Because a lot more things abort so we get less good itesm forkwhilefork 14:50:20 your joke sucked raid raidandfade 14:50:24 no y'all are too serious ben40 14:50:42 This is literally IRC No jokes allowed raidandfade 14:50:54 sorry father, for i have joked forkwhilefork 14:50:58 lmaooo Zora 14:50:59 At least it isn't Nintendo where they shut off the servers out of spite OrIdow6 14:51:37 I think it took me about half an hour to read this backlog plshelp 14:51:39 forkwhilefork: >poor bby rewby 14:51:40 Yahoo has historically been good at keeping their shit up until the cutoff date IIRC plshelp 14:51:46 r u insulting me forkwhilefork 14:51:56 plshelp lmao no I was making fun of raid ← ambor has left (#noanswers) rewby 14:52:22 OrIdow6: Yeeaaahhhh we may have gone slightly offtopic ben40 14:52:34 For the person asking if I could get the warrior up on a Raspberry Pi: I just bricked it, so no rewby 14:52:36 *slightly* EggplantN2 14:52:48 ben40: i dont think anyone did if they do disregard We need to test it ourselves Check data content etc ben40 14:53:00 Someone definitely did Zora 14:53:02 The CPU will be the bottleneck EggplantN2 14:53:03 ensure its not garbage data whoever did ignore ben40 14:53:15 Larsenv@21:08 yeah ok probably for the better that it did brick itself OrIdow6 14:53:41 Isn't this project already saturated with workers anyway? rewby 14:53:42 I've already offered one of my pis if AT needs a test host. OrIdow6: Yeah, it is Zora 14:53:50 I've had several servers crash today because of too many containers and connections going at once forkwhilefork 14:54:52 tbh it doesn't look like any of the active AT projects need more workers at the moment thaeli 14:55:01 load average: 208.43 yeah maybe i'll scale that one back in a tad ben40 14:55:03 Do warriors send discovered links to the tracker where they get put in the pool and eventually picked back up by warriors? Or do they crawl their own discovered links when they find them forkwhilefork 14:55:07 not sure what to do with all this extra server capacity rewby 14:55:13 Sent to the tracker ben40 14:55:17 ty Island 14:55:27 After stepping through the lua file and combing through my firewall yet again turns out I'm just overtired and missing stuff. Firewall was catching the 23038 port needed for submission, what do you know. Seems to be submitting now OrIdow6 14:56:45 ben40: SHould be the former ben40 14:57:09 Depending on how the todo is doing like, 10 or so days before shutdown, could we start not scraping questions with only 1 answer? Try to prioritise content I guess? rewby 14:57:30 The question is: How do you know a question has one answer until you scrape it forkwhilefork 14:57:31 how would we know they only have one answer? yeah ben40 14:57:46 o you guys are smart rewby 14:58:17 By the time you know how many answers there are, you've already scraped it and might as well package it up and rsync it off ← Island has left (Remote host closed the connection) ben40 14:58:33 good point rewby 14:59:22 Literally the only data we have on the question before the scrapers get to it is "what is the ID of this question" (The part with numbers and characters at the end of the URL) ben40 14:59:29 Right rewby 14:59:52 Plus, even a question with only one answer might discover many more questions with many more answers ← Megame has left (#noanswers) ThreeHeadedMonkey 15:01:42 About Raspberry Pis: Maybe not helpful here, but running manually from github works on ARM for other projects EggplantN2 15:02:04 ThreeHeadedMonkey: don't do that for now please :) raidandfade 15:02:37 aaaand my docker broke containers are running but dont show up in the list man i love docker ThreeHeadedMonkey 15:02:50 Only here or would that be a problem for other projects too? rewby 15:03:01 All projects EggplantN2 15:03:08 all for now ThreeHeadedMonkey 15:03:25 ok rewby 15:03:25 The grabs haven't been tested on ARM yet, so there is no guarantee they produce useful captures and not garbage data. EggplantN2 15:03:49 wget-at is our baby and we haven't tested her on ARM OrIdow6 15:04:20 We should probably make the preprocessor or something enforce that, then ThreeHeadedMonkey 15:06:33 That might be a good idea, this is the first time I've heard about this :'D Disconnected forkwhilefork ~forkwhile@irc.rekt.app Real name: The Lounge User Send a message More information Ignore user plshelp utc now=Sat Apr 10 21:23:11 2021 UTC https://webirc.hackint.org/#irc://irc.hackint.org/#warrior P hackint hackint Not connected. Connect #warrior #warrior You are not currently connected! Reconnect to join #warrior ⓘ Thanks for NOT asking about upload or project-specific problems | Latest Warrior: https://warriorhq.archiveteam.org/downloads/warrior3/ → plshelp has joined ChanServ 14:48:13 [#warrior] Welcome to #warrior! Please note that this channel is publicly logged. plshelp 14:48:29 I got a virtualbox error: "VT-x/AMD-V hardware acceleration is not available on your system. Your 64-bit guest will fail to detect a 64-bit CPU and will not be able to boot." and This kernel requies an x86-64 CPU, but only detected an i686 CPU. Unable to boot - please use a kernel appropriate for your CPU. from running archiveteam-warrior-v3.2-20210306.ova rewby 14:51:15 What kind of CPU do you have? rewby 15:02:08 That error generally means that you've either not got virtualization support enabled in your BIOS or that you have a 32bit OS/CPU (which the warrior doesn't support as far as I know) Disconnected plshelp 15:19:04 hey So I should try enabling it in my BIOS? Will do. Control Panel\System and Security\System says I have a 64bit OS forkwhilefork ~forkwhile@irc.rekt.app Real name: The Lounge User Send a message More information Ignore user plshelp