![]() |
![]() |
![]() |
||||
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
![]() ![]() |
|
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. |
|
Thread Tools |
![]() |
#1 |
Confirmed User
Join Date: Oct 2003
Location: Atlanta
Posts: 2,840
|
Shell script help needed
Can someone who knows shell scripting spot my problem, everything appears to be correct but it's returning no results.
this is the html code that has the item_id code (ex: 55963573) that I need to collect Code:
<a href="http://www.domain.com/vendors/cat.html?item_id=55963573" onclick="itemPlayPlop.open(this.href); return false;"> Code:
while read prodName; do wget -q -U Mozilla "http://www.domain.com/$prodName/" -O - \ | tr '"' '\n' | grep "^?item_id=" | cut -d ' ' -f 4 >> itemIDs.txt done < catNames.txt |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#2 |
Confirmed User
Join Date: Aug 2009
Posts: 478
|
Check to make sure the domain you are querying is actually returning results to
you. A smart admin blocks queries from wget to db/query servers to avoid certain ddos attacks while a smart coder sets the client settings in wget to match that of mozilla or another popular web browser so it does not look automated. Set wget to look like a browser and see if you get better results. Code looks straight. :-) |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#3 | |
Confirmed User
Join Date: Oct 2003
Location: Atlanta
Posts: 2,840
|
Quote:
Code:
for page in {1..50} do wget -q -U Mozilla "http://www.domain.com/catalog_search/cat?p=$page" -O - \ | tr '"' '\n' | grep "^Product photo for " | cut -d ' ' -f 4 >> catNames.txt sleep 15 done |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#4 |
Damn Right I Kiss Ass!
Industry Role:
Join Date: Dec 2003
Location: Cowtown, USA
Posts: 32,409
|
No idea how to help you without the data example.
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#5 | |
It's 42
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
|
from the manual; |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#6 |
Confirmed User
Join Date: Oct 2003
Location: Atlanta
Posts: 2,840
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#7 |
Confirmed User
Join Date: Oct 2002
Posts: 3,745
|
It appears one problem is that you've anchored the grep:
Code:
grep "^?item_id=" nothing matches. Also, remember ? is a metacharacter. You'll probably not get much more help without posting your actual code with the real URL so somebody can see what is going on. When you obfuscate things you may as well ask why this doesn't work: Code:
some code some more code also code if code then do some stuff fi < input I'm not showing you
__________________
For historical display only. This information is not current: support@bettercgi.com ICQ 7208627 Strongbox - The next generation in site security Throttlebox - The next generation in bandwidth control Clonebox - Backup and disaster recovery on steroids |
![]() |
![]() ![]() ![]() ![]() ![]() |