GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   Shell script help needed (https://gfy.com/showthread.php?t=1033154)

acctman 08-06-2011 12:51 PM

Shell script help needed
 
Can someone who knows shell scripting spot my problem, everything appears to be correct but it's returning no results.

this is the html code that has the item_id code (ex: 55963573) that I need to collect
Code:

<a href="http://www.domain.com/vendors/cat.html?item_id=55963573"
onclick="itemPlayPlop.open(this.href); return false;">

shell script
Code:

while read prodName;
do
  wget -q -U Mozilla "http://www.domain.com/$prodName/" -O - \
  | tr '"' '\n' | grep "^?item_id=" | cut -d ' ' -f 4 >> itemIDs.txt
done < catNames.txt

thanks in advance

critical 08-06-2011 01:33 PM

Check to make sure the domain you are querying is actually returning results to
you. A smart admin blocks queries from wget to db/query servers to avoid certain ddos attacks while a smart coder sets the client settings in wget to match that of mozilla or another popular web browser so it does not look automated. Set wget to look like a browser and see if you get better results. Code looks straight.

:-)

acctman 08-06-2011 02:06 PM

Quote:

Originally Posted by critical (Post 18336495)
Check to make sure the domain you are querying is actually returning results to
you. A smart admin blocks queries from wget to db/query servers to avoid certain ddos attacks while a smart coder sets the client settings in wget to match that of mozilla or another popular web browser so it does not look automated. Set wget to look like a browser and see if you get better results. Code looks straight.

:-)

weird cause I used a similar code to get the product names

Code:

for page in {1..50}
do
        wget -q -U Mozilla "http://www.domain.com/catalog_search/cat?p=$page" -O - \
        | tr '"' '\n' | grep "^Product photo for " | cut -d ' ' -f 4 >> catNames.txt
        sleep 15
done


V_RocKs 08-06-2011 02:16 PM

No idea how to help you without the data example.

Barry-xlovecam 08-06-2011 02:49 PM

from the manual;

Quote:

?-U agent-string?
?--user-agent=agent-string?
Identify as agent-string to the http server.

The http protocol allows the clients to identify themselves using a User-Agent header field. This enables distinguishing the www software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as ?Wget/version?, version being the current version number of Wget.

However, some sites have been known to impose the policy of tailoring the output according to the User-Agent-supplied information. While this is not such a bad idea in theory, it has been abused by servers denying information to clients other than (historically) Netscape or, more frequently, Microsoft Internet Explorer. This option allows you to change the User-Agent line issued by Wget. Use of this option is discouraged, unless you really know what you are doing.

Specifying empty user agent with ?--user-agent=""? instructs Wget not to send the User-Agent header in http requests.
http://www.gnu.org/software/wget/man....html#Invoking

acctman 08-06-2011 02:53 PM

Quote:

Originally Posted by V_RocKs (Post 18336559)
No idea how to help you without the data example.

this is the html line i'm interested in. i need to extract 55963573
Code:

<a href="http://www.domain.com/vendors/cat.html?item_id=55963573"
onclick="itemPlayPlop.open(this.href); return false;">


raymor 08-06-2011 05:23 PM

It appears one problem is that you've anchored the grep:

Code:

grep "^?item_id="
In your example "?item_id" isn't the beginning of a line, so the ^ anchor means
nothing matches. Also, remember ? is a metacharacter.

You'll probably not get much more help without posting your actual code with the
real URL so somebody can see what is going on. When you obfuscate things you may
as well ask why this doesn't work:

Code:

some code
  some more code
 also code
if code then
do some stuff
fi
< input I'm not showing you



All times are GMT -7. The time now is 07:12 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123