I suggest you ...

Setting Referer http header when retrieving images

Some news/blog sites refuse access to inline images when they're not
referred from the article page (Referer: header is not properly set.)

Setting Referer header when retrieving image should allow NewsRob to show those images.

Example feed (Sorry it's in Japanese):

http://mrss.dokoda.jp/a/http/rss.rssad.jp/rss/itmnews/1.0/news_bursts.xml

Without Referer, request is redirected to an error page:

ts1@quad:/tmp
$ wget http://image.itmedia.co.jp/news/articles/0912/28/yog_konbu01.jpg--2009-12-30 13:30:59-- http://image.itmedia.co.jp/news/articles/0912/28/yog_konbu01.jpg
Resolving image.itmedia.co.jp... 202.218.219.10, 202.218.219.9
Connecting to image.itmedia.co.jp|202.218.219.10|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://www.itmedia.co.jp/messages/referer_error.html [following]
--2009-12-30 13:30:59-- http://www.itmedia.co.jp/messages/referer_error.html
Resolving www.itmedia.co.jp... 202.218.219.9, 202.218.219.10
Reusing existing connection to image.itmedia.co.jp:80.
HTTP request sent, awaiting response... 200 OK
Length: 233 [text/html]
Saving to: `referer_error.html'

100%[======================================>] 233 --.-K/s in 0s

2009-12-30 13:30:59 (28.9 MB/s) - `referer_error.html' saved [233/233]

With Referer, as the actual article page, jpeg image is retrieved:

ts1@quad:/tmp
$ wget --header "Referer: http://www.itmedia.co.jp/news/articles/0912/28/news044.html" http://image.itmedia.co.jp/news/articles/0912/28/yog_konbu01.jpg
--2009-12-30 13:31:02-- http://image.itmedia.co.jp/news/articles/0912/28/yog_konbu01.jpg
Resolving image.itmedia.co.jp... 202.218.219.9, 202.218.219.10
Connecting to image.itmedia.co.jp|202.218.219.9|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51280 (50K) [image/jpeg]
Saving to: `yog_konbu01.jpg'

100%[======================================>] 51,280 --.-K/s in 0.05s

2009-12-30 13:31:02 (1.04 MB/s) - `yog_konbu01.jpg' saved [51280/51280]

11 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    ts1ts1 shared this idea  ·   ·  Admin →

    6 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • Mariano KampAdminMariano Kamp (Admin, newsrob) commented  · 

        I know this might be annoying to you, but I really need to have a western site. I tried to go through the japanese site, but it's really hard for me to identify anything.
        And I also think that these sites have other issues why they are not displayed properly. But again, I have trouble debugging through it.

        Example:
        curl http://www.itmedia.co.jp/news/articles/0912/28/news044.html > outp -D header
        In header you will see:

        HTTP/1.0 200 OK
        Date: Fri, 12 Feb 2010 11:05:34 GMT
        Server: Apache
        Accept-Ranges: bytes
        Cache-Control: max-age=1800
        Expires: Fri, 12 Feb 2010 11:35:34 GMT
        Vary: Accept-Encoding,User-Agent
        Content-Type: text/html
        X-Cache: MISS from lbccl05.itmedia.co.jp
        X-Cache: MISS from lbccl01.itmedia.co.jp
        Via: 1.1 lbccl05.itmedia.co.jp:80 (squid), 1.0 lbccl01.itmedia.co.jp:80 (squid)
        Connection: close

        So the content encoding the charset is not set, which means its iso-8859-1, which is what NewsRob then uses to decode the stream and to save it.

        But in the ouput itself the then the charset is set to shift_jis.

        <meta http-equiv="Content-Type" content="text/html; charset=shift_jis">

        So it's misconfigured, at least not processable by NewsRob. I don't know how to do this differently. If I should investigate that it would be great if there is a western site that has the same symptoms. If you don't find one submit a new suggestion to fix itmedia.co.jp, but then there need to be enough votes for me to dedicate more time into that. This is really not about an hour or two (which I already spent btw.)

      • ts1ts1 commented  · 

        Thank you for implementing this but it is not fixed for the feeds I gave here.

      • Mariano KampAdminMariano Kamp (Admin, newsrob) commented  · 

        Anybody else of the "supporters" have a western example site?

        The two japanese sites are both not rendered properly on my phone and I would like to have an example where I can fix this issue in isolation and preferably where I can search in the html for western phrases ;-)

      • Mariano KampAdminMariano Kamp (Admin, newsrob) commented  · 

        ts1 do you happen to have another example with a latin charset? I have trouble making heads and tails of the pages, e.g. I can't decide if an image is part of the article or a banner ad etc.

        Do those pages work in the Android browser?

      Feedback and Knowledge Base